replace method with Regex - unexpected output logging match $& and Group $1 - javascript

I have a regex I intend to use with the .replace method with the intention of extracting paragraphs from a string and pushing each one to an array.
I was struggling with my getValues function and when I logged both Match and Group1 to the console got some unexpected results.
Here's the code wip:
var mystring = 'Valid prater\nLorem ipsum dolor sit amet, consectetur adipiscing elit. \nProin volutpat facilisis imperdiet. \n Nunc porttito\nMorbi non eros nec arcu condimentum ultrices in ut nunc. \nMaecenas elit tellus, scelerisque ac auctor fermentum, bibendum. '
var paragraphs = [];
var obj = {};
var getValues = function(match,p1) {
console.log('Match: ' + match );
console.log('p1: ' + p1 );
// obj= {};
// obj['paragraph'] = p1;
// paragraphs.push(obj);
};
mystring.replace(/([^\\n][^\\]+)/g, getValues);
https://jsfiddle.net/7293mo7y/
Expected output:
Match: Valid prater
p1: Valid prater
Match: Lorem ipsum dolor sit amet, consectetur adipiscing elit.
p1: Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Match: Proin volutpat facilisis imperdiet.
p1: Proin volutpat facilisis imperdiet.
Match: Nunc porttito
p1: Nunc porttito
Match: Morbi non eros nec arcu condimentum ultrices in ut nunc.
p1: Morbi non eros nec arcu condimentum ultrices in ut nunc.
Match: Maecenas elit tellus, scelerisque ac auctor fermentum, bibendum.
p1: Maecenas elit tellus, scelerisque ac auctor fermentum, bibendum.
I'm expecting similar behaviour to this example
Actual output:
Match: Valid prater
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Proin volutpat facilisis imperdiet.
Nunc porttito
Morbi non eros nec arcu condimentum ultrices in ut nunc.
Maecenas elit tellus, scelerisque ac auctor fermentum, bibendum.
p1: Valid prater
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Proin volutpat facilisis imperdiet.
Nunc porttito
Morbi non eros nec arcu condimentum ultrices in ut nunc.
Maecenas elit tellus, scelerisque ac auctor fermentum, bibendum.
Could anyone explain why I'm not getting the expected output when logging match and p1 to the console?
Why is the behaviour different to this example?
What needs to change to get the expected output?
Thanks!

You can just take advantage of MULTILINE flag or m in your regex. That allows you to use anchors ^ and $ to match a full line in each match like this:
var mystring = 'Valid prater\nLorem ipsum dolor sit amet, consectetur adipiscing elit. \nProin volutpat facilisis imperdiet. \n Nunc porttito\nMorbi non eros nec arcu condimentum ultrices in ut nunc. \nMaecenas elit tellus, scelerisque ac auctor fermentum, bibendum. '
var paragraphs = [];
var obj = {};
var getValues = function(match,p1) {
console.log('Match: ' + match);
console.log('p1: ' + p1);
};
mystring.replace(/^(.*)$/mg, getValues);
Updated JS Fiddle

Related

How to alert all p using javascript?

I learn on how to alert all html code from webpage using this code:
var markup = document.document.innerHTML;
alert(markup);
I want to alert only all <p>
I tried this code
var markup = document.getElementsByTag('p').innerHTML;
alert(markup);`
But it's not working
document.getElementsByTagName("p") returns a HTMLCollection which you can convert to an array using the spread operator. Then you need to get the innerHTML for each element of the array. Finally you can join those innerHTML together to output them:
var pElements = [ ... document.getElementsByTagName("p") ];
var pMarkup = pElements.map( element => element.innerHTML );
alert( pMarkup.join( "\n" ) );
<p>abc<strong>def</strong></p>
<table><tr><td>Don't show this</td></tr></table>
<p>ghi<em>jkl</em></p>
I think you are after HTMLElement.outerHTML
// All p's
const pTags = document.querySelectorAll('p');
const output = [...pTags].map(p => p.outerHTML).join("");
console.log(output);
.red {
color: red;
}
<p class="red">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis at fermentum turpis. Maecenas congue accumsan enim, et dictum turpis malesuada et.
</p>
<p>
Mauris vitae pretium tortor. Aenean nulla ante, scelerisque in erat ac, tincidunt porttitor dolor. Sed blandit sed mi at vulputate.
</p>
<p id="three">
Curabitur lobortis at augue at hendrerit. Mauris id ligula cursus ligula dictum viverra.
</p>
<p>
Sed suscipit varius orci. Duis sit amet fermentum eros. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin commodo turpis in neque aliquam, et laoreet odio consequat.
</p>
<p data-number="5">
Nam dolor neque, lacinia sed viverra et, cursus ac ipsum. Cras gravida quam enim, sit amet tristique urna faucibus non.
</p>
<p>
Phasellus cursus, justo a volutpat pulvinar, ligula metus mollis turpis, in tincidunt ante nisl non nunc.
</p>

CSS Transitions not working inside parametrized functions

So I got the following code to create a simple fixed red box:
var red_box = document.createElement('div');
red_box.id = 'caixa_apresentacao_texto';
red_box.style.width = "40%";
red_box.style.overflow = "hidden";
red_box.style.backgroundColor = "white";
red_box.style.color = "black";
red_box.style.border = "5px double red";
/* Centralizing */
red_box.style.position = "fixed";
red_box.style.left = "50%";
red_box.style.marginLeft = "-20%"; //Por que a largura é 40%...
red_box.style.transition = "max-height 1s";
red_box.style.display = "none";
red_box.style.zIndex = "99999999999999";
red_box.style.marginTop = "50px";
red_box.style.maxHeight = "0px";
document.documentElement.insertBefore(red_box,document.body);
So, the idea is that, when I pass some text to this box, it enlarges slowly in order to display it. I get this behaviour with the following code:
var timerHeight;
function expandBox(text){
clearInterval(timerHeight);
/* if the box is empty...*/
if(document.querySelector("#red_box").style.maxHeight == "0px"){
document.querySelector("#red_box").style.display = "inline-block";
red_box.innerHTML = text;
/* Call a function that enlarge the maxHeight property , theorically with the transition letting it more beautiful */
var someText = "text";
timerHeight= setTimeout(enlargeBoxHeight(someText),1);
}
}
function enlargeBoxHeight(anyText){
document.querySelector("#red_box").style.maxHeight ="50px";
}
expandBox("Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut sollicitudin euismod metus, at blandit neque maximus ac. Integer fermentum nulla at nibh suscipit, a placerat est pretium. Morbi varius ornare enim, ac pulvinar elit aliquet in. Nullam non diam in nibh consectetur fringilla id nec enim. Mauris lacinia a augue ac consectetur. Etiam tempor et elit a dictum. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam interdum pulvinar pharetra. Aliquam erat volutpat. Aliquam non diam eget turpis tincidunt venenatis at in est. Duis laoreet nibh ultrices erat faucibus hendrerit.")
You can see the fiddle here.
So, I know that 50px is not a good height, but what matters here is that the transition is not working. You may have noticed that the var someText is useless here; but it does have the purpose to express my doubt. I've tried to take it off of the enlargeBoxHeight call. So the last part of the code now is:
...
timerHeight= setTimeout(enlargeBoxHeight,1);
}
}
function enlargeBoxHeight(){
document.querySelector("#red_box").style.maxHeight ="50px";
}
expandBox("Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut sollicitudin euismod metus, at blandit neque maximus ac. Integer fermentum nulla at nibh suscipit, a placerat est pretium. Morbi varius ornare enim, ac pulvinar elit aliquet in. Nullam non diam in nibh consectetur fringilla id nec enim. Mauris lacinia a augue ac consectetur. Etiam tempor et elit a dictum. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam interdum pulvinar pharetra. Aliquam erat volutpat. Aliquam non diam eget turpis tincidunt venenatis at in est. Duis laoreet nibh ultrices erat faucibus hendrerit.")
And the surprise: the transition works now. Why? What I am missing here?
When you do:
setTimeout(enlargeBoxHeight,1);
Without the parentheses (), you pass the function enlargeBoxHeight to the timeout, without calling it yet. The timeout will call it after 1ms. Which produced expected behavior + transition.
When you do:
timerHeight= setTimeout(enlargeBoxHeight(someText),1);
You pass the result of the function enlargeBoxHeight to the timeout. () part forces javascript to immediately call the function, and process everything before the timeout. So the transition does not work. After the 1ms, javascript handles the result (with is undefined or irrelevant).
If you want to pass a parameter to a timeout, do:
timerHeight= setTimeout(enlargeBoxHeight.bind(someText),1);
Which should work as expected.

Javascript or Python: Newline after each sentence

I'm curious if there's a library for python OR javascript to tokenize sentences of a string of sentences and put new line at each sentence?
IE:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum aliquet leo in urna hendrerit placerat. Donec adipiscing dignissim adipiscing. Duis adipiscing mollis cursus. Etiam fringilla elit nec enim sagittis a auctor nisi gravida. Nunc sollicitudin, leo sit amet consequat pharetra, mi orci vestibulum mi, a suscipit odio tellus tincidunt erat. Suspendisse a consequat turpis. Morbi eget ante leo, a dignissim mi.
to
Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n
Vestibulum aliquet leo in urna hendrerit placerat.\n
Donec adipiscing dignissim adipiscing. \n
Duis adipiscing mollis cursus. Etiam fringilla elit nec enim sagittis a auctor nisi gravida. Nunc sollicitudin, leo sit amet consequat pharetra, mi orci vestibulum mi, a suscipit odio tellus tincidunt erat. \n
Suspendisse a consequat turpis. \n
Morbi eget ante leo, a dignissim mi.
You are looking for a natural language library.
For Python there is Natural Language Toolkit (NLTK). For example you could take a look at the PunktSentenceTokenizer.
The PunktSentenceTokenizer divides a text into a list of sentences, by using an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences. It must be trained on a large collection of plaintext in the taret language before it can be used. The algorithm for this tokenizer is described in Kiss & Strunk (2006):
Kiss, Tibor and Strunk, Jan (2006): Unsupervised Multilingual Sentence
Boundary Detection. Computational Linguistics 32: 485-525.
The NLTK data package includes a pre-trained Punkt tokenizer for English.
In Python, use str.replace():
>>> s = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum aliquet leo in urna hendrerit placerat. Donec adipiscing dignissim adipiscing. Duis adipiscing mollis cursus. Etiam fringilla elit nec enim sagittis a auctor nisi gravida. Nunc sollicitudin, leo sit amet consequat pharetra, mi orci vestibulum mi, a suscipit odio tellus tincidunt erat. Suspendisse a consequat turpis. Morbi eget ante leo, a dignissim mi."
>>> print s.replace('. ', '.\n')
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Vestibulum aliquet leo in urna hendrerit placerat.
Donec adipiscing dignissim adipiscing.
Duis adipiscing mollis cursus.
Etiam fringilla elit nec enim sagittis a auctor nisi gravida.
Nunc sollicitudin, leo sit amet consequat pharetra, mi orci vestibulum mi, a suscipit odio tellus tincidunt erat.
Suspendisse a consequat turpis.
Morbi eget ante leo, a dignissim mi.
Also, you make be interested in the textwrap module.
If you're just looking for javascript that would do that, you could do this:
var str = "Lorem ipsum 4.00 dolor sit amet, consectetur adipiscing elit. Vestibulum aliquet leo in urna hendrerit placerat. Donec adipiscing dignissim adipiscing. Duis adipiscing mollis cursus. Etiam fringilla elit nec enim sagittis a auctor nisi gravida. Nunc etc.... sollicitudin, leo sit amet consequat pharetra, mi orci vestibulum mi, a suscipit odio tellus tincidunt erat. Suspendisse a consequat turpis. Morbi eget ante leo, a dignissim mi."
str = str.replace(/(\S\.)\s*([A-Z])/g, "$1\n$2");
You can see it work here: http://jsfiddle.net/jfriend00/NR5Nc/.
This particular algorithm only adds a newline if it's a non white space followed by a period followed by whitespace followed by a capital letter. So, it's safe from things like $4.00 and etc... which don't actually end lines. It's also flexible on the amount of whitespace between lines.

Regex insert commas before spaces

How could you insert N number of commas into this string, before a space but not after a period or another comma? Using ruby or javascript.
One option:
>>> var str = "Lorem ipsum dolor sit amet consectetur adipiscing elit. Praesent mauris neque adipiscing nec malesuada id fermentum at eros. Curabitur eu neque nunc, et porta risus.";
>>> str.replace(/([^,.]) /g, '$1, ');
"Lorem, ipsum, dolor, sit, amet, consectetur, adipiscing, elit. Praesent, mauris, neque, adipiscing, nec, malesuada, id, fermentum, at, eros. Curabitur, eu, neque, nunc, et, porta, risus."
Alternatively, you can go another way in order to mimick negative lookbehind:
>>> var str = "Lorem ipsum dolor sit amet consectetur adipiscing elit. Praesent mauris neque adipiscing nec malesuada id fermentum at eros. Curabitur eu neque nunc, et porta risus.";
>>> str.replace(/([,.])? /g, function($0, $1) { return $1 ? $0 : ', '; });
"Lorem, ipsum, dolor, sit, amet, consectetur, adipiscing, elit. Praesent, mauris, neque, adipiscing, nec, malesuada, id, fermentum, at, eros. Curabitur, eu, neque, nunc, et, porta, risus."
Ruby variation of #jensgram's answer:
str.gsub(/([^,.]) /, $1 + ', ')

How can I select the first word of every line of a block of text?

I'm trying to select each first word, to wrap it in a specific span.
Lorem ipsum dolor sit amet,
consectetur adipiscing elit. Cras
sagittis nunc non nisi venenatis
auctor. Aliquam consectetur pretium
sapien, eget congue purus egestas nec.
Maecenas sed purus ut turpis varius
dictum. Praesent a nunc ipsum, id
mattis odio. Donec rhoncus posuere
bibendum. Fusce nulla elit, laoreet
non posuere.
If this is the text, the script should select Lorem, Aliquam, varius and nulla.
You can do this, by using JavaScript to wrap every word in the paragraph in its own span, and then walking through the spans finding out what their actual position on the page is, and then applying your style changes to the spans whose Y position is greater than the preceding span. (Best do it beginning-to-end, though, as earlier ones may well affect the wrapping of latter ones.) But it's going to be a lot of work for the browser, and you'll have to repeat it each time the window is resized, so the effect will have to be worth the cost.
Something like this (used jQuery as you've listed the jquery tag on your question):
jQuery(function($) {
var lasty;
var $target = $('#target');
$target.html(
"<span>" +
$target.text().split(/\s/).join("</span> <span>") +
"</span>");
lasty = -1;
$target.find('span').each(function() {
var $this = $(this),
top = $this.position().top;
if (top > lasty) {
$this.css("fontWeight", "bold");
lasty = top;
}
});
});
<div id='target' style='width: 20em'>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras sagittis nunc non nisi venenatis auctor. Aliquam consectetur pretium sapien, eget congue purus egestas nec. Maecenas sed purus ut turpis varius dictum. Praesent a nunc ipsum, id mattis odio. Donec rhoncus posuere bibendum. Fusce nulla elit, laoreet non posuere.</div>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script>
Naturally that's making a huge set of assumptions (that all whitespace should be replaced with a single space, that there's no markup in the text, probably others). But you get the idea.
Here's a version that handles window resize, 50ms after the last resize event occurs (so we're not doing it interim) and with Gaby's suggestion (below) that we unbold at the start of the resize:
jQuery(function($) {
var resizeTriggerHandle = 0;
// Do it on load
boldFirstWord('#target');
// Do it 100ms after the end of a resize operation,
// because it's *expensive*
$(window).resize(function() {
if (resizeTriggerHandle != 0) {
clearTimeout(resizeTriggerHandle);
}
unboldFirstWord('#target');
resizeTriggerHandle = setTimeout(function() {
resizeTriggerHandle = 0;
boldFirstWord('#target');
}, 50);
});
function boldFirstWord(selector) {
var lasty;
// Break into spans if not already done;
// if already done, remove any previous bold
var $target = $(selector);
if (!$target.data('spanned')) {
$target.html(
"<span>" +
$target.text().split(/\s/).join("</span> <span>") +
"</span>");
$target.data('spanned', true);
}
else {
unboldFirstWord($target);
}
// Apply bold to first span of each new line
lasty = -1;
$target.find('span').each(function() {
var $this = $(this),
top = $this.position().top;
if (top > lasty) {
$this.css("fontWeight", "bold");
lasty = top;
}
});
$target.data('bolded', true);
}
function unboldFirstWord(selector) {
var $target = selector.jquery ? selector : $(selector);
if ($target.data('spanned') && $target.data('bolded')) {
$target.find('span').css("fontWeight", "normal");
$target.data('bolded', false);
}
}
});
<div id='target'>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras sagittis nunc non nisi venenatis auctor. Aliquam consectetur pretium sapien, eget congue purus egestas nec. Maecenas sed purus ut turpis varius dictum. Praesent a nunc ipsum, id mattis odio. Donec rhoncus posuere bibendum. Fusce nulla elit, laoreet non posuere. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras sagittis nunc non nisi venenatis auctor. Aliquam consectetur pretium sapien, eget congue purus egestas nec. Maecenas sed purus ut turpis varius dictum. Praesent a nunc ipsum, id mattis odio. Donec rhoncus posuere bibendum. Fusce nulla elit, laoreet non posuere. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras sagittis nunc non nisi venenatis auctor. Aliquam consectetur pretium sapien, eget congue purus egestas nec. Maecenas sed purus ut turpis varius dictum. Praesent a nunc ipsum, id mattis odio. Donec rhoncus posuere bibendum. Fusce nulla elit, laoreet non posuere. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras sagittis nunc non nisi venenatis auctor. Aliquam consectetur pretium sapien, eget congue purus egestas nec. Maecenas sed purus ut turpis varius dictum. Praesent a nunc ipsum, id mattis odio. Donec rhoncus posuere bibendum. Fusce nulla elit, laoreet non posuere. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras sagittis nunc non nisi venenatis auctor. Aliquam consectetur pretium sapien, eget congue purus egestas nec. Maecenas sed purus ut turpis varius dictum. Praesent a nunc ipsum, id mattis odio. Donec rhoncus posuere bibendum. Fusce nulla elit, laoreet non posuere.</div>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script>
Try this:
$(function() {
$('p').each(function() {
var text_splited = $(this).text().split(" ");
$(this).html("<strong>"+text_splited.shift()+"</strong> "+text_splited.join(" "));
});
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js"></script>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras sagittis nunc non nisi venenatis auctor.</p>
<p>Aliquam consectetur pretium sapien, eget congue purus egestas nec. Maecenas sed purus ut turpis varius dictum.</p>
<p>Praesent a nunc ipsum, id mattis odio. Donec rhoncus posuere bibendum. Fusce nulla elit, laoreet non posuere.</p>
To bold every first word of a <p> tag, including whitespace after the initial <p>, use some regular expressions:
$('p').each(function(){
var me = $(this);
me.html( me.text().replace(/(^\w+|\s+\w+)/,'<strong>$1</strong>') );
});

Categories

Resources