javascript replace each occurrence of a character when they occur consecutively - javascript

so I want to replace each '+' in this string to a space ' '
EST++++++++++%5E0+90310++162
So the output I want is:
EST %5E0 90310 162
I've tried this:
var l = l.replace(/\+/g, " ");
Which works alittle except when they occur consecutively, it replaces all the consecutive +'s with a single space.
So I'm getting this instead of what I want:
EST %5E0 90310 162

My psychic powers tell me that you are actually getting multiple spaces just fine, but you are displaying it as HTML and there (as explained here) consecutive whitespace is collapsed to one space.
EDIT: In fact, it appears that exactly this happened to your question itself when you posted it, and caused some confusion in this thread ;)
If you want to keep the whitespace, either replace it with a non-breaking space ( in HTML - but this will modify the value of the string) or display it in a different way which preserves whitespace, for example inside a <pre> element of by using the CSS property white-space: pre; on the containing element.
See this example:
var value = 'EST++++++++++%5E0+90310++162'.replace(/\+/g, " ");
document.getElementById('element1').innerHTML = value;
document.getElementById('element2').innerHTML = value;
<p>
<span id="element1"></span>
</p>
<p>
<pre id="element2"></pre>
</p>
(Or, if you are assigning the content using .innerHTML like I did in my snippet, the solution could be as simple as changing to .innerText. But I don't know where you use this code exactly so it can be that this solution doesn't apply.)

Working fine for me, maybe the way you are outputting the value is trimming extra spaces?
var l = 'EST++++++++++%5E0+90310++162'
//'EST %5E0 90310 162'
l = l.replace(/\+/g, " ");
console.log(l);

Related

IE11 innerHTML strange behaviour

I have very strange behaviour with element.innerHTML in IE11.
As you can see there: http://pe281.s3.amazonaws.com/index.html, some riotjs expressions are not evaluated.
I've tracked it down to 2 things:
- the euro sign above it. It's encoded as €, but I have the same behaviour with \u20AC or €. It happens with all characters in the currency symbols range, and some other ranges. Removing or using a standard character does not cause the issue.
- The way riotjs creates a custom tag and template. Basically it does this:
var html = "{reward.amount.toLocaleString()}<span>€</span>{moment(expiracyDate).format('DD/MM/YYYY')}";
var e = document.createElement('div');
e.innerHTML = html;
In the resulting e node, e.childNodes returns the following array:
[0]: {reward.amount.toLocaleString()}
[1]: <span>€</span>
[2]: {
[3]: moment(expiracyDate).format('DD/MM/YYYY')}
Obviously nodes 2 and 3 should be only one. Have them split makes riot not recognizing an expression to evaluate, hence the issue.
But there's more: The problem is not consistent, and for instance cannot be reproduced on a fiddle: https://jsfiddle.net/5wg3zxk5/4/, where the html string is correctly parsed.
So I guess my question is how can some specific characters change the way element.innerHTML parses its input? How can it be solved?
.childNodes is a generated array (...well NodeList) that is filled with ELEMENT_NODE but may also be filled with: ATTRIBUTE_NODE, TEXT_NODE, CDATA_SECTION_NODE, ENTITY_REFERENCE_NODE, ENTITY_NODE, PROCESSING_INSTRUCTION_NODE, COMMENT_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE, DOCUMENT_FRAGMENT_NODE, NOTATION_NODE, ...
You probably want only nodes from the type: ELEMENT_NODE (div and such..) and maybe also TEXT_NODE.
Use a simple loop to keep just those nodes with .nodeType === Element.ELEMENT_NODE (or just compare it to its enum which is 1).
You can also just use the much more simpler alternative of .children.
Replace <br> with <br /> (they are self-closing tags). IE is trying to close the tags for you. That's why you have doubled br tags
I think it should be something like this:
var html = {reward.amount.toLocaleString()} + "€<br>" +{moment(expiracyDate).format('DD/MM/YYYY')} + " <br>";
var e = document.createElement('div');
e.innerHTML = html;
The stuff I removed from the quotes seem to be variables or other stuff, and not a string, so it should not be in quotes.

Is there any way for me to work with this 100,000 item new-line separated string of words?

I've got a 100,000+ long list of English words in plain text. I want to use split() to convert the list into an array, which I can then convert to an associative array, giving each list item a key equal to its own name, so I can very efficiently check whether or not a string is an English word.
Here's the problem:
The list is new-line separated.
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
This means that var list = ' <copy/paste list> ' isn't going to work, because JavaScript quotes don't work multi-line.
Is there any way for me to work with this 100,000 item new-line separated string?
replace the newlines with commas in any texteditor before copying to your js file
One workaround would be to use paste the list into notepad++. Then select all and Edit>Line Operations>Join lines.
This removes new lines and replaces them with spaces.
If you're doing this client side, you can use jQuery's get function to get the words from a text file and do the processing there:
jQuery.get('wordlist.txt', function(results){
//Do your processing on results here
});
If you're doing this in Node.js, follow the guide here to see how to read a file into memory.
You can use notepad++ or any semi-advanced text editor.
Go to notepad++ and push Ctrl+H to bring up the Replace dialog.
Towards the bottom, select the "Extended" Search Mode
You want to find "\r\n" and replace it with ", "
This will remove the newlines and replace it with commas
jsfiddle Demo
Addressing this purely from having a string and trying to work with it in JavaScript through copy paste. Specifically the issues regarding, "This means that var list = ' ' isn't going to work, because JavaScript quotes don't work multi-line.", and "Is there any way for me to work with this 100,000 item new-line separated string?".
You can treat the string like a string in a comment in JavaScript . Although counter-intuitive, this is an interesting approach. Here is the main function
function convertComment(c) {
return c.toString().
replace(/^[^\/]+\/\*!?/, '').
replace(/\*\/[^\/]+$/, '');
}
It can be used in your situation as follows:
var s = convertComment(function() {
/*
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
*/
});
At which point you may do whatever you like with s. The demo simply places it into a div for displaying.
jsFiddle Demo
Further, here is an example of taking the list of words, getting them into an array, and then referencing a single word in the array.
//previously shown code
var all = s.match(/[^\r\n]+/g);
var rand = parseInt(Math.random() * all.length);
document.getElementById("random").innerHTML = "Random index #"+rand+": "+all[rand];
If the words are in a separate file, you can load them directly into the page and go from there. I've used a script element with a MIME type that should mean browsers ignore the content (provided it's in the head):
<script type="text/plain" id="wordlist">
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
</script>
<script>
var words = (function() {
var words = '\n' + document.getElementById('wordlist').textContent + '\n';
return {
checkWord: function (word) {
return words.indexOf('\n' + word + '\n') != -1;
}
}
}());
console.log(words.checkWord('aaliis')); // true
console.log(words.checkWord('ahh')); // false
</script>
The result is an object with one method, checkWord, that has access to the word list in a closure. You could add more methods like addWord or addVariant, whatever.
Note that textContent may not be supported in all browsers, you may need to feature detect and use innerText or an alternative for some.
For variety, another solution is to put the unaltered content into
A data attribute - HTML attributes can contain newlines
or a "non-script" script - eg. <SCRIPT TYPE="text/x-wordlist">
or an HTML comment node
or another hidden element that allows content
Then the content could be read and split/parsed. Since this would be done outside of JavaScript's string literal parsing it doesn't have the issue regarding embedded newlines.

Match text not inside span tags

Using Javascript, I'm trying to wrap span tags around certain text on the page, but I don't want to wrap tags around text already inside a set of span tags.
Currently I'm using:
html = $('#container').html();
var regex = /([\s| ]*)(apple)([\s| ]*)/g;
html = html.replace(regex, '$1<span class="highlight">$2</span>$3');
It works but if it's used on the same string twice or if the string appears in another string later, for example 'a bunch of apples' then later 'apples', I end up with this:
<span class="highlight">a bunch of <span class="highlight">apples</span></span>
I don't want it to replace 'apples' the second time because it's already inside span tags.
It should match 'apples' here:
Red apples are my <span class="highlight">favourite fruit.</span>
But not here:
<span class="highlight">Red apples are my favourite fruit.</span>
I've tried using this but it doesn't work:
([\s| ]*)(apples).*(?!</span)
Any help would be appreciated. Thank you.
First off, you should know that parsing html with regex is generally considered to be a bad idea—a Dom parser is usually recommended. With this disclaimer, I will show you a simple regex solution.
This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."
We can solve it with a beautifully-simple regex:
<span.*?<\/span>|(\bapples\b)
The left side of the alternation | matches complete <span... /span> tags. We will ignore these matches. The right side matches and captures apples to Group 1, and we know they are the right ones because they were not matched by the expression on the left.
This program shows how to use the regex (see the results in the right pane of the online demo). Please note that in the demo I replaced with [span] instead of <span> so that the result would show in the browser (which interprets the html):
var subject = 'Red apples are my <span class="highlight">favourite apples.</span>';
var regex = /<span.*?<\/span>|(\bapples\b)/g;
replaced = subject.replace(regex, function(m, group1) {
if (group1 == "" ) return m;
else return "<span class=\"highlight\">" + group1 + "</span>";
});
document.write("<br>*** Replacements ***<br>");
document.write(replaced);
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
Article about matching a pattern unless...

JavaScript remove ZERO WIDTH SPACE (unicode 8203) from string

I'm writing some javascript that processes website content. My efforts are being thwarted by SharePoint text editor's tendency to put the "zero width space" character in the text when the user presses backspace.
The character's unicode value is 8203, or B200 in hexadecimal. I've tried to use the default "replace" function to get rid of it. I've tried many variants, none of them worked:
var a = "o​m"; //the invisible character is between o and m
var b = a.replace(/\u8203/g,'');
= a.replace(/\uB200/g,'');
= a.replace("\\uB200",'');
and so on and so forth. I've tried quite a few variations on this theme. None of these expressions work (tested in Chrome and Firefox) The only thing that works is typing the actual character in the expression:
var b = a.replace("​",''); //it's there, believe me
This poses potential problems. The character is invisible so that line in itself doesn't make sense. I can get around that with comments. But if the code is ever reused, and the file is saved using non-Unicode encoding, (or when it's deployed to SharePoint, there's not guarantee it won't mess up encoding) it will stop working. Is there a way to write this using the unicode notation instead of the character itself?
[My ramblings about the character]
In case you haven't met this character, (and you probably haven't, seeing as it's invisible to the naked eye, unless it broke your code and you discovered it while trying to locate the bug) it's a real a-hole that will cause certain types of pattern matching to malfunction. I've caged the beast for you:
[​] <- careful, don't let it escape.
If you want to see it, copy those brackets into a text editor and then iterate your cursor through them. You'll notice you'll need three steps to pass what seems like 2 characters, and your cursor will skip a step in the middle.
The number in a unicode escape should be in hex, and the hex for 8203 is 200B (which is indeed a Unicode zero-width space), so:
var b = a.replace(/\u200B/g,'');
Live Example:
var a = "o​m"; //the invisible character is between o and m
var b = a.replace(/\u200B/g,'');
console.log("a.length = " + a.length); // 3
console.log("a === 'om'? " + (a === 'om')); // false
console.log("b.length = " + b.length); // 2
console.log("b === 'om'? " + (b === 'om')); // true
The accepted answer didn't work for my case.
But this one did:
text.replace(/(^[\s\u200b]*|[\s\u200b]*$)/g, '')

Trying to use Javascript regex to grab a section of "&" delimited text whether or not it's the last value

My text looks similar to this:
action=addItem&siteId=4&lang_locale=en_US&country=US&catalogId=1&productId=417689&displaySize=7&skuSize=2194171&qty=1&pil=7&psh=had+AIRJRnjbp7+rGivIKg00
and I want to replace the value of 'psh'. It may sometimes not be the last value (it may be followed by &something=else).
I've tried doing these lines of code:
var text = text.replace(/&psh=.*(?=&|$)/, "&psh=" + data.psh);
var text = text.replace(/&psh=.*(?=[&|$]+)/, "&psh=" + data.psh);
var text = text.replace(/(?:&psh=)(.*)(?=[&|$]+)/, data.psh);
None of them work for both situations. Use this site to check regexes.
This should work:
var text = text.replace(/&psh=[^&]*/, "&psh=" + data.psh);
[^&]* matches a string of any length that consists of any characters except &, therefore the match will continue until the end of the string or until (but not including) the next &, whichever comes first.
Tim's answer may work, but I fear it is not the best possible answer. The string you are giving as an example looks a lot like a url. If it is, that means there can sometimes be a pound sign in it as well (#). To compensate for that you actually need to modify your code to look like this:
var text = text.replace(/&psh=[^&#]*/, "&psh=" + data.psh);
Notice the # which was added in order to not get tripped up by anchor tags in the url.

Categories

Resources