IE11 innerHTML strange behaviour - javascript

I have very strange behaviour with element.innerHTML in IE11.
As you can see there: http://pe281.s3.amazonaws.com/index.html, some riotjs expressions are not evaluated.
I've tracked it down to 2 things:
- the euro sign above it. It's encoded as €, but I have the same behaviour with \u20AC or €. It happens with all characters in the currency symbols range, and some other ranges. Removing or using a standard character does not cause the issue.
- The way riotjs creates a custom tag and template. Basically it does this:
var html = "{reward.amount.toLocaleString()}<span>€</span>{moment(expiracyDate).format('DD/MM/YYYY')}";
var e = document.createElement('div');
e.innerHTML = html;
In the resulting e node, e.childNodes returns the following array:
[0]: {reward.amount.toLocaleString()}
[1]: <span>€</span>
[2]: {
[3]: moment(expiracyDate).format('DD/MM/YYYY')}
Obviously nodes 2 and 3 should be only one. Have them split makes riot not recognizing an expression to evaluate, hence the issue.
But there's more: The problem is not consistent, and for instance cannot be reproduced on a fiddle: https://jsfiddle.net/5wg3zxk5/4/, where the html string is correctly parsed.
So I guess my question is how can some specific characters change the way element.innerHTML parses its input? How can it be solved?

.childNodes is a generated array (...well NodeList) that is filled with ELEMENT_NODE but may also be filled with: ATTRIBUTE_NODE, TEXT_NODE, CDATA_SECTION_NODE, ENTITY_REFERENCE_NODE, ENTITY_NODE, PROCESSING_INSTRUCTION_NODE, COMMENT_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE, DOCUMENT_FRAGMENT_NODE, NOTATION_NODE, ...
You probably want only nodes from the type: ELEMENT_NODE (div and such..) and maybe also TEXT_NODE.
Use a simple loop to keep just those nodes with .nodeType === Element.ELEMENT_NODE (or just compare it to its enum which is 1).
You can also just use the much more simpler alternative of .children.

Replace <br> with <br /> (they are self-closing tags). IE is trying to close the tags for you. That's why you have doubled br tags

I think it should be something like this:
var html = {reward.amount.toLocaleString()} + "€<br>" +{moment(expiracyDate).format('DD/MM/YYYY')} + " <br>";
var e = document.createElement('div');
e.innerHTML = html;
The stuff I removed from the quotes seem to be variables or other stuff, and not a string, so it should not be in quotes.

Related

Regex to find a specific string that is not in a HTML attribute

My case is: I have a string with HTML elements:
This is a text and "specific_string"
I need a Regex to match only the one that is not in a HTML attribute.
This is my current Regex, it works but it gives a false positive when the string is wrapped by double quotes
((?!\"[\w\s]*)specific_string(?![\w\s]*\"))
I have tried the following Regex:
((?!\"[\w\s]*)specific_string(?![\w\s]*\"))
It works but it gives a false positive when the string is wrapped by double quotes
if you want to get what's inside the tag you might be trying to use the split() tool; to cut the string every >" or "<" basically like this:
let string = "<a href='something+specific_string' title='testing'>This is a text and 'specific_string'</a>";
string = string.split('>');
string = string[1].split('<');
console.log(string)
So, when you want to manipulate it, just use position 0 of the string. Is not regex like u wnat, but is an idea
Though it can suffice in simple cases, you should know it's often said that RegExp is ill-suited for parsing HTML, and depending on environment you could be better off using more robust techniques. (There's http://htmlparsing.com/ dedicated to the topic but yet it doesn't discuss JS.)
That said, the following works in Chrome 107 and Node 16.13.
(s=>s.match(/(?<=>[^<]*|^[^<]*)specific_string/))
('This is a text and "specific_string"')
It uses look-behind. In lieu of that you could use /(>[^<]*|^[^<]*)(specific_string)/ and compensate index/lengths to get the position of a match...
As you answer in a comment that you'll replace in user-provided HTML, I encourage you to consider security implications (namely XSS).
Back on the topic of parsing HTML w/o RegExp we obviously have the techniques in a web browser and I couldn't stop myself writing a quick and dirty textNode replacer in web JS, working in Chrome 107:
((html, fun) => {
const el = document.createElement('body')
el.innerHTML = html
const X = new XPathEvaluator, R = X.evaluate('//*[text()]', el)
const A = []; for (let n; n = R.iterateNext();) A.push(n) // mutating el while iterating XPathResult is illegal
for (let n of A) fun(n)
return el.innerHTML})
('This is a text and "specific_string"',
n => n.innerHTML = n.innerHTML
.replace(/specific_string/, '<b>replaced</b>'))

Tokenize HTML string in JavaScript [duplicate]

This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 5 years ago.
I would like to split a string that looks like:
This is <strong>a</strong> test link and <br /> line. break
into the following with JavaScript:
[
'This',
'is',
'<strong>a</strong>',
'test',
'link',
'<br />',
'line.',
]
I tried splitting on spaces, and < >, but that obviously doesn't work for tags like strong and a. I'm not sure how to write a regex that doesn't split within HTML tags. I also tried to use jQuery children(), but it doesn't extract plain text, just the html tags. Any help would be great.
If the code is executing in a browser, using the browser's parser to separate the string into text and tag components may provide an alternative workaround:
var text = 'This is <strong>a</strong> link and <br /> line. break'
function splitHTML( text) {
var parts = [];
var div = document.createElement('DIV');
div.innerHTML = text;
div.normalize();
for( var node = div.firstChild; node; node=node.nextSibling) {
if( node.nodeType == Node.TEXT_NODE) {
parts.push.apply( parts, node.textContent.split(" "));
}
else if( node.nodeType == Node.ELEMENT_NODE) {
parts.push( node.outerHTML);
}
}
return parts;
}
console.log( splitHTML( text));
Note the line that adds text nodes split by spaces to the result
parts.push.apply( parts, node.textContent.split(" "));
is for demonstration and needs further work to prevent zero length strings in the ouput for spaces between text and html tagged elements. Also the html tags are reconstructed from the DOM element and may not exactly match the input: in this case the XHTML tags <br \> are returned as <br> HTML tags (which don't take a closing tag).
The general idea is to side step parsing html using a regex by parsing it with the browser. Understandably this may or may not fit with the target environment and a full set of requirements.
To achieve what you want, you need to consider this:
Rule 1) if no "<" occurred yet, simply split at " ".
Rule 2) if "<" occurred, look for "/>" or "/"..">" and split after it, then start at rule 1 again.
Apply those rules while looping through a string and you are golden.
Making this recursive, i.E. nested tags like
<div>
<p>Hi</p>
<p>Bye</p>
</div>
is harder. As mentioned above, actually parsing a html tree is very complex.
Try this:
#(?:(?!<)[^<>]+(?!>))|(?:<(?=[^/>]+\/>).*\/>)|(?:<([^\s]+).*>.*(?=<\/\1>)<\/\1>)#g
It should work in simple cases, All that I can thik of right now.
Use captured group to find out TAG name, then execute it recursivly for block elements as div.

javascript replace each occurrence of a character when they occur consecutively

so I want to replace each '+' in this string to a space ' '
EST++++++++++%5E0+90310++162
So the output I want is:
EST %5E0 90310 162
I've tried this:
var l = l.replace(/\+/g, " ");
Which works alittle except when they occur consecutively, it replaces all the consecutive +'s with a single space.
So I'm getting this instead of what I want:
EST %5E0 90310 162
My psychic powers tell me that you are actually getting multiple spaces just fine, but you are displaying it as HTML and there (as explained here) consecutive whitespace is collapsed to one space.
EDIT: In fact, it appears that exactly this happened to your question itself when you posted it, and caused some confusion in this thread ;)
If you want to keep the whitespace, either replace it with a non-breaking space ( in HTML - but this will modify the value of the string) or display it in a different way which preserves whitespace, for example inside a <pre> element of by using the CSS property white-space: pre; on the containing element.
See this example:
var value = 'EST++++++++++%5E0+90310++162'.replace(/\+/g, " ");
document.getElementById('element1').innerHTML = value;
document.getElementById('element2').innerHTML = value;
<p>
<span id="element1"></span>
</p>
<p>
<pre id="element2"></pre>
</p>
(Or, if you are assigning the content using .innerHTML like I did in my snippet, the solution could be as simple as changing to .innerText. But I don't know where you use this code exactly so it can be that this solution doesn't apply.)
Working fine for me, maybe the way you are outputting the value is trimming extra spaces?
var l = 'EST++++++++++%5E0+90310++162'
//'EST %5E0 90310 162'
l = l.replace(/\+/g, " ");
console.log(l);

Replace with RegExp only outside tags in the string

I have a strings where some html tags could present, like
this is a nice day for bowling <b>bbbb</b>
how can I replace with RegExp all b symbols, for example, with :blablabla: (for example) but ONLY outside html tags?
So in that case the resulting string should become
this is a nice day for :blablabla:owling <b>bbbb</b>
EDIT: I would like to be more specific, based on the answers I have received. So first of all I have just a string, not DOM element, or anything else. The string may or may not contain tags (opening and closing). The main idea is to be able to replace anywhere in the text except inside tags. For example if I have a string like
not feeling well today :/ check out this link http://example.com
the regexp should replace only first :/ with real smiley image, but should not replace second and third, because they are inside (and part of) tag. Here's an example snippet using the regexp from one of the answer.
var s = 'not feeling well today :/ check out this link http://example.com';
var replaced = s.replace(/(?:<[^\/]*?.*?<\/.*?>)|(:\/)/g, "smiley_image_here");
document.querySelector("pre").textContent = replaced;
<pre></pre>
It is strange but the DEMO shows that it captured the correct group, but the same regexp in replace function seem not to be working.
The regex itself to replace all bs with :blablabla: is not that hard:
.replace(/b/g, ":blablabla:")
It is a bit tricky to get the text nodes where we need to perform search and replace.
Here is a DOM-based example:
function replaceTextOutsideTags(input) {
var doc = document.createDocumentFragment();
var wrapper = document.createElement('myelt');
wrapper.innerHTML = input;
doc.appendChild( wrapper );
return textNodesUnder(doc);
}
function textNodesUnder(el){
var n, walk=document.createTreeWalker(el,NodeFilter.SHOW_TEXT,null,false);
while(n=walk.nextNode())
{
if (n.parentNode.nodeName.toLowerCase() === 'myelt')
n.nodeValue = n.nodeValue.replace(/:\/(?!\/)/g, "smiley_here");
}
return el.firstChild.innerHTML;
}
var s = 'not feeling well today :/ check out this link http://example.com';
console.log(replaceTextOutsideTags(s));
Here, we only modify the text nodes that are direct children of the custom-created element named myelt.
Result:
not feeling well today smiley_here check out this link http://example.com
var input = "this is a nice day for bowling <b>bbbb</b>";
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:')
+ c;
});
document.querySelector("pre").textContent = result;
<pre></pre>
You can do this:
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:') // you may do something else here
+ c;
});
Note that in most (no all but most) real complex use cases, it's much more convenient to manipulate a parsed DOM rather than just a string. If you're starting with a HTML page, you might use a library (some, like my one, accept regexes to do so).
I think you can use a regex like this : (Just for a simple data not a nested one)
/<[^\/]*?b.*?<\/.*?>|(b)/ig
[Regex Demo]
If you wanna use a regex I can suggest you use below regex to remove all tags recursively until all tags removed:
/<[^\/][^<]*>[^<]*<\/.*?>/g
then use a replace for finding any b.

regex replace characters within tags

I'm already using a html parser, but I need to create a regex that will select the < and > symbols within the first instance of <code> tags - in this case, the one with the class "html".
<code class="html">
<b>test</b><script>lol</script>
<code>test</code> <b> test </b>
<lol>
</lol>
<test>
</code>
So every < or > within the indented area starting from <b> to the start of the last </code> should be replaced, leaving the outer <code> tags alone.
I'm using javascript's .replace method and would like all < and > symbols within the code area to turn into ascii < and >.
I imagine its best to use a look forward/back regex using $1 etc. but can't figure out where to begin, so any help would be much appreciated.
How about something like this? In this example I'm creating a variable and populating the variable with html, just to get things started
var doc = document.createElement( 'div' );
doc.innerHTML = ---your input html here
Here I'm pulling the code tag
var string = doc.getElementsByTagName( 'code' ).innerHTML;
Once you have the string then simply replace the desired brackets with
var string = string .replace(/[<]/, "<)
var string = string .replace(/[>]/, ">)
then just reinsert the replaced value back into your source html
The easy way:
var elem = $('.html');
elem.text(elem.html());
This will not necessarily use literally < for escaping; if you're fine with a different escape, it's much simpler than anything else you can do, though.
If you have multiple elements like that, you might need to wrap the second line in an elem.each(); otherwise the html() method will probably just concatenate the content from all elements or something similarly pointless.

Categories

Resources