JavaScript htmlentities French - javascript

I have a .NET MVC page with a list of items that each have
<%: %> encoded descriptions in the rel.
I want to be able to search for all items with a rel that contains my search query.
One of the fields has a value with htmlentities rel='Décoration'
I type "Décoration" in the search box, let jQuery search for all elements that have a 'rel' attribute that contains (indexOf != -1) that value:
no results!
Why? because Décoration != Décoration.
What would be the best solution to compare these two? (Has to work for all special accented characters, not just é)
P.S. (I tried escape/unescape on both sides, also tried the trick to append it to a div and then read it as text, this replaces dangerous stuff, but doesn't replace é (it doesn't have to because it's valid in utf-8 anyway))

Since the é and like are html entities, you can set the html content of a temporary div with the garbled string, and retrive the decoded string using the text content of the element. The browser will do the decoding work for you.
Using jQuery :
function searchInRel(needle) {
return $('[rel]').filter(function(i,e) {
var decodedText = $('<div/>').html(e.attr('rel')).text();
return (decodedText.indexOf(needle) != -1);
};
}
Using just the DOM :
function decodeEntities(text) {
var tempDiv = document.getElementById('tempDiv');
tempDiv.innerHTML = text;
return tempDiv.textContent;
}

If you serve your pages with UTF-8 encoding, you won't need to use entities for all the accented characters. Problem solved.

You can decode the html entities.
Just copy the two javascript methods from HERE
var decoded = 'Décoration';
var entity = html_entity_decode('Décoration');
console.log(decoded == entity);

Related

How do I count all the image tags within an HTML string?

I have a string of HTML produced by a WYSIWYG editor. I am using Angular for my project.
Let's say I have some stringified HTML like this:
'<p>Here is some text with an image</p><br><img src="data:base64;{ a base64 string }"/>'
How can I parse this string on the browser-side and count all the <img> elements?
Do I need to use regex or is there an NPM package that I can use on the browser-side that will do this for me?
Note: I don't want to render this HTML in the browser. I just want to count the image tags for validation purposes.
With DOMParser, you can create a document from the string and use querySelectorAll to select and count them:
const str = '<p>Here is some text with an image</p><br><img src="data:base64;{ a base64 string }"/>';
const doc = new DOMParser().parseFromString(str, 'text/html');
const imgs = doc.querySelectorAll('img');
console.log(imgs.length);
I'd say the fastest and simplest way is to consider that any html element <img> must always start with <img. You can then just search the number of occurrences. This also supports malformed html such as <iMg
var msg = `<p>Here is some text with an image</p><br>
<img src="data:base64;{ a base64 string }"/>
<iMg src="" />`
const n = msg.match(/<img/gim).length
console.log(n) // 2
CertainPerformance's answer works, however, you can also use javascript's built-in match function (because it looks like you are looking for a regex-type solution):
var str = '<p>Here is some text with an image</p><br><img src="data:base64;{ a base64 string }"/>';
numMatches = str.match(/<img/igm);
if (numMatches != null) {
numMatches = numMatches.length;
} else {
numMatches = 0;
}
//The string "<img" is needed because a text can have the string "img", but angle brackets are specially reserved for HTML tags. Also, this prevents the matching of </img>, in case there is a closing tag (though there typically isn't)
console.log(numMatches);

How to display special html characters properly via javascript

I'm using javascript to get some asp.net server variables to display them, problem is that if the have some html special character the string isn't being assigned as it's on server and it displays wrong.
For example the string :
`ALBERTO GÓMEZ SÁNCHEZ`
is displaying like
`ALBERTO GóMEZ SáNCHEZ`
I know I could use a Replace function but doing that for every possible special html character seems too time consuming... I guess there must be some built-in function that solves that easily but I cannot find it or an easier method than trying to replace every possible html special character.
Do you know any way? Thanks for your help.
If you want to decode html string use this way:
function decodeHTMLEntities (str) {
if(str && typeof str === 'string') {
// strip script/html tags
str = str.replace(/<script[^>]*>([\S\s]*?)<\/script>/gmi, '');
str = str.replace(/<\/?\w(?:[^"'>]|"[^"]*"|'[^']*')*>/gmi, '');
element.innerHTML = str;
str = element.textContent;
element.textContent = '';
}
return str;
}
Taken from here: HTML Entity Decode
If you want do put this html string into your DOM, you don't need to decode it, the browser will do this job for you.
Just insert it like this:
$("body").html(encodedHtmlStringFromServer);

Is there any way for me to work with this 100,000 item new-line separated string of words?

I've got a 100,000+ long list of English words in plain text. I want to use split() to convert the list into an array, which I can then convert to an associative array, giving each list item a key equal to its own name, so I can very efficiently check whether or not a string is an English word.
Here's the problem:
The list is new-line separated.
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
This means that var list = ' <copy/paste list> ' isn't going to work, because JavaScript quotes don't work multi-line.
Is there any way for me to work with this 100,000 item new-line separated string?
replace the newlines with commas in any texteditor before copying to your js file
One workaround would be to use paste the list into notepad++. Then select all and Edit>Line Operations>Join lines.
This removes new lines and replaces them with spaces.
If you're doing this client side, you can use jQuery's get function to get the words from a text file and do the processing there:
jQuery.get('wordlist.txt', function(results){
//Do your processing on results here
});
If you're doing this in Node.js, follow the guide here to see how to read a file into memory.
You can use notepad++ or any semi-advanced text editor.
Go to notepad++ and push Ctrl+H to bring up the Replace dialog.
Towards the bottom, select the "Extended" Search Mode
You want to find "\r\n" and replace it with ", "
This will remove the newlines and replace it with commas
jsfiddle Demo
Addressing this purely from having a string and trying to work with it in JavaScript through copy paste. Specifically the issues regarding, "This means that var list = ' ' isn't going to work, because JavaScript quotes don't work multi-line.", and "Is there any way for me to work with this 100,000 item new-line separated string?".
You can treat the string like a string in a comment in JavaScript . Although counter-intuitive, this is an interesting approach. Here is the main function
function convertComment(c) {
return c.toString().
replace(/^[^\/]+\/\*!?/, '').
replace(/\*\/[^\/]+$/, '');
}
It can be used in your situation as follows:
var s = convertComment(function() {
/*
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
*/
});
At which point you may do whatever you like with s. The demo simply places it into a div for displaying.
jsFiddle Demo
Further, here is an example of taking the list of words, getting them into an array, and then referencing a single word in the array.
//previously shown code
var all = s.match(/[^\r\n]+/g);
var rand = parseInt(Math.random() * all.length);
document.getElementById("random").innerHTML = "Random index #"+rand+": "+all[rand];
If the words are in a separate file, you can load them directly into the page and go from there. I've used a script element with a MIME type that should mean browsers ignore the content (provided it's in the head):
<script type="text/plain" id="wordlist">
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
</script>
<script>
var words = (function() {
var words = '\n' + document.getElementById('wordlist').textContent + '\n';
return {
checkWord: function (word) {
return words.indexOf('\n' + word + '\n') != -1;
}
}
}());
console.log(words.checkWord('aaliis')); // true
console.log(words.checkWord('ahh')); // false
</script>
The result is an object with one method, checkWord, that has access to the word list in a closure. You could add more methods like addWord or addVariant, whatever.
Note that textContent may not be supported in all browsers, you may need to feature detect and use innerText or an alternative for some.
For variety, another solution is to put the unaltered content into
A data attribute - HTML attributes can contain newlines
or a "non-script" script - eg. <SCRIPT TYPE="text/x-wordlist">
or an HTML comment node
or another hidden element that allows content
Then the content could be read and split/parsed. Since this would be done outside of JavaScript's string literal parsing it doesn't have the issue regarding embedded newlines.

Removing non-break-spaces in JavaScript

I am having trouble removing spaces from a string. First I am converting the div to text(); to remove the tags (which works) and then I'm trying to remove the "&nbsp" part of the string, but it won't work. Any Idea what I'm doing wrong.
newStr = $('#myDiv').text();
newStr = newStr.replace(/ /g, '');
$('#myText').val(newStr);
<html>
<div id = "myDiv"><p>remove space</p></div>
<input type = "text" id = "myText" />
</html>
When you use the text function, you're not getting HTML, but text: the entities have been changed to spaces.
So simply replace spaces:
var str = " a     b   ", // bunch of NBSPs
newStr = str.replace(/\s/g,'');
console.log(newStr)
If you want to replace only the spaces coming from do the replacement before the conversion to text:
newStr = $($('#myDiv').html().replace(/ /g,'')).text();
.text()/textContent do not contain HTML entities (such as ), these are returned as literal characters. Here's a regular expression using the non-breaking space Unicode escape sequence:
var newStr = $('#myDiv').text().replace(/\u00A0/g, '');
$('#myText').val(newStr);
Demo
It is also possible to use a literal non-breaking space character instead of the escape sequence in the Regex, however I find the escape sequence more clear in this case. Nothing that a comment wouldn't solve, though.
It is also possible to use .html()/innerHTML to retrieve the HTML containing HTML entities, as in #Dystroy's answer.
Below is my original answer, where I've misinterpreted OP's use case. I'll leave it here in case anyone needs to remove from DOM elements' text content
[...] However, be aware that re-setting the .html()/innerHTML of an element means trashing out all of the listeners and data associated with it.
So here's a recursive solution that only alters the text content of text nodes, without reparsing HTML nor any side effects.
function removeNbsp($el) {
$el.contents().each(function() {
if (this.nodeType === 3) {
this.nodeValue = this.nodeValue.replace(/\u00A0/g, '');
} else {
removeNbsp( $(this) );
}
});
}
removeNbsp( $('#myDiv') );
Demo

Remove and extract text in javascript

I'm wanting to do the following in JavaScript as efficiently as possible:
Remove <ul></ul> tags from a string and everything in between.
For what remains, every string that is encased within <li> and </li> I want dumped in an array, without any newline characters lurking at the end.
I'm thinking regexes are the answer but I've never used them before. Guess I could figure out a way but eventually it would probably not be the most efficient.
As others have said, you do have to be careful parsing HTML with regexes. If the HTML is controlled and does not have nested ul or li tags in it and doesn't have embedded strings that contain valid HTML tags or < or > chars (e.g. the HTML is coming from a known source in a known format, it can work fine). Here's one way to do what I think you were asking for:
function parseList(str) {
var output = [], matches;
var re = /<\s*li[^>]*>(.*?)<\/li>/gi;
// remove newlines
str = str.replace(/\n|\r/igm, "");
// get text between ul tags
matches = str.match(/<\s*ul[^>]*>(.*?)<\/ul\s*>/);
if (matches) {
str = matches[1];
// get text between each li tag
while (matches = re.exec(str)) {
output.push(matches[1]);
}
}
return(output);
}
It is more foolproof to use an actual HTML parser that understands the finer points of the format (like nested tags, tag values in embedded strings, etc...), but if you have none of that, a simpler parser like this can be used.
You can see it work here: http://jsfiddle.net/jfriend00/c9ZLT/

Categories

Resources