Javascript Apostrophe Decoding

Javascript Apostrophe Decoding - javascript

I have a string javascript message like this one :
var message = "merci d'ajouter";
And I want this text to be converted into this one (decoding) :
var result = "merci d'ajouter";
I don't want any replace method, i want a general javascript solution working for every caracter encoded. Thanks in advance

This is actually possible in native JavaScript
Heep in mind that IE8 and earlier do not support textContent, so we will have to use innerText for them.
function decode(string) {
var div = document.createElement("div");
div.innerHTML = string;
return typeof div.textContent !== 'undefined' ? div.textContent : div.innerText;
}
var testString = document.getElementById("test-string");
var decodeButton = document.getElementById("decode-button");
var decodedString = document.getElementById("decoded-string");
var encodedString = "merci d'ajouter";
decodeButton.addEventListener("click", function() {
decodedString.innerHTML = decode(encodedString);
});
<h1>Decode this html</h1>
<p id="test-string"></p>
<input type=button id="decode-button" value="Decode HTML"/>
<p id="decoded-string"></p>
An easier solution would be to use the Underscore.js library. This is a fantastic library that provides you with a lot of additional functionality.
Underscore provides an _unescape(string) function
The opposite of escape, replaces &, <, >, ", ` and ' with their unescaped counterparts.
_.unescape('Zebras, Elephants & Penguins');
=> "Zebras, Elephants & Penguins"

Related

Javascript: Convert chars like ' to UTF8 [duplicate]

This question already has answers here:
Unescape HTML entities in JavaScript?
(33 answers)
Closed 5 years ago.
How do I encode and decode HTML entities using JavaScript or JQuery?
var varTitle = "Chris&apos; corner";
I want it to be:
var varTitle = "Chris' corner";

I recommend against using the jQuery code that was accepted as the answer. While it does not insert the string to decode into the page, it does cause things such as scripts and HTML elements to get created. This is way more code than we need. Instead, I suggest using a safer, more optimized function.
var decodeEntities = (function() {
// this prevents any overhead from creating the object each time
var element = document.createElement('div');
function decodeHTMLEntities (str) {
if(str && typeof str === 'string') {
// strip script/html tags
str = str.replace(/<script[^>]*>([\S\s]*?)<\/script>/gmi, '');
str = str.replace(/<\/?\w(?:[^"'>]|"[^"]*"|'[^']*')*>/gmi, '');
element.innerHTML = str;
str = element.textContent;
element.textContent = '';
}
return str;
}
return decodeHTMLEntities;
})();
http://jsfiddle.net/LYteC/4/
To use this function, just call decodeEntities("&") and it will use the same underlying techniques as the jQuery version will—but without jQuery's overhead, and after sanitizing the HTML tags in the input. See Mike Samuel's comment on the accepted answer for how to filter out HTML tags.
This function can be easily used as a jQuery plugin by adding the following line in your project.
jQuery.decodeEntities = decodeEntities;

You could try something like:
var Title = $('<textarea />').html("Chris&apos; corner").text();
console.log(Title);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
JS Fiddle.
A more interactive version:
$('form').submit(function() {
var theString = $('#string').val();
var varTitle = $('<textarea />').html(theString).text();
$('#output').text(varTitle);
return false;
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<form action="#" method="post">
<fieldset>
<label for="string">Enter a html-encoded string to decode</label>
<input type="text" name="string" id="string" />
</fieldset>
<fieldset>
<input type="submit" value="decode" />
</fieldset>
</form>
<div id="output"></div>
JS Fiddle.

Like Robert K said, don't use jQuery.html().text() to decode html entities as it's unsafe because user input should never have access to the DOM. Read about XSS for why this is unsafe.
Instead try the Underscore.js utility-belt library which comes with escape and unescape methods:
_.escape(string)
Escapes a string for insertion into HTML, replacing &, <, >, ", `, and ' characters.
_.escape('Curly, Larry & Moe');
=> "Curly, Larry & Moe"
_.unescape(string)
The opposite of escape, replaces &, <, >, ", ` and ' with their unescaped counterparts.
_.unescape('Curly, Larry & Moe');
=> "Curly, Larry & Moe"
To support decoding more characters, just copy the Underscore unescape method and add more characters to the map.

Original author answer here.
This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.
function decodeHtml(html) {
var txt = document.createElement("textarea");
txt.innerHTML = html;
return txt.value;
}
Example: http://jsfiddle.net/k65s3/
Input:
Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>
Output:
Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>

Here's a quick method that doesn't require creating a div, and decodes the "most common" HTML escaped chars:
function decodeHTMLEntities(text) {
var entities = [
['amp', '&'],
['apos', '\''],
['#x27', '\''],
['#x2F', '/'],
['#39', '\''],
['#47', '/'],
['lt', '<'],
['gt', '>'],
['nbsp', ' '],
['quot', '"']
];
for (var i = 0, max = entities.length; i < max; ++i)
text = text.replace(new RegExp('&'+entities[i][0]+';', 'g'), entities[i][1]);
return text;
}

here is another version:
function convertHTMLEntity(text){
const span = document.createElement('span');
return text
.replace(/&[#A-Za-z0-9]+;/gi, (entity,position,text)=> {
span.innerHTML = entity;
return span.innerText;
});
}
console.log(convertHTMLEntity('Large < £ 500'));

Inspired by Robert K's solution, this version does not strip HTML tags, and is just as secure.
var decode_entities = (function() {
// Remove HTML Entities
var element = document.createElement('div');
function decode_HTML_entities (str) {
if(str && typeof str === 'string') {
// Escape HTML before decoding for HTML Entities
str = escape(str).replace(/%26/g,'&').replace(/%23/g,'#').replace(/%3B/g,';');
element.innerHTML = str;
if(element.innerText){
str = element.innerText;
element.innerText = '';
}else{
// Firefox support
str = element.textContent;
element.textContent = '';
}
}
return unescape(str);
}
return decode_HTML_entities;
})();

jQuery provides a way to encode and decode html entities.
If you use a "<div/>" tag, it will strip out all the html.
function htmlDecode(value) {
return $("<div/>").html(value).text();
}
function htmlEncode(value) {
return $('<div/>').text(value).html();
}
If you use a "<textarea/>" tag, it will preserve the html tags.
function htmlDecode(value) {
return $("<textarea/>").html(value).text();
}
function htmlEncode(value) {
return $('<textarea/>').text(value).html();
}

To add yet another "inspired by Robert K" to the list, here is another safe version which does not strip HTML tags. Instead of running the whole string through the HTML parser, it pulls out only the entities and converts those.
var decodeEntities = (function() {
// this prevents any overhead from creating the object each time
var element = document.createElement('div');
// regular expression matching HTML entities
var entity = /&(?:#x[a-f0-9]+|#[0-9]+|[a-z0-9]+);?/ig;
return function decodeHTMLEntities(str) {
// find and replace all the html entities
str = str.replace(entity, function(m) {
element.innerHTML = m;
return element.textContent;
});
// reset the value
element.textContent = '';
return str;
}
})();

Inspired by Robert K's solution, strips html tags and prevents executing scripts and eventhandlers like: <img src=fake onerror="prompt(1)">
Tested on latest Chrome, FF, IE (should work from IE9, but haven't tested).
var decodeEntities = (function () {
//create a new html document (doesn't execute script tags in child elements)
var doc = document.implementation.createHTMLDocument("");
var element = doc.createElement('div');
function getText(str) {
element.innerHTML = str;
str = element.textContent;
element.textContent = '';
return str;
}
function decodeHTMLEntities(str) {
if (str && typeof str === 'string') {
var x = getText(str);
while (str !== x) {
str = x;
x = getText(x);
}
return x;
}
}
return decodeHTMLEntities;
})();
Simply call:
decodeEntities('<img src=fake onerror="prompt(1)">');
decodeEntities("<script>alert('aaa!')</script>");

Here is a full version
function htmldecode(s){
window.HTML_ESC_MAP = {
"nbsp":" ","iexcl":"¡","cent":"¢","pound":"£","curren":"¤","yen":"¥","brvbar":"¦","sect":"§","uml":"¨","copy":"©","ordf":"ª","laquo":"«","not":"¬","reg":"®","macr":"¯","deg":"°","plusmn":"±","sup2":"²","sup3":"³","acute":"´","micro":"µ","para":"¶","middot":"·","cedil":"¸","sup1":"¹","ordm":"º","raquo":"»","frac14":"¼","frac12":"½","frac34":"¾","iquest":"¿","Agrave":"À","Aacute":"Á","Acirc":"Â","Atilde":"Ã","Auml":"Ä","Aring":"Å","AElig":"Æ","Ccedil":"Ç","Egrave":"È","Eacute":"É","Ecirc":"Ê","Euml":"Ë","Igrave":"Ì","Iacute":"Í","Icirc":"Î","Iuml":"Ï","ETH":"Ð","Ntilde":"Ñ","Ograve":"Ò","Oacute":"Ó","Ocirc":"Ô","Otilde":"Õ","Ouml":"Ö","times":"×","Oslash":"Ø","Ugrave":"Ù","Uacute":"Ú","Ucirc":"Û","Uuml":"Ü","Yacute":"Ý","THORN":"Þ","szlig":"ß","agrave":"à","aacute":"á","acirc":"â","atilde":"ã","auml":"ä","aring":"å","aelig":"æ","ccedil":"ç","egrave":"è","eacute":"é","ecirc":"ê","euml":"ë","igrave":"ì","iacute":"í","icirc":"î","iuml":"ï","eth":"ð","ntilde":"ñ","ograve":"ò","oacute":"ó","ocirc":"ô","otilde":"õ","ouml":"ö","divide":"÷","oslash":"ø","ugrave":"ù","uacute":"ú","ucirc":"û","uuml":"ü","yacute":"ý","thorn":"þ","yuml":"ÿ","fnof":"ƒ","Alpha":"Α","Beta":"Β","Gamma":"Γ","Delta":"Δ","Epsilon":"Ε","Zeta":"Ζ","Eta":"Η","Theta":"Θ","Iota":"Ι","Kappa":"Κ","Lambda":"Λ","Mu":"Μ","Nu":"Ν","Xi":"Ξ","Omicron":"Ο","Pi":"Π","Rho":"Ρ","Sigma":"Σ","Tau":"Τ","Upsilon":"Υ","Phi":"Φ","Chi":"Χ","Psi":"Ψ","Omega":"Ω","alpha":"α","beta":"β","gamma":"γ","delta":"δ","epsilon":"ε","zeta":"ζ","eta":"η","theta":"θ","iota":"ι","kappa":"κ","lambda":"λ","mu":"μ","nu":"ν","xi":"ξ","omicron":"ο","pi":"π","rho":"ρ","sigmaf":"ς","sigma":"σ","tau":"τ","upsilon":"υ","phi":"φ","chi":"χ","psi":"ψ","omega":"ω","thetasym":"ϑ","upsih":"ϒ","piv":"ϖ","bull":"•","hellip":"…","prime":"′","Prime":"″","oline":"‾","frasl":"⁄","weierp":"℘","image":"ℑ","real":"ℜ","trade":"™","alefsym":"ℵ","larr":"←","uarr":"↑","rarr":"→","darr":"↓","harr":"↔","crarr":"↵","lArr":"⇐","uArr":"⇑","rArr":"⇒","dArr":"⇓","hArr":"⇔","forall":"∀","part":"∂","exist":"∃","empty":"∅","nabla":"∇","isin":"∈","notin":"∉","ni":"∋","prod":"∏","sum":"∑","minus":"−","lowast":"∗","radic":"√","prop":"∝","infin":"∞","ang":"∠","and":"∧","or":"∨","cap":"∩","cup":"∪","int":"∫","there4":"∴","sim":"∼","cong":"≅","asymp":"≈","ne":"≠","equiv":"≡","le":"≤","ge":"≥","sub":"⊂","sup":"⊃","nsub":"⊄","sube":"⊆","supe":"⊇","oplus":"⊕","otimes":"⊗","perp":"⊥","sdot":"⋅","lceil":"⌈","rceil":"⌉","lfloor":"⌊","rfloor":"⌋","lang":"〈","rang":"〉","loz":"◊","spades":"♠","clubs":"♣","hearts":"♥","diams":"♦","\"":"quot","amp":"&","lt":"<","gt":">","OElig":"Œ","oelig":"œ","Scaron":"Š","scaron":"š","Yuml":"Ÿ","circ":"ˆ","tilde":"˜","ndash":"–","mdash":"—","lsquo":"‘","rsquo":"’","sbquo":"‚","ldquo":"“","rdquo":"”","bdquo":"„","dagger":"†","Dagger":"‡","permil":"‰","lsaquo":"‹","rsaquo":"›","euro":"€"};
if(!window.HTML_ESC_MAP_EXP)
window.HTML_ESC_MAP_EXP = new RegExp("&("+Object.keys(HTML_ESC_MAP).join("|")+");","g");
return s?s.replace(window.HTML_ESC_MAP_EXP,function(x){
return HTML_ESC_MAP[x.substring(1,x.length-1)]||x;
}):s;
}
Usage
htmldecode("∑ >€");

Injecting untrusted HTML into the page is dangerous as explained in How to decode HTML entities using jQuery?.
One alternative is to use a JavaScript-only implementation of PHP's html_entity_decode (from http://phpjs.org/functions/html_entity_decode:424). The example would then be something like:
var varTitle = html_entity_decode("Chris&apos; corner");

A more functional approach to #William Lahti's answer:
var entities = {
'amp': '&',
'apos': '\'',
'#x27': '\'',
'#x2F': '/',
'#39': '\'',
'#47': '/',
'lt': '<',
'gt': '>',
'nbsp': ' ',
'quot': '"'
}
function decodeHTMLEntities (text) {
return text.replace(/&([^;]+);/gm, function (match, entity) {
return entities[entity] || match
})
}

I know I'm a bit late to the game, but I thought I might provide the following snippet as an example of how I decode HTML entities using jQuery:
var varTitleE = "Chris&apos; corner";
var varTitleD = $("<div/>").html(varTitleE).text();
console.log(varTitleE + " vs. " + varTitleD);
Don't forget to fire-up your inspector/firebug to see the console results -- or simply replace console.log(...) w/alert(...)
That said, here's what my console via the Google Chrome inspector read:
Chris&apos; corner vs. Chris' corner

Because #Robert K and #mattcasey both have good code, I thought I'd contribute here with a CoffeeScript version, in case anyone in the future could use it:
String::unescape = (strict = false) ->
###
# Take escaped text, and return the unescaped version
#
# #param string str | String to be used
# #param bool strict | Stict mode will remove all HTML
#
# Test it here:
# https://jsfiddle.net/tigerhawkvok/t9pn1dn5/
#
# Code: https://gist.github.com/tigerhawkvok/285b8631ed6ebef4446d
###
# Create a dummy element
element = document.createElement("div")
decodeHTMLEntities = (str) ->
if str? and typeof str is "string"
unless strict is true
# escape HTML tags
str = escape(str).replace(/%26/g,'&').replace(/%23/g,'#').replace(/%3B/g,';')
else
str = str.replace(/<script[^>]*>([\S\s]*?)<\/script>/gmi, '')
str = str.replace(/<\/?\w(?:[^"'>]|"[^"]*"|'[^']*')*>/gmi, '')
element.innerHTML = str
if element.innerText
# Do we support innerText?
str = element.innerText
element.innerText = ""
else
# Firefox
str = element.textContent
element.textContent = ""
unescape(str)
# Remove encoded or double-encoded tags
fixHtmlEncodings = (string) ->
string = string.replace(/\&#/mg, '&#') # The rest, for double-encodings
string = string.replace(/\"/mg, '"')
string = string.replace(/\&quote;/mg, '"')
string = string.replace(/\_/mg, '_')
string = string.replace(/\'/mg, "'")
string = string.replace(/\"/mg, '"')
string = string.replace(/\>/mg, '>')
string = string.replace(/\</mg, '<')
string
# Run it
tmp = fixHtmlEncodings(this)
decodeHTMLEntities(tmp)
See https://jsfiddle.net/tigerhawkvok/t9pn1dn5/7/ or https://gist.github.com/tigerhawkvok/285b8631ed6ebef4446d (includes compiled JS, and is probably updated compared to this answer)

To do it in pure javascript without jquery or predefining everything you can cycle the encoded html string through an elements innerHTML and innerText(/textContent) properties for every decode step that is required:
<html>
<head>
<title>For every decode step, cycle through innerHTML and innerText </title>
<script>
function decode(str) {
var d = document.createElement("div");
d.innerHTML = str;
return typeof d.innerText !== 'undefined' ? d.innerText : d.textContent;
}
</script>
</head>
<body>
<script>
var encodedString = "<p>name</p><p><span style=\"font-size:xx-small;\">ajde</span></p><p><em>da</em></p>";
</script>
<input type=button onclick="document.body.innerHTML=decode(encodedString)"/>
</body>
</html>

I think that is the exact opposite of the solution chosen.
var decoded = $("<div/>").text(encodedStr).html();
Try it :)

Javascript string replace() method malfunctioning for dots and commas

I want to replace a text by using the user input values but for the below script dots and commas are malfunctioning when replacing. I tried (/\x/) method but it's not working, maybe because it's a value. So, how can I execute output more accurately?
function myFunction() {
var str = document.getElementById("text").value;
var x = new RegExp(document.getElementById("x").value, "g");
var y = document.getElementById("y").value;
var txt = str.replace(x, y);
document.getElementById("newText").innerHTML = txt;
}
function reset() {
document.getElementById("text").value = "";
}
example:
text = ..........a.a.a..a..a..aaaaaa..a.a.
x = ..a
y = B
output = ........B.BBBBBaaB.a.
but output should be
........B.a.aBBBaaaaaB.a.
(Sorry for the unprofessional example...)
I am just now learning JS and not a professional and I'm trying to make a replacer web page using JS like in MS Notepad where you can press ctrl+H and replace any word or letter.

You're looking for RegExp.escape, unlucky for you - the smart people at the JavaScript technical committee decided to postpone its inclusion in the standard because of an edge case you, or anyone else will likely never run into.
if(!RegExp.escape){
RegExp.escape = function(s){
return String(s).replace(/[\\^$*+?.()|[\]{}]/g, '\\$&');
};
}
Then, you can call it on a value and it'll escape it for use in new RegExp:
var raw = document.getElementById("x").value;
var x = new RegExp(RegExp.escape(raw), "g");

You want this regex - [.]{2}[a] or [.][.][a].
Two dots mandatory with trailing a. And it should be of 3 characters.

Javascript replace tag but preserve content

Say i have a text like this:
This should also be extracted, <strong>text</strong>
I need the text only from the entire string, I have tried this:
r = r.replace(/<strong[\s\S]*?>[\s\S]*?<\/strong>/g, "$1"); but failed (strong is still there). Is there any proper way to do this?
Expected Result
This should also be extracted, text
Solution:
To target specific tag I used this:
r = r.replace(/<strong\b[^>]*>([^<>]*)<\/strong>/i, "**$1**")

To parse HTML, you need an HTML parser. See this answer for why.
If you just want to remove <strong> and </strong> from the text, you don't need parsing, but of course simplistic solutions tend to fail, which is why you need an HTML parser to parse HTML. Here's a simplistic solution that removes <strong> and </strong>:
str = str.replace(/<\/?strong>/g, "")
var yourString = "This should also be extracted, <strong>text</strong>";
yourString = yourString.replace(/<\/?strong>/g, "")
display(yourString);
function display(msg) {
// Show a message, making sure any HTML tags show
// as text
var p = document.createElement('p');
p.innerHTML = msg.replace(/&/g, "&").replace(/</g, "<");
document.body.appendChild(p);
}
Back to parsing: In your case, you can easily do it with the browser's parser, if you're on a browser:
var yourString = "This should also be extracted, <strong>text</strong>";
var div = document.createElement('div');
div.innerHTML = yourString;
display(div.innerText || div.textContent);
function display(msg) {
// Show a message, making sure any HTML tags show
// as text
var p = document.createElement('p');
p.innerHTML = msg.replace(/&/g, "&").replace(/</g, "<");
document.body.appendChild(p);
}
Most browsers provide innerText; Firefox provides textContent, which is why there's that || there.
In a non-browser environment, you'll want some kind of DOM library (there are lots of them).

You can do this
var r = "This should also be extracted, <strong>text</strong>";
r = r.replace(/<(.+?)>([^<]+)<\/\1>/,"$2");
console.log(r);
I have just included some strict regex. But if you want relaxed version, you can very well do
r = r.replace(/<.+?>/g,"");

Is there any method to unescape '>' to '>' in JavaScript?

i want to escape some HTML in JavaScript. How can I do that?

I often use the following function to decode HTML Entities:
function htmlDecode(input){
var e = document.createElement('div');
e.innerHTML = input;
return e.childNodes[0].nodeValue;
}
htmlDecode('<>'); // "<>"
Simple, cross-browser and works with all the HTML 4 Character Entities.

You could create a dummy textarea, set its innerHTML to your escaped html [the html with >s] and use the textarea.value
var ta = document.createElement('textarea');
ta.innerHTML = ">";
alert(ta.value);
... had to use this on a CMS once [although when i used it, it was bad practice]

Regex: how to get contents from tag inner (use javascript)?

page contents:
aa<b>1;2'3</b>hh<b>aaa</b>..
.<b>bbb</b>
blabla..
i want to get result:
1;2'3aaabbb
match tag is <b> and </b>
how to write this regex using javascript?
thanks!

Lazyanno,
If and only if:
you have read SLaks's post (as well as the previous article he links to), and
you fully understand the numerous and wondrous ways in which extracting information from HTML using regular expressions can break, and
you are confident that none of the concerns apply in your case (e.g. you can guarantee that your input will never contain nested, mismatched etc. <b>/</b> tags or occurrences of <b> or </b> within <script>...</script> or comment <!-- .. --> tags, etc.)
you absolutely and positively want to proceed with regular expression extraction
...then use:
var str = "aa<b>1;2'3</b>hh<b>aaa</b>..\n.<b>bbb</b>\nblabla..";
var match, result = "", regex = /<b>(.*?)<\/b>/ig;
while (match = regex.exec(str)) { result += match[1]; }
alert(result);
Produces:
1;2'3aaabbb

You cannot parse HTML using regular expressions.
Instead, you should use Javascript's DOM.
For example (using jQuery):
var text = "";
$('<div>' + htmlSource + '</div>')
.find('b')
.each(function() { text += $(this).text(); });
I wrap the HTML in a <div> tag to find both nested and non-nested <b> elements.

Here is an example without a jQuery dependency:
// get all elements with a certain tag name
var b = document.getElementsByTagName("B");
// map() executes a function on each array member and
// builds a new array from the function results...
var text = b.map( function(element) {
// ...in this case we are interested in the element text
if (typeof element.textContent != "undefined")
return element.textContent; // standards compliant browsers
else
return element.innerText; // IE
});
// now that we have an array of strings, we can join it
var result = text.join('');

var regex = /(<([^>]+)>)/ig;
var bdy="aa<b>1;2'3</b>hh<b>aaa</b>..\n.<b>bbb</b>\nblabla..";
var result =bdy.replace(regex, "");
alert(result) ;
See : http://jsfiddle.net/abdennour/gJ64g/

Just use '?' character after the generating pattern for your inner text if you want to use Regular experssions.
for example:
".*" to "(.*?)"

Develop Reference

JavaScript is the programming language of the Web.

Javascript Apostrophe Decoding - javascript

Related

Javascript: Convert chars like ' to UTF8 [duplicate]

Javascript string replace() method malfunctioning for dots and commas

Javascript replace tag but preserve content

Is there any method to unescape '>' to '>' in JavaScript?

Regex: how to get contents from tag inner (use javascript)?

Categories

Resources