Dumb quotes into smart quotes only for text not HTML code - javascript

I’m transforming dumb quotes into smart quotes in a contenteditable but the problem is that it also replaces them inside HTML elements like:
<a href=“something” title=“something”
thus making them invalid. I want to only do it for user’s text. Here’s the catch. I have to keep the original formatting elements, so I can’t do something like:
clean($('#something_container').text());
This would remove all HTML elements (formatting) when returned. Here’s the code that I have:
content = clean($('#post_content').html());
$('#post_content').html(content);
// replaces ", ', --, <div> with <p>
function clean(html) {
html = html.replace(/'\b/g, "\u2018") // opening singles
.replace(/\b'/g, "\u2019") // closing singles
.replace(/"\b/g, "\u201c") // opening doubles
.replace(/\b"/g, "\u201d") // closing doubles
.replace(/--/g, "\u2014") // em-dashes
.replace(/<div>/g, "<p>") //<div> to <p>
.replace(/<\/div>/g, "</p>"); //</div> to </p>
return html;
};
What would be the best (most efficient) way to replace dumb quotes only in user’s text and skip the HTML tags like <img src="" />? Thanks!

Here’s a possible approach (don’t know about efficiency, but if you only handle strings typed in by users by hand, they probably won’t be very long, so it shouldn’t matter):
split your string into non-overlapping chunks: HTML tags vs. the rest
“educate quotes” only in the non-tags, leaving the tags alone
put the string back together
If the HTML you’re dealing with is well-formed (in particular, if there’s no "<" floating around), the splitting into chunks is easy:
var html = '<p style="color:red">some "quotes" in here</p>'
var chunks = html.match(/(<.+?>|[^<]+)/g)
// returns Array: ['<p style="color:red">', 'some "quotes" in here', '</p>']
Then, given your clean() function that handles the replacements, you can say:
cleaned = chunks.map(function(chunk){
return /</.test(chunk) ? chunk : clean(chunk)
}).join('');
to apply your replacements anywhere except between < and >.

Related

 appearing in textarea elements but not in string

I am working on an autocomplete used inside a textarea. I know there is some autocompletes already created, but anyway.
It works well, but if when I'm typing something and I select one or many characters and delete it, a  appears at the end of my string (or where I was inside it). I tried to replace it while retrieving my html with replaceAll, but it doesn't work (There is not this special char when I use an indexOf). The problem is he doesn't find any result because of this char. Let's see an exemple :
This is my array (a little bit cut but we don't really care)
let array = [{
name: "test",
value: "I'm a test value"
},
{
name: "valueorange",
value: "I'm just an orange"
},
// This is how I get the contents of my span (I tried both innerHTML and innerText, same results).
// Same while using .text() or .html() with jquery
let value = jqElement.find("#searching-span")[0].innerHTML.substring(1).toLowerCase();
value = value.replaceAll(" ", " ");
value = value.replaceAll("", "");
I can replace every without any problems. Finally I check with a loop if there is some value with indexOf on each value, and if it returns anything I push it and get it in a new array. But when I have  I have no results.
Any idea how I can resolve it ?
I tried to be clear, I hope my english wasn't so bad, sorry if I made many mistakes !
Character entities and HTML escaped characters like and  appearing in HTML source code are converted by the HTML parser into unicode characters like \u00a0 and \ufeff before being inserted into the DOM.
If replacing them in JavaScript, use their unicode characters, not HTML escape sequences, to match them in DOM strings. For example:
p.textContent = p.textContent.replaceAll("\ufeff", '*'); // zwj
p.textContent = p.textContent.replaceAll("\xa0", '-'); // nbsp
<p id="p">   </p>
Note that zero width joiners are uses a lot in emoji character sequences and arbitrarily removing may break emoji character decoding (although decoding badly formed emoji strings is almost a prerequisite for handling emojis in the wild).
Second note: I am not suggesting this as a means of circumventing badly decoding characters that have been encoded using a Unicode Transform Format. Making sure decoding is performed correctly is always a better option.

Javascript with Special Chartecter

I have a html page in which I need to pass a String variable to javascript function. This works until String does not have a special charecter.
<html>
<head>
<script>
function test(v){
alert(v);
}
</script>
</head>
<body>
<input type="button" value="Test Button" onClick="test('BlahBlah')"/>
</body>
</html>
As soon as I change onClick like below, it stops working.
onClick="test('Blah'Blah')"
Any solution for this problem. Please take a note parameter which is being passed to JavaScript function is dynamic.Source of Parameter is backend and I cannot change that peice of code. Second thing even if put escape it still does not work. My problem is I have to retian the special charecter for some processing at backend
There are two layers to this:
The content of onClick attributes, like all attributes, is HTML text. That means that any character that's special in HTML (like <) must be replaced with an HTML entity (e.g., <). Additionally, if you use double quotes around the attribute value, any double quotes within the value must be replaced with entities ("); if you used single quotes around the attribute, you'd need to replace ' with &apos;.
Your attribute contains a JavaScript string literal. That means that any characters that are special inside JavaScript string literals must be escaped according to the JavaScript rules. Since you've used single quotes to delimit the JavaScript string, for instance, you have to escape any single quotes in the string with a backslash.
I'm assuming that HTML is generated server-side. If so, the work above must be done server-side, when building the HTML of the page. You haven't said what server-side tech you're using, so it's hard to point you at solutions that your server-side tech/environment might provide.
In the simple case of your
onClick="test('Blah'Blah')"
...you just need to add the backslash within the JavaScript string
onClick="test('Blah\'Blah')"
...but that's just that one specific case.
The dramatically simpler option is to not put JavaScript code in attribute values. Instead, use modern techniques (addEventListener, attachEvent) to hook up JavaScript code.
But if you must use an onClick attribute, avoid having text in it (or deal with the complexities above); have it call a function defined in a script element that then has the text, as you then have only the one layer (#2 above) to deal with.
Source of Parameter is backend and I cannot change that peice of code.
That backend is broken and needs fixing.
If:
the backend is only producing invalid JavaScript code (not invalid HTML)
and the code consists of a single function call
and the code is always a single function call
and the function call always has a single string literal argument
and that argument is always delimited with single quotes
and the single quotes within the string are never correctly escaped
...we might be able to salvage it client-side. But my guess is that the backend will also produce invalid HTML, for instance when the text has a " in it. (We can't do anything about that, because the attribute value will be chopped off at that point.)
But let's keep a good thought: Given the ridiculous list of caveats above, this might do it:
var elm = document.getElementById("the-div");
var code = elm.getAttribute("onclick");
var m = code.match(/^([^(]+)\('(.*)'\)$/);
if (m) {
code = m[1] + "('" + m[2].replace(/'/g, "\\'") + "')";
}
elm.setAttribute("onclick", code);
Live Example:
function foo(str) {
alert(str);
}
var elm = document.getElementById("the-div");
var code = elm.getAttribute("onclick");
var m = code.match(/^([^(]+)\('(.*)'\)$/);
if (m) {
code = m[1] + "('" + m[2].replace(/'/g, "\\'") + "')";
}
elm.setAttribute("onclick", code);
<div id="the-div" onclick="foo('blah'blah')">Click me</div>
Well this is an very common problem you wanted to add single quotes inside single quotes to do this you have to escape that Sigle quotes to do that you have to put an forward slash.
onClick="test('Blah\'Blah')"

Havoc while escaping quotes in Javascript

I am trying to build a custom HTML/Javascript command using the following Javascript (for now, "dialogText" contains the name of a vegetable, but it may later contain HTML tags too):
str = str + "<span onClick=showDialog('"+dialogText+"')>";
When dialogText is only one word long (i.e. "Basil"), this works correctly, giving the following result:
<span onclick="showDialog('Basil')">
But when dialogText includes more than one word (i.e. "Beet root"), this fails. The result is syntactically invalid and generates a Javascript error:
<span onclick="showDialog('Beet" root')="">
Why does this happen (where did the equals sign come from?)?
And how can I change the code so that it works?
You aren't looking at the HTML you are generating, you are converting that HTML to a DOM and then serializing it back to HTML.
Since you have a " as data in the attribute value, but haven't represented it as a character reference ("), and the value is limited with " - the " ends the attribute.
You then start a new attribute.
Since the next attribute doesn't have a value, it gets assigned an empty string when the browser attempts to error correct.
In general, avoid mashing strings together to generate HTML for conversion to a DOM. Use DOM methods directly instead.
var span = document.createElement('span');
span.addEventListener('click', function (event) {
showDialog(dialogText);
});

Is there a javascript function that converts characters to the &code; equivalent?

I have text that was created created with CKeditor, and it seems to be inserting where spaces should be. And it appears to do similar conversions with >, <, &, etc. which is fine, except, when I make a DOMSelection, those codes are removed.
So, this is what is selected:
beforeHatchery (2)
But this is what is actually in the DOM:
beforeHatchery (2)
note that I outputted the selection and the original text stored in the database using variable.inspect, so all the quotes are escaped (they wouldn't be when sent to the browser).
To save everyone the pain of looking for the difference:
From the first: Hatchery</a> (2) (The Selection)
From the second: Hatchery</a> (2) (The original)
These differences are at the very end of the selection.
So... there are three ways, I can see of, to approach this.
1) - Replace all characters commonly replaced with codes with their codes,
and hope for the best.
2) - Javascript may have some uncommon function / a library may exist that
replaces these characters for me (I think this might be the way CKeditor
does its character conversion).
3) - Figure out the way CKeditor converts and do the conversion exactly that way.
I'm using Ruby on Rails, but that shouldn't matter for this problem.
Some other things that I found out that it converts:
1: It seems to only convert spaces to if the space(s) is before or after a tag:
e.g.: "With quick <a href..."
2: It changes apostrophes to the hex value
e.g.: "opponent's"
3: It changes "&" to "&"
4: It changes angle brackets to ">" and "<" appropriately.
Does anyone have any thoughts on this?
To encode html entities in str (your question title asks for this, if I understand correctly):
$('<div/>').text(str).html();
To decode html entities in str:
$('<div/>').html(str).text();
These rely on jQuery, but vanilla alternatives are basically the same but more verbose.
To encode html entities in str:
var el = document.createElement('div');
el.innerText = str;
el.innerHTML;
To decode html entities in str:
var el = document.createElement('div');
el.innerHTML = str;
el.innerText;
Conversion of spaces to is usually done by the browser while editing content.
Conversion of ' to ' can be controled with http://docs.cksource.com/ckeditor_api/symbols/CKEDITOR.config.html#.entities_additional
and 4. are usually needed to avoid breaking code that it's written in design view when loading again that content. You can try to change http://docs.cksource.com/ckeditor_api/symbols/CKEDITOR.config.html#.basicEntities but that usually can lead to problems in the future.

How do I extract the title value from a string using Javascript regexp?

I have a string variable which I would like to extract the title value in id="resultcount" element. The output should be 2.
var str = '<table cellpadding=0 cellspacing=0 width="99%" id="addrResults"><tr></tr></table><span id="resultcount" title="2" style="display:none;">2</span><span style="font-size: 10pt">2 matching results. Please select your address to proceed, or refine your search.</span>';
I tried the following regex but it is not working:
/id=\"resultcount\" title=['\"][^'\"](+['\"][^>]*)>/
Since var str = ... is Javascript syntax, I assume you need a Javascript solution. As Peter Corlett said, you can't parse HTML using regular expressions, but if you are using jQuery you can use it to take advantage of browser own parser without effort using this:
$('#resultcount', '<div>'+str+'</div>').attr('title')
It will return undefined if resultcount is not found or it has not a title attribute.
To make sure it doesn't matter which attribute (id or title) comes first in a string, take entire html element with required id:
var tag = str.replace(/^.*(<[^<]+?id=\"resultcount\".+?\/.+?>).*$/, "$1")
Then find title from previous string:
var res = tag.replace(/^.*title=\"(\d+)\".*$/, "$1");
// res is 2
But, as people have previously mentioned it is unreliable to use RegEx for parsing html, something as trivial as different quote (single instead of double quote) or space in "wrong" place will brake it.
Please see this earlier response, entitled "You can't parse [X]HTML with regex":
RegEx match open tags except XHTML self-contained tags
Well, since no one else is jumping in on this and I'm assuming you're just looking for a value and not trying to create a parser, I'll give you what works for me with PCRE. I'm not sure how to put it into the java format for you but I think you'll be able to do that.
span id="resultcount" title="(\d+)"
The part you're looking to get is the non-passive group $1 which is the '\d+' part. It will get one or more digits between the quote marks.

Categories

Resources