jQuery Escaping Special Characters Fails - javascript

I am trying to make a jQuery selector to select, by an arbitrary id, an html element. The ids may contains special characters that need to be escaped. An example is test_E///_AAAAA
I am basically doing exactly what is going on in this working fiddle (which uses v 1.11.0, where I am using v 1.11.3 and have also tested with 2.1.3)
However, in my scaled up environment, it doesn't work. I get Syntax error, unrecognized expression: #test_E\\/\\/\\/_AAAAA
There must be some obscure factoid about jQuery that is the difference between this working and not working. I, being a novice, have no hope of identifying it.
I notice that I am not alone though. A commentator on this thread had the same issue.
The code files are thousands of lines long, and I'm probably prohibited from posting more than a couple lines by my employer. I'm just looking for a hint, a clue, a shot in the dark about what would cause a perfectly reasonable selection string to be rejected.

You just need enough backslashes :)
ID:
The ID of the element is test_E\\/\\/\\/_AAAAA. Note that backslashes don't have any special meaning in HTML, so there really are six backslashes in the ID.
jQuery selector: Backslashes, forward slashes, and several other characters have special meaning in jQuery selectors, so we need to escape them with a backslash. The selector therefore needs to be #test_E\\\\\/\\\\\/\\\\\/_AAAAA. This tells jQuery to look for an element whose ID contains test_E, then two backslashes, then one forward slash, and so on.
JavaScript string literal: To represent that selector using a JavaScript string literal, each backslash needs to be escaped. So the string literal would be "#test_E\\\\\\\\\\/\\\\\\\\\\/\\\\\\\\\\/_AAAAA".
var selectionString = "#test_E\\\\\\\\\\/\\\\\\\\\\/\\\\\\\\\\/_AAAAA";
snippet.log("actual id: " + $("p")[0].id);
snippet.log("selection string given to jQuery: " + selectionString);
snippet.log("text: " + $(selectionString).text());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
<!-- Provides the `snippet` object, see http://meta.stackexchange.com/a/242144 -->
<script src="http://tjcrowder.github.io/simple-snippets-console/snippet.js"></script>
<p id="test_E\\/\\/\\/_AAAAA">This is a test :)</p>
As you can see, this is extremely ugly, hard to understand, and hard to get right. I highly recommend avoiding such IDs. Another option is to use good old document.getElementById(), which only requires the string literal escapes:
$(document.getElementById('test_E\\\\/\\\\/\\\\/_AAAAA')).text()

The code in the fiddle doesn't work either. I have tried it in IE, Firefox and Chrome, and neither of them finds the element.
You need to escape a slash to use it in a # selector. If you use a backslash, you have to escape it twice, once to put it in a string, and once for the selector.
To match the id test\A you need the selector #test\\A which as a string is "#test\\\\A".
To match the id test/A you need the selector #test\/A which as a string is "#test\\/A".
To match the id test_E\\/\\/\\/_AAAAA you need the selector #test_E\\\\\/\\\\\/\\\\\/_AAAAA which as a string is "#test_E\\\\\\\\\\/\\\\\\\\\\/\\\\\\\\\\/_AAAAA".
Demo: https://jsfiddle.net/Guffa/463849xj/4/
Generally you should avoid unusual characters in an identity. Even if you can make it work, there is still a risk that some browser handles it differently.
Update:
The error message is shown with the selector unescaped, so as the error message shows the selector #test_E\\/\\/\\/_AAAAA it means that you actually use the string "#test_E\\\\/\\\\/\\\\/_AAAAA". That leaves the slashes unescaped, which causes the syntax error.

Related

regex replace on JSON is removing an Object from Array

I'm trying to improve my understanding of Regex, but this one has me quite mystified.
I started with some text defined as:
var txt = "{\"columns\":[{\"text\":\"A\",\"value\":80},{\"text\":\"B\",\"renderer\":\"gbpFormat\",\"value\":80},{\"text\":\"C\",\"value\":80}]}";
and do a replace as follows:
txt.replace(/\"renderer\"\:(.*)(?:,)/g,"\"renderer\"\:gbpFormat\,");
which results in:
"{"columns":[{"text":"A","value":80},{"text":"B","renderer":gbpFormat,"value":80}]}"
What I expected was for the renderer attribute value to have it's quotes removed; which has happened, but also the C column is completely missing! I'd really love for someone to explain how my Regex has removed column C?
As an extra bonus, if you could explain how to remove the quotes around any value for renderer (i.e. so I don't have to hard-code the value gbpFormat in the regex) that'd be fantastic.
You are using a greedy operator while you need a lazy one. Change this:
"renderer":(.*)(?:,)
^---- add here the '?' to make it lazy
To
"renderer":(.*?)(?:,)
Working demo
Your code should be:
txt.replace(/\"renderer\"\:(.*?)(?:,)/g,"\"renderer\"\:gbpFormat\,");
If you are learning regex, take a look at this documentation to know more about greedyness. A nice extract to understand this is:
Watch Out for The Greediness!
Suppose you want to use a regex to match an HTML tag. You know that
the input will be a valid HTML file, so the regular expression does
not need to exclude any invalid use of sharp brackets. If it sits
between sharp brackets, it is an HTML tag.
Most people new to regular expressions will attempt to use <.+>. They
will be surprised when they test it on a string like This is a
first test. You might expect the regex to match and when
continuing after that match, .
But it does not. The regex will match first. Obviously not
what we wanted. The reason is that the plus is greedy. That is, the
plus causes the regex engine to repeat the preceding token as often as
possible. Only if that causes the entire regex to fail, will the regex
engine backtrack. That is, it will go back to the plus, make it give
up the last iteration, and proceed with the remainder of the regex.
Like the plus, the star and the repetition using curly braces are
greedy.
Try like this:
txt = txt.replace(/"renderer":"(.*?)"/g,'"renderer":$1');
The issue in the expression you were using was this part:
(.*)(?:,)
By default, the * quantifier is greedy by default, which means that it gobbles up as much as it can, so it will run up to the last comma in your string. The easiest solution would be to turn that in to a non-greedy quantifier, by adding a question mark after the asterisk and change that part of your expression to look like this
(.*?)(?:,)
For the solution I proposed at the top of this answer, I also removed the part matching the comma, because I think it's easier just to match everything between quotes. As for your bonus question, to replace the matched value instead of having to hardcode gbpFormat, I used a backreference ($1), which will insert the first matched group into the replacement string.
Don't manipulate JSON with regexp. It's too likely that you will break it, as you have found, and more importantly there's no need to.
In addition, once you have changed
'{"columns": [..."renderer": "gbpFormat", ...]}'
into
'{"columns": [..."renderer": gbpFormat, ...]}' // remove quotes from gbpFormat
then this is no longer valid JSON. (JSON requires that property values be numbers, quoted strings, objects, or arrays.) So you will not be able to parse it, or send it anywhere and have it interpreted correctly.
Therefore you should parse it to start with, then manipulate the resulting actual JS object:
var object = JSON.parse(txt);
object.columns.forEach(function(column) {
column.renderer = ghpFormat;
});
If you want to replace any quoted value of the renderer property with the value itself, then you could try
column.renderer = window[column.renderer];
Assuming that the value is available in the global namespace.
This question falls into the category of "I need a regexp, or I wrote one and it's not working, and I'm not really sure why it has to be a regexp, but I heard they can do all kinds of things, so that's just what I imagined I must need." People use regexps to try to do far too many complex matching, splitting, scanning, replacement, and validation tasks, including on complex languages such as HTML, or in this case JSON. There is almost always a better way.
The only time I can imagine wanting to manipulate JSON with regexps is if the JSON is broken somehow, perhaps due to a bug in server code, and it needs to be fixed up in order to be parseable.

Match attribute value of XML string in JS

I've researched stackoverflow and find similar results but it is not really what I wanted.
Given an xml string: "<a b=\"c\"></a>" in javascript context, I want to create a regex that will capture the attribute value including the quotation marks.
NOTE: this is similar if you're using single quotation marks.
Currently I have a regular expression tailored to the XML specification:
[_A-Za-z][\w\.\-]*(?:=\"[^\"]*\")?
[_A-Za-z][\w\.\-]* //This will match the attribute name.
(?:=\"[^\"]*\")? //This will match the attribute value.
\"[^\"]*\" //This part concerns me.
My question now is, what if the xml string looks like this:
<shout statement="Hi! \"Richeve\"."></shout>
I know this is a dumb question to ask but I just want to capture rare cases that this scenario might happen (I know the coder can use single quotes on this scenario) but there are cases that we don't know the current value of the attribute given that the attribute value changes dynamically at runtime.
So to make this clearer, the result of that using the correct regex should be:
"Hi! \"Richeve\"."
I hope my question is clear. Thanks for all the help!
PS: Note that the language context is Javascript and I know it is tempting to use lookbehinds but currently lookbehinds are not supported.
PS: I know it is really hard to parse XML but I have an elegant solution to this :) so I just need this small problem to be solved. So this problem only main focus is capturing quotation marked string tokens containing quotation marks inside the string token.
The standard pattern for content with matching delimiters and embedded escaped delimiters goes like this:
"[^"\\]*(?:\\.[^"\\]*)*"
Ignoring the obvious first and last characters in the pattern, here's how the rest of the pattern works:
[^"\\]*: Consume all characters until a delimiter OR backslash (matching Hi! in your example)
(?:\\.[^"\\]*)* Try to consume a single escaped character \\. followed by a series of non delimiter/backslash characters, repeatedly (matching \"Richeve first and then \". next in your example)
That's it.
You can try to use a more generic delimiter approach using (['"]) and back references, or you can just allow for an alternate pattern with single quotes like so:
("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')
Here's another description of this technique that might also help (see the section called Strings): http://www.regular-expressions.info/examplesprogrammer.html
Description
I'm pretty really sure embedding double quotes inside a double quoted attribute value is not legal. You could use the unicode equivalent of a double quote \x22 inside the value.
However to answer the question, this expression will:
allow escaped quotes inside attribute values
capture the attribute statement 's value
allow attributes to appear in any order inside the tag
will avoid many of the edge cases which will trip up pattern matching inside html text
doesn't use lookbehinds
<shout\b(?=\s)(?=(?:[^>=]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*?\sstatement=(['"])((?:\\['"]|.)*?)\1(?:\s|\/>|>))(?:[^>=]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/shout>
Example
Pretty Rubular
Ugly RegexPlanet set to Javascript
Sample Text
Note the difficult edge case in the first attribute :)
<shout onmouseover=' statement="He said \"I am Inside the onMouseOver\" " ; if ( 6 > a ) { funRotate(statement) } ; ' statement="Hi! \"Richeve\"." title="sometitle">SomeString</shout>
Matches
Group 0 gets the entire tag from open to close
Group 1 gets the quote surrounding the statement attribute value, this is used to match the closing quote correctly
Group 2 gets the statement attribute value which may include escaped quotes like \" but not including the surrounding quotes
[0][0] = <shout onmouseover=' statement="He said \"I am Inside the onMouseOver\" " ; if ( 6 > a ) { funRotate(statement) } ; ' statement="Hi! \"Richeve\"." title="sometitle">SomeString</shout>
[0][1] = "
[0][2] = Hi! \"Richeve\".

Selecting element by unescaped data attribute

Without going into specifics why I'm doing this... (it should be encoded to begin with, but it's not for reasons outside my control)
Say I have a bit of HTML that looks like this
<tr data-path="files/kissjake's files"">...</tr> so the actual data-path is files/kissjake's files"
How do I go about selecting that <tr> by its data path?
The best I can currently do is when I bring the variables into JS and do any manipulation, I URLEncode it so that I'm always working with the encoded version. jQuery seems smart enough to determine the data-path properly so I'm not worried about that.
The problem is on one step of the code I need to read from a data-path of another location, and then compare them.
Actually selecting this <tr> is what's confusing me.
Here is my coffeescript
oldPriority = $("tr[data-path='#{path}']").attr('data-priority')
If I interpolate the URLEncoded version of the path, it doesn't find the TR. And I can't URLDecode it because then jQuery breaks as there are multiple ' and " conflicting in the path.
I need some way to select any <tr> that matches a particular data-attribute, even if its not encoded in the html to begin with
First, did you mean to have the extra " in there? You will have to escape that, as it's not valid HTML.
<tr data-path="files/kissjake's files"">...</tr>
To select it, you need to escape inside the selector. Here's an example of how that would look:
$("tr[data-path='files/kissjake\\'s files\"']")
Explanation:
\\' is used to escape the ' inside the CSS selector. Since ' is inside other single quotes, it must be escaped at the CSS level. The reason there are two slashes '\` is we must escape a slash so that it makes it into the selector string.
Simpler example: 'John\\'s' yields the string John\'s.
\" is used to escape the double quote which is contained inside the other double quotes. This one is being escaped on the JS level (not the CSS level), so only one slash is used because we don't need a slash to actually be inside the string contents.
Simpler example: 'Hello \"World\"' yields the string Hello "World".
Update
Since you don't have control over how the HTML is output, and you are doomed to deal with invalid HTML, that means the extra double quote should be ignored. So you can instead do:
$("tr[data-path='files/kissjake\\'s files']")
Just the \\' part to deal with the single quote. The extra double quote should be handled by the browser's lenient HTML parser.
Building off of #Nathan Wall's answer, this will select all <tr> tags with a data-path attribute on them.
$("tr[data-path]");

When do I need to escape metacharectars? (jQuery Selectors)

According to the jQuery docs, I need to escape metacharacters that occur in my selector strings, when they occur as a literal. However, I couldn't find very many specific examples of when and when not to escape selectors. So when and when don't I need to escape metacharacters, when they are to be interpreted as a literal, in:
Attribute selectors? ie
$("[attr=value]")
Id selectors? ie
$("#id")
Class selectors? ie
$(".class");
And, is there a way to write a function that replaces metachars in selector strings, while still preserving the beginning character? ie:
// replace all meta chars, preserving the id selection?
$("#id.rest$of*string")
// replace all meta chars, preserving the attribute selection?
// going back to my previous question, do I even need to escape the metachars in this situation?
$("[attr=blah.dee#foo*yay]")
The reason I ask this question, is because I'm working with a website that happens to have some really nasty selectors. And I don't have control over the website, so I can't go change the selectors to be nicer to work with.
THANKS!!
From the jQuery docs:
If you wish to use any of the meta-characters (#;&,.+*~':"!^$=>|/ ) as a literal part of a name, you must escape the character with two backslashes ...
All of these must be escaped:
id
class name
attribute name
attribute value
element name
The first four are obvious, and here's an example for the fifth. Element names in XML can contain a "." character for instance and still be valid.
<user.name>John Doe</user.name>
If you had to select all elements of user.name, then that . must be escaped
$(xml).find("user\\.name");
Rather than blatantly stealing someone else's answer, I'll point you to it: jQuery selector value escaping, where jQuery's selector parsing method is described in detail.
The short answer: you may be in trouble since jQuery's selector parser is not 100% standards-complaint. Per the suggestion in the linked answer, you may be able to workaround by calling the regular DOM methods (document.getElementById()), which will work with funny selectors, and then pass the raw DOM element to the jQuery selector.
$(document.getElementById("id.rest$of*string"));

Find DOM element by ID when ID contains square brackets?

I have a DOM element with an ID similar to:
something[500]
which was built by my Ruby on Rails application. I need to be able to get this element via jQuery so that I can traverse my way up the DOM to delete the parent of it's parent, which has a variable ID that I don't have access to beforehand.
Does anyone know how I could go about this? The following code doesn't seem to be working:
alert($("#something["+id+"]").parent().parent().attr("id"));
Upon further inspection, the following:
$("#something["+id+"]")
returns an object, but when I run ".html()" or ".text()" on it, the result is always null or just an empty string.
You need to escape the square brackets so that they are not counted as attribute selectors. Try this:
alert($("#something\\["+id+"\\]").parent().parent().attr("id"));
See Special Characters In Selectors, specifically the second paragraph:
To use any of the meta-characters (such as !"#$%&'()*+,./:;<=>?#[\]^``{|}~) as a literal part of a name, it must be escaped with with two backslashes: \\. For example, an element with id="foo.bar", can use the selector $("#foo\\.bar"). The W3C CSS specification contains the complete set of rules regarding valid CSS selectors. Also useful is the blog entry by Mathias Bynens on CSS character escape sequences for identifiers.
You can also do
$('[id="something['+id+']"]')
An id cannot include square brackets. It is forbidden by the spec.
Some browsers might error correct and cope, but you should fix you data instead of trying to deal with bad data.
Square brackets have special meaning to jQuery selectors, the attribute filters specifically.
Just escape these and it will find your element fine
$( "#something\\[" + id + "\\]" )
Try this:
alert($("#something\\["+id+"\\]").parent()[0].parent()[0].attr("id"));
You can escape them using \\ or you could do something like this...
$(document.getElementById("something[" + id + "]"))
"Any of the meta-characters
!"#$%&'()*+,./:;<=>?#[\]^`{|}~
as a literal part of a name, it must be escaped with with two backslashes: \\.
For example, an element with id="foo.bar", can use the selector
$("#foo\\.bar")
" [source: jquery doc], and an element with id="foo[bar]" (even though not valid for W3C but recognised by JQuery), can use the selector
$("#foo\\[bar\\]")
(Just an asnwer like the many others, but all together :))

Categories

Resources