When storing '\n\r' inside a string constant it will make the Javascript engine throw an error like "unterminated string" and so on.
How to solve this?
More info: basically I want to use Javascript to select text into a TEXTAREA HTML field and insert newlines. When trying to stuff those constants, I get an error.
String literals must not contain plain line break characters like CR and LF:
A 'LineTerminator' character cannot appear in a string literal, even if preceded by a backslash \. The correct way to cause a line terminator character to be part of the string value of a string literal is to use an escape sequence such as \n or \u000A.
So having a line break like this is invalid:
"foo
bar"
Instead you need to use an escape sequence like:
"foo\nbar"
Related
I have a script (in another language) that generates pieces of valid JavaScript which is then executed in the browser. The generated javascript looks e.g. like this:
my_function(123,"long string with lots of weird characters");
That "long string" can potentially contain quotes, apostrophes, backslashes etc... For example the "long string" can be any of these:
hello"there
hello'there
hello\\\\"\\\'\'\\\'"'\"\\"""there
All these characters should be passed as they are to my_function(), without assuming that backslash is a special character that escapes something.
Does javascript have some sort of unique "tag" to delimit a long string literal in which nothing should be "escaped" or "interpreted"? For example a construct similar to this:
my_function(123, [<STRING_START>]long string with lots of weird characters[<STRING_END>]);
I need something like this and I can guarantee that my long string won't contain the string "][<STRING_END>]" so this would work. However I cannot easily guarantee that it won't contain quotes and/or backslashes.
I know that I can use e.g. the normal quotes to delimit my string and programmaticaly add backslashes (in my javascript generator) before all required characters inside the string but the existence of "tags" shown above (or something similar) would make the life easier for me.
you can use String.raw :
It's used to get the raw string form of template literals, that is,
substitutions (e.g. ${foo}) are processed, but escapes (e.g. \n) are
not.
var hello= String.raw` there
hello'there
hello\\\\"\\\'\'\\\'"'\"\\"""there `;
I noticed that backslash is escaped when I get "attribute value including backslash" with JavaScript in the following code.
console.log(document.getElementById("test").getAttribute("class")); // -> \A
console.log(document.getElementById("test").getAttribute("class").replace("\\A", "\A")); // -> A
console.log(document.getElementById("test").dataset.b); // -> \B
console.log(document.getElementById("test").dataset.b.replace("\\B", "\B")); // -> B
<div id="test" class="\A" data-b="\B"></div>
The backslash is treated as a special character in JavaScript, and two backslashes (\\) represent one backslash (\).
The result of the above code means that when getting the attribute value with JavaScript using getAttribute(), one backslash (\) is escaped to two backslashes (\\) at somewhere.
However, in the specification, it seems that the corresponding process is not applied.
Question
In which process of getAttributes() the backslash of HTML attribute is escaped (\ -> \\)?
There's a difference between string literals (which require escaping) and string values from other places (like html, ajax, etc), which are what they look like. Only when converted to literals (ex: JSON.stringify, some console views, etc) do JS strings have backslash escaping. The escape is an output formatting artifact; internally, there are no escapes in the sequence of characters.
HTML doesn't need the same escaping on blackslashes, due to different roots of the standard. An attribute isn't "converted" to one with escaped backslashes unless it's formatted as a string literal. That would happen at a stage between the string and it's visible output. You can use alert() instead of console.log() to see the string as it really is. I believe that specifically for the console, the
goal is to be more helpful to developers than accurate to the internals.
Searching for a regex which parses efficiently strings containing escaped quote I ended up with the fallowing regular expression literal:
/"[^"\\]*(?:\\.[^"\\]*)*"/
It works well and fast if, for example, used to split a strings like:
var str = 'This is a block of text containing a "string with a \" (escaped quote) in it"';
str.split(/("[^"\\]*(?:\\.[^"\\]*)*")/);
Now the trouble comes in play when i try to build dynamically the regex making use of the built-in RegEx object:
/* splits by space characters and
strings containing escaped quote */
var re = new RegExp("(\\s|\"[^\"\\]*(?:\\.[^\"\\]*)*\")");
How noticeable I know that this use case requires to escape metacharacters and quotes. Nevertheless i get the fallowing errors:
Safari says
SyntaxError: Invalid regular expression: missing terminating ] for character class
Firefox:
SyntaxError: unterminated character class
By the way, the error message returned from Safari give me a little more clue making clear that the regex engine detects a missing closing square bracket, requiring the backslash character before itself to be escaped like so:
v v
var re = new RegExp("(\\s|\"[^\"\\\\]*(?:\\.[^\"\\\\]*)*\")");
but this way I realize that strings containing escaped quote are no more parsed correctly.
Any help or suggestion is really appreciated.
Taking account also of the comment above, I've examined more in depth the topic and finally, thanks to the observation pointed out by #NullUserException, I got the solution. I've realized that the regex object:
var re = new RegExp("(\\s|\"[^\"\\\\]*(?:\\.[^\"\\\\]*)*\")");
didn't work because for a mere oversight I didn't escape correctly the part which detects characters preceded by backslash (escape chars). So, in a string, the sequence \\. must be \\\\.:
var re = new RegExp("(\\s|\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")");
Here's a simple live demonstration: http://jsfiddle.net/9ctw66pu/
I encountered this regular expression that detects string literal of Unicode characters in JavaScript.
'"'("\\x"[a-fA-F0-9]{2}|"\\u"[a-fA-F0-9]{4}|"\\"[^xu]|[^"\n\\])*'"'
but I couldn't understand the role and need of
"\\x"[a-fA-F0-9]{2}
"\\"[^xu]|[^"\n\\]
My guess about 1) is that it is detecting control characters.
"\\x"[a-fA-F0-9]{2}
This is a literal \x followed by two characters from the hex-digit group.
This matches the shorter-form character escapes for the code points 0–255, \x00–\xFF. These are valid in JavaScript string literals but they aren't in JSON, where you have to use \u0000–\u00FF instead.
"\\"[^xu]|[^"{esc}\n]
This matches one of:
backslash followed by one more character, except for x or u. The valid cases for \xNN and \uNNNN were picked up in the previous |-separated clauses, so what this does is avoid matching invalid syntax like \uqX.
anything else, except for the " or newline. It is probably also supposed to be excluding other escape characters, which I'm guessing is what {esc} means. That isn't part of the normal regex syntax, but it may be some extended syntax or templating over the top of regex. Otherwise, [^"{esc}\n] would mean just any character except ", {, e, s, c, } or newline, which would be wrong.
Notably, the last clause, that picks up ‘anything else’, doesn't exclude \ itself, so you can still have \uqX in your string and get a match even though that is invalid in both JSON and JavaScript.
I would like to take a string that the user will enter to a text box in a form and turn it into a javascript literal. So I'd, for instance, turn the " character into \".
Is there any complete list of characters that would need to be escaped?
You only need to escape your delimiter and/or whatever enclosing quotes you choose (escape double quotes if you enclose your string in double, escape single otherwise).
If you're going to render it as HTML, you may wnat to convert them to entities (e.g. & -> &)