When is the HTML attribute backslash escaped as a JavaScript string? - javascript

I noticed that backslash is escaped when I get "attribute value including backslash" with JavaScript in the following code.
console.log(document.getElementById("test").getAttribute("class")); // -> \A
console.log(document.getElementById("test").getAttribute("class").replace("\\A", "\A")); // -> A
console.log(document.getElementById("test").dataset.b); // -> \B
console.log(document.getElementById("test").dataset.b.replace("\\B", "\B")); // -> B
<div id="test" class="\A" data-b="\B"></div>
The backslash is treated as a special character in JavaScript, and two backslashes (\\) represent one backslash (\).
The result of the above code means that when getting the attribute value with JavaScript using getAttribute(), one backslash (\) is escaped to two backslashes (\\) at somewhere.
However, in the specification, it seems that the corresponding process is not applied.
Question
In which process of getAttributes() the backslash of HTML attribute is escaped (\ -> \\)?

There's a difference between string literals (which require escaping) and string values from other places (like html, ajax, etc), which are what they look like. Only when converted to literals (ex: JSON.stringify, some console views, etc) do JS strings have backslash escaping. The escape is an output formatting artifact; internally, there are no escapes in the sequence of characters.
HTML doesn't need the same escaping on blackslashes, due to different roots of the standard. An attribute isn't "converted" to one with escaped backslashes unless it's formatted as a string literal. That would happen at a stage between the string and it's visible output. You can use alert() instead of console.log() to see the string as it really is. I believe that specifically for the console, the
goal is to be more helpful to developers than accurate to the internals.

Related

How to Escape the unescaped double quotes in a CSV string in Node

This is very similar to
Regular expression to find unescaped double quotes in CSV file
However, the solutions presented don't work with Node.js's regex engine. Given a CSV string where columns are quoted with double quotes, but some columns have unescaped double quotes in them, what regex could be used to match these unescaped quotes and just remove them.
Example rows
"123","","SDFDS SDFSDF EEE "S"","asdfas","b","lll"
"123","","SDFDS SDFSDF EEE "S"","asdfas","b","lll"
So the two double quotes surrounding the S in the third column would get matched and removed. Needs to work in Node.js (14.16.1)
I have tried (?m)""(?![ \t]*(,|$)) but get a Invalid regular expression: /(?m)""(?![ \t]*(,|$))/: Invalid group exception
I don't know much about node.js, but assuming it is like the JavaScript flavor of regex then I have the following comments about the example you took from the prior answer:
I think your example is choking on the first element, (?m) which is unsupported in Javascript. However, that part is not essential to your task. It only turns on multiline processing and you don't need that if you feed the regex engine each line individually. If you find you still want to feed it a multiline string, then you can still turn on multiline in JavaScript - you do it with the "m" flag after the final delimiter, "/myregex/m". All of the other elements, including the negative lookahead are supported by JavaScript and probably by your engine as well. So, drop the (?m) part of your expression and try it again.
Even after you get it to work, the example row you provided will not be parsed according to your expectations by the sample regular expression. Its function is to identify all occurrences of two double-quotes that are not followed by a comma (or end of string). The ONLY two occurrences of doubled quotes in your example each have a comma after, so you will get no matches on this regex in your example.
It seems like you want some context-sensitive scanning to match and remove the inner pairs of double quotes while leaving the outer ones in place and handling commas inside your strings and possibly correctly quoted double quotes. Regular expression engines are really bad at this kind of processing and I don't think you are going to get satisfactory results whatever you come up with.
You can get an approximate solution to your problem by using regex once to parse the individual elements of the .csv stripping the outer quotes as you go and then running a second regex against each parsed element to either remove single occurrences of double quote or adding a second double-quote, where necessary. Then you can reassemble the string under program control.
This still will break if someone embeds a "", sequence in a data field string, so it's not perfect but it might be good enough for you.
The regex for splitting the .csv and stripping the double quotes is:
/(("(.*?)")|([^,]*))(,|$)/gm
This will accept either a "anything", OR a anything, repeatedly until the source is exhausted. Because of the capturing groups, the parsed text will either by in $3 (if the field was quoted) or $4 (if it was not quoted) but not both.
Here's a regexpReplace of your string with $3&$4 and a semicolon after each iteration (I took the liberty of adding a numeric field without the quotes so you could see that it handles both cases):
"123","","SDFDS SDFSDF EEE "S"",456,"asdfas","b","lll"
RegexpReplace(<above>,"((""(.*?)"")|([^,]*))(,|$)","$3$4;")
=> 123;;SDFDS SDFSDF EEE "S";456;asdfas;b;lll;;
See how the outer quotes have been stripped away. Now it's a simple thing to go through all the matches to remove all the remaining quotes, and then you can reconstruct the string from the array of matches.

How to write a long string literal in which no characters are escaped?

I have a script (in another language) that generates pieces of valid JavaScript which is then executed in the browser. The generated javascript looks e.g. like this:
my_function(123,"long string with lots of weird characters");
That "long string" can potentially contain quotes, apostrophes, backslashes etc... For example the "long string" can be any of these:
hello"there
hello'there
hello\\\\"\\\'\'\\\'"'\"\\"""there
All these characters should be passed as they are to my_function(), without assuming that backslash is a special character that escapes something.
Does javascript have some sort of unique "tag" to delimit a long string literal in which nothing should be "escaped" or "interpreted"? For example a construct similar to this:
my_function(123, [<STRING_START>]long string with lots of weird characters[<STRING_END>]);
I need something like this and I can guarantee that my long string won't contain the string "][<STRING_END>]" so this would work. However I cannot easily guarantee that it won't contain quotes and/or backslashes.
I know that I can use e.g. the normal quotes to delimit my string and programmaticaly add backslashes (in my javascript generator) before all required characters inside the string but the existence of "tags" shown above (or something similar) would make the life easier for me.
you can use String.raw :
It's used to get the raw string form of template literals, that is,
substitutions (e.g. ${foo}) are processed, but escapes (e.g. \n) are
not.
var hello= String.raw` there
hello'there
hello\\\\"\\\'\'\\\'"'\"\\"""there `;

String.replace() in case of different encodings

When I use JSON.stringfy().replace(/[\t\r\n]/g,"").trim() on response messages (lambda functions callbacks) from different system I face an issue where \t will be replaced with \\t and \ to \\\
Is there a way to avoid this?
I tried to search for answers but only found articles for base cases.
JSON.stringify's specific purpose is to convert what you give it to JSON. If what you give it is a string with backslashes in it, then what you'll get back is the JSON representation of that string, which is the string encased in double quotes (") with any special characters, such as backslashes, escaped with a backslash, newlines converted to \n, carriage returns converted to \r, etc.
Example:
const str = document.querySelector("input").value;
console.log("The string:", str);
console.log("JSON.stringify's output:", JSON.stringify(str));
<input type="text" value="This string has a backslash in it: \ For instance, here's a backslash followed by a t: \t">
That's what JSON.stringify does. If you don't want that, don't use JSON.stringify.
...in case of different encodings
That part is irrelevant. By the time you're dealing with a JavaScript string, it doesn't matter what encoding was used to represent that string (in an HTML file, a .js file, etc.). Once it's in memory, it's in the one format for JavaScript strings defined by the language (which is essentially UTF-16, except invalid surrogate pairs are allowed).

Write HTML Special Character into a Variable

$("<h2/>", {"class" : "wi wi"+data.today.code}).text(" " + data.city + data.today.temp.now + "F").appendTo(custom_example);
Hi there, I'm trying to alter the code above to add the degrees icon just before the (F)arenheit marker. I've tried entering + html("°") + but it doesn't work. My JS is pretty rough and I was hoping I could get a quick answer here before I spent too long trying and failing. Thanks!
I want the end result to print something like: Encinitas 65°F
Special characters are characters that must be escaped by a backslash\, like:
Single quote \'
Double quote \"
Backslash \\
The degree ° is not a special character, you can just write it, as it is.
Edit: If you want to use the unicode of °F, just write: '\u2109'.
Escape Special Characters JavaScript
JavaScript uses the \ (backslash) as an escape characters for:
\' single quote
\" double quote
\ backslash
\n new line
\r carriage return
\t tab
\b backspace
\f form feed
\v vertical tab (IE < 9 treats '\v' as 'v' instead of a vertical tab
('\x0B').
If cross-browser compatibility is a concern, use \x0B instead of \v.)
\0 null character (U+0000 NULL) (only if the next character is not a
decimal digit; else it’s an octal escape sequence)
Note that the \v and \0 escapes are not allowed in JSON strings.
First of all the degree character needs not to be escaped. So simply entering "°F" should do the job.
However, if you are in doubt with the codepage of your JS code you could use a JavaScript escape sequence. JS escape sequences are quite different from HTML escapes. The do not support decimal values at all. So first of all you have to convert 176 to hex: b0. The correctly escaped equivalent to "°F" is "\xb0F". It will work too and is more robust with respect to codepage issues of you platform's source editor.
If you really want to assign HTML code you need to use the .html() function. But this is mutual exclusive to .text(). So in this case all of your content needs to be HTML rather than plain text. Otherwise an HTML injection vulnerability arises. I.e. you need to properly escape angle brackets and some other symbols in data.city and maybe data.today.temp.now as well.
JS itself has no built-in function to escape HTML. But JQuery provides a trick: $('<div/>').text(data.city).html() will return appropriately escaped HTML. See HTML-encoding lost when attribute read from input field for more details.
I would recommend not to use .html() unless you really need it, e.g. if you want to apply styles or formatting to parts of the text only.

Bug with Javascript's JSON.parse?

console.log(JSON.parse('{"data":"{\"json\":\"rocks\"}"}'));
gives error (tested on Firefox and Chrome's console). Is this a bug with JSON.parse? Same decodes well when tested with PHP.
print_r(json_decode('{"data":"{\"json\":\"rocks\"}"}', true));
This string is processed differently in PHP and JS, i.e. you get different results.
The only escapes sequences in single quoted strings in PHP are \\ and \'. All others are outputted literally, according to the documentation:
To specify a literal single quote, escape it with a backslash (\). To specify a literal backslash, double it (\\). All other instances of backslash will be treated as a literal backslash: this means that the other escape sequences you might be used to, such as \r or \n, will be output literally as specified rather than having any special meaning.
In JS on the other hand, if a string contains an invalid escape sequence, the backslash is discarded (CV means character value):
The CV of CharacterEscapeSequence :: NonEscapeCharacter is the CV of the NonEscapeCharacter.
The CV of NonEscapeCharacter :: SourceCharacter but not EscapeCharacter or LineTerminator is the SourceCharacter character itself.
The quote might not be helpful by itself, but if you follow the link and have a look at the grammar, it should become clear.
So in PHP the string will literally contain \" while in JS it will only contains ", which makes it invalid JSON:
{"data":"{"json":"rocks"}"}
If you want to create a literal backslash in JS, you have to escape it:
'{"data":"{\\"json\\":\\"rocks\\"}"}'
To have a literal backslash in a string literal,you need \\.
console.log(JSON.parse('{"data":"{\\"json\\":\\"rocks\\"}"}'));
This will successfully escape the inner quotation marks for the JSON processing.
You need to escape the backslashes:
console.log(JSON.parse('{"data":"{\\"json\\":\\"rocks\\"}"}'));​
object with one or more then '\' wont return Object by JSON.parser, It will return the string again with skipping one '\'.
You can do parse again and again until all '\' skipped.
myobj = {\"json\":\"rocks\"}
myobj = {\\"json\\":\\"rocks\\"}
Following lines worked for me
remove backslash
while(typeof myobj == 'string'){
myobj = JSON.parse(myobj)
}
You don't really need to escape double quotes inside single quotes and you have two extra quotes in your input around inner object, just
console.log(JSON.parse('{"data":{"json":"rocks"}}'));
is enough.

Categories

Resources