String.replace() in case of different encodings

String.replace() in case of different encodings - javascript

When I use JSON.stringfy().replace(/[\t\r\n]/g,"").trim() on response messages (lambda functions callbacks) from different system I face an issue where \t will be replaced with \\t and \ to \\\
Is there a way to avoid this?
I tried to search for answers but only found articles for base cases.

JSON.stringify's specific purpose is to convert what you give it to JSON. If what you give it is a string with backslashes in it, then what you'll get back is the JSON representation of that string, which is the string encased in double quotes (") with any special characters, such as backslashes, escaped with a backslash, newlines converted to \n, carriage returns converted to \r, etc.
Example:
const str = document.querySelector("input").value;
console.log("The string:", str);
console.log("JSON.stringify's output:", JSON.stringify(str));
<input type="text" value="This string has a backslash in it: \ For instance, here's a backslash followed by a t: \t">
That's what JSON.stringify does. If you don't want that, don't use JSON.stringify.
...in case of different encodings
That part is irrelevant. By the time you're dealing with a JavaScript string, it doesn't matter what encoding was used to represent that string (in an HTML file, a .js file, etc.). Once it's in memory, it's in the one format for JavaScript strings defined by the language (which is essentially UTF-16, except invalid surrogate pairs are allowed).

Related

How to write a long string literal in which no characters are escaped?

I have a script (in another language) that generates pieces of valid JavaScript which is then executed in the browser. The generated javascript looks e.g. like this:
my_function(123,"long string with lots of weird characters");
That "long string" can potentially contain quotes, apostrophes, backslashes etc... For example the "long string" can be any of these:
hello"there
hello'there
hello\\\\"\\\'\'\\\'"'\"\\"""there
All these characters should be passed as they are to my_function(), without assuming that backslash is a special character that escapes something.
Does javascript have some sort of unique "tag" to delimit a long string literal in which nothing should be "escaped" or "interpreted"? For example a construct similar to this:
my_function(123, [<STRING_START>]long string with lots of weird characters[<STRING_END>]);
I need something like this and I can guarantee that my long string won't contain the string "][<STRING_END>]" so this would work. However I cannot easily guarantee that it won't contain quotes and/or backslashes.
I know that I can use e.g. the normal quotes to delimit my string and programmaticaly add backslashes (in my javascript generator) before all required characters inside the string but the existence of "tags" shown above (or something similar) would make the life easier for me.

you can use String.raw :
It's used to get the raw string form of template literals, that is,
substitutions (e.g. ${foo}) are processed, but escapes (e.g. \n) are
not.
var hello= String.raw` there
hello'there
hello\\\\"\\\'\'\\\'"'\"\\"""there `;

When is the HTML attribute backslash escaped as a JavaScript string?

I noticed that backslash is escaped when I get "attribute value including backslash" with JavaScript in the following code.
console.log(document.getElementById("test").getAttribute("class")); // -> \A
console.log(document.getElementById("test").getAttribute("class").replace("\\A", "\A")); // -> A
console.log(document.getElementById("test").dataset.b); // -> \B
console.log(document.getElementById("test").dataset.b.replace("\\B", "\B")); // -> B
<div id="test" class="\A" data-b="\B"></div>
The backslash is treated as a special character in JavaScript, and two backslashes (\\) represent one backslash (\).
The result of the above code means that when getting the attribute value with JavaScript using getAttribute(), one backslash (\) is escaped to two backslashes (\\) at somewhere.
However, in the specification, it seems that the corresponding process is not applied.
Question
In which process of getAttributes() the backslash of HTML attribute is escaped (\ -> \\)?

There's a difference between string literals (which require escaping) and string values from other places (like html, ajax, etc), which are what they look like. Only when converted to literals (ex: JSON.stringify, some console views, etc) do JS strings have backslash escaping. The escape is an output formatting artifact; internally, there are no escapes in the sequence of characters.
HTML doesn't need the same escaping on blackslashes, due to different roots of the standard. An attribute isn't "converted" to one with escaped backslashes unless it's formatted as a string literal. That would happen at a stage between the string and it's visible output. You can use alert() instead of console.log() to see the string as it really is. I believe that specifically for the console, the
goal is to be more helpful to developers than accurate to the internals.

Convert unicode characters to their character

I have a file of localized properties coming in.
The file is like this:
str1=Rawr
str2=This is a dot \u00B7
In str2, they mean that \u00B7 is the unicode and not the actual string \\u00B7. Is there anyway to parse strings to the unicode chars are converted?

Add double quotes around the value – then JSON.parse can do the job for you.
If you want to read and parse
str1=Rawr
str2=This is a dot \u00B7
as one value, then you will need to replace the line breaks with \n before doing so, otherwise it’ll break the syntax of the “string” you are passing to JSON.parse.

In NODE.JS, the newline code (%0A) will decode back to what character?

I have a pretty simple question, but a few simple googling and stachexchange queries were not able to answer it, so i guess i'm missing something here.
Here are my simplified parameters:
I'm using Javascript.
I have a text that needs to get URLEncoded and the text have more than 1 line.
My question is: What is the character for newline before the text get encoded? (I know that after the encoding the newline will be encoded into %0A)
I guess asking "What char is decoded when decoding %0A" will be the same.

Those codes consist of a percent sign, followed by a two character hexadecimal number representing a byte value.
So in this case, the byte value is 0A, representing the ASCII newline character. This is commonly written as \n inside strings in JavaScript (and others, like PHP).
But I think your question suggests you want to do some search and replace for this character. I would not do that, since there can be other characters too that need encoding. Instead, use the function encodeURIComponent, which can encode the entire string for you. There is encodeURI as well, but in your case, I think the first is more appropriate.
This example shows how special characters (newline, space, and others) are encoded to an url-friendly format. Note that the diacritic é translates to the two bytes of its UTF-8 representation.
document.write(encodeURIComponent("Normal text\nEéy, check the specials: /, + and \t!"));

Bug with Javascript's JSON.parse?

console.log(JSON.parse('{"data":"{\"json\":\"rocks\"}"}'));
gives error (tested on Firefox and Chrome's console). Is this a bug with JSON.parse? Same decodes well when tested with PHP.
print_r(json_decode('{"data":"{\"json\":\"rocks\"}"}', true));

This string is processed differently in PHP and JS, i.e. you get different results.
The only escapes sequences in single quoted strings in PHP are \\ and \'. All others are outputted literally, according to the documentation:
To specify a literal single quote, escape it with a backslash (\). To specify a literal backslash, double it (\\). All other instances of backslash will be treated as a literal backslash: this means that the other escape sequences you might be used to, such as \r or \n, will be output literally as specified rather than having any special meaning.
In JS on the other hand, if a string contains an invalid escape sequence, the backslash is discarded (CV means character value):
The CV of CharacterEscapeSequence :: NonEscapeCharacter is the CV of the NonEscapeCharacter.
The CV of NonEscapeCharacter :: SourceCharacter but not EscapeCharacter or LineTerminator is the SourceCharacter character itself.
The quote might not be helpful by itself, but if you follow the link and have a look at the grammar, it should become clear.
So in PHP the string will literally contain \" while in JS it will only contains ", which makes it invalid JSON:
{"data":"{"json":"rocks"}"}
If you want to create a literal backslash in JS, you have to escape it:
'{"data":"{\\"json\\":\\"rocks\\"}"}'

To have a literal backslash in a string literal,you need \\.
console.log(JSON.parse('{"data":"{\\"json\\":\\"rocks\\"}"}'));
This will successfully escape the inner quotation marks for the JSON processing.

You need to escape the backslashes:
console.log(JSON.parse('{"data":"{\\"json\\":\\"rocks\\"}"}'));

object with one or more then '\' wont return Object by JSON.parser, It will return the string again with skipping one '\'.
You can do parse again and again until all '\' skipped.
myobj = {\"json\":\"rocks\"}
myobj = {\\"json\\":\\"rocks\\"}
Following lines worked for me
remove backslash
while(typeof myobj == 'string'){
myobj = JSON.parse(myobj)
}

You don't really need to escape double quotes inside single quotes and you have two extra quotes in your input around inner object, just
console.log(JSON.parse('{"data":{"json":"rocks"}}'));
is enough.

Develop Reference

JavaScript is the programming language of the Web.

String.replace() in case of different encodings - javascript

When I use JSON.stringfy().replace(/[\t\r\n]/g,"").trim() on response messages (lambda functions callbacks) from different system I face an issue where \t will be replaced with \\t and \ to \\\ Is there a way to avoid this? I tried to search for answers but only found articles for base cases.

Related

How to write a long string literal in which no characters are escaped?

When is the HTML attribute backslash escaped as a JavaScript string?

Convert unicode characters to their character

In NODE.JS, the newline code (%0A) will decode back to what character?

Bug with Javascript's JSON.parse?

Categories

Resources