Read non printable characters from Javascript - javascript

Say I have the following Javascript instruction:
var a="hiàja, c . Non di–g t";
a contains binary data, i.e., any ASCII from 0-255.
Before what ASCII bytes should I add backslash so that a is read properly? (for example, before ").
Should I use an specific charset and content-type different than text/Javascript and UTF-8?
Thanks

The ASCII range is 0 to 127, but strings are not limited to ASCII in JavaScript. According to the ECMAScript standard, “All characters may appear literally in a string literal except for the closing quote character, backslash, carriage return, line separator, paragraph separator, and line feed.” If the encoding of your document is suitable (e.g., windows-1252 or utf-8) and properly declared, you can use your example string as it is.

Related

insert unicode like \u1d6fc in a javascript text string

I'm writing some code that scans a string for TeX-style Greek character (like \Delta or \alpha), and replaces the string with the Unicode symbol. It works fine for the non-italic Greek characters. The problem is that I want to use mathematical italic for the lower case. These codes are one digit longer. For example, the code for the letter alpha is 1d6fc. When I put \u1d6fc into my string it displays as the character that matches \u1d6f (a lower case m with a superimposed tilde) followed by the letter c. How do I force the "correct" reading of the code?
You have to use UTF-16 surrogate pairs for characters beyond the UTF-16 range. In your particular case, you can use 0xD835 0xDEFC:
console.log('\uD835\uDEFC')
Here is a handy pair calculator. If you don't have to worry about Internet Explorer, you can also use String.fromCodePoint(), which will deal with that mess for you. If you do have to worry about Internet Explorer, MDN has a polyfill for that method.
To produce a \u escape sequence with more than 4 hex digits (code point belonging to a so-called astral plane), you can use the Unicode code point escape notation \u{xxxxx}:
console.log ('\u{1d6fc}');
or you can call String.fromCodePoint with the code point value expressed in hexadecimal using the 0x prefix notation:
console.log (String.fromCodePoint (0x1d6fc));

String.replace() in case of different encodings

When I use JSON.stringfy().replace(/[\t\r\n]/g,"").trim() on response messages (lambda functions callbacks) from different system I face an issue where \t will be replaced with \\t and \ to \\\
Is there a way to avoid this?
I tried to search for answers but only found articles for base cases.
JSON.stringify's specific purpose is to convert what you give it to JSON. If what you give it is a string with backslashes in it, then what you'll get back is the JSON representation of that string, which is the string encased in double quotes (") with any special characters, such as backslashes, escaped with a backslash, newlines converted to \n, carriage returns converted to \r, etc.
Example:
const str = document.querySelector("input").value;
console.log("The string:", str);
console.log("JSON.stringify's output:", JSON.stringify(str));
<input type="text" value="This string has a backslash in it: \ For instance, here's a backslash followed by a t: \t">
That's what JSON.stringify does. If you don't want that, don't use JSON.stringify.
...in case of different encodings
That part is irrelevant. By the time you're dealing with a JavaScript string, it doesn't matter what encoding was used to represent that string (in an HTML file, a .js file, etc.). Once it's in memory, it's in the one format for JavaScript strings defined by the language (which is essentially UTF-16, except invalid surrogate pairs are allowed).

javascript encodeURI() output

According to MDN, The 'encodeURI()' function:
replacing each instance of certain characters by one, two, three, or four escape sequences representing the UTF-8 encoding of the character
However, when invoking encodeURI('\u0082') (in Chrome) Im getting %C2%81 as output.
I expected to get %82 or %00%82. What does the %C2 mean?
The '0082' in '\u0082' is the Unicode code point, not the UTF-8 bytes representation.
UTF-8 maps u+0082 code point to two bytes: C2+81
Unicode to UTF-8 mapping table
Decoding %C2 at http://www.albionresearch.com/misc/urlencode.php leads to Â
When dealing with German texts and ISO 8859-15 / ISO 8859-1 vs. UTF-8 I often ran into the à character. The characters are quite close to each other. May this also be an encoding problem?
Maybe HTML encoding issues - "Â" character showing up instead of " " helps.

In NODE.JS, the newline code (%0A) will decode back to what character?

I have a pretty simple question, but a few simple googling and stachexchange queries were not able to answer it, so i guess i'm missing something here.
Here are my simplified parameters:
I'm using Javascript.
I have a text that needs to get URLEncoded and the text have more than 1 line.
My question is: What is the character for newline before the text get encoded? (I know that after the encoding the newline will be encoded into %0A)
I guess asking "What char is decoded when decoding %0A" will be the same.
Those codes consist of a percent sign, followed by a two character hexadecimal number representing a byte value.
So in this case, the byte value is 0A, representing the ASCII newline character. This is commonly written as \n inside strings in JavaScript (and others, like PHP).
But I think your question suggests you want to do some search and replace for this character. I would not do that, since there can be other characters too that need encoding. Instead, use the function encodeURIComponent, which can encode the entire string for you. There is encodeURI as well, but in your case, I think the first is more appropriate.
This example shows how special characters (newline, space, and others) are encoded to an url-friendly format. Note that the diacritic é translates to the two bytes of its UTF-8 representation.
document.write(encodeURIComponent("Normal text\nEéy, check the specials: /, + and \t!"));

Why does "\054321" produce ",321" in JavaScript?

Just tested in Chrome:
"\054321"
entered in the console gives:
",321"
Clearly "\054" gets converted to ",", but I fail to find a rule for this in the specification.
I am looking for a formal specification for this rule. Why does "\054" produce ","?
Update: It was an OctalEscapeSequence, as defined in B 1.2.
If you look at an ascii table you will find that the octal value 054 (decimal: 44) is a comma.
You can find more on the Values, variables, and literals - JavaScript | MDN page that specifies:
Table 2.1 JavaScript special characters
Character Meaning
XXX The character with the Latin-1 encoding specified by up to three octal digits XXX between 0 and 377. For example, \251 is the octal sequence for the copyright symbol.
So essentially, if you use an escape code of 3 digits, it will be evalutated as an octal value.
That said, be aware that octal escapes are deprecated in ES5 (These have been removed from this edition of ECMAScript) and you should use heximal or unicode escape codes instead (\XX or \uXXXX).
If you wish to have the literal string \012345 then simply use a double backslash: \\012345.
The Oct code for , is 054.
Check this Ascii Table
Why you have to put "\" before that Oct code then??
From a quick search you'll find lots of solutions. But this one gave me a better clearification.
Octal escape sequences
Any character with a character code lower than 256 (i.e. any character in the extended ASCII range) can be escaped using its octal-encoded character code, prefixed with . (Note that this is the same range of characters that can be escaped through hexadecimal escapes.)
To use the same example, the copyright symbol ('©') has character code 169, which gives 251 in octal notation, so you could write it as '\251'.
[The upper section was collected from an article which is here]
There are also some exceptions which you can find from that article.
Because using \ in a string, as in other languages like C or Python, implies a special character, like \r as carriage return or \n as newline. In this case, It's the ascii value of the coma, which is 054.
"\054" is octal for ",". See here: http://www.asciitable.com/index/asciifull.gif

Categories

Resources