Why does "\054321" produce ",321" in JavaScript?

Why does "\054321" produce ",321" in JavaScript? - javascript

Just tested in Chrome:
"\054321"
entered in the console gives:
",321"
Clearly "\054" gets converted to ",", but I fail to find a rule for this in the specification.
I am looking for a formal specification for this rule. Why does "\054" produce ","?
Update: It was an OctalEscapeSequence, as defined in B 1.2.

If you look at an ascii table you will find that the octal value 054 (decimal: 44) is a comma.
You can find more on the Values, variables, and literals - JavaScript | MDN page that specifies:
Table 2.1 JavaScript special characters
Character Meaning
XXX The character with the Latin-1 encoding specified by up to three octal digits XXX between 0 and 377. For example, \251 is the octal sequence for the copyright symbol.
So essentially, if you use an escape code of 3 digits, it will be evalutated as an octal value.
That said, be aware that octal escapes are deprecated in ES5 (These have been removed from this edition of ECMAScript) and you should use heximal or unicode escape codes instead (\XX or \uXXXX).
If you wish to have the literal string \012345 then simply use a double backslash: \\012345.

The Oct code for , is 054.
Check this Ascii Table
Why you have to put "\" before that Oct code then??
From a quick search you'll find lots of solutions. But this one gave me a better clearification.
Octal escape sequences
Any character with a character code lower than 256 (i.e. any character in the extended ASCII range) can be escaped using its octal-encoded character code, prefixed with . (Note that this is the same range of characters that can be escaped through hexadecimal escapes.)
To use the same example, the copyright symbol ('©') has character code 169, which gives 251 in octal notation, so you could write it as '\251'.
[The upper section was collected from an article which is here]
There are also some exceptions which you can find from that article.

Because using \ in a string, as in other languages like C or Python, implies a special character, like \r as carriage return or \n as newline. In this case, It's the ascii value of the coma, which is 054.

"\054" is octal for ",". See here: http://www.asciitable.com/index/asciifull.gif

Related

insert unicode like \u1d6fc in a javascript text string

I'm writing some code that scans a string for TeX-style Greek character (like \Delta or \alpha), and replaces the string with the Unicode symbol. It works fine for the non-italic Greek characters. The problem is that I want to use mathematical italic for the lower case. These codes are one digit longer. For example, the code for the letter alpha is 1d6fc. When I put \u1d6fc into my string it displays as the character that matches \u1d6f (a lower case m with a superimposed tilde) followed by the letter c. How do I force the "correct" reading of the code?

You have to use UTF-16 surrogate pairs for characters beyond the UTF-16 range. In your particular case, you can use 0xD835 0xDEFC:
console.log('\uD835\uDEFC')
Here is a handy pair calculator. If you don't have to worry about Internet Explorer, you can also use String.fromCodePoint(), which will deal with that mess for you. If you do have to worry about Internet Explorer, MDN has a polyfill for that method.

To produce a \u escape sequence with more than 4 hex digits (code point belonging to a so-called astral plane), you can use the Unicode code point escape notation \u{xxxxx}:
console.log ('\u{1d6fc}');
or you can call String.fromCodePoint with the code point value expressed in hexadecimal using the 0x prefix notation:
console.log (String.fromCodePoint (0x1d6fc));

What is the logic behind the output of console.log('\x') where x is an arbitrary number?

What is the logic behind the output of the following examples:
console.log('\272') // -> º
console.log('\364') // -> ô
As far as I know, \ is an escape character in javascript which means it tries to escape the following character but in the first example it is not equal to ASCII code of 72 which is character H.

That's because of the octal encoding.
Any character with a character code lower than 256 (i.e. any character
in the extended ASCII range) can be escaped using its octal-encoded
character code, prefixed with . (Note that this is the same range of
characters that can be escaped through hexadecimal escapes.)
To use the same example, the copyright symbol ('©') has character code
169, which gives 251 in octal notation, so you could write it as
'\251'.
You can take a look to this explanation, quite illustrative: https://mathiasbynens.be/notes/javascript-escapes

They are octal values.
You can find all of them here
However, using octal values are deprecated. Using them in strict mode will throw you SyntaxError.
I'll suggest you to use Hexadecimal code instead which you can find in the provided link:
For octal value 272, the hexadecimal value is BA. So, you'll use it prefixed by small letter x - denoted as hex value.
console.log('\xBA') // -> º

Javascript converting number to string turns it into another number?

If I try to convert 003050 to a string it turns it into 1576 how can I turn 003050 into a string without it doing that. And any other possible whole number? I tried '' + 003050 and it's still 1576
String(003050);
"1576"

This has no relation with the conversion to string.
var n = 003050;
is enough to make the number interpreted as octal.
You would have gotten the same result with ""+parseInt("3050", 8).
Simply remove the leading 0 to get the number 3050 :
var n = 3050;
If you want a literal string with leading 0, well, just make it
var s = "003050";

Seems you could do with reading a little about Javascript,
Reference: Standard ECMA-262 5.1 Edition / June 2011
and understand primitives, objects and literals etc.
It is worth noting Additional Syntax
Past editions of ECMAScript have included additional syntax and semantics for specifying octal literals and octal escape sequences. These have been removed from this edition of ECMAScript. This non-normative annex presents uniform syntax and semantics for octal literals and octal escape sequences for compatibility with some older ECMAScript programs.
What you have, 003050, is a Numeric Literal
The syntax and semantics of 7.8.3 can be extended as follows except that this
extension is not allowed for strict mode code:
Syntax
NumericLiteral ::
DecimalLiteral
HexIntegerLiteral
OctalIntegerLiteral
OctalIntegerLiteral ::
0 OctalDigit
OctalIntegerLiteral OctalDigit
OctalDigit :: one of
0 1 2 3 4 5 6 7
And so, if not using strict mode, where this syntax has been removed ,then 003050 (an octal) is the decimal value 1576.
If you had, '003050', then you would have a String Literal
A string literal is zero or more characters enclosed in single or double quotes. Each character may be represented by an escape sequence. All characters may appear literally in a string literal except for the closing quote character, backslash, carriage return, line separator, paragraph separator, and line feed. Any character may appear in the form of an escape sequence.
So, you really wanted to use a string literal, didn't you?
If you really meant to use an octal numeric literal and wanted it as a string, then you would need to do something like this.
Javascript
var x = 003050,
y = ('000000' + x.toString(8)).slice(-6);
console.log(y);
Output
003050
On jsFiddle

Unicode characters not working

I'm really new to Javascript and I heard about unicode characters, but I don't know how they work. I did this:
alert("U+00BF");
which is the unicode for an upside-down question mark, but for some reason it just alerts the letters "U+00BF". I've tried using unicode characters with a format more like this:
alert("/xF3");
and those worked, but I don't know what I'm doing wrong with the first one. Does anyone know?

The "U+00BF" in alert("U+00BF"); is a string of length 6 containing the characters 'U', '+', '0', '0', 'B', 'F'. Hence the string "U+00BF" is echoed out in the alert.
Based on https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Values,_variables,_and_literals#Unicode ,
You can use the Unicode escape sequence in string literals, regular expressions, and identifiers. The escape sequence consists of six ASCII characters: \u and a four-digit hexadecimal number. For example, \u00A9 represents the copyright symbol. Every Unicode escape sequence in JavaScript is interpreted as one character.
Which means, we need to do:
alert("\u00BF");
to see the "upside-down question mark" unicode character.

The notation U+00BF is simply a conventional way of emphasing that you are mentioning, in text, a Unicode character by its code number 00BF. It is not an escape notation of any kind in JavaScript.
You can use the character as such,
alert("¿")
provided that handle the character encoding issues, as you should anyway.
If, however, you find this simple approach not applicable due to some external constraints, you can use a classic JavaScript escape notation:
alert("\xBF")
for characters in the range up to U+00FF, or the Unicode-based JavaScript escape notation
alert("\u00BF")
for characters in the range up to U+FFFF. (For characters beyond that, you need a so-called surrogate pair.)
Note that the special character used in these notations is \ U+005C REVERSE SOLIDUS, commonly, and originally, called “backslash”, not / U+002F SOLIDUS, commonly called “slash”, or sometimes (for emphasis) “forward slash”.

What's the meaning about characterEncoding

I'm reading the Sizzle source code. I'm confused when I read the regular about characterEncoding. In the source code, the characterEncoding defined as below:
characterEncoding = "(?:\\\\.|[\\w-]|[^\\x00-\\xa0])+"
It looks try to match \\. or \w- or ^\x00-\xa0.
I know [\w-] means \ or w or -, and I also know [^\x00-\xa0] means anything not in \x00-\x20. Who can tell me what's the meaning about \\. and \x00-\x20.
Thanks
I think I know what it is. The type of characterEncoding is string. So if we assign like below:
characterEncoding = "(?:\\\\.|[\\w-]|[^\\x00-\\xa0])+"
The value of characterEncoding is:
(?:\\.|[\w-]|[^\x00-\xa0])+
So if I build a regular expression like above, it means:
[\w-] // A symbol of Latin alphabet or a digit or an underscore '_' or '-'
[^\\x00-\\xa0] // ISO 10646 characters U+00A1 and higher
\\. // '\' and '.'
So this time, my question is when will the pattern \\. work?

The variable would be better named css3Identifier or something.
Transforming [\w-]|[^\x00-\xa0] into an equivalent form that matches the spec better:
[a-zA-Z0-9_-]|[\u00A1-\uFFFF]
Consider that A1 is 161, _ is underscore and - is a dash and then
read this:
In CSS3, identifiers (including element names, classes, and IDs in selectors (see [SELECT] [or is this still true])) can contain only the characters [A-Za-z0-9] and ISO 10646 characters 161 and higher, plus the hyphen (-) and the underscore (_)
"and higher" is covered by -\uFFFF
The "\\\\." matches any single character preceded by backslash. e.g.- \7B would match \7 and then B would be caught
by the middle alternative. It also matches \n, \r, \t etc.

It is just the valid regex format of CSS identifier, class, tag and attributes. A link is also in the source code comment. Following are the rules, including the possible use of backslashes which might answer your question:
4.1. Characters and case
The following rules always hold:
All CSS style sheets are case-insensitive, except for parts that are not under the control of CSS. For example, the case-sensitivity of values of the HTML attributes "id" and "class", of font names, and of URIs lies outside the scope of this specification. Note in particular that element names are case-insensitive in HTML, but case-sensitive in XML.
In CSS3, identifiers (including element names, classes, and IDs in selectors (see [SELECT] [or is this still true])) can contain only the characters [A-Za-z0-9] and ISO 10646 characters 161 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit or a hyphen followed by a digit. They can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B\&W\?" or "B\26 W\3F". (See [UNICODE310] and [ISO10646].)
In CSS3, a backslash () character indicates three types of character escapes.
First, inside a string (see [CSS3VAL]), a backslash followed by a newline is ignored (i.e., the string is deemed not to contain either the backslash or the newline).
Second, it cancels the meaning of special CSS characters. Any character (except a hexadecimal digit) can be escaped with a backslash to remove its special meaning. For example, "\"" is a string consisting of one double quote. Style sheet preprocessors must not remove these backslashes from a style sheet since that would change the style sheet's meaning.
Third, backslash escapes allow authors to refer to characters they can't easily put in a style sheet. In this case, the backslash is followed by at most six hexadecimal digits (0..9A..F), which stand for the ISO 10646 ([ISO10646]) character with that number. If a digit or letter follows the hexadecimal number, the end of the number needs to be made clear. There are two ways to do that:
with a space (or other whitespace character): "\26 B" ("&B"). In this case, user agents should treat a "CR/LF" pair (13/10) as a single whitespace character.
by providing exactly 6 hexadecimal digits: "\000026B" ("&B")
In fact, these two methods may be combined. Only one whitespace character is ignored after a hexadecimal escape. Note that this means that a "real" space after the escape sequence must itself either be escaped or doubled.
Backslash escapes are always considered to be part of an identifier or a string (i.e., "\7B" is not punctuation, even though "{" is, and "\32" is allowed at the start of a class name, even though "2" is not).
http://www.w3.org/TR/css3-syntax/#characters

Develop Reference

JavaScript is the programming language of the Web.