Line endings (also known as Newlines) in JS strings - javascript

It is well known, that Unix-like system uses LF characters for newlines, whereas Windows uses CR+LF.
However, when I test this code from local HTML file on my Windows PC, it seems that JS treat all newlines as separated with LF. Is it correct assumption?
var string = `
foo
bar
`;
// There should be only one blank line between foo and bar.
// \n - Works
// string = string.replace(/^(\s*\n){2,}/gm, '\n');
// \r\n - Doesn't work
string = string.replace(/^(\s*\r\n){2,}/gm, '\r\n');
alert(string);
// That is, it seems that JS treat all newlines as separated with
// `LF` instead of `CR+LF`?

I think I found an explanation.
You are using an ES6 Template Literal to construct your multi-line string.
According to the ECMAScript specs a
.. template literal component is interpreted as a sequence of Unicode
code points. The Template Value (TV) of a literal component is
described in terms of code unit values (SV, 11.8.4) contributed by the
various parts of the template literal component. As part of this
process, some Unicode code points within the template component are
interpreted as having a mathematical value (MV, 11.8.3). In
determining a TV, escape sequences are replaced by the UTF-16 code
unit(s) of the Unicode code point represented by the escape sequence.
The Template Raw Value (TRV) is similar to a Template Value with the
difference that in TRVs escape sequences are interpreted literally.
And below that, it is defined that:
The TRV of LineTerminatorSequence::<LF> is the code unit 0x000A (LINE
FEED).
The TRV of LineTerminatorSequence::<CR> is the code unit 0x000A (LINE FEED).
My interpretation here is, you always just get a line feed - regardless of the OS-specific new-line definitions when you use a template literal.
Finally, in JavaScript's regular expressions a
\n matches a line feed (U+000A).
which describes the observed behavior.
However, if you define a string literal '\r\n' or read text from a file stream, etc that contains OS-specific new-lines you have to deal with it.
Here are some tests that demonstrate the behavior of template literals:
`a
b`.split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
(String.raw`a
b`).split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
'a\r\nb'.split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
"a\
b".split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
Interpreting the results:
char(97) = a, char(98) = b
char(10) = \n, char(13) = \r

You could use the regular expression: /^\s*[\r\n]/gm
Code example:
let string = `
foo
bar
`;
string = string.replace(/^\s*[\r\n]/gm, '\r\n');
console.log(string);

Related

Escaping backslash in a string containing backslash

I have a string containing I\u2019m (with backslashes not escaped)
var myString = 'I\\u2019m'; // I\u2019m
But then I need a function that 'escape backslashes' that string, so the function I'm looking for would return I'm
backslashString(myString); // I'm
I've tried using eval:
function backslashString(input){
input = input.replace(/'/g, "\\'"); // Replace ' with \' that's going to mess up eval
return eval(`'${input}'`);
}
But is there a proper way of doing it? I'm looking for a function that escape backslashes a string containing I\u2019m to I'm and also handles if there's an extra backslash (A lost \ backslash)
EDIT:
I did not ask what I meant from the start. This not only applies to unicode characters, but applies to all backslash characters including \n
The backslashes aren’t the real problem here - the real problem is the difference between code and data.
\uXXXX is JavaScript syntax to write the Unicode codepoint of a character in a text literal. It gets replaced with the actual character, when the JavaScript parser interprets this code.
Now you have a variable that contains the value I\u2019m already - that is data. This does not get parsed as JavaScript, so it does mean the literal characters I\u2019m, and not I’m. eval can “fix” that, because the missing step of interpreting this as code is simply what eval does.
If you do not want to use eval (and thereby invite all the potential risks that entails, if the input data is not completely under your control), then you can parse those numeric values from the string using regular expressions, and then use String.formCharCode to create the actual Unicode character from the given code point:
var myString = 'I\\u2019m and I\\u2018m';
var myNewString = myString.replace(/\\u([0-9]+)/g, function(m, n) {
return String.fromCharCode(parseInt(n, 16)) }
);
console.log(myNewString)
/\\u([0-9]+)/g - regular expression to match this \uXXXX format (X=digits), g modifier to replace all matches instead of stopping after the first.
parseInt(n, 16) - to convert the hexadecimal value to a decimal first, because String.fromCharCode wants the latter.
decodeURIComponent(JSON.parse('"I\\u2019m"'));
OR for multiple
'I\\\u2019m'.split('\\').join().replace(/,/g,'');
'I\u2019m'.split('\\').join().replace(/,/g,'');
Looks like there's no other way other than eval (JSON.parse doesn't like new lines in strings)
NOTE: The function would return false if it has a trailing backslash
function backslashString(input){
input = input.replace(/`/g, '\\`'); // Escape quotes for input to eval
try{
return eval('`'+input+'`');
}catch(e){ // Will return false if input has errors in backslashing
return false;
}
}

Replace new line characters with \r

I'm using hl7parser to parse ADM files.
The documentation states that to create a new Message object, a string should be passed:
var message = hl7parser.create("MSH|^~\&|||||20121031232617||ADT^A04|20381|P|2.3||||NE\rEVN|A04|20121031162617||01\rPID|1|16194|16194||Jones^Bob");
Notice that the string uses '\r' to separate segments (MSH, EVN, PID).
I'm fetching the data from a server, which returns for instance the following data.
MSH|^~\&|EPICADT|DH|LABADT|DH|201301011226||ADT^A01|HL7MSG00001|P|2.3.1|
EVN|A01|201301011223||
PID|||MRN12345^5^M11||APPLESEED^JOHN^A^III||19710101|M||C|1 CATALYZE STREET^^MADISON^WI^53005-1020|GL|(414)379-1212|(414)271-3434||S||MRN12345001^2^M10|123456789|987654^NC|
NK1|1|APPLESEED^BARBARA^J|WIFE||||||NK^NEXT OF KIN
PV1|1|I|2000^2012^01||||004777^GOOD^SIDNEY^J.|||SUR||||ADM|A0|
Replacing the \n with \r with replace() doesn't make the parsing work, neither does split('\n') and join('\r').
I noticed that there is a difference when logging the string passed in the example and the string after replacing with \r
With string in example:
PID|1|16194|16194||Jones^BobADT^A04|20381|P|2.3||||NE
It's only printing the last segment apparently because of the \r characters
With my replacement method:
PID|||MRN12345^5^M11||APPLESEED^JOHN^A^III||19710101|M||C|1 CATALYZE STREET^^MADISON^WI^53005-1020|GL|(414)379-1PV1|1|I|2000^2012^01||||004777^GOOD^SIDNEY^J.|||SUR||||ADM|A0|
The entire string is printed, not just the last segment.
I'm not sure why there is a difference when printing them. Is there a difference between passing a literal string with \r character and "adding" \r to a string?
Doing this should work:
const lines = "A\nB\nC";
const result = lines.split("\n").join("\r");
console.log(result);
The confusion probably comes from the fact that it looks like it didn't, since it looks like it just output ABC.
However, if we check out the length of the string produced:
const lines = "A\nB\nC";
const result = lines.split("\n").join("\r");
console.log(result);
console.log(result.length);
Notice that it is 5 characters long, not 3. The \r is there. It's just that when it is output to most things, it basically gets hidden because an \r doesn't really render to anything on its own.
It is a "carriage return" and only MacOS (before X) used it as a newline character. Windows uses a combination of \r\n to render a newline and Linux (and MacOSX) uses \n.
If it wanted an explicitly shown in the string \r, then you'd need to use an escaped one (though this is almost certainly not what it expects):
const lines = "A\nB\nC";
const result = lines.split("\n").join("\\r");
console.log(result);
console.log(result.length);
function replaceLfWithCr(text) {
return text.replace(/\n/g, '\r');
}

Does javascript treat these differently: ' vs `? [duplicate]

In JavaScript, a backtick† seems to work the same as a single quote. For instance, I can use a backtick to define a string like this:
var s = `abc`;
Is there a way in which the behavior of the backtick actually differs from that of a single quote?
† Note that among programmers, "backtick" is one name for what is more generally called the grave accent. Programmers also sometimes use the alternate names "backquote" and "backgrave". Also, on Stack Overflow and elsewhere, other common spellings for "backtick" are "back-tick" and "back tick".
This is a feature called template literals.
They were called "template strings" in prior editions of the ECMAScript 2015 specification.
Template literals are supported by Firefox 34, Chrome 41, and Edge 12 and above, but not by Internet Explorer.
Examples: http://tc39wiki.calculist.org/es6/template-strings/
Official specification: ECMAScript 2015 Language Specification, 12.2.9 Template Literal Lexical Components (a bit dry)
Template literals can be used to represent multi-line strings and may use "interpolation" to insert variables:
var a = 123, str = `---
a is: ${a}
---`;
console.log(str);
Output:
---
a is: 123
---
What is more important, they can contain not just a variable name, but any JavaScript expression:
var a = 3, b = 3.1415;
console.log(`PI is nearly ${Math.max(a, b)}`);
ECMAScript 6 comes up with a new type of string literal, using the backtick as the delimiter. These literals do allow basic string interpolation expressions to be embedded, which are then automatically parsed and evaluated.
let person = {name: 'RajiniKanth', age: 68, greeting: 'Thalaivaaaa!' };
let usualHtmlStr = "<p>My name is " + person.name + ",</p>\n" +
"<p>I am " + person.age + " old</p>\n" +
"<strong>\"" + person.greeting + "\" is what I usually say</strong>";
let newHtmlStr =
`<p>My name is ${person.name},</p>
<p>I am ${person.age} old</p>
<p>"${person.greeting}" is what I usually say</strong>`;
console.log(usualHtmlStr);
console.log(newHtmlStr);
As you can see, we used the ` around a series of characters, which are interpreted as a string literal, but any expressions of the form ${..} are parsed and evaluated inline immediately.
One really nice benefit of interpolated string literals is they are allowed to split across multiple lines:
var Actor = {"name": "RajiniKanth"};
var text =
`Now is the time for all good men like ${Actor.name}
to come to the aid of their
country!`;
console.log(text);
// Now is the time for all good men like RajiniKanth
// to come to the aid of their
// country!
Interpolated Expressions
Any valid expression is allowed to appear inside ${..} in an interpolated string literal, including function calls, inline function expression calls, and even other interpolated string literals!
function upper(s) {
return s.toUpperCase();
}
var who = "reader"
var text =
`A very ${upper("warm")} welcome
to all of you ${upper(`${who}s`)}!`;
console.log(text);
// A very WARM welcome
// to all of you READERS!
Here, the inner `${who}s` interpolated string literal was a little bit nicer convenience for us when combining the who variables with the "s" string, as opposed to who + "s". Also note that an interpolated string literal is just lexically scoped where it appears, not dynamically scoped in any way:
(e.g. below: a name variable gets interpolated with the value held in the scope where the template-literal is defined; assigning another value in the foo function's scope will have no effect)
function foo(str) {
var name = "foo";
console.log(str);
}
function bar() {
var name = "bar";
foo(`Hello from ${name}!`);
}
var name = "global";
bar(); // "Hello from bar!"
Using the template literal for the HTML is definitely more readable by reducing the annoyance.
The plain old way:
'<div class="' + className + '">' +
'<p>' + content + '</p>' +
'Let\'s go'
'</div>';
With ECMAScript 6:
`<div class="${className}">
<p>${content}</p>
Let's go
</div>`
Your string can span multiple lines.
You don't have to escape quotation characters.
You can avoid groupings like: '">'
You don't have to use the plus operator.
Tagged Template Literals
We can also tag a template string, when a template string is tagged, the literals and substitutions are passed to function which returns the resulting value.
function myTaggedLiteral(strings) {
console.log(strings);
}
myTaggedLiteral`test`; //["test"]
function myTaggedLiteral(strings, value, value2) {
console.log(strings, value, value2);
}
let someText = 'Neat';
myTaggedLiteral`test ${someText} ${2 + 3}`;
//["test", ""]
// "Neat"
// 5
We can use the spread operator here to pass multiple values. The first argument—we called it strings—is an array of all the plain strings (the stuff between any interpolated expressions).
We then gather up all subsequent arguments into an array called values using the ... gather/rest operator, though you could of course have left them as individual named parameters following the strings parameter like we did above (value1, value2, etc.).
function myTaggedLiteral(strings, ...values) {
console.log(strings);
console.log(values);
}
let someText = 'Neat';
myTaggedLiteral`test ${someText} ${2 + 3}`;
//["test", ""]
// "Neat"
// 5
The argument(s) gathered into our values array are the results of the already evaluated interpolation expressions found in the string literal. A tagged string literal is like a processing step after the interpolations are evaluated, but before the final string value is compiled, allowing you more control over generating the string from the literal. Let's look at an example of creating reusable templates.
const Actor = {
name: "RajiniKanth",
store: "Landmark"
}
const ActorTemplate = templater`<article>
<h3>${'name'} is a Actor</h3>
<p>You can find his movies at ${'store'}.</p>
</article>`;
function templater(strings, ...keys) {
return function(data) {
let temp = strings.slice();
keys.forEach((key, i) => {
temp[i] = temp[i] + data[key];
});
return temp.join('');
}
};
const myTemplate = ActorTemplate(Actor);
console.log(myTemplate);
Raw Strings
Our tag functions receive a first argument we called strings, which is an array. But there’s an additional bit of data included: the raw unprocessed versions of all the strings. You can access those raw string values using the .raw property, like this:
function showraw(strings, ...values) {
console.log(strings);
console.log(strings.raw);
}
showraw`Hello\nWorld`;
As you can see, the raw version of the string preserves the escaped \n sequence, while the processed version of the string treats it like an unescaped real new-line. ECMAScript 6 comes with a built-in function that can be used as a string literal tag: String.raw(..). It simply passes through the raw versions of the strings:
console.log(`Hello\nWorld`);
/* "Hello
World" */
console.log(String.raw`Hello\nWorld`);
// "Hello\nWorld"
Backticks (`) are used to define template literals. Template literals are a new feature in ECMAScript 6 to make working with strings easier.
Features:
we can interpolate any kind of expression in the template literals.
They can be multi-line.
Note: we can easily use single quotes (') and double quotes (") inside the backticks (`).
Example:
var nameStr = `I'm "Alpha" Beta`;
To interpolate the variables or expression we can use the ${expression} notation for that.
var name = 'Alpha Beta';
var text = `My name is ${name}`;
console.log(text); // My name is Alpha Beta
Multi-line strings means that you no longer have to use \n for new lines anymore.
Example:
const name = 'Alpha';
console.log(`Hello ${name}!
How are you?`);
Output:
Hello Alpha!
How are you?
Apart from string interpolation, you can also call a function using back-tick.
var sayHello = function () {
console.log('Hello', arguments);
}
// To call this function using ``
sayHello`some args`; // Check console for the output
// Or
sayHello`
some args
`;
Check styled component. They use it heavily.
Backticks enclose template literals, previously known as template strings. Template literals are string literals that allow embedded expressions and string interpolation features.
Template literals have expressions embedded in placeholders, denoted by the dollar sign and curly brackets around an expression, i.e. ${expression}. The placeholder / expressions get passed to a function. The default function just concatenates the string.
To escape a backtick, put a backslash before it:
`\`` === '`'; => true
Use backticks to more easily write multi-line string:
console.log(`string text line 1
string text line 2`);
or
console.log(`Fifteen is ${a + b} and
not ${2 * a + b}.`);
vs. vanilla JavaScript:
console.log('string text line 1\n' +
'string text line 2');
or
console.log('Fifteen is ' + (a + b) + ' and\nnot ' + (2 * a + b) + '.');
Escape sequences:
Unicode escapes started by \u, for example \u00A9
Unicode code point escapes indicated by \u{}, for example \u{2F804}
Hexadecimal escapes started by \x, for example \xA9
Octal literal escapes started by \ and (a) digit(s), for example \251
Summary:
Backticks in JavaScript is a feature which is introduced in ECMAScript 6 // ECMAScript 2015 for making easy dynamic strings. This ECMAScript 6 feature is also named template string literal. It offers the following advantages when compared to normal strings:
In Template strings linebreaks are allowed and thus can be multiline. Normal string literals (declared with '' or "") are not allowed to have linebreaks.
We can easily interpolate variable values to the string with the ${myVariable} syntax.
Example:
const name = 'Willem';
const age = 26;
const story = `
My name is: ${name}
And I'm: ${age} years old
`;
console.log(story);
Browser compatibility:
Template string literal are natively supported by all major browser vendors (except Internet Explorer). So it is pretty safe to use in your production code. A more detailed list of the browser compatibilities can be found here.
The good part is we can make basic maths directly:
let nuts = 7
more.innerHTML = `
<h2>You collected ${nuts} nuts so far!
<hr>
Double it, get ${nuts + nuts} nuts!!
`
<div id="more"></div>
It became really useful in a factory function:
function nuts(it){
return `
You have ${it} nuts! <br>
Cosinus of your nuts: ${Math.cos(it)} <br>
Triple nuts: ${3 * it} <br>
Your nuts encoded in BASE64:<br> ${btoa(it)}
`
}
nut.oninput = (function(){
out.innerHTML = nuts(nut.value)
})
<h3>NUTS CALCULATOR
<input type="number" id="nut">
<div id="out"></div>
It's a pretty useful functionality, for example here is a Node.js code snippet to test the set up of a 3 second timing function.
const waitTime = 3000;
console.log(`setting a ${waitTime/1000} second delay`);
Explanation
Declare wait time as 3000
Using the backtick you can embed the result of the calculation of 'wait time' divided by 1000 in the same line with your chosen text.
Further calling a timer function using the 'waitTime' constant will result in a 3 second delay, as calculated in the console.log argument.
You can make a template of templates too, and reach private variable.
var a= {e:10, gy:'sfdsad'}; //global object
console.log(`e is ${a.e} and gy is ${a.gy}`);
//e is 10 and gy is sfdsad
var b = "e is ${a.e} and gy is ${a.gy}" // template string
console.log( `${b}` );
//e is ${a.e} and gy is ${a.gy}
console.log( eval(`\`${b}\``) ); // convert template string to template
//e is 10 and gy is sfdsad
backtick( b ); // use fonction's variable
//e is 20 and gy is fghj
function backtick( temp ) {
var a= {e:20, gy:'fghj'}; // local object
console.log( eval(`\`${temp}\``) );
}
A lot of the comments answer most of your questions, but I mainly wanted to contribute to this question:
Is there a way in which the behavior of the backtick actually differs from that of a single quote?
A difference I've noticed for template strings is the disability to set one as an object property. More information in this post; an interesting quote from the accepted answer:
Template strings are expressions, not literals1.
But basically, if you ever wanted to use it as an object property you'd have to use it wrapped with square brackets.
// Throws error
const object = {`templateString`: true};
// Works
const object = {[`templateString`]: true};
The backtick character () *in JavaScript is used to define template literals. A template literal is a special type of string that allows you to embed expressions, which are evaluated and included in the final string. They are denoted by being surrounded by the backtick ()* character instead of single quotes (') or double quotes (").
Here's an example of using a template literal to embed an expression in a string:
const name = "Akila";
const message = `Hello, ${name}!`;
console.log(message); // Output: Hello, Akila!
In the example above, the expression ${name} is evaluated and included in the final string, which is assigned to the message variable.
Template literals also provide several convenient features, such as multi-line strings and string interpolation. Multi-line strings allow you to include line breaks in your strings, which is especially useful for creating formatted text.
Here's an example of using a multi-line string with a template literal:
const message = `This is a
multi-line string.`;
console.log(message);
output
This is a
multi-line string.
In conclusion, the backtick character (`) in JavaScript is used to define template literals, which are a convenient way to include expressions and multi-line strings in your JavaScript code.

What are the actual uses of ES6 Raw String Access?

What are the actual uses of String.raw Raw String Access introduced in ECMAScript 6?
// String.raw(callSite, ...substitutions)
function quux (strings, ...values) {
strings[0] === "foo\n"
strings[1] === "bar"
strings.raw[0] === "foo\\n"
strings.raw[1] === "bar"
values[0] === 42
}
quux `foo\n${ 42 }bar`
String.raw `foo\n${ 42 }bar` === "foo\\n42bar"
I went through the below docs.
http://es6-features.org/#RawStringAccess
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw
http://www.2ality.com/2015/01/es6-strings.html
https://msdn.microsoft.com/en-us/library/dn889830(v=vs.94).aspx
The only the thing that I understand, is that it is used to get the raw string form of template strings and used for debugging the template string.
When this can be used in real time development? They were calling this a tag function. What does that mean?
What concrete use cases am I missing?
The best, and very nearly only, use case for String.raw I can think of is if you're trying to use something like Steven Levithan's XRegExp library that accepts text with significant backslashes. Using String.raw lets you write something semantically clear rather than having to think in terms of doubling your backslashes, just like you can in a regular expression literal in JavaScript itself.
For instance, suppose I'm doing maintenance on a site and I find this:
var isSingleUnicodeWord = /^\w+$/;
...which is meant to check if a string contains only "letters." Two problems: A) There are thousands of "word" characters across the realm of human language that \w doesn't recognize, because its definition is English-centric; and B) It includes _, which many (including the Unicode consortium) would argue is not a "letter."
So if we're using XRegExp on the site, since I know it supports \pL (\p for Unicode categories, and L for "letter"), I might quickly swap this in:
var isSingleUnicodeWord = XRegExp("^\pL+$"); // WRONG
Then I wonder why it didn't work, facepalm, and go back and escape that backslash, since it's being consumed by the string literal.
Easy enough in that simple regex, but in something complicated, remembering to double all those backslashes is a maintenance pain. (Just ask Java programmers trying to use Pattern.)
Enter String.raw:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`);
Example:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`); // L: Letter
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Now I just kick back and write what I mean. I don't even really have to worry about ${...} constructs used in template literals to do substitution, because the odds of my wanting to apply a quantifier {...} to the end-of-line assertion ($) are...low. So I can happily use substitutions and still not worry about backslashes. Lovely.
Having said that, though, if I were doing it a lot, I'd probably want to write a function and use a tagged template instead of String.raw itself. But it's surprisingly awkward to do correctly:
// My one-time tag function
function xrex(strings, ...values) {
let raw = strings.raw;
let max = Math.max(raw.length, values.length);
let result = "";
for (let i = 0; i < max; ++i) {
if (i < raw.length) {
result += raw[i];
}
if (i < values.length) {
result += values[i];
}
}
console.log("Creating with:", result);
return XRegExp(result);
}
// Using it, with a couple of substitutions to prove to myself they work
let category = "L"; // L: Letter
let maybeEol = "$";
let isSingleUnicodeWord = xrex`^\p${category}+${maybeEol}`;
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Maybe the hassle is worth it if you're using it in lots of places, but for a couple of quick ones, String.raw is the simpler option.
First, a few things:
Template strings is old name for template literals.
A tag is a function.
String.raw is a method.
String.raw `foo\n${ 42 }bar\` is a tagged template literal.
Template literals are basically fancy strings.
Template literals can interpolate.
Template literals can be multi-line without using \.
String.raw is required to escape the escape character \.
Try putting a string that contains a new-line character \n through a function that consumes newline character.
console.log("This\nis\nawesome"); // "This\nis\nawesome"
console.log(String.raw`This\nis\nawesome`); // "This\\nis\\nawesome"
If you are wondering, console.log is not one of them. But alert is. Try running these through http://learnharmony.org/ .
alert("This\nis\nawesome");
alert(String.raw`This\nis\nawesome`);
But wait, that's not the use of String.raw.
Possible uses of String.raw method:
To show string without interpretation of backslashed characters (\n, \t) etc.
To show code for the output. (As in example below)
To be used in regex without escaping \.
To print windows director/sub-directory locations without using \\ to much. (They use \ remember. Also, lol)
Here we can show output and code for it in single alert window:
alert("I printed This\nis\nawesome with " + Sring.raw`This\nis\nawesome`);
Though, it would have been great if It's main use could have been to get back the original string. Like:
var original = String.raw`This is awesome.`;
where original would have become: This\tis \tawesome.. This isn't the case sadly.
References:
http://exploringjs.com/es6/ch_template-literals.html
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw
Template strings can be useful in many situations which I will explain below. Considering this, the String.raw prevents escapes from being interpreted. This can be useful in any template string in which you want to contain the escape character but do not want to escape it. A simple example could be the following:
var templateWithBackslash = String.raw `someRegExp displayed in template /^\//`
There are a few things inside that are nice to note with template strings.
They can contain unescaped line breaks without problems.
They can contain "${}". Inside these curly braces the javascript is interpreted instead.
(Note: running these will output the result to your console [in browser dev tools])
Example using line breaks:
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
{
asdf: "and some pretty printed json"
}
</pre>
</div>
`
console.log(myTemplate)
If you wanted to do the above with a normal string in Javascript it would look like the following:
var myTemplate = "\
<div class="myClass">\
<pre>\
My formatted text\
with multiple lines\
{\
asdf: "and some pretty printed json"\
}\
</pre>\
</div>"
console.log(myTemplate)
You will notice the first probably looks much nicer (no need to escape line breaks).
For the second I will use the same template string but also insert the some pretty printed JSON.
var jsonObj = {asdf: "and some pretty printed json", deeper: {someDeep: "Some Deep Var"}}
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
${JSON.stringify(jsonObj, null, 2)}
</pre>
</div>
`
console.log(myTemplate)
In NodeJS it is extremely handy when it comes to filepath handling:
var fs=require('fs');
var s = String.raw`C:\Users\<username>\AppData\Roaming\SomeApp\someObject.json`;
var username = "bob"
s=s.replace("<username>",username)
fs.readFile(s,function(err,result){
if (err) throw error;
console.log(JSON.parse(result))
})
It improves readability of filepaths on Windows. \ is also a fairly common separator, so I can definitely see why it would be useful in general. However it is pretty stupid how \ still escapes `... So ultimately:
String.raw`C:\Users\` //#==> C:\Users\`
console.log(String.raw`C:\Users\`) //#==> SyntaxError: Unexpected end of input.
In addition to its use as a tag, String.raw is also useful in implementing new tag functions as a tool to do the interleaving that most people do with a weird loop. For example, compare:
function foo(strs, ...xs) {
let result = strs[0];
for (let i = 0; i < xs.length; ++i) {
result += useFoo(xs[i]) + strs[i + 1];
}
return result;
}
with
function foo(strs, ...xs) {
return String.raw({raw: strs}, ...xs.map(useFoo));
}
The Use
(Requisite knowledge: tstring §.)
Instead of:
console.log(`\\a\\b\\c\\n\\z\\x12\\xa9\\u1234\\u00A9\\u{1234}\\u{00A9}`);
.you can:
console.log(String.raw`\a\b\c\n\z\x12\xa9\u1234\u00A9\u{1234}\u{00A9}`);
"Escaping"
<\\u> is fine, yet <\u> needs "escaping", eg:
console.log(String.raw`abc${'\\u'}abc`);
.Dit <\\x>, <\x>,
<console.log(String.raw`abc${`\\x`}abc`)>;
.<\`>, <`>, <console.log(String.raw`abc${`\``}abc`)>;
.<\${>, <${&>, <console.log(String.raw`abc${`$\{`}abc`)>;
.<\\1> (till <\\7>), <\1>, <console.log(String.raw`abc${`\\1`}abc`)>;
.<\\>, endunit <\>, <console.log(String.raw`abc${`\\`}`)>.
Nb
There's also a new "latex" string. Cf §.
I've found it to be useful for testing
my RegExps. Say I have a RegExp which
should match end-of-line comments because
I want to remove them. BUT, it must not
match source-code for a regexp like /// .
If your code contains /// it is not the
start of an EOL comment but a RegExp, as
per the rules of JavaScript syntax.
I can test whether my RegExp in variable patEOLC
matches or doesn't /// with:
String.raw`/\//` .match (patEOLC)
In other words it is a way to let my
code "see" code the way it exists in
source-code, not the way it exists
in memory after it has been read
into memory from source-code, with
all backslashes removed.
It is a way to "escape escaping" but
without having to do it separately
for every backslash in a string, but
for all of them at the same time.
It is a way to say that in a given
(back-quoted) string backslash
shall behave just like any other
character, it has no special
meaning or interpretation.

Regex converting & to &

I am developing a small character encoder generator where the user input their text and on the click of a button, it outputs the encoded version.
I've defined an object of the characters that need to be encoded like so:
map = {
'©' : '©',
'&' : '&'
},
And here is the loop that gets the values from the map and replaces them:
Object.keys(map).forEach(function (ico) {
var icoE = ico.replace(/([.?*+^$[\]\\(){}|-])/g, "\\$1");
raw = raw.replace( new RegExp(icoE, 'g'), map[ico] );
});
I am them simply outputting the result to a textarea. This all works fine, however the problem I'm facing is this.
© is replaced with © however the & symbol at the beginning of this is then converted to & so it ends up being &copy;.
I see why this is happening however I'm not sure how to go about ensuring that & is not replaced within character encoded strings.
Here is a JSFiddle for a live preview of what I mean:
http://jsfiddle.net/4m3nw/1/
Any help would be much appreciated
Prelude: Apart from regex, an idea worth considering is something like this JS function that already handles html entities. Now, on to the regex question.
HTML Special Characters, Negative Lookahead
In HTML, special characters can look not only like © but also like —, and they can have upper-case characters.
To replace ampersands that are not immediately followed by a hash or word characters and a semicolon, you can use something like this:
&(?!(?:#[0-9]+|[a-z]+);)
See the demo.
Make sure to use the i flag to activate case-insensitive mode
& matches the literal ampersand
The negative lookahead (?!(?:#[0-9]+|[a-z]+);) asserts that it is not followed by...
(?:#[0-9]+|[a-z]+) a hash and digits, | OR letters...
then a semicolon.
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
The problem is that since you process the same string you replace the &in ©. If you re-order your map then that seemingly solves the problem. However according to the ECMAScript specifications, this is not a given, so you would be relying on implementation details of the ECMAScript engine used.
What you can do to make sure it will always work is to swap the keys so that & is always processed first:
map = {
'©' : '©',
'&' : '&'
};
var keys = Object.keys(map);
keys[keys.indexOf('&')] = keys[0];
keys[0] = '&';
keys.forEach(function (ico) {
var icoE = ico.replace(/([.?*+^$[\]\\(){}|-])/g, "\\$1");
raw = raw.replace( new RegExp(icoE, 'g'), map[ico] );
});
Obviously you need to add checks for the &'s existence if it isn't always there.
jsFiddle Demo.
Probably the simplest code change is to reorder your map by putting the ampersand on top.

Categories

Resources