What are the actual uses of ES6 Raw String Access? - javascript

What are the actual uses of String.raw Raw String Access introduced in ECMAScript 6?
// String.raw(callSite, ...substitutions)
function quux (strings, ...values) {
strings[0] === "foo\n"
strings[1] === "bar"
strings.raw[0] === "foo\\n"
strings.raw[1] === "bar"
values[0] === 42
}
quux `foo\n${ 42 }bar`
String.raw `foo\n${ 42 }bar` === "foo\\n42bar"
I went through the below docs.
http://es6-features.org/#RawStringAccess
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw
http://www.2ality.com/2015/01/es6-strings.html
https://msdn.microsoft.com/en-us/library/dn889830(v=vs.94).aspx
The only the thing that I understand, is that it is used to get the raw string form of template strings and used for debugging the template string.
When this can be used in real time development? They were calling this a tag function. What does that mean?
What concrete use cases am I missing?

The best, and very nearly only, use case for String.raw I can think of is if you're trying to use something like Steven Levithan's XRegExp library that accepts text with significant backslashes. Using String.raw lets you write something semantically clear rather than having to think in terms of doubling your backslashes, just like you can in a regular expression literal in JavaScript itself.
For instance, suppose I'm doing maintenance on a site and I find this:
var isSingleUnicodeWord = /^\w+$/;
...which is meant to check if a string contains only "letters." Two problems: A) There are thousands of "word" characters across the realm of human language that \w doesn't recognize, because its definition is English-centric; and B) It includes _, which many (including the Unicode consortium) would argue is not a "letter."
So if we're using XRegExp on the site, since I know it supports \pL (\p for Unicode categories, and L for "letter"), I might quickly swap this in:
var isSingleUnicodeWord = XRegExp("^\pL+$"); // WRONG
Then I wonder why it didn't work, facepalm, and go back and escape that backslash, since it's being consumed by the string literal.
Easy enough in that simple regex, but in something complicated, remembering to double all those backslashes is a maintenance pain. (Just ask Java programmers trying to use Pattern.)
Enter String.raw:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`);
Example:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`); // L: Letter
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Now I just kick back and write what I mean. I don't even really have to worry about ${...} constructs used in template literals to do substitution, because the odds of my wanting to apply a quantifier {...} to the end-of-line assertion ($) are...low. So I can happily use substitutions and still not worry about backslashes. Lovely.
Having said that, though, if I were doing it a lot, I'd probably want to write a function and use a tagged template instead of String.raw itself. But it's surprisingly awkward to do correctly:
// My one-time tag function
function xrex(strings, ...values) {
let raw = strings.raw;
let max = Math.max(raw.length, values.length);
let result = "";
for (let i = 0; i < max; ++i) {
if (i < raw.length) {
result += raw[i];
}
if (i < values.length) {
result += values[i];
}
}
console.log("Creating with:", result);
return XRegExp(result);
}
// Using it, with a couple of substitutions to prove to myself they work
let category = "L"; // L: Letter
let maybeEol = "$";
let isSingleUnicodeWord = xrex`^\p${category}+${maybeEol}`;
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Maybe the hassle is worth it if you're using it in lots of places, but for a couple of quick ones, String.raw is the simpler option.

First, a few things:
Template strings is old name for template literals.
A tag is a function.
String.raw is a method.
String.raw `foo\n${ 42 }bar\` is a tagged template literal.
Template literals are basically fancy strings.
Template literals can interpolate.
Template literals can be multi-line without using \.
String.raw is required to escape the escape character \.
Try putting a string that contains a new-line character \n through a function that consumes newline character.
console.log("This\nis\nawesome"); // "This\nis\nawesome"
console.log(String.raw`This\nis\nawesome`); // "This\\nis\\nawesome"
If you are wondering, console.log is not one of them. But alert is. Try running these through http://learnharmony.org/ .
alert("This\nis\nawesome");
alert(String.raw`This\nis\nawesome`);
But wait, that's not the use of String.raw.
Possible uses of String.raw method:
To show string without interpretation of backslashed characters (\n, \t) etc.
To show code for the output. (As in example below)
To be used in regex without escaping \.
To print windows director/sub-directory locations without using \\ to much. (They use \ remember. Also, lol)
Here we can show output and code for it in single alert window:
alert("I printed This\nis\nawesome with " + Sring.raw`This\nis\nawesome`);
Though, it would have been great if It's main use could have been to get back the original string. Like:
var original = String.raw`This is awesome.`;
where original would have become: This\tis \tawesome.. This isn't the case sadly.
References:
http://exploringjs.com/es6/ch_template-literals.html
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw

Template strings can be useful in many situations which I will explain below. Considering this, the String.raw prevents escapes from being interpreted. This can be useful in any template string in which you want to contain the escape character but do not want to escape it. A simple example could be the following:
var templateWithBackslash = String.raw `someRegExp displayed in template /^\//`
There are a few things inside that are nice to note with template strings.
They can contain unescaped line breaks without problems.
They can contain "${}". Inside these curly braces the javascript is interpreted instead.
(Note: running these will output the result to your console [in browser dev tools])
Example using line breaks:
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
{
asdf: "and some pretty printed json"
}
</pre>
</div>
`
console.log(myTemplate)
If you wanted to do the above with a normal string in Javascript it would look like the following:
var myTemplate = "\
<div class="myClass">\
<pre>\
My formatted text\
with multiple lines\
{\
asdf: "and some pretty printed json"\
}\
</pre>\
</div>"
console.log(myTemplate)
You will notice the first probably looks much nicer (no need to escape line breaks).
For the second I will use the same template string but also insert the some pretty printed JSON.
var jsonObj = {asdf: "and some pretty printed json", deeper: {someDeep: "Some Deep Var"}}
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
${JSON.stringify(jsonObj, null, 2)}
</pre>
</div>
`
console.log(myTemplate)

In NodeJS it is extremely handy when it comes to filepath handling:
var fs=require('fs');
var s = String.raw`C:\Users\<username>\AppData\Roaming\SomeApp\someObject.json`;
var username = "bob"
s=s.replace("<username>",username)
fs.readFile(s,function(err,result){
if (err) throw error;
console.log(JSON.parse(result))
})
It improves readability of filepaths on Windows. \ is also a fairly common separator, so I can definitely see why it would be useful in general. However it is pretty stupid how \ still escapes `... So ultimately:
String.raw`C:\Users\` //#==> C:\Users\`
console.log(String.raw`C:\Users\`) //#==> SyntaxError: Unexpected end of input.

In addition to its use as a tag, String.raw is also useful in implementing new tag functions as a tool to do the interleaving that most people do with a weird loop. For example, compare:
function foo(strs, ...xs) {
let result = strs[0];
for (let i = 0; i < xs.length; ++i) {
result += useFoo(xs[i]) + strs[i + 1];
}
return result;
}
with
function foo(strs, ...xs) {
return String.raw({raw: strs}, ...xs.map(useFoo));
}

The Use
(Requisite knowledge: tstring §.)
Instead of:
console.log(`\\a\\b\\c\\n\\z\\x12\\xa9\\u1234\\u00A9\\u{1234}\\u{00A9}`);
.you can:
console.log(String.raw`\a\b\c\n\z\x12\xa9\u1234\u00A9\u{1234}\u{00A9}`);
"Escaping"
<\\u> is fine, yet <\u> needs "escaping", eg:
console.log(String.raw`abc${'\\u'}abc`);
.Dit <\\x>, <\x>,
<console.log(String.raw`abc${`\\x`}abc`)>;
.<\`>, <`>, <console.log(String.raw`abc${`\``}abc`)>;
.<\${>, <${&>, <console.log(String.raw`abc${`$\{`}abc`)>;
.<\\1> (till <\\7>), <\1>, <console.log(String.raw`abc${`\\1`}abc`)>;
.<\\>, endunit <\>, <console.log(String.raw`abc${`\\`}`)>.
Nb
There's also a new "latex" string. Cf §.

I've found it to be useful for testing
my RegExps. Say I have a RegExp which
should match end-of-line comments because
I want to remove them. BUT, it must not
match source-code for a regexp like /// .
If your code contains /// it is not the
start of an EOL comment but a RegExp, as
per the rules of JavaScript syntax.
I can test whether my RegExp in variable patEOLC
matches or doesn't /// with:
String.raw`/\//` .match (patEOLC)
In other words it is a way to let my
code "see" code the way it exists in
source-code, not the way it exists
in memory after it has been read
into memory from source-code, with
all backslashes removed.
It is a way to "escape escaping" but
without having to do it separately
for every backslash in a string, but
for all of them at the same time.
It is a way to say that in a given
(back-quoted) string backslash
shall behave just like any other
character, it has no special
meaning or interpretation.

Related

Why the .replace() and toUppercase() did not work in the second function? [duplicate]

I want to replace the smart quotes like ‘, ’, “ and ” to regular quotes. Also, I wanted to replace the ©, ® and ™. I used the following code. But it doesn't help.
Kindly help me to resolve this issue.
str.replace(/[“”]/g, '"');
str.replace(/[‘’]/g, "'");
Use:
str = str.replace(/[“”]/g, '"');
str = str.replace(/[‘’]/g, "'");
or to do it in one statement:
str = str.replace(/[“”]/g, '"').replace(/[‘’]/g,"'");
In JavaScript (as in many other languages) strings are immutable - string "replacement" methods actually just return the new string instead of modifying the string in place.
The MDN JavaScript reference entry for replace states:
Returns a new string with some or all matches of a pattern replaced by a replacement.
…
This method does not change the String object it is called on. It simply returns a new string.
replace return the resulting string
str = str.replace(/["']/, '');
The OP doesn't say why it isn't working, but there seems to be problems related to the encoding of the file. If I have an ANSI encoded file and I do:
var s = "“This is a test” ‘Another test’";
s = s.replace(/[“”]/g, '"').replace(/[‘’]/g,"'");
document.writeln(s);
I get:
"This is a test" "Another test"
I converted the encoding to UTF-8, fixed the smart quotes (which broke when I changed encoding), then converted back to ANSI and the problem went away.
Note that when I copied and pasted the double and single smart quotes off this page into my test document (ANSI encoded) and ran this code:
var s = "“This is a test” ‘Another test’";
for (var i = 0; i < s.length; i++) {
document.writeln(s.charAt(i) + '=' + s.charCodeAt(i));
}
I discovered that all the smart quotes showed up as ? = 63.
So, to the OP, determine where the smart quotes are originating and make sure they are the character codes you expect them to be. If they are not, consider changing the encoding of the source so they arrive as “ = 8220, ” = 8221, ‘ = 8216 and ’ = 8217. Use my loop to examine the source, if the smart quotes are showing up with any charCodeAt() values other than those I've listed, replace() will not work as written.
To replace all regular quotes with smart quotes, I am using a similar function. You must specify the CharCode as some different computers/browsers default settings may identify the plain characters differently ("",",',').
Using the CharCode with call the ASCII character, which will eliminate the room for error across different browsers, and operating systems. This is also helpful for bilingual use (accents, etc.).
To replace smart quotes with SINGLE QUOTES
function unSmartQuotify(n){
var name = n;
var apos = String.fromCharCode(39);
while (n.indexOf("'") > -1)
name = name.replace("'" , apos);
return name;
}
To find the other ASCII values you may need. Check here.

RegExp replacement with variable and anchor

I've done some research, like 'How do you use a variable in a regular expression?' but no luck.
Here is the given input
const input = 'abc $apple 123 apple $banana'
Each variable has its own value.
Currently I'm able to query all variables from the given string using
const variables = input.match(/([$]\w+)/g)
Replacement through looping the variables array with the following codes is not successful
const r = `/(\$${variable})/`;
const target = new RegExp(r, 'gi');
input.replace(target, value);
However, without using the variable, it will be executed,
const target = new RegExp(/(\$apple)/, 'gi');
input.replace(target, value);
I also changed the variable flag from $ to % or #, and it works with the following codes,
// const target = new RegExp(`%{variable}`, 'gi');
// const target = new RegExp(`#{variable}`, 'gi');
input.replace(target, value);
How to match the $ symbol with variable in this case?
If understand correctly, uses (\\$${variable}).
You can check below Fiddle, if only one slash, it will cause RegExp is /($apple)/gi (the slash and the following character are escaped: \$ => $), but $ indicates the end of string in Regex if not escaped.
So the solution is add another slash.
Like below demo:
const input = 'abc $apple 123 apple $banana'
let variable = 'apple'
let value = '#test#'
const r = `(\\$${variable})`;
const target = new RegExp(r, 'gi');
console.log('template literals', r, `(\$${variable})`)
console.log('regex:', target, new RegExp(`(\$${variable})`, 'gi'))
console.log(input.replace(target, value))
I have been using regular expressions just about everyday for almost a year now. I'll post my thoughts and just say:
Regular Expressions are most useful at finding parts of a text or data
file.
If there is a text-file that contains the word "apple" or some derivative there-in, regular-expressions can be a great tool. I use them everyday for parsing HTML content, as I write foreign-news translations (based in HTML).
I believe the code that was posted was in JavaScript (because I saw the replace(//,"gi") function which is what I know is used in that scripting language. I use Java's java.util.regex package myself, and the syntax is just slightly different.
If all you want to do is put a "place-holder" inside of a String - this code could work, I guess - but again, understanding why "regular-expressions" are necessary seems like the real question. In this example, I have used the ampersand ('&') as the placeholder - since I know for a fact it is not one of the "reserved key words" used by (most, but not necessarily all of) the Regular Expression Compiler and Processor.
var s1 = "&VAR1"; // Un-used variable - leaving it here for show!
var myString = "An example text-string with &VAR1, a particular kind of fruit.";
myString.replace(/&VAR1/gi, "apple");
If you want a great way to practice with Regular-Expressions, go to this web-site and play around with them:
https://regexr.com/
Here are the rules for "reserved key symbols" of RegEx Patterns (Copied from that Site):
The following character have special meaning, and should be preceded
by a \ (backslash) to represent a literal character:
+*?^$.[]{}()|/
Within a character set, only \, -, and ] need to be escaped.
Also, sort of "most importantly" - Regular Expressions are "compiled" in Java. I'm not exactly sure about how they work in Java-Script - but there is no such concept as a "Variable" in the Compiled-Expression Part of a Regular Expression - just in the text and data it analyzes. What that means is - if you want to change what you are searching for in a particular piece of Text or Data in Java, you must re-compile your expression using:
Pattern p = Pattern.compile(regExString, flags);
There is not an easy way to "dynamically change" particular values of text in the expression. The amount of complexity it would add would be astronomical, and the value, minimal. Instead, just compile another expression in your code, and search again. Another option is to better undestand things like .* .+ .*? .+? and even (.*) (.+) (.*?) (.+?) so that things that change, do change, and things that don't change, won't!
For instance if you used this pattern to match different-variables:
input.replace(/&VAR.*VAREND/gi, "apple");
All of your variables could be identified by the re-used pattern: "&VAR-stuff-VAREND" but this is just one of millions of ways to change your idea - skin the cat.
Using a replace callback function you can avoid building a separate regex for each variable and replace them all at once:
const input = 'abc $apple 123 apple $banana'
const vars = {
apple: 'Apfel',
banana: 'Banane'
}
let result = input.replace(/\$(\w+)/g, (_, v) => vars[v])
console.log(result)
This won't work if apple and banana were local variables though, but using locals is a bad idea anyways.

Line endings (also known as Newlines) in JS strings

It is well known, that Unix-like system uses LF characters for newlines, whereas Windows uses CR+LF.
However, when I test this code from local HTML file on my Windows PC, it seems that JS treat all newlines as separated with LF. Is it correct assumption?
var string = `
foo
bar
`;
// There should be only one blank line between foo and bar.
// \n - Works
// string = string.replace(/^(\s*\n){2,}/gm, '\n');
// \r\n - Doesn't work
string = string.replace(/^(\s*\r\n){2,}/gm, '\r\n');
alert(string);
// That is, it seems that JS treat all newlines as separated with
// `LF` instead of `CR+LF`?
I think I found an explanation.
You are using an ES6 Template Literal to construct your multi-line string.
According to the ECMAScript specs a
.. template literal component is interpreted as a sequence of Unicode
code points. The Template Value (TV) of a literal component is
described in terms of code unit values (SV, 11.8.4) contributed by the
various parts of the template literal component. As part of this
process, some Unicode code points within the template component are
interpreted as having a mathematical value (MV, 11.8.3). In
determining a TV, escape sequences are replaced by the UTF-16 code
unit(s) of the Unicode code point represented by the escape sequence.
The Template Raw Value (TRV) is similar to a Template Value with the
difference that in TRVs escape sequences are interpreted literally.
And below that, it is defined that:
The TRV of LineTerminatorSequence::<LF> is the code unit 0x000A (LINE
FEED).
The TRV of LineTerminatorSequence::<CR> is the code unit 0x000A (LINE FEED).
My interpretation here is, you always just get a line feed - regardless of the OS-specific new-line definitions when you use a template literal.
Finally, in JavaScript's regular expressions a
\n matches a line feed (U+000A).
which describes the observed behavior.
However, if you define a string literal '\r\n' or read text from a file stream, etc that contains OS-specific new-lines you have to deal with it.
Here are some tests that demonstrate the behavior of template literals:
`a
b`.split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
(String.raw`a
b`).split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
'a\r\nb'.split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
"a\
b".split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
Interpreting the results:
char(97) = a, char(98) = b
char(10) = \n, char(13) = \r
You could use the regular expression: /^\s*[\r\n]/gm
Code example:
let string = `
foo
bar
`;
string = string.replace(/^\s*[\r\n]/gm, '\r\n');
console.log(string);

Trimming whitespace without affecting strings

So, I recently found this example on trimming whitespace, but I've found that it also affects strings in code. For instance, say I'm doing a lesson on string comparison, and to demonstrate that "Hello World!" and "Hello World!" are different, I need the code compression to not have any effect on those two strings.
I'm using the whitespace compression so that people with different formatting styles won't be punished for using something that I don't use. For instance, I like to format my functions like this:
function foo(){
return 0;
};
While others may format it like this:
function foo()
{
return 0;
};
So I use whitespace compression around punctuation to make sure it always comes out the same, but I don't want it to affect anything within a string. Is there a way to add exceptions in JavaScript's replace() function?
UPDATE:
check this jsfiddle
var str='dfgdfg fdgfd fd gfd g print("Hello World!"); sadfds dsfgsgdf'
var regex=/(?:(".*"))|(\s+)/g;
var newStr=str.replace(regex, '$1 ');
console.log(newStr);
console.log(str);
In this code it will process everything except the quoted strings
to play with the code more comfortably you can see how the regex is working :
https://regex101.com/r/tG5qH2/1
I made a jsfiddle here: https://jsfiddle.net/cuywha8t/2/
var stringSplitRegExp = /(".+?"|'.+?')/g;
var whitespaceRegExp = /\s+\{/g;
var whitespaceReplacement = "{"
var exampleCode = `var str = "test test test" + 'asdasd "sd"';\n`+
`var test2 = function()\n{\nconsole.log("This is a string with 'single quotes'")\n}\n`+
`console.log('this is a string with "double quotes"')`;
console.log(exampleCode)
var separatedStrings =(exampleCode.split(stringSplitRegExp))
for(var i = 0; i < separatedStrings.length; i++){
if (i%2 === 1){
continue;
}
var oldString = separatedStrings[i];
separatedStrings[i] = oldString.replace(whitespaceRegExp, whitespaceReplacement)
}
console.log(separatedStrings.join(""))
I believe this is what you are looking for. it handles cases where a string contains the double quotes, etc. without modifying. This example just does the formatting of the curly-braces as you mentioned in your post.
Basically, the behavior of split allows the inclusion of the splitter in the array. And since you know the split is always between two non-string elements you can leverage this by looping over and modifying only every even-indexed array element.
If you want to do general whitespace replacement you can of course modify the regexp or do multiple passes, etc.

Ignoring whitespace in Javascript regular expression patterns?

From my research it looks like Javascript's regular expressions don't have any built-in equivalent to Perl's /x modifier, or .NET's RegexOptions.IgnorePatternWhitespace modifier. These are very useful as they can make a complex regex much easier to read. Firstly, have I missed something and is there a Javascript built-in equivalent to these? Secondly, if there isn't, does anyone know of a good jQuery plugin that will implement this functionality? It's a shame to have to compress my complex regex into one line because of Javascript's apparent regex limitations.
If I understand you correctly you want to add white space that isn't part of the regexp?
As far as I know it isn't possible with literal regexp.
Example:
var a = /^[\d]+$/
You can break up the regexp in several lines like this:
var a = RegExp(
"^" +
"[\\d]+" + // This is a comment
"$"
);
Notice that since it is now a normal string, you have to escape \ with \\
Or if you have a complex one:
var digit_8 = "[0-9]{8}";
var alpha_4 = "[A-Za-z]{4}";
var a = RegExp(
digit_8 +
alpha_4 + // Optional comment
digit_8
);
Update: Using a temporary array to concatenate the regular expression:
var digit_8 = "[0-9]{8}";
var alpha_4 = "[A-Za-z]{4}";
var a = RegExp([
digit_8,
alpha_4, // Optional comment
digit_8,
"[0-9A-F]" // Another comment to a string
].join(""));
Unfortunately there isn't such option in ES5, and I suspect it's unlikely to ever be in RegExp literals, because they're already very hard to parse and line breaks would make them even more ambiguous.
If you want easy escaping and syntax highlighting of RegExp literals, you can join them by taking advantage of the source property. It's not perfect, but IMHO less bad than falling all the way back to strings:
new RegExp(
/foo/.source +
/[\d+]/.source +
/bar/.source
);
In ES6 you can create your own template string:
regexp`
foo
[\d+]
bar
`;
function regexp(parts) {
// I'm ignoring support for ${} placeholders for brevity,
// but full implementation should escape them.
// Note that `.raw` allows use of \d instead of \\d.
return new RegExp(parts.raw.join('').replace(/\s+/g,''));
}

Categories

Resources