Escaping a string to be placed in generated JavaScript source, in JavaScript - javascript

I'm writing code that will generate some javascript. The javascript will involve assigning a variable in the generated code to a string passed into the generator. The generator is also in javascript.
Basically I want to do this:
function generate_code(text) {
return "var a = " + jsEscapeString(text) + "; alert(a);";
}
function jsEscapeString(text) {
// WHAT GOES HERE?
// e.g. it needs to:
// - surround with quotes
// - escape quotes inside the text
// - escape backslashes and newlines and other fun characters
// - defend against other horrible things I probably don't know about
}
I don't want something that only works in the happy case. I want something correct. Something that would survive a malicious adversary trying to do sandbox escapes on the resulting code (e.g. like what you do in the game 'Untrusted').

Super easy.
function jsEscapeString(text) {
return JSON.stringify(text);
}
No matter what you put in, you will ALWAYS get a valid representation of it that can be dumped into JS source. The result, when executed, will always be exactly what you put in.
This even works for strings, booleans, numbers, arrays, objects... basically everything you'll ever need.
Although I'm curious as to why you're doing this... This smells of eval fishiness...

You would need to escape backslash, the string delimiter, and control characters:
function jsEscapeString(text) {
return '"' +
text
.replace(/\\/g, '\\\\')
.replace(/"/g, '\\"')
.replace(/\r/g, '\\r')
.replace(/\n/g, '\\n')
.replace(/\t/g, '\\t')
.replace(/\b/g, '\\b')
.replace(/\v/g, '\\v')
.replace(/\f/g, '\\f')
+ '"';
}

Related

Deformat Discord Text

I have been trying to make a "deformatting" function that escapes all formatting characters (*,`,_,|,~) and I have actually created this, but I want to check if the user already escaped it to prevent formatting happening on accident.
My current function:
function deformat(string) {
return string.replace(/([*_~|`])/g, '\\$1');
}
I would love to be able to improve it so that a string like "\*hey\*" doesn't become "\\*hey\\*", which would italicize it. (Note that the slashes are not escaped in the previous two string for readability)
Try this:
function deformat(string) {
return string.replace(/(?<!\\)([*_~|`])/g, '\\$1');
}
console.log(deformat('*hey*'));
console.log(deformat('\*hey\*'));
Edit:
Since the above didn't work, try this:
function deformat(string) {
return string.replace(/(^|[^\\])([*_~|`])/g, '$1\\$2');
}
console.log(deformat('*hey*'));
console.log(deformat('\*hey\*'));
It uses (^|[^\\]), which matches the start of the string or anything that isn't '\', instead of a negative lookbehind.

Escaping backslash in a string containing backslash

I have a string containing I\u2019m (with backslashes not escaped)
var myString = 'I\\u2019m'; // I\u2019m
But then I need a function that 'escape backslashes' that string, so the function I'm looking for would return I'm
backslashString(myString); // I'm
I've tried using eval:
function backslashString(input){
input = input.replace(/'/g, "\\'"); // Replace ' with \' that's going to mess up eval
return eval(`'${input}'`);
}
But is there a proper way of doing it? I'm looking for a function that escape backslashes a string containing I\u2019m to I'm and also handles if there's an extra backslash (A lost \ backslash)
EDIT:
I did not ask what I meant from the start. This not only applies to unicode characters, but applies to all backslash characters including \n
The backslashes aren’t the real problem here - the real problem is the difference between code and data.
\uXXXX is JavaScript syntax to write the Unicode codepoint of a character in a text literal. It gets replaced with the actual character, when the JavaScript parser interprets this code.
Now you have a variable that contains the value I\u2019m already - that is data. This does not get parsed as JavaScript, so it does mean the literal characters I\u2019m, and not I’m. eval can “fix” that, because the missing step of interpreting this as code is simply what eval does.
If you do not want to use eval (and thereby invite all the potential risks that entails, if the input data is not completely under your control), then you can parse those numeric values from the string using regular expressions, and then use String.formCharCode to create the actual Unicode character from the given code point:
var myString = 'I\\u2019m and I\\u2018m';
var myNewString = myString.replace(/\\u([0-9]+)/g, function(m, n) {
return String.fromCharCode(parseInt(n, 16)) }
);
console.log(myNewString)
/\\u([0-9]+)/g - regular expression to match this \uXXXX format (X=digits), g modifier to replace all matches instead of stopping after the first.
parseInt(n, 16) - to convert the hexadecimal value to a decimal first, because String.fromCharCode wants the latter.
decodeURIComponent(JSON.parse('"I\\u2019m"'));
OR for multiple
'I\\\u2019m'.split('\\').join().replace(/,/g,'');
'I\u2019m'.split('\\').join().replace(/,/g,'');
Looks like there's no other way other than eval (JSON.parse doesn't like new lines in strings)
NOTE: The function would return false if it has a trailing backslash
function backslashString(input){
input = input.replace(/`/g, '\\`'); // Escape quotes for input to eval
try{
return eval('`'+input+'`');
}catch(e){ // Will return false if input has errors in backslashing
return false;
}
}

What are the actual uses of ES6 Raw String Access?

What are the actual uses of String.raw Raw String Access introduced in ECMAScript 6?
// String.raw(callSite, ...substitutions)
function quux (strings, ...values) {
strings[0] === "foo\n"
strings[1] === "bar"
strings.raw[0] === "foo\\n"
strings.raw[1] === "bar"
values[0] === 42
}
quux `foo\n${ 42 }bar`
String.raw `foo\n${ 42 }bar` === "foo\\n42bar"
I went through the below docs.
http://es6-features.org/#RawStringAccess
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw
http://www.2ality.com/2015/01/es6-strings.html
https://msdn.microsoft.com/en-us/library/dn889830(v=vs.94).aspx
The only the thing that I understand, is that it is used to get the raw string form of template strings and used for debugging the template string.
When this can be used in real time development? They were calling this a tag function. What does that mean?
What concrete use cases am I missing?
The best, and very nearly only, use case for String.raw I can think of is if you're trying to use something like Steven Levithan's XRegExp library that accepts text with significant backslashes. Using String.raw lets you write something semantically clear rather than having to think in terms of doubling your backslashes, just like you can in a regular expression literal in JavaScript itself.
For instance, suppose I'm doing maintenance on a site and I find this:
var isSingleUnicodeWord = /^\w+$/;
...which is meant to check if a string contains only "letters." Two problems: A) There are thousands of "word" characters across the realm of human language that \w doesn't recognize, because its definition is English-centric; and B) It includes _, which many (including the Unicode consortium) would argue is not a "letter."
So if we're using XRegExp on the site, since I know it supports \pL (\p for Unicode categories, and L for "letter"), I might quickly swap this in:
var isSingleUnicodeWord = XRegExp("^\pL+$"); // WRONG
Then I wonder why it didn't work, facepalm, and go back and escape that backslash, since it's being consumed by the string literal.
Easy enough in that simple regex, but in something complicated, remembering to double all those backslashes is a maintenance pain. (Just ask Java programmers trying to use Pattern.)
Enter String.raw:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`);
Example:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`); // L: Letter
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Now I just kick back and write what I mean. I don't even really have to worry about ${...} constructs used in template literals to do substitution, because the odds of my wanting to apply a quantifier {...} to the end-of-line assertion ($) are...low. So I can happily use substitutions and still not worry about backslashes. Lovely.
Having said that, though, if I were doing it a lot, I'd probably want to write a function and use a tagged template instead of String.raw itself. But it's surprisingly awkward to do correctly:
// My one-time tag function
function xrex(strings, ...values) {
let raw = strings.raw;
let max = Math.max(raw.length, values.length);
let result = "";
for (let i = 0; i < max; ++i) {
if (i < raw.length) {
result += raw[i];
}
if (i < values.length) {
result += values[i];
}
}
console.log("Creating with:", result);
return XRegExp(result);
}
// Using it, with a couple of substitutions to prove to myself they work
let category = "L"; // L: Letter
let maybeEol = "$";
let isSingleUnicodeWord = xrex`^\p${category}+${maybeEol}`;
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Maybe the hassle is worth it if you're using it in lots of places, but for a couple of quick ones, String.raw is the simpler option.
First, a few things:
Template strings is old name for template literals.
A tag is a function.
String.raw is a method.
String.raw `foo\n${ 42 }bar\` is a tagged template literal.
Template literals are basically fancy strings.
Template literals can interpolate.
Template literals can be multi-line without using \.
String.raw is required to escape the escape character \.
Try putting a string that contains a new-line character \n through a function that consumes newline character.
console.log("This\nis\nawesome"); // "This\nis\nawesome"
console.log(String.raw`This\nis\nawesome`); // "This\\nis\\nawesome"
If you are wondering, console.log is not one of them. But alert is. Try running these through http://learnharmony.org/ .
alert("This\nis\nawesome");
alert(String.raw`This\nis\nawesome`);
But wait, that's not the use of String.raw.
Possible uses of String.raw method:
To show string without interpretation of backslashed characters (\n, \t) etc.
To show code for the output. (As in example below)
To be used in regex without escaping \.
To print windows director/sub-directory locations without using \\ to much. (They use \ remember. Also, lol)
Here we can show output and code for it in single alert window:
alert("I printed This\nis\nawesome with " + Sring.raw`This\nis\nawesome`);
Though, it would have been great if It's main use could have been to get back the original string. Like:
var original = String.raw`This is awesome.`;
where original would have become: This\tis \tawesome.. This isn't the case sadly.
References:
http://exploringjs.com/es6/ch_template-literals.html
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw
Template strings can be useful in many situations which I will explain below. Considering this, the String.raw prevents escapes from being interpreted. This can be useful in any template string in which you want to contain the escape character but do not want to escape it. A simple example could be the following:
var templateWithBackslash = String.raw `someRegExp displayed in template /^\//`
There are a few things inside that are nice to note with template strings.
They can contain unescaped line breaks without problems.
They can contain "${}". Inside these curly braces the javascript is interpreted instead.
(Note: running these will output the result to your console [in browser dev tools])
Example using line breaks:
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
{
asdf: "and some pretty printed json"
}
</pre>
</div>
`
console.log(myTemplate)
If you wanted to do the above with a normal string in Javascript it would look like the following:
var myTemplate = "\
<div class="myClass">\
<pre>\
My formatted text\
with multiple lines\
{\
asdf: "and some pretty printed json"\
}\
</pre>\
</div>"
console.log(myTemplate)
You will notice the first probably looks much nicer (no need to escape line breaks).
For the second I will use the same template string but also insert the some pretty printed JSON.
var jsonObj = {asdf: "and some pretty printed json", deeper: {someDeep: "Some Deep Var"}}
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
${JSON.stringify(jsonObj, null, 2)}
</pre>
</div>
`
console.log(myTemplate)
In NodeJS it is extremely handy when it comes to filepath handling:
var fs=require('fs');
var s = String.raw`C:\Users\<username>\AppData\Roaming\SomeApp\someObject.json`;
var username = "bob"
s=s.replace("<username>",username)
fs.readFile(s,function(err,result){
if (err) throw error;
console.log(JSON.parse(result))
})
It improves readability of filepaths on Windows. \ is also a fairly common separator, so I can definitely see why it would be useful in general. However it is pretty stupid how \ still escapes `... So ultimately:
String.raw`C:\Users\` //#==> C:\Users\`
console.log(String.raw`C:\Users\`) //#==> SyntaxError: Unexpected end of input.
In addition to its use as a tag, String.raw is also useful in implementing new tag functions as a tool to do the interleaving that most people do with a weird loop. For example, compare:
function foo(strs, ...xs) {
let result = strs[0];
for (let i = 0; i < xs.length; ++i) {
result += useFoo(xs[i]) + strs[i + 1];
}
return result;
}
with
function foo(strs, ...xs) {
return String.raw({raw: strs}, ...xs.map(useFoo));
}
The Use
(Requisite knowledge: tstring §.)
Instead of:
console.log(`\\a\\b\\c\\n\\z\\x12\\xa9\\u1234\\u00A9\\u{1234}\\u{00A9}`);
.you can:
console.log(String.raw`\a\b\c\n\z\x12\xa9\u1234\u00A9\u{1234}\u{00A9}`);
"Escaping"
<\\u> is fine, yet <\u> needs "escaping", eg:
console.log(String.raw`abc${'\\u'}abc`);
.Dit <\\x>, <\x>,
<console.log(String.raw`abc${`\\x`}abc`)>;
.<\`>, <`>, <console.log(String.raw`abc${`\``}abc`)>;
.<\${>, <${&>, <console.log(String.raw`abc${`$\{`}abc`)>;
.<\\1> (till <\\7>), <\1>, <console.log(String.raw`abc${`\\1`}abc`)>;
.<\\>, endunit <\>, <console.log(String.raw`abc${`\\`}`)>.
Nb
There's also a new "latex" string. Cf §.
I've found it to be useful for testing
my RegExps. Say I have a RegExp which
should match end-of-line comments because
I want to remove them. BUT, it must not
match source-code for a regexp like /// .
If your code contains /// it is not the
start of an EOL comment but a RegExp, as
per the rules of JavaScript syntax.
I can test whether my RegExp in variable patEOLC
matches or doesn't /// with:
String.raw`/\//` .match (patEOLC)
In other words it is a way to let my
code "see" code the way it exists in
source-code, not the way it exists
in memory after it has been read
into memory from source-code, with
all backslashes removed.
It is a way to "escape escaping" but
without having to do it separately
for every backslash in a string, but
for all of them at the same time.
It is a way to say that in a given
(back-quoted) string backslash
shall behave just like any other
character, it has no special
meaning or interpretation.

Escape character Problems IN MY oBJECT

is it possible to create a value with these characteristics with key addBlueborder?
var Object = {
footer: '#footer',
mockup: '#mockup',
// My dude??????
addBlueborder: '"input[name='"+ oBlock +"_fix']"'
};
I don't know to escape with double quote.
I would like to make that:
$(Object.addBlueborder).each(function(e) { ... };
thanks
You have quotes confusion. It should be like this
addBlueborder: 'input[name="' + oBlock + '_fix"]'
Also note, that quotes around the attribute are not really necessary if your oBlock doesn't contain spaces or other weird characters. In this case expression will become even simpler
addBlueborder: 'input[name=' + oBlock + '_fix]'

parse multiple levels of document.write nesting

The question is a bit weird, but I am trying this just for learning purpose.
I have this string
var str = "document.write('<script>document.write(\"<script>document.write('<script>document.write(\"Hello World\");<\/script>');<\/script>\");<\/script>')";
And, I am trying to parse/eval this string so that the inner most document.write prints "Hello World". Here's what I tried.
function (str) {
str = str.replace(/["']/g, '"')
.replace(/"<script>document.write/g, "")
.replace(/;<\/script\>"/g, "");
eval(str);
}
But, this looks like horrible cheating. I want to do it a more subtle way. for example escaping the \s or splitting </script> tags. But can't just simply escape backslashes, as the number of slashes would need to be more for deeper nesting.
Any ideas?

Categories

Resources