Javascript \x escaping - javascript

I've seen a few other programs that have something like this:
var string = '\x32\x20\x60\x78\x6e\x7a\x9c\x89';
And I had to try to fiddle with the numbers and letters, to find the text I wanted to display.
I'm wondering if there is a function to find the \x escape of a string, like string.toUpperCase() in JS. I'm using processingJS, but it will be okay for me to use other programming languages to find the ASCII for \x.

If you have a string that you want escaped, you can use String.prototype.charCodeAt()
If you have the code with escapes, you can just evaluate them to get the original string. If it's a string with literal escapes, you can use String.fromCharCode()
If you have '\x32\x20\x60\x78\x6e\x7a\x9c\x89' and want "2 `xnz" then
'\x32\x20\x60\x78\x6e\x7a\x9c\x89' == "2 `xnz"
If you have '\\x32\\x20\\x60\\x78\\x6e\\x7a\\x9c\\x89' which is a literal string with the value \x32\x20\x60\x78\x6e\x7a\x9c\x89 then you can parse it by passing the decimal value of each pair of hex digits to String.prototype.fromCharCode()
'\\x32\\x20\\x60\\x78\\x6e\\x7a\\x9c\\x89'.replace(/\\x([0-9a-f]{2})/ig, function(_, pair) {
return String.fromCharCode(parseInt(pair, 16));
})
Alternatively, eval is an option if you can be sure of the safety of the input and performance isn't important1.
eval('"\\x32\\x20\\x60\\x78\\x6e\\x7a\\x9c\\x89"')
Note the " nested in the ' surrounding the input string.
If you know it's a program, and it's from a trusted source, you can eval the string directly, which won't give you the ASCII, but will execute the program itself.
eval('\\x32\\x20\\x60\\x78\\x6e\\x7a\\x9c\\x89')
Note that the input you provided is not a program and the eval call fails.
If you have "2 `xnz" and want '\x32\x20\x60\x78\x6e\x7a\x9c\x89' then
"2 `xnz".split('').map(function(e) {
return '\\x' + e.charCodeAt(0).toString(16);
}).join('')

Related

How to avoid parsing "\" in JSON.parse () method

I'm trying to parse JSON to JS object, but i have problem with one property, which in value always contains "\" character and four characters after. E.g. string looks something like that:
"key": "Z13g\u003d"
Once I parse it i get:
"key": "Z13g="
Is there any easy way to solve this problem?
If you have a string like "\u003d" in JavaScript, it's indistinguishable from its parsed string "=". Even the String.replace function won't find the \ character in the string.
However, if you are truly trying to represent a string that includes the backslash character, you need to escape it with another backslash.
Whereas "\u003d" represents the string value "=", "\\u003d" represents the string value "\u003d".
However, things get more complicated when you invoke JSON.parse; since it's parsing the string value again, it'll transform "\\u003d" to "=".
To get around this, you need to double-escape the backslash, so you'll have a string value of "\\\\u003d". The parser will transform that into "\u003d" instead of "=".
console.log(JSON.parse("\"\u003d\"")); // "\u003d" -> "="
console.log(JSON.parse("\"\\u003d\"")); // "\\u003d" -> "="
console.log(JSON.parse("\"\\\\u003d\"")); // "\\\\u003d" -> "\u003d"

How do I replace a double-quote with an escape-char double-quote in a string using JavaScript?

Say I have a string variable (var str) as follows-
Dude, he totally said that "You Rock!"
Now If I'm to make it look like as follows-
Dude, he totally said that "You Rock!"
How do I accomplish this using the JavaScript replace() function?
str.replace("\"","\\""); is not working so well. It gives unterminated string literal error.
Now, if the above sentence were to be stored in a SQL database, say in MySQL as a LONGTEXT (or any other VARCHAR-ish) datatype, what else string optimizations I need to perform?
Quotes and commas are not very friendly with query strings. I'd appreciate a few suggestions on that matter as well.
You need to use a global regular expression for this. Try it this way:
str.replace(/"/g, '\\"');
Check out regex syntax and options for the replace function in Using Regular Expressions with JavaScript.
Try this:
str.replace("\"", "\\\""); // (Escape backslashes and embedded double-quotes)
Or, use single-quotes to quote your search and replace strings:
str.replace('"', '\\"'); // (Still need to escape the backslash)
As pointed out by helmus, if the first parameter passed to .replace() is a string it will only replace the first occurrence. To replace globally, you have to pass a regex with the g (global) flag:
str.replace(/"/g, "\\\"");
// or
str.replace(/"/g, '\\"');
But why are you even doing this in JavaScript? It's OK to use these escape characters if you have a string literal like:
var str = "Dude, he totally said that \"You Rock!\"";
But this is necessary only in a string literal. That is, if your JavaScript variable is set to a value that a user typed in a form field you don't need to this escaping.
Regarding your question about storing such a string in an SQL database, again you only need to escape the characters if you're embedding a string literal in your SQL statement - and remember that the escape characters that apply in SQL aren't (usually) the same as for JavaScript. You'd do any SQL-related escaping server-side.
The other answers will work for most strings, but you can end up unescaping an already escaped double quote, which is probably not what you want.
To work correctly, you are going to need to escape all backslashes and then escape all double quotes, like this:
var test_str = '"first \\" middle \\" last "';
var result = test_str.replace(/\\/g, '\\\\').replace(/\"/g, '\\"');
depending on how you need to use the string, and the other escaped charaters involved, this may still have some issues, but I think it will probably work in most cases.
var str = 'Dude, he totally said that "You Rock!"';
var var1 = str.replace(/\"/g,"\\\"");
alert(var1);

Why does Closure Compiler insist on adding more bytes?

If I give Closure Compiler something like this:
window.array = '0123456789'.split('');
It "compiles" it to this:
window.array="0,1,2,3,4,5,6,7,8,9".split(",");
Now as you can tell, that's bigger. Is there any reason why Closure Compiler is doing this?
I think this is what's going on, but I am by no means certain...
The code that causes the insertion of commas is tryMinimizeStringArrayLiteral in PeepholeSubstituteAlternateSyntax.java.
That method contains a list of characters that are likely to have a low Huffman encoding, and are therefore preferable to split on than other characters. You can see the result of this if you try something like this:
"a b c d e f g".split(" "); //Uncompiled, split on spaces
"a,b,c,d,e,f,g".split(","); //Compiled, split on commas (same size)
The compiler will replace the character you try to split on with one it thinks is favourable. It does so by iterating over the characters of the string and finding the most favourable splitting character that does not occur within the string:
// These delimiters are chars that appears a lot in the program therefore
// probably have a small Huffman encoding.
NEXT_DELIMITER: for (char delimiter : new char[]{',', ' ', ';', '{', '}'}) {
for (String cur : strings) {
if (cur.indexOf(delimiter) != -1) {
continue NEXT_DELIMITER;
}
}
String template = Joiner.on(delimiter).join(strings);
//...
}
In the above snippet you can see the array of characters the compiler claims to be optimal to split on. The comma is first (which is why in my space example above, the spaces have been replaced by commas).
I believe the insertion of commas in the case where the string to split on is the empty string may simply be an oversight. There does not appear to be any special treatment of this case, so it's treated like any other split call and each character is joined with the first appropriate character from the array shown in the above snippet.
Another example of how the compiler deals with the split method:
"a,;b;c;d;e;f;g".split(";"); //Uncompiled, split on semi-colons
"a, b c d e f g".split(" "); //Compiled, split on spaces
This time, since the original string already contains a comma (and we don't want to split on the comma character), the comma can't be chosen from the array of low-Huffman-encoded characters, so the next best choice is selected (the space).
Update
Following some further research into this, it is definitely not a bug. This behaviour is actually by design, and in my opinion it's a very clever little optimisation, when you bear in mind that the Closure compiler tends to favour the speed of the compiled code over size.
Above I mentioned Huffman encoding a couple of times. The Huffman coding algorithm, explained very simply, assigns a weight to each character appearing the the text to be encoded. The weight is based on the frequency with which each character appears. These frequencies are used to build a binary tree, with the most common character at the root. That means the most common characters are quicker to decode, since they are closer to the root of the tree.
And since the Huffman algorithm is a large part of the DEFLATE algorithm used by gzip. So if your web server is configured to use gzip, your users will be benefiting from this clever optimisation.
This issue was fixed on Apr 20, 2012 see revision:
https://code.google.com/p/closure-compiler/source/detail?r=1267364f742588a835d78808d0eef8c9f8ba8161
Ironically, split in the compiled code has nothing to do with split in the source. Consider:
Source : a = ["0","1","2","3","4","5"]
Compiled: a="0,1,2,3,4,5".split(",")
Here, split is just a way to represent long arrays (long enough for sum of all quotes + commas to be longer than split(","") ). So, what's going on in your example? First, the compiler sees a string function applied to a constant and evaluates it right away:
'0123456789'.split('') => ["0","1","2","3","4","5","6","7","8","9"]
At some later point, when generating output, the compiler considers this array to be "long" and writes it in the above "split" form:
["0","1","2","3","4","5","6","7","8","9"] => "0,1,2,3,4,5,6,7,8,9".split(",")
Note that all information about split('') in the source is already lost at this point.
If the source string were shorter, it would be generated in the array array form, without extra splitting:
Source : a = '0123'.split('')
Compiled: a=["0","1","2","3"]

Regex validation rules

I'm writing a database backup function as part of my school project.
I need to write a regex rule so the database backup name can only contain legal characters.
By 'legal' I mean a string that doesn't contain ANY symbols or spaces. Only letters from the alphabet and numbers.
An example of a valid string would be '31Jan2012' or '63927jkdfjsdbjk623' or 'hello123backup'.
Here's my JS code so far:
// Check if the input box contains the charactes a-z, A-Z ,or 0-9 with a regular expression.
function checkIfContainsNumbersOrCharacters(elem, errorMessage){
var regexRule = new RegExp("^[\w]+$");
if(regexRule.test( $(elem).val() ) ){
return true;
}else{
alert(errorMessage);
return false;
}
}
//call the function
checkIfContainsNumbersOrCharacters("#backup-name", "Input can only contain the characters a-z or 0-9.");
I've never really used regular expressions before though, however after a quick bit of googling i found this tool, from which I wrote the following regex rule:
^[\w]+$
^ = start of string
[/w] = a-z/A-Z/0-9
'+' = characters after the string.
When running my function, the whatever string I input seems to return false :( is my code wrong? or am I not using regex rules correctly?
The problem here is, that when writing \w inside a string, you escape the w, and the resulting regular expression looks like this: ^[w]+$, containing the w as a literal character. When creating a regular expression with a string argument passed to the RegExp constructor, you need to escape the backslash, like so: new RegExp("^[\\w]+$"), which will create the regex you want.
There is a way to avoid that, using the shorthand notation provided by JavaScript: var regex = /^[\w]+$/; which does not need any extra escaping.
It can be simpler. This works:
function checkValid(name) {
return /^\w+$/.test(name);
}
/^\w+$/ is the literal notation for new RegExp(). Since the .test function returns a boolean, you only need to return its result. This also reads better than new RegExp("^\\w+$"), and you're less likely to goof up (thanks #x3ro for pointing out the need for two backslashes in strings).
The \w is a synonym for [[:alnum:]], which matches a single character of the alnum class. Note that using character classes means that you may match characters that are not part of the ASCII character encoding, which may or may not be what you want. If what you really intend to match is [0-9A-Za-z], then that's what you should use.
When you declare the regex as a string parameter to the RegExp constructor, you need to escape it. Both
var regexRule = new RegExp("^[\\w]+$");
...and...
var regexRule = new RegExp(/^[\w]+$/);
will work.
Keep in mind though, that client side validation for database data will never be enough, as the validation is easily bypassed by disabling javascript in the browser, and invalid/malicious data can reach your DB. You need to validate the data on the server side, but preventing the request with invalid data, but validating client side is good practice.
This is the official spec: http://dev.mysql.com/doc/refman/5.0/en/identifiers.html but it's not very easily converted to a regular expression. Just a regular expression won't do it as there are also reserved words.
Why not just put it in the query (don't forget to escape it properly) and let MySQL give you an error? There might for instance be a bug in the MySQL version you're using, and even though your check is correct, MySQL might still refuse.

Double-Escaped Unicode Javascript Issue

I am having a problem displaying a Javascript string with embedded Unicode character escape sequences (\uXXXX) where the initial "\" character is itself escaped as "\"
What do I need to do to transform the string so that it properly evaluates the escape sequences and produces output with the correct Unicode character?
For example, I am dealing with input such as:
"this is a \u201ctest\u201d";
attempting to decode the "\" using a regex expression, e.g.:
var out = text.replace('/\/g','\');
results in the output text:
"this is a \u201ctest\u201d";
that is, the Unicode escape sequences are displayed as actual escape sequences, not the double quote characters I would like.
As it turns out, it's unescape() we want, but with '%uXXXX' rather than '\uXXXX':
unescape(yourteststringhere.replace(/\/g,'%'))
This is a terrible solution, but you can do this:
var x = "this is a \u201ctest\u201d".replace(/\/g,'\\')
// x is now "this is a \u201ctest\u201d"
eval('x = "' + x + '"')
// x is now "this is a “test”"
It's terrible because:
eval can be dangerous, if you don't know what's in the string
the string quoting in the eval statement will break if you have actual quotation marks in your string
Are you sure '\' is the only character that might get HTML-escaped? Are you sure '\uXXXX' is the only kind of string escape in use?
If not, you'll need a general-purpose HTML-character/entity-reference-decoder and JS-string-literal-decoder. Unfortunately JavaScript has no built-in methods for this and it's quite tedious to do manually with a load of regexps.
It is possible to take advantage of the browser's HTML-decoder by assigning the string to an element's innerHTML property, and then ask JavaScript to decode the string as above:
var el= document.createElement('div');
el.innerHTML= s;
return eval('"'+el.firstChild.data+'"');
However this is an incredibly ugly hack and a security hole if the string comes from a source that isn't 100% trusted.
Where are the strings coming from? It would be nicer if possible to deal with the problem at the server end where you may have more powerful text handling features available. And if you could fix whatever it is that is unnecessarily HTML-escaping your backslashes you could find the problem fixes itself.
I'm not sure if this is it, but the answer might have something to do with eval(), if you can trust your input.
I was thinking along the same lines, but using eval() in everyway I could imagine resulted in the same escaped output; e.g.,
eval(new String("this is a \u201ctest&#amp;92;u201d"));
or even
eval(new String("this is a \u201ctest&#amp;92;u201d".replace('/&amp#92;/g','\')));
all results in the same thing:
"this is a \u201ctest\u201d";
It's as if I need to get the Javascript engine to somehow re-evaluate or re-parse the string, but I don't know what would do it. I thought perhaps eval() or just creating a new string from using the properly escaped input would do it, but now luck.
The fundamental question is - what do I have to do to turn the given string:
"this is a \u201ctest&#amp;92;u201d"
into a string that uses the proper Unicode characters?

Categories

Resources