I have this JS code:
var A = {};
A.new = function(n) { return new Array(n); }
It works well in all browsers, but when I try to obfuscate it with obfuscator, it shows an error.
Is it a valid JS code? I looked into specification, but didn't find anything. I know, that browsers sometimes accept syntactically wrong code, but I want to write syntactically correct code.
Note, that I am not doing var new = ...
BTW. what about this?
var a = { "new" : 2 }; // ok?
a.new = 3; // ok?
a["new"] = 3; // ok?
.
EDIT: Thank you all for your help! I wrote an email to obfuscator's authors and they fixed it! :)
Yes, your code is valid an the obfuscator is wrong (or old). new is a reserved word, that means it is not a valid identifier (e.g. to be used as a variable name). Yet, it is still an IdentifierName which is valid in object literals and in dot-notation property access.
However, this was not the case in EcmaScript 3, where both of these needed to be Identifiers, and keywords like new were invalid. Therefore it is considered bad practise to use these names unquoted in scripts on the web, which might be executed by older browsers.
Reserved words can be used as property identifiers:
A.new = 1
B = { function: 123 }
D.if = { else: 5 }
etc - it's all valid (not necessarily good) javascript.
These things are called "IdentifierName" in the specification, and the only difference between an "IdentifierName" and a regular Identifier is
Identifier :: IdentifierName but not ReservedWord
http://es5.github.io/#x7.6
I see no errors. Lead by a dot its just a property name, just like ['new'], not any more a reserved word.
Since new is a reserved word, it can't be used as an identifier, although it can be used as an identifierName. See this section of the ECMAScript specification. So your code is legal, and apparently the obfuscator doesn't understand this distinction.
To work around this, you can access it using array-style notation:
A['new'] = function(n) { return new Array(n); };
Related
If I want to use the following in JS, Is some option available to do it?
var break = "";
In C# we have the option to use it using #, is similar option available in JS
public int #public;
Well, if you insist, you can
var $break = 1;
which is the same as C# in terms of characters ;)
Seriously, you cannot use reserved keywords as variables, but they are allowed as property names:
myObj = {}
myObj.break = 'whatever';
Also, do note that the repertoire of the reserved words varies depending on the strictness. For example,
var interface = 1;
is valid in the non-strict mode, but breaks once you add use strict to your script.
You can't use Javascript keywords as a variable name.
I find myself, for reasons which are too irrelevant to go into here but which involve machine-generated code, needing to create variables with names which are Javascript reserved words.
For example, with object literals, by quoting the keys if they're invalid identifiers:
var o = { validIdentifier: 1, "not a valid identifier": 2 };
Is there a similar technique which works for simple variable references?
A poke around the spec shows that there used to be a mechanism that allowed this, by abusing Unicode escapes:
f\u0075nction = 7;
However this seems incredibly dubious, and is apparently rapidly vanishing (although my recent Chrome still appears to support it). Is there a more modern equivalent?
If they're object keys, you can call them what you like (even reserved names), and you don't need to quote them.
var o = { function: 'a' }
console.log(o.function) // a
DEMO
Is there a more modern equivalent?
No. Reserved words cannot be used as variable names. You can use names that look like those words but that's it.
From the spec:
A reserved word is an IdentifierName that cannot be used as an Identifier.
FYI, reserved words can be used as property names, even without quotes:
var o = { function: 1 };
Variable names can't be reserved words.
You can always do bad things like:
window["function"] = 'foo';
console.log(window.function);
> foo
You still can't reference it using a bare reserved, because it's reserved.
That it's machine-generated code means whatever is generating the code is broken.
To create an IDE that would autocomplete all variables the user declares but would be oblivious to other variables such as Math.PI or even the module Math, the IDE would need to be able to identify all identifiers relating to variables declared by the user. What mechanism could be used to capture all such variables, assuming you already have access to the AST (Abstract Symbol Table) for the program?
I am using reflect.js (https://github.com/zaach/reflect.js) to generate the AST.
I think it's pretty much impossible
Here is why I think it's pretty much impossible without executing it:
Let us go through the unexplored parts, from easy to hard.
Easy to catch:
Function scope is missed here:
(function(x){
//x is now an object with an a property equal to 3
// for the scope of that IIFE.
x;
})({a:3});
Here is some fun dirty tricks for you all.:
Introducing... drum roll... Block Scoping!!
with({x:3}){
x;//x is now declared in the scope of that with and is equal to 3.
}
try{ throw 5}catch(x){
x // x is now declared in the scope of the try block and is equal to 5;
}
(people reading: I beg you to please not use these last two for actual scoping in code :))
Not easy:
Bracket notation:
var n = "lo";
a["h"+"e"+"l"+n] = "world"; // need to understand that a.hello is a property.
// not a part of the ast!
The really hard parts:
Let us not forget invoking the compiler These would not show up in the AST:
eval("var x=5"); // declares x as 5, just a string literal and a function call
new Function("window.x = 5")();// or global in node
In node.js this can also be done with the vm module. In the browser using document.write or script tag injection.
What else? Of course they can obfuscate all they want:
new Function(["w","i","n","dow.x"," = ","5"].join(""))(); // Good luck finding this!
new Function('new Function(["w","i","n","dow.x"," = ","5"].join(""))()')();// Getting dizzy already?
So what can be done?
Execute the code, once, in a closed, timed environment when you update the symbol table (just the relevant parts)
See what's the generated symbol table is from the execution
Boom, you got yourself a symbol table.
This is not reliable but it's probably as close as you get.
The only other alternative I can think of, which is what most IDEs are doing is to simply ignore anything that is not:
object.property = ... //property definition
var a = ... //scoped
b = ... //global, or error in strict mode
function fn(){ //function declaration
object["property"] //property with a _fixed_ literal in bracket notation.
And also, function parameters.
I have seen no IDE that has been able to deal with anything but these. Since they're the most common by far, I think it's perfectly reasonable to count those.
By adding them onto am object that already exists....ie
window.mynewvar = 5;
function mynewfunc() {
}
Given this function:
function doThing(values,things){
var thatRegex = /^http:\/\//i; // is this created once or on every execution?
if (values.match(thatRegex)) return values;
return things;
}
How often does the JavaScript engine have to create the regex? Once per execution or once per page load/script parse?
To prevent needless answers or comments, I personally favor putting the regex outside the function, not inside. The question is about the behavior of the language, because I'm not sure where to look this up, or if this is an engine issue.
EDIT:
I was reminded I didn't mention that this was going to be used in a loop. My apologies:
var newList = [];
foreach(item1 in ListOfItems1){
foreach(item2 in ListOfItems2){
newList.push(doThing(item1, item2));
}
}
So given that it's going to be used many times in a loop, it makes sense to define the regex outside the function, but so that's the idea.
also note the script is rather genericized for the purpose of examining only the behavior and cost of the regex creation
From Mozilla's JavaScript Guide on regular expressions:
Regular expression literals provide compilation of the regular expression when the script is evaluated. When the regular expression will remain constant, use this for better performance.
And from the ECMA-262 spec, ยง7.8.5 Regular Expression Literals:
A regular expression literal is an input element that is converted to a RegExp object (see 15.10) each time the literal is evaluated.
In other words, it's compiled once when it's evaluated as a script is first parsed.
It's worth noting also, from the ES5 spec, that two literals will compile to two distinct instances of RegExp, even if the literals themselves are the same. Thus if a given literal appears twice within your script, it will be compiled twice, to two distinct instances:
Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical.
...
... each time the literal is evaluated, a new object is created as if by the expression new RegExp(Pattern, Flags) where RegExp is the standard built-in constructor with that name.
The provided answers don't clearly distinguish between two different processes behind the scene: regexp compilation and regexp object creation when hitting regexp object creation expression.
Yes, using regexp literal syntax, you're gaining the performance benefit of one time regexp compilation.
But if your code executes in ES5+ environment, every time the code path enters the doThing() function in your example, it actually creates a new RegExp object, though, without need to compile the regexp again and again.
In ES5, literal syntax produces a new RegExp object every time code path hits expression that creates a regexp via literal:
function getRE() {
var re = /[a-z]/;
re.foo = "bar";
return re;
}
var reg = getRE(),
re2 = getRE();
console.log(reg === re2); // false
reg.foo = "baz";
console.log(re2.foo); // "bar"
To illustrate the above statements from the point of actual numbers, take a look at the performance difference between storedRegExp and inlineRegExp tests in this jsperf.
storedRegExp would be about 5 - 20% percent faster across browsers than inlineRegExp - the overhead of creating (and garbage collecting) a new RegExp object every time.
Conslusion:
If you're heavily using your literal regexps, consider caching them outside the scope where they are needed, so that they are not only be compiled once, but actual regexp objects for them would be created once as well.
There are two "regular expression" type objects in javascript.
Regular expression instances and the RegExp object.
Also, there are two ways to create regular expression instances:
using the /regex/ syntax and
using new RegExp('regex');
Each of these will create new regular expression instance each time.
However there is only ONE global RegExp object.
var input = 'abcdef';
var r1 = /(abc)/;
var r2 = /(def)/;
r1.exec(input);
alert(RegExp.$1); //outputs 'abc'
r2.exec(input);
alert(RegExp.$1); //outputs 'def'
The actual pattern is compiled as the script is loaded when you use Syntax 1
The pattern argument is compiled into an internal format before use. For Syntax 1, pattern is compiled as the script is loaded. For Syntax 2, pattern is compiled just before use, or when the compile method is called.
But you still could get different regular expression instances each method call. Test in chrome vs firefox
function testregex() {
var localreg = /abc/;
if (testregex.reg != null){
alert(localreg === testregex.reg);
};
testregex.reg = localreg;
}
testregex();
testregex();
It's VERY little overhead, but if you wanted exactly one regex, its safest to only create one instance outside of your function
The regex will be compiled every time you call the function if it's not in literal form.
Since you are including it in a literal form, you've got nothing to worry about.
Here's a quote from websina.com:
Regular expression literals provide compilation of the regular expression when the script is evaluated. When the regular expression will remain constant, use this for better performance.
Calling the constructor function of the RegExp object, as follows:
re = new RegExp("ab+c")
Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.
Is there some way I can define String[int] to avoid using String.CharAt(int)?
No, there isn't a way to do this.
This is a common question from developers who are coming to JavaScript from another language, where operators can be defined or overridden for a certain type.
In C++, it's not entirely out of the question to overload operator* on MyType, ending up with a unique asterisk operator for operations involving objects of type MyType. The readability of this practice might still be called into question, but the language affords for it, nevertheless.
In JavaScript, this is simply not possible. You will not be able to define a method which allows you to index chars from a String using brackets.
#Lee Kowalkowski brings up a good point, namely that it is, in a way, possible to access characters using the brackets, because the brackets can be used to access members of a JavaScript Array. This would involve creating a new Array, using each of the characters of the string as its members, and then accessing the Array.
This is probably a confusing approach. Some implementations of JavaScript will provide access to a string via the brackets and some will not, so it's not standard practice. The object may be confused for a string, and as JavaScript is a loosely typed language, there is already a risk of misrepresenting a type. Defining an array solely for the purposes of using a different syntax from what the language already affords is only gong to promote this type of confusion. This gives rise to #Andrew Hedges's question: "Why fight the language?"..
There are useful patterns in JavaScript for legitimate function overloading and polymorphic inheritance. This isn't an example of either.
All semantics aside, the operators still haven't been overridden.
Side note: Developers who are accustomed to the conventions of strong type checking and classical inheritance are sometimes confused by JavaScript's C-family syntax. Under the hood, it is working in an unfamiliar way. It's best to write JavaScript in clean and unambiguous ways, in order to prevent confusion.
Please note: Before anybody else would like to vote my answer down, the question I answered was:
IE javascript string indexers
is there some way I can define string[int] to avoid using string.CharAt(int)?"
Nothing about specifically overriding brackets, or syntax, or best-practice, the question just asked for "some way". (And the only other answer said "No, there isn't.")
Well, there is actually, kind of:
var newArray = oldString.split('');
...now you can access newArray using bracket notation, because you've just converted it to an array.
Use String.charAt()
It's standard and works in all browsers.
In non-IE browsers you can use bracket notation to access characters like this:
"TEST"[1]; // = E
You could convert a string into an array of characters doing this:
var myString = "TEST";
var charArray = myString.split(''); // charArray[1] == E
These would be discouraged. There isn't any reason not to use the charAt() method, and there is no benefit to doing anything else.
This is not an answer, just a trick (strongly deprecated!). It shows, in particular, that in Javascript you can do whatever you want. It's just a matter of your fantasy.
You can use a fact that you can set any additional properties to String Object like to all others, so you can create String.0, String.1, ... properties:
String.prototype.toChars = function() {
for (var i=0; i<this.length; i++) {
this[i+""] = this.charAt(i);
}
};
Now you can access single characters using:
var str = "Hello World";
str.toChars();
var i = 1+"";
var c = str[i]; // "e"
Note that it's useful only for access. It should be another method defined for assigning string chars in such manner.
Also note that you must call .toChars() method every time you modify the sting.