Crockford's JavaScript: The Good Parts contains the following text.
Reserved Words
The following words are reserved in JavaScript:
abstract boolean break byte case catch char class const continue
debugger default delete do double else enum export extends false final
finally float for
function goto if implements import in instanceof int interface long native new null
package private protected public return short static super switch synchronized this
throw throws transient true try typeof var volatile void while with
Most of these words are not used in the language.
They cannot be used to name variables or parameters. When reserved
words are used as keys in object literals, they must be quoted. They
cannot be used with the dot notation, so it is sometimes necessary to
use the bracket notation instead:
var method; // ok
var class; // illegal
object = {box: value}; // ok
object = {case: value}; // illegal
object = {'case': value}; // ok
object.box = value; // ok
object.case = value; // illegal
object['case'] = value; // ok
Some of the reserved words appear to not be reserved in my installed interpreters. For example, in both Chrome 48 (beta) and node.js 0.10.40 the following code will successfully add two numbers identified by reserved words.
var abstract = 1;
var native = 1;
abstract + native;
> 2
Why can I use these two reserved words as variable names? Am I missing something crucial?
Reserved keywords as of ECMAScript 6
break case class catch const continue debugger default delete do else
export extends finally for function if import in instanceof new return
super switch this throw try typeof var void while with yield
and abstract and native (more here) were reserved as future keywords by older ECMAScript specifications (ECMAScript 1 till 3).
always reserved : enum
reserved when they are found in strict mode code:
implements package protected static let interface private public
reserved when they are found in module code: await
A reserved word (also known as a reserved identifier or keyword) is a word that cannot be used as an identifier, such as the name of a variable, function, or label – it is "reserved from use". Reserved words or keywords have special meaning within the programming languages. They are use to identify the data types in language that supports the system, which identify the blocks and loops, etc. so their functionality is is already defined in the system library.
Including keywords or reserved words in your code, create confusion to other developers as well as the compiler at the time that you run your code. That is why reserved words is not allow for many programming languages. There are some other programming language that have similar keywords; such as C, C++, C#, and Java they share a commonality.
Here you can get the most updated list of Reserved Words in JavaScript, it also contains useful examples.
Related
For instance, "default" is a reserved keyword in Javascript, so I can't do this:
const default = 'does not work'
According to Mozilla, the "default" keyword is used for only two cases:
switch statement and
export statement pages.
Is there a good reason, from a design or technical perspective, why it couldn't be unreserved for variables? I like to think that many of these reserved JavaScript keywords could be disambiguated based on the context in which the keyword is found, but not sure. Is it just a convenience thing or more that it is practically impossible because of "X"?
It's for practical reasons. Imagine you would be able to create variable
var false = 1;
and then use it
if(false) {...}
How is computer supposed to know what you mean (real false vs. your variable 1) ? That's why these words are reserved. System relies on them to have specific meaning
I'm trying to write a simple webpack loader which will automatically write console logs if a string literal appears after a function's opening brackets.
For example:
function myFunc(arg1) { "log" ...
Would become
function myFunc(arg1) { console.log("myFunc", arguments) ...
Since webpack will load files as strings, I'm using the string.replace method, beginning with a regex which at the moment only matches the string "log". Obviously that's pretty rigid, so I was hoping a regex aficionado might help me accomplish a couple things:
Match "log" and 'log' alike (double or single quotes - I know I could just do "log"|'log', but I imagine there's a cleaner way).
Capture the name of the function in the regex (i.e. myFunc in function myFunc() {... or var myFunc = function() {...). It's possible this can't be done with a regex, but I keep reading about "lookaheads" and "negative lookaheads" which sound like features that do what I'm looking to do.
Obviously I have a lot to learn about regular expressions, so any help with this concrete problem will really help me along. Any help/information is much appreciated.
Such a seemingly simple task turns out to be much more complex than you might think at first. The problem with using regular expressions for this, is that you need to have a very specific pattern that you can match on. JavaScript's function definitions are too versatile to find a simple regular expression for them.
Let's only consider the named function declaration (function name(args) {}) for the moment. You might think that starting from the function keyword, then capturing the name and looking for { "log" is a decent choice, because the opening curly bracket starts the function body. This does work for simple arguments but it will fail for object destructuring. For example:
// Take an object as argument, look for the `log` property and assign it to `arg`
// Remember that object keys can be strings
function destructure({ "log": arg }) { return arg; }
destructure({ log: 42 }); // returns 42
// Invalid function after transformation
function destructure({ console.log("destructure", arguments);: arg }) { return arg; }
// ^ SyntaxError: Unexpected Token .
For more details about destructuring see Exploring ES6 - Chapter 10. Destructuring.
You can fix this case by including the closing parenthesis of the arguments list, which means you would be looking for ) { "log" instead. Unfortunately, that is not good enough either. An example that involves default parameters with an object literal that has a method, would at least not produce a syntax error, but it is semantically incorrect.
// If the argument is not given (undefined), use the specified object
function defaultParam(arg = { defaultMethod() { "log"; } }) { "log"; }
// Logs in the wrong function or the wrong name respectively
function defaultParam(arg = { defaultMethod() { console.log("defaultParam", arguments); } }) { "log"; }
There are no simple patterns left you could use to find the beginning of the function body and any step further would involve finding the matching parentheses, for which there is no regular expression for the general case, as they would be recursive patterns and regular expressions don't support recursion.
Note: Some languages do have regular expression recursion, but JavaScript is not one of them. Have a look at Regular Expression Recursion for a brief description.
As you've asked for a regular expression, I will explain a regular expression that should work for function declarations without default parameters.
( # Start capture group for everything before "log" (will be $1)
function # function keyword
\s+ # One or more spaces, to avoid matching: functionVariable etc.
( # Start captured group for the function name (will be $2)
[^\s\(]+ # Any character except spaces or opening parenthesis (at least one)
) # End captured group for the function name
\s* # Optional spaces between name and opening parenthesis
\( # Opening parenthesis
[^\)]* # Any character but the closing parenthesis (arguments, which are optional)
\) # Closing parenthesis
\s* # Optional spaces between closing parenthesis and opening curly
{ # Opening curly bracket to start the body
\s* # Optional spaces between starting the body and the "log"
) # End captured group for everything before "log"
("log"|'log') # The actual log string
;? # Optional semicolon
Notice that the first group ($1) captures everything up until the "log" string. The reason is that you want to keep that and only replace the "log" string, but because you're matching it, you'll need to include it again in the pattern you want to replace it with. The name is available as $2 and the replacement pattern would be:
$1console.log("$2", arguments);
You can see it in action on Regexr.
The JavaScript version:
text.replace(/(function\s+([^\s\(]+)\s*\([^\)]*\)\s*{\s*)("log"|'log');?/g, '$1console.log("$2", arguments);');
Match "log" and 'log' alike (double or single quotes - I know I could just do "log"|'log', but I imagine there's a cleaner way).
An alternative to ("log"|'log') is to capture the opening quote and match the same character that has been matched again after log. As we have already used two capture groups, it will be $3 and it can be accessed it in the regular expression with \3.
("|')log\3
As a tiny bonus you can use the same quotes in the replacement that were used in the "log" string.
$1console.log($3$2$3, arguments);
Babel plugin - A much better solution
Regular expressions are clearly not the right tool for the job. A better solution is to write a parser for it. Thankfully you don't need to do that, because there are various JavaScript parsers out there, which you can use. With Babel you can go a step further and transform the code fairly easily by creating a Babel plugin. Chances are very high that you are already using Babel, so all you'd need to do is add the plugin to your .babelrc.
To get started you should read the Babel Plugin Handbook and use AST Explorer to look at the AST produced by Babel. As a bonus, you can even write the Babel plugin in the AST Explorer by selecting Babel in the Transform menu.
If you paste your example into AST Explorer, you'll find out that the "log" string is not just a simple string, but it's a directive. You might have encountered the "use strict"; directive before, which enables strict mode. We can use that fact to visit all Directive AST nodes and modify the AST if needed.
Let's start by creating the Babel plugin, which is a function that returns an object with a visitor on it. For details see Writing your first Babel Plugin. We only care about a visitor for Directive nodes, so it looks like this:
export default function({ types: t }) {
return {
visitor: {
Directive(path) {
// Find "log" directive and replace it with a console.log call
}
}
};
}
t has a lot of helpful methods to operate on AST nodes. We will use t.isDirectiveLiteral(node, opts) to find out whether the value of the Directive is actually "log". The t.isX(node, opts) methods check whether the given node is of type X, where is X is any AST node type which you can easily find out in the AST Explorer. Additionally it will check whether the node contains the properties specified in opts. See also Check if a node is a certain type.
if (t.isDirectiveLiteral(path.node.value, { value: "log" })) {}
When it is the "log" directive, we need to find the name of the function. First, we need to find the enclosing function, which can be done with path.getFunctionParent().
Now that we have the function, we need to find its name. This will be the only tricky part. When it's a function declaration, we can use it's name. When it's a function expression, we need to find the name of the variable it is assigned to, which coincidentally will be the parent of the function expression. It can also be a method on an object, but that is the same concept except that it's an ObjectMethod instead of FunctionDeclaration and an ObjectProperty instead of VariableDeclaration. That are at least the named functions I can think of, so we will use these and everything else will be considered an anonymous function.
const fnDefinition = path.getFunctionParent();
let name = "anonymous";
if (t.isFunctionDeclaration(fnDefinition.node)) {
name = fnDefinition.node.id.name;
} else if (t.isVariableDeclarator(fnDefinition.parent)) {
name = fnDefinition.parent.id.name;
} else if (t.isObjectMethod(fnDefinition.node)) {
name = fnDefinition.node.key.name;
} else if (t.isObjectProperty(fnDefinition.parent)) {
name = fnDefinition.parent.key.name
}
Next, we can create an AST node that represents console.log("name", arguments);. That will be a CallExpression where the callee is console.log and the arguments will be a StringLiteral with the value log and the identifier arguments. Similar to the t.isX methods, there exist methods on t to create an AST node, which have the same names as the AST nodes, but start with a lowercase character, e.g. t.callExpression(callee, arguments).
const logNode = t.expressionStatement(
t.callExpression(t.identifier("console.log"), [
t.stringLiteral(name),
t.identifier("arguments")
])
);
Lastly, we insert the node into the body of the function and delete the Directive node, as we no longer need it in the resulting code.
fnDefinition.get("body").unshiftContainer("body", logNode);
path.remove();
Full code:
export default function({ types: t }) {
return {
visitor: {
Directive(path) {
// Check that it's the "log" directive
if (t.isDirectiveLiteral(path.node.value, { value: "log" })) {
// Find the enclosing function definition
const fnDefinition = path.getFunctionParent();
// Look for a possible name for the function otherwise use "anonymous"
let name = "anonymous";
if (t.isFunctionDeclaration(fnDefinition.node)) {
name = fnDefinition.node.id.name;
} else if (t.isVariableDeclarator(fnDefinition.parent)) {
name = fnDefinition.parent.id.name;
} else if (t.isObjectMethod(fnDefinition.node)) {
name = fnDefinition.node.key.name;
} else if (t.isObjectProperty(fnDefinition.parent)) {
name = fnDefinition.parent.key.name
}
// Create an AST node that corresponds to console.log("name", arguments)
const logNode = t.expressionStatement(
t.callExpression(t.identifier("console.log"), [
t.stringLiteral(name),
t.identifier("arguments")
])
);
// Insert the node into the beginning of the function's body
fnDefinition.get("body").unshiftContainer("body", logNode);
// Delete the directive, it's no longer needed
path.remove();
}
}
}
};
}
The full code and some test cases can be found in this AST Explorer Gist.
That was a lot easier than fiddling with the regular expression and it works for every case like the default parameter object with a method from above. Additionally, it also works for arrow functions, without having to change anything.
Helpful resources:
Babel Plugin Handbook
babel-types
AST Explorer
Code Transformation and Linting with ASTs - FrontendMasters course by Kent C. Dodds
you may want to try a more popular approach: method decorators.
#log
myMethod = function(foo) {
}
http://derpturkey.com/function-wrapping-with-javascript-decorators/
If you absolutely want to use a regular expression, the following hopefully does what you want to do (if I understood your question):
var a = "function myFunc(arg1) { \"log\" ...";
var b = "function abc(argument,arg, argument_test )\n\n\n{ \n 'log' ...";
console.log(a);
console.log(b);
a = a.replace(/function(\s*)([a-zA-Z0-9_]+)(\s*)\((\s*[a-zA-Z0-9_]+\s*(?:,\s*[a-zA-Z0-9_]+\s*)*)\)(\s*){(\s*)(?:"|')log(?:"|')/g, "function$1$2$3($4)$5{$6console.log(\"$2\", arguments);");
b = b.replace(/function(\s*)([a-zA-Z0-9_]+)(\s*)\((\s*[a-zA-Z0-9_]+\s*(?:,\s*[a-zA-Z0-9_]+\s*)*)\)(\s*){(\s*)(?:"|')log(?:"|')/g, "function$1$2$3($4)$5{$6console.log(\"$2\", arguments);");
console.log(a);
console.log(b);
If I run the code above, I get the following output:
If you want to know more about the regular expression, leave a comment and if you want to learn about regular expressions in general, I recommend a tutorial and the following webpage which can be really helpful: https://regexr.com/
If I want to use the following in JS, Is some option available to do it?
var break = "";
In C# we have the option to use it using #, is similar option available in JS
public int #public;
Well, if you insist, you can
var $break = 1;
which is the same as C# in terms of characters ;)
Seriously, you cannot use reserved keywords as variables, but they are allowed as property names:
myObj = {}
myObj.break = 'whatever';
Also, do note that the repertoire of the reserved words varies depending on the strictness. For example,
var interface = 1;
is valid in the non-strict mode, but breaks once you add use strict to your script.
You can't use Javascript keywords as a variable name.
I find myself, for reasons which are too irrelevant to go into here but which involve machine-generated code, needing to create variables with names which are Javascript reserved words.
For example, with object literals, by quoting the keys if they're invalid identifiers:
var o = { validIdentifier: 1, "not a valid identifier": 2 };
Is there a similar technique which works for simple variable references?
A poke around the spec shows that there used to be a mechanism that allowed this, by abusing Unicode escapes:
f\u0075nction = 7;
However this seems incredibly dubious, and is apparently rapidly vanishing (although my recent Chrome still appears to support it). Is there a more modern equivalent?
If they're object keys, you can call them what you like (even reserved names), and you don't need to quote them.
var o = { function: 'a' }
console.log(o.function) // a
DEMO
Is there a more modern equivalent?
No. Reserved words cannot be used as variable names. You can use names that look like those words but that's it.
From the spec:
A reserved word is an IdentifierName that cannot be used as an Identifier.
FYI, reserved words can be used as property names, even without quotes:
var o = { function: 1 };
Variable names can't be reserved words.
You can always do bad things like:
window["function"] = 'foo';
console.log(window.function);
> foo
You still can't reference it using a bare reserved, because it's reserved.
That it's machine-generated code means whatever is generating the code is broken.
I have this JS code:
var A = {};
A.new = function(n) { return new Array(n); }
It works well in all browsers, but when I try to obfuscate it with obfuscator, it shows an error.
Is it a valid JS code? I looked into specification, but didn't find anything. I know, that browsers sometimes accept syntactically wrong code, but I want to write syntactically correct code.
Note, that I am not doing var new = ...
BTW. what about this?
var a = { "new" : 2 }; // ok?
a.new = 3; // ok?
a["new"] = 3; // ok?
.
EDIT: Thank you all for your help! I wrote an email to obfuscator's authors and they fixed it! :)
Yes, your code is valid an the obfuscator is wrong (or old). new is a reserved word, that means it is not a valid identifier (e.g. to be used as a variable name). Yet, it is still an IdentifierName which is valid in object literals and in dot-notation property access.
However, this was not the case in EcmaScript 3, where both of these needed to be Identifiers, and keywords like new were invalid. Therefore it is considered bad practise to use these names unquoted in scripts on the web, which might be executed by older browsers.
Reserved words can be used as property identifiers:
A.new = 1
B = { function: 123 }
D.if = { else: 5 }
etc - it's all valid (not necessarily good) javascript.
These things are called "IdentifierName" in the specification, and the only difference between an "IdentifierName" and a regular Identifier is
Identifier :: IdentifierName but not ReservedWord
http://es5.github.io/#x7.6
I see no errors. Lead by a dot its just a property name, just like ['new'], not any more a reserved word.
Since new is a reserved word, it can't be used as an identifier, although it can be used as an identifierName. See this section of the ECMAScript specification. So your code is legal, and apparently the obfuscator doesn't understand this distinction.
To work around this, you can access it using array-style notation:
A['new'] = function(n) { return new Array(n); };