zen-coding: ability to ascend the DOM tree using ^

zen-coding: ability to ascend the DOM tree using ^ - javascript

I forked the excellent zen-coding project, with an idea to implement DOM ascension using a ^ - so you can do:
html>head>title^body>h1 rather than html>(head>title)+body>h1
Initially I implemented with rather shoddy regex methods. I have now implemented using #Jordan's excellent answer. My fork is here
What I still want to know
Are there any scenarios where my function returns the wrong value?

Disclaimer: I have never used zen-coding and this is only my second time hearing about it, so I have no idea what the likely gotchas are. That said, this seems to be a working solution, or at least very close to one.
I am using Zen Coding for textarea v0.7.1 for this. If you are using a different version of the codebase you will need to adapt these instructions accordingly.
A couple of commenters have suggested that this is not a job for regular expressions, and I agree. Fortunately, zen-coding has its own parser implementation, and it's really easy to build on! There are two places where you need to add code to make this work:
Add the ^ character to the special_chars variable in the isAllowedChar function (starts circa line 1694):
function isAllowedChar(ch) {
...
special_chars = '#.>+*:$-_!#[]()|^'; // Added ascension operator "^"
Handle the new operator in the switch statement of the parse function (starts circa line 1541):
parse: function(abbr) {
...
while (i < il) {
ch = abbr.charAt(i);
prev_ch = i ? abbr.charAt(i - 1) : '';
switch (ch) {
...
// YOUR CODE BELOW
case '^': // Ascension operator
if (!text_lvl && !attr_lvl) {
dumpToken();
context = context.parent.parent.addChild();
} else {
token += ch;
}
break;
Here's a line-by-line breakdown of what the new code does:
case '^': // Current character is ascension operator.
if (!text_lvl && !attr_lvl) { // Don't apply in text/attributes.
dumpToken(); // Operator signifies end of current token.
// Shift context up two levels.
context = context.parent.parent.addChild();
} else {
token += ch; // Add char to token in text/attribute.
}
break;
The implementation above works as expected for e.g.:
html>head>title^body
html:5>div#first>div.inner^div#second>div.inner
html:5>div>(div>div>div^div)^div*2
html:5>div>div>div^^div
You will doubtless want to try some more advanced, real-world test cases. Here's my modified source if you want a kick-start; replace your zen_textarea.min.js with this for some quick-and-dirty testing.
Note that this merely ascends the DOM by two levels and does not treat the preceding elements as a group, so e.g. div>div^*3 will not work like (div>div)*3. If this is something you want then look at the logic for the closing parenthesis character, which uses a lookahead to check for multiplication. (Personally, I suggest not doing this, since even for an abbreviated syntax it is horribly unreadable.)

You should look for Perl's Text::Balanced alternative in the language that you're using.

Related

Why these unambiguous expressions are different in JavaScript?

doing some JavaScript learning and playing around with the syntax. In JavaScript semicolons are optional. It is recommended to use them, but it seems the specification does not impose them in some cases.
In particular it is said that when there is no ambiguity, then it does not make a difference whether you use them or not.
I found this example (an exercise from a blog I googled), but no explanation. I think the semicolon should not make a difference here, but it does. I don't see an ambiguity here, why is it supposed to be different. I tried to play around with some examples, it really seems different. Can't wrap my head around that :-(
var y = x + f
(a+b).toString()
and
var y = x + f;
(a+b).toString();
Would be cool if some guru could shed light on this one.

The f followed by whitespace and a ( can be interpreted as a function call. Placing a semicolon will prevent this.

The ECMAScript Standard Specification mentions the rules of Automatic Semicolon Insertion in section 11.9.1, and the second rule described there is what we need to read:
When, as the Script or Module is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single complete ECMAScript Script or Module, then a semicolon is automatically inserted at the end of the input stream. emphasis mine
So there we go. In over-simplified layman language, the semi-colon is inserted if continuation of parsing doesn't give us valid code.
In your case, if we continue parsing the code, we do get valid ECMAScript code as follows: (since JavaScript isn't white-space sensitive)
var y = x + f(a+b)...
Thus, a semicolon will not be inserted.
Exceptions to this are described in the third rule (like return and yield keyword usage), but they don't apply in your code.

Whitespace in JavaScript (including line breaks) are not syntaxical. So in the above, with a little bit of rearranging the difference should become clear.
Is this:
var y = x + f(a+b).toString()
The same as this:
var y = x + f;
(a+b).toString();
Nope. In the first example, f being called as though it is a function with the variable a+b being passes as a parameter of that function.
NOTE
There is one case I'm aware of where the above isn't exactly true; return statements.
A linebreak character does act in a syntaxical way for return statements, so this:
return {
tomatoes: 'are yummy'
}
would return an object with the property tomatoes, whereas this:
return
{
tomatoes: 'are yummy'
}
would return undefined

Why is String.replace() with lambda slower than a while-loop repeatedly calling RegExp.exec()?

One problem:
I want to process a string (str) so that any parenthesised digits (matched by rgx) are replaced by values taken from the appropriate place in an array (sub):
var rgx = /\((\d+)\)/,
str = "this (0) a (1) sentence",
sub = [
"is",
"test"
],
result;
The result, given the variables declared above, should be 'this is a test sentence'.
Two solutions:
This works:
var mch,
parsed = '',
remainder = str;
while (mch = rgx.exec(remainder)) { // Not JSLint approved.
parsed += remainder.substring(0, mch.index) + sub[mch[1]];
remainder = remainder.substring(mch.index + mch[0].length);
}
result = (parsed) ? parsed + remainder : str;
But I thought the following code would be faster. It has fewer variables, is much more concise, and uses an anonymous function expression (or lambda):
result = str.replace(rgx, function() {
return sub[arguments[1]];
});
This works too, but I was wrong about the speed; in Chrome it's surprisingly (~50%, last time I checked) slower!
...
Three questions:
Why does this process appear to be slower in Chrome and (for example) faster in Firefox?
Is there a chance that the replace() method will be faster compared to the while() loop given a bigger string or array? If not, what are its benefits outside Code Golf?
Is there a way optimise this process, making it both more efficient and as fuss-free as the functional second approach?
I'd welcome any insights into what's going on behind these processes.
...
[Fo(u)r the record: I'm happy to be called out on my uses of the words 'lambda' and/or 'functional'. I'm still learning about the concepts, so don't assume I know exactly what I'm talking about and feel free to correct me if I'm misapplying the terms here.]

Why does this process appear to be slower in Chrome and (for example) faster in Firefox?
Because it has to call a (non-native) function, which is costly. Firefox's engine might be able to optimize that away by recognizing and inlining the lookup.
Is there a chance that the replace() method will be faster compared to the while() loop given a bigger string or array?
Yes, it has to do less string concatenation and assignments, and - as you said - less variables to initialize. Yet you can only test it to prove my assumptions (and also have a look at http://jsperf.com/match-and-substitute/4 for other snippets - you for example can see Opera optimizing the lambda-replace2 which does not use arguments).
If not, what are its benefits outside Code Golf?
I don't think code golf is the right term. Software quality is about readabilty and comprehensibility, in whose terms the conciseness and elegance (which is subjective though) of the functional code are the reasons to use this approach (actually I've never seen a replace with exec, substring and re-concatenation).
Is there a way optimise this process, making it both more efficient and as fuss-free as the functional second approach?
You don't need that remainder variable. The rgx has a lastIndex property which will automatically advance the match through str.

Your while loop with exec() is slightly slower than it should be, since you are doing extra work (substring) as you use exec() on a non-global regex. If you need to loop through all matches, you should use a while loop on a global regex (g flag enabled); this way, you avoid doing extra work trimming the processed part of the string.
var rgR = /\((\d+)\)/g;
var mch,
result = '',
lastAppend = 0;
while ((mch = rgR.exec(str)) !== null) {
result += str.substring(lastAppend, mch.index) + sub[mch[1]];
lastAppend = rgR.lastIndex;
}
result += str.substring(lastAppend);
This factor doesn't disturb the performance disparity between different browser, though.
It seems the performance difference comes from the implementation of the browser. Due to the unfamiliarity with the implementation, I cannot answer where the difference comes from.
In terms of power, exec() and replace() have the same power. This includes the cases where you don't use the returned value from replace(). Example 1. Example 2.
replace() method is more readable (the intention is clearer) than a while loop with exec() if you are using the value returned by the function (i.e. you are doing real replacement in the anonymous function). You also don't have to reconstruct the replaced string yourself. This is where replace is preferred over exec(). (I hope this answers the second part of question 2).
I would imagine exec() to be used for the purposes other than replacement (except for very special cases such as this). Replacement, if possible, should be done with replace().
Optimization is only necessary, if performance degrades badly on actual input. I don't have any optimization to show, since the 2 only possible options are already analyzed, with contradicting performance between 2 different browser. This may change in the future, but for now, you can choose the one that has better worst-performance-across-browser to work with.

I need a Javascript literal syntax converter/deobfuscation tools

I have searched Google for a converter but I did not find anything. Is there any tools available or I must make one to decode my obfuscated JavaScript code ?
I presume there is such a tool but I'm not searching Google with the right keywords.
The code is 3 pages long, this is why I need a tools.
Here is an exemple of the code :
<script>([][(![]+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+(!![]+[])[+!+[]]+(!![]+[])[+[]]][([][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(![]+[])[+!+[]]+(![]+[])[!+[]+!+[]]+(![]+[])[!+[]+!+[]]]()[(!![]+[])[!+[]+!+[]+!+[]]+(+(+[])+[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[!+[]+!+[]+!+[]+[+[]]]+(![]+[])[+!+[]]+(![]+[])[!+[]+!+[]]])(([]+[])[([][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(!![]+
Thank you

This code is fascinating because it seems to use only nine characters ("[]()!+,;" and empty space U+0020) yet has some sophisticated functionality. It appears to use JavaScript's implicit type conversion to coerce arrays into various primitive types and their string representations and then use the characters from those strings to compose other strings which type out the names of functions which are then called.
Consider the following snippet which evaluates to the array filter function:
([][
(![]+[])[+[]] // => "f"
+ ([![]]+[][[]])[+!+[]+[+[]]] // => "i"
+ (![]+[])[!+[]+!+[]] // => "l"
+ (!![]+[])[+[]] // => "t"
+ (!![]+[])[!+[]+!+[]+!+[]] // => "e"
+ (!![]+[])[+!+[]] // => "r"
]) // => function filter() { /* native code */ }
Reconstructing the code as such is time consuming and error prone, so an automated solution is obviously desirable. However, the behavior of this code is so tightly bound to the JavaScript runtime that de-obsfucating it seems to require a JS interpreter to evaluate the code.
I haven't been able to find any tools that will work generally with this sort of encoding. It seems as though you'll have to study the code further and determine any patterns of usage (e.g. reliance on array methods) and figure out how to capture their usage (e.g. by wrapping high-level functions [such as Function.prototype.call]) to trace the code execution for you.

This question has already an accepted answer, but I will still post to clear some things up.
When this idea come up, some guy made a generator to encode JavaScript in this way. It is based on doing []["sort"]["call"]()["eval"](/* big blob of code here */). Therefore, you can decode the results of this encoder easily by removing the sort-call-eval part (i.e. the first 1628 bytes). In this case it produces:
if (document.cookie=="6ffe613e2919f074e477a0a80f95d6a1"){ alert("bravo"); }
else{ document.location="http://www.youtube.com/watch?v=oHg5SJYRHA0"; }
(Funny enough the creator of this code was not even able to compress it properly and save a kilobyte)
There is also an explanation of why this code doesn't work in newer browser anymore: They changed Array.prototype.sort so it does not return a reference to window. As far as I remember, this was the only way to get a reference to window, so this code is kind of broken now.

JSLint, else and Expected exactly one space between '}' and 'else' error

Why JSLint report in code:
function cos(a) {
var b = 0;
if (a) {
b = 1;
}
else {
b = 2;
}
return b;
}
error:
Problem at line 6 character 5: Expected exactly one space between '}' and 'else'.
This error can be turned off by disabling Tolerate messy white space option of JSLint.
Or in other words -- why syntax:
} else { is better then
...
}
else {
...
Google also uses syntax with } else { form.
But I don't understand why. Google mentioned ''implicit semicolon insertion'', but in context of opening {, not closing one.
Can Javascript insert semicolon after closing } of if block even if next token is else instruction?
Sorry that my question is a bit chaotic -- I tried to think loud.

JSLint is based on Crockford's preferences (which I share in this case).
It's a matter of opinion which is "better".
(Although clearly his opinion is right ;)

It's not a matter of style. It's how ECMAScript works.
For better or for worse, it will automatically insert semicolons at the end of statements where it feels necessary.
JavaScript would interpret this:
function someFunc {
return
{
something: 'My Value'
};
}
As this:
function someFunc {
return;
{
something: 'My Value'
};
}
Which is certainly what you don't want.
If you always put the bracket on the same line as the if and if else statement, you won't run into a problem like this.
As with any coding language, the coding style chosen should be the one that minimizes potential risk the most.
Mozilla Developer Network also promotes same line bracketing: https://developer.mozilla.org/en-US/docs/User:GavinSharp_JS_Style_Guidelines#Brackets

JSLint is being very picky here, just enforcing a style that you might not share.
Try JSHint instead:
The project originally started as an effort to make a more configurable version of JSLint—the one that doesn't enforce one particular coding style on its users [...]

JSLint is just being picky here. The guy who wrote it also baked in many stylistic suggestions in order to keep his own code more consistent.
As for semicolon insertion, you shouldn't need to worry here. Inserting a semicolon before the else clause would lead to a syntax error and automatic semicolon insertion only occurs in situations where the resulting code would still be syntactically valid.
If you want to read more on semicolon insertion, I recommend this nice reference
Basically if you insert semicolons everywhere you only need be careful about putting the argument to "return" or "throw" (or the label for "break" and "continue") on the same line.
And when you accidentally forget a semicolon, the only common cases that are likely to bite you are if you start the next line with an array literal (it might parsed as the subscript operator) or a parenthsised expression (it might be parsed as a function call)
Conclusion
Should you omit optional semicolons or not? The answer is a matter of
personal preference, but should be made on the basis of informed
choice rather than nebulous fears of unknown syntactical traps or
nonexistent browser bugs. If you remember the rules given here, you
are equipped to make your own choices, and to read any JavaScript
easily.
If you choose to omit semicolons where possible, my advice is to
insert them immediately before the opening parenthesis or square
bracket in any statement that begins with one of those tokens, or any
which begins with one of the arithmetic operator tokens "/", "+", or
"-" if you should happen to write such a statement.
Whether you omit semicolons or not, you must remember the restricted
productions (return, break, continue, throw, and the postfix increment
and decrement operators), and you should feel free to use linebreaks
everywhere else to improve the readability of your code.
By the way, I personally think that the } else { version is prettier. Stop insisting in your evil ways and joins us on the light side of the force :P

I have just finished reading a book titled Mastering JavaScript High Performance. I speak under correction here, but from what I can gather is that "white space" does in fact matter.
It has to do with the way the interpreter fetches the next function. By keeping white space to a minimum (i.e.) using a minifier when your code is ready for deployment, you actually speed up the process.
If the interpreter has to search through the white space to find the next statement this takes time. Perhaps you want to test this with a piece of code that runs a loop say 10,000 times with white space in it and then the same code minified.
The statement to put before the start of the loop would be console.time and finally console.timeEnd at the end of the loop. This will then tell you how many milliseconds the loop took to compute.

The JSLint error/warning is suggesting to alter code to
// naming convention winner? it's subjective
} else if{
b = 2;
}
from:
}
else if{
b = 2;
}
It prevents insert semicolons; considered more standard & conventional.
most people could agree a tab between the
}tabelse if{
is not the most popular method. Interesting how the opening bracket { is placed (space or not) obviously both ar subjected

Good resources for extreme minified JavaScript (js1k-style)

As I'm sure most of the JavaScripters out there are aware, there's a new, Christmas-themed js1k. I'm planning on entering this time, but I have no experience producing such minified code. Does anyone know any good resources for this kind of thing?

Google Closure Compiler is a good javascript minifier.
There is a good online tool for quick use, or you can download the tool and run it as part of a web site build process.
Edit: Added a non-exhaustive list of tricks that you can use to minify JavaScript extremely, before using a minifier:
Shorten long variable names
Use shortened references to built in variables like d=document;w=window.
Set Interval
The setInterval function can take either a function or a string. Pass in a string to reduce the number of characters used: setInterval('a--;b++',10). Note that passing in a string forces an eval invokation so it will be slower than passing in a function.
Reduce Mathematical Calculations
Example a=b+b+b can be reduced to a=3*b.
Use Scientific Notation
10000 can be expressed in scientific notation as 1E4 saving 2 bytes.
Drop leading Zeroes
0.2 = .2 saves a byte
Ternery Operator
if (a > b) {
result = x;
}
else {
result = y;
}
can be expressed as result=a>b?x:y
Drop Braces
Braces are only required for blocks of more than one statement.
Operator Precedence
Rely on operator precedence rather than adding unneeded brackets which aid code readability.
Shorten Variable Assignment
Rather than function x(){a=1,b=2;...}() pass values into the function, function x(a,b){...}(1,2)
Think outside the box
Don't automatically reach for standard ways of doing things. Rather than using d.getElementById('p') to get a reference to a DOM element, could you use b.children[4] where d=document;b=body.
Original source for above list of tricks:
http://thingsinjars.com/post/293/the-quest-for-extreme-javascript-minification/

Spolto is right.
Any code minifier won't do the trick alone. You need to first optimize your code and then make some dirty manual tweaks.
In addition to Spolto's list of tricks I want to encourage the use of logical operators instead of the classical if else syntax. ex:
The following code
if(condition){
exp1;
}else{
exp2;
}
is somewhat equivalent to
condition&&exp1||exp2;
Another thing to consider might be multiple variable declaration :
var a = 1;var b = 2;var c = 1;
can be rewritten as :
var a=c=1,b=2;
Spolto is also right about the braces. You should drop them. But in addition, you should know that they can be dropped even for blocks of more expressions by writing the expressions delimited by a comma(with a leading ; of course) :
if(condition){
exp1;
exp2;
exp3;
}else{
exp4;
exp5;
}
Can be rewritten as :
if(condition)exp1,exp2,exp3;
else exp4,exp5;
Although it's not much (it saves you only 1 character/block for those who are counting), it might come in handy. (By the way, the latest Google Closure Compiler does this trick too).
Another trick worth mentioning is the controversial with functionality.
If you care more about the size, then you should use this because it might reduce code size.
For example, let's consider this object method:
object.method=function(){
this.a=this.b;
this.c++;
this.d(this.e);
}
This can be rewritten as :
object.method=function(){
with(this){
a=b;
c++;
d(e);
}
}
which is in most cases signifficantly smaller.
Something that most code packers & minifiers do not do is replacing large repeating tokens in the code with smaller ones. This is a nasty hack that also requires the use of eval, but since we're in it for the space, I don't think that should be a problem. Let's say you have this code :
a=function(){/*code here*/};
b=function(){/*code here*/};
c=function(){/*code here*/};
/*...*/
z=function(){/*code here*/};
This code has many "function" keywords repeating. What if you could replace them with a single(unused) character and then evaluate the code?
Here's how I would do it :
eval('a=F(){/*codehere*/};b=F(){/*codehere*/};c=F(){/*codehere*/};/*...*/z=F(){/*codehere*/};'.replace(/function/g,'F'));
Of course the replaced token(s) can be anything since our code is reduced to an evaluated string (ex: we could've replaced =function(){ with F, thus saving even more characters).
Note that this technique must be used with caution, because you can easily screw up your code with multiple text replacements; moreover, you should use it only in cases where it helps (ex: if you only have 4 function tokens, replacing them with a smaller token and then evaluating the code might actually increase the code length :
var a = "eval(''.replace(/function/g,'F'))".length,
b = ('function'.length-'F'.length)*4;
alert("you should" + (a<b?"":" NOT") + " use this technique!");

In the following link you'll find surprisingly good tricks to minify js code for this competition:
http://www.claudiocc.com/javascript-golfing/
One example: (extracted from section Short-circuit operators):
if (p) p=q; // before
p=p&&q; // after
if (!p) p=q; // before
p=p||q; // after
Or the more essoteric Canvas context hash trick:
// before
a.beginPath
a.fillRect
a.lineTo
a.stroke
a.transform
a.arc
// after
for(Z in a)a[Z[0]+(Z[6]||Z[2])]=a[Z];
a.ba
a.fc
a.ln
a.sr
a.to
a.ac
And here is another resource link with amazingly good tricks: https://github.com/jed/140bytes/wiki/Byte-saving-techniques

First off all, just throwing your code into a minifier won't help you that much. You need to have the extreme small file size in mind when you write the code. So in part, you need to learn all the tricks yourself.
Also, when it comes to minifiers, UglifyJS is the new shooting star here, its output is smaller than GCC's and it's way faster too. And since it's written in pure JavaScript it should be trivial for you to find out what all the tricks are that it applies.
But in the end it all comes down to whether you can find an intelligent, small solution for something that's awsome.

Also:
Dean Edwards Packer
http://dean.edwards.name/packer/
Uglify JS
http://marijnhaverbeke.nl/uglifyjs

A friend wrote jscrush packer for js1k.
Keep in mind to keep as much code self-similar as possible.
My workflow for extreme packing is: closure (pretty print) -> hand optimizations, function similarity, other code similarity -> closure (whitespace only) -> jscrush.
This packs away about 25% of the data.
There's also packify, but I haven't tested that myself.

This is the only online version of #cowboy's packer script:
http://iwantaneff.in/packer/
Very handy for packing / minifying JS

Develop Reference

JavaScript is the programming language of the Web.