why does minified jQuery have line breaks? [duplicate] - javascript

I know about a similar question but it is a tiny bit off from what I am asking here, so please don't flag it as a duplicate.
When you see the production version of jQuery, why is there a newline after a while? I downloaded a copy and deleted all the newlines (apart from the licence) and it still worked. (I ran the entire unit test suite against my changes on Mozilla Firefox, Google Chrome and Opera.)
I know three newlines (not counting the license) is not going to slow it down a lot, but still, doesn't every tiny bit help?
I have assigned myself a small challenge, to squeeze every little bit of performance out of my JavaScript code.

jQuery currently use UglifyJS to minify their source code. In their build script, they specifically set the max_line_length directive to be 32 * 1024:
The documentation for UglifyJS has this to say on the max-line-len directive;
--max-line-len (default 32K characters) — add a newline after around 32K characters. I’ve seen both FF and Chrome croak when all the code was on a single line of around 670K. Pass –max-line-len 0 to disable this safety feature.

To cite the Closure Compiler FAQ:
The Closure Compiler intentionally adds line breaks every 500
characters or so. Firewalls and proxies sometimes corrupt or ignore
large JavaScript files with very long lines. Adding line breaks every
500 characters prevents this problem. Removing the line breaks has no
effect on a script's semantics. The impact on code size is small, and
the Compiler optimizes line break placement so that the code size
penalty is even smaller when files are gzipped.
This is relevant to any minification programs in general.

The lines (excluding the license) are all around 30k characters in length. It could be to avoid bugs where some Javascript parsers die on extremely long lines. This probably won't happen on today's browsers but maybe some older or more obscure ones have such limits.
(Old answer below, which might also be applicable, just not in this case)
This might be because JSMin, a popular Javascript minifier will retain line feeds in the output under certain conditions. This is because in Javascript line feeds are significant if you leave out semicolons, for example. The documentation says:
It is more conservative in omitting linefeeds, because linefeeds are sometimes treated as semicolons. A linefeed is not omitted if it precedes a non-ASCII character or an ASCII letter or digit or one of these characters:
\ $ _ { [ ( + -
and if it follows a non-ASCII character or an ASCII letter or digit or one of these characters:
\ $ _ } ] ) + - " '
Other minifiers might have similar rules.
So this is mostly a precaution against accidentally removing a line feed that may be necessary, syntax-wise. The last thing you want is that your minified JS won't work anymore because the minifier destroyed its semantics.
Regarding »I know three newlines (not counting the license) is not going to slow it down a lot, but still, doesn't every tiny bit help?«: When your server uses gzip compression the difference will likely be moot anyway.

Related

Dealing with illegal characters (apostrophes) in a TXT file with Node.js

I am relying on .txt files being sent externally in Node.js that sometimes have what i would class as "illegal" characters such as apostrophes and commas resulting in copying and pasting from webpages and programs such as Microsoft Word
How can I get Node.js or use Javascript to replace these incorrect formats such as apostrophes with correctly formatted apostrophes or strip out any illegal characters full stop?
Here is an example from a web page and shown in PasteBin:
Resilience is what happens when we’re able to move forward even when things don’t fit together the way we expect.
And tolerances are an engineer’s measurement of how well the parts meet spec. (The word ‘precision’ comes to mind). A 2018 Lexus is better than 1968 Camaro because every single part in the car fits together dramatically better. The tolerances are more narrow now.
One way to ensure that things work out the way you hope is to spend the time and money to ensure that every part, every form, every worker meets spec. Tighten your spec, increase precision and you’ll discover that systems become more reliable.
The other alternative is to embrace the fact that nothing is ever exactly on spec, and to build resilient systems.
You’ll probably find that while precision feels like the way forward, resilience, the ability to thrive when things go wrong, is a much safer bet.
The trap? Hoping for one, the other or both but not doing the work to make it likely. What will you do when it doesn’t work?
Neither resilience nor tolerances get better on their own.
https://pastebin.com/uJ7GAKk4
Copied from the following URL and pasted into Notepad and saved
https://seths.blog/storyoftheweek/
You could use a RegExp to remove the unwanted characters
// text is the pasted text
var filtered = text.replace(/[',]/gm, '');

JS lexing---multi line string

I am making a JS lexer as part of my study. In JS, single line stings start from " or ' and ends with the same character except if that character is preceded by a backslash.
In my current code, I loop through every character and append them to existing tokens based on flags like "string" or "regex". so it feels natural to implement multi line string with " or ' because it seems that it does not affect any other part of my lexer
Is there any practical reason why new line is not allowed as contents of strings?
Many languages, but not all, prohibit unescaped newlines in string literals. So JavaScript is certainly not unique here.
But the motivation really has little to do with the ease, difficulty or efficiency of lexical analysis. In fact, for lexical analysis the simplest syntax is to allow any character rather than having to include special-case checks. [Note 1]
There are other considerations, though; notably, the importance of a program to be readable and easy to debug. Long strings put an extra load on someone reading the code, because they may not be aware that a section of program text is actually part of a string literal. (There's a similar problem with multiline comments, which is why it's usually considered good style to mark every line in a long comment in some way, for example with a vertical column of stars at the left-hand margin. No such solution exists for string literals, though.)
Also, unterminated multiline strings can be annoying to correct. If strings are cannot span lines, the error will be detected on the line containing the problem. But multiline strings might continue until the beginning of the next string, then triggering a syntax error when the contents of the next string are accidentally parsed as program text. Or worse, resulting in a completely incorrect parse of what was supposed to be program text, followed by another incorrect string literal starting where the second literal ends, and continuing from there.
That also makes it hard for developer tools, such as editors and syntax highlighters, to deal with program text as it is being typed.
In the end, you may or may not find these arguments compelling, and a language designer might have other aesthetic preferences as well. I can't really speak for the original designers of the JavaScript language, and neither of us can take a voyage in time to argue with them and maybe change their decision.
For better or worse, languages are designed according to particular subjective judgements, and if the language is successful these judgements become permanent features. They are things you have to accept if you are using a language and they're not usually worth obsessing about. You get used to them, or you find a different language to program in, with its own syntax quirks.
When you design your own language, you will need to resolve a large number of syntactic questions, and you will undoubtedly run into cases where the answer is not clearcut because there is no objectively correct unique solution. Whatever you do, someone will want to argue with you. Perhaps you can refer them to this answer.
Notes:
There is actually a historic reason for not allowing multiline string literals, which is much clearer but has been more or less irrelevant for several decades.
Once Upon A Time, common filesystems considered text files to be linear arrays of fixed-length lines (often 80 character lines, matching a Hollerith card). One advantage of such a filesystem is that it could instantly navigate to a particular line number in a file, since all lines were the same length. But in any case, for systems where programs were entered on punched cards, the fixed length lines were just part of the environment.
To make all lines the same length, lines needed to be filled out with space characters. This would obviously make multiline string literals awkward, and that's why C never allowed multiline string literals, instead relying on a syntactic feature where consecutive string literals are automatically concatenated into a single literal.
In the end, fixed-line-length filesystems proved to be unpopular, and I don't think you're likley to run into one these days. But a careful reading of the C and Posix standards shows that such filesystems must still be usable by conforming implementations, with the consequence that a fully portable program must be prepared to deal with line length limits on output and trailing whitespace on input.
There is also such syntax
const string =
'line1\
line2\
line3'

Spaces in equal signs

I'm just wondering is there a difference in performance using removing spaces before and after equal signs. Like this two code snippets.
first
int i = 0;
second
int i=0;
I'm using the first one, but my friend who is learning html/javascript told me that my coding is inefficient. Is it true in html/javascript? And is it a huge bump in the performance? Will it also be same in c++/c# and other programming languages? And about the indent, he said 3 spaces is better that tab. But I already used to code like this. So I just want to know if he is correct.
Your friend is a bit misguided.
The extra spaces in the code will make a small difference in the size of the JS file which could make a small difference in the download speed, though I'd be surprised if it was noticeable or meaningful.
The extra spaces are unlikely to make a meaningful difference in the time to parse the file.
Once the file is parsed, the extra spaces will not make any difference in execution speed since they are not part of the parsed code.
If you really want to optimize download or parse speed, the way to do that is to write your code in the most readable fashion possible for best maintainability and then use a minimizer for the deployed code and this is a standard practice by many web sites. This will give you the best of both worlds - maintainable, readable code and minimum deployed size.
A minimizer will remove all unnecessary spacing, shorten the names of variables, remove comments, collapse lines, etc... all designed to make the deployed code as small as possible without changing the run-time meaning of the code at all.
C++ is a compiled language. As such, only the compiler that the developer uses sees any extra spaces (same with comments). Those spaces are gone once the code has been compiled into native code which is what the end-user gets and runs. So, issues about spaces between elements in a line are simply not applicable at all for C++.
Javascript is an interpreted language. That means the source code is downloaded to the browser and the browser then parses the code at runtime into some opcode form that the interpreter can run. The spaces in Javascript will be part of the downloaded code (if you don't use a minimizer to remove them), but once the code is parsed, those extra spaces are not part of the run-time performance of the code. Thus, the spaces could have a small influence on the download time and perhaps an even smaller influence on the parse time (though I'm guessing unlikely to be measurable or meaningful). As I said above, the way to optimize this for Javascript is to use spaces to enhance readability in the source code and then run a minimizer over the code to generate a deployed version of the code to minimize the deployed size of the file. This preserves maximum readability and minimizes download size.
There is little (javascript) to no (c#, c++, Java) difference in performance. In the compiled languages in particular, the source code compiles to the exact same machine code.
Using spaces instead of tabs can be a good idea, but not because of performance. Rather, if you aren't careful, use of tabs can result in "tab rot", where there are tabs in some places and spaces in others, and the indentation of the source code depends on your tab settings, making it hard to read.

JavaScript minifier that outputs linefeeds instead of semicolons - Windows based

We want line-numbers to be useful when an exception occurs in the clientside JavaScript. Ideally a solution that understands the ASI (Automatic Semicolon Insertion) rules i.e. that is syntax aware and not RegExp/text based.
The closest I have found is UglifyJS2 with semicolon option set to false. [edit: UglifyJS2 also needs the maximum line length option set to a low value, and perhaps sequences option unticked, otherwise it would still generate long lines e.g. 1000s of characters long].
PS: We must use Windows for our build process (locked in for various reasons), and our build scripts already have dependencies on Python and Java (closurecompiler for JS, yuicompressor for CSS, cssembed for embedding images into CSS) and I really don't want to add another build process dependency (Node, Ruby, or a web service etc), and I strongly disfavour using a RegExp solution as I think text processing is too fragile for a reliable production solution.
PPS: I don't want to rely on sourcemaps because they are significantly more complex (cross-browser issues, internal and client support issues, difficulties managing and finding correct sourcemap).
PPPS: The reason for the new requirement for line numbers is that we are starting to use https://github.com/getsentry/raven-js and it will be most useful if line numbers for exceptions are useful. Our client side code is a single page Ajax/json app. In theory using a linefeed instead of a semicolon would change the gzip size very little.
UglifyJS2 was the only solution I found that worked OK with the following settings:
semicolon option set to false
maximum line length option set to a reasonably low value
sequences option unticked
The line length choice involves a compromise: low line length is better for diagnosing where an exception occurred, but the gzip size increases by a small amount as you decrease the line length.

How few lines can javascript be on?

I am writing a little custom minifier for javascript, and I am wondering if I need to force line breaks after a certain number of characters on a single line, or if it matters at all?
For example, I can minify an entire script into a single line with thousands of characters, or every n characters, I could insert a line break (at an acceptable point in the js code) to drop it to the next line.
Is there a reason to do this, or is does javascript/browsers not care how many characters are on a line?
I wouldn't want to bet my site on various web servers, proxies, Internet Explorer, etc. all not-choking on huge line lengths. Of course, I don't know quite how I'd define huge, but jQuery - as an example - looks like it limits to about 512 characters per line. Put it this way, I'd prefer your minifier if it allowed me to choose whether I have line breaks every x characters or not.
There are some pitfalls regarding to long lines of JavaScript. The following is from the Closure Compiler's FAQ (this link):
Why are there random line feeds in compiled scripts?
The Closure Compiler intentionally adds line breaks every 500 characters or so. Firewalls and proxies sometimes corrupt or ignore large JavaScript files with very long lines. Adding line breaks every 500 characters prevents this problem. Removing the line breaks has no effect on a script's semantics. The impact on code size is small, and the Compiler optimizes line break placement so that the code size penalty is even smaller when files are gzipped.
You can put an entire script on one line, it makes no difference to the parser.
However breaking content onto lines of about 4kB may be beneficial to the downloader, slightly improving performance.

Categories

Resources