How would I create a Text to Html parser? [duplicate] - javascript

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Edit: I recently learned about a project called CommonMark, which
correctly identifies and deals with the ambiguities in the original
Markdown specification. http://commonmark.org/ It has great C# library
support.
You can find the syntax here.
The source that follows with the download is written in Perl, which I have no intentions of honoring. It is riddled with regular expressions, and it relies on MD5 hashes to escape certain characters. Something is just wrong about that!
I'm about to hard code a parser for Markdown. What is experience with this?
If you don't have anything meaningful to say about the actual parsing of Markdown, spare me the time. (This might sound harsh, but yes, I'm looking for insight, not a solution, that is, a third-party library).
To help a bit with the answers, regular expressions are meant to identify patterns! NOT to parse an entire grammar. That people consider doing so is foobar.
If you think about Markdown, it's fundamentally based around the concept of paragraphs.
As such, a reasonable approach might be to split the input into paragraphs.
There are many kinds of paragraphs, for example, heading, text, list, blockquote, and code.
The challenge is thus to identify these paragraphs and in what context they occur.
I'll be back with a solution, once I find it's worthy to be shared.

The only markdown implementation I know of, that uses an actual parser, is Jon MacFarleane’s peg-markdown. Its parser is based on a Parsing Expression Grammar parser generator called peg.
EDIT: Mauricio Fernandez recently released his Simple Markup Markdown parser, which he wrote as part of his OcsiBlog Weblog Engine. Because the parser is written in OCaml, it is extremely simple and short (268 SLOC for the parser, 43 SLOC for the HTML emitter), yet blazingly fast (20% faster than discount (written in hand-optimized C) and sixhundred times faster than BlueCloth (Ruby)), despite the fact that it isn't even optimized for performance yet. Because it is only intended for internal use by Mauricio himself for his weblog, there are a few deviations from the official Markdown specification, but Mauricio has created a branch which reverts most of those changes.

I released a new parser-based Markdown Java implementation last week, called pegdown.
pegdown uses a PEG parser to first build an abstract syntax tree, which is subsequently written out to HTML. As such it is quite clean and much easier to read, maintain and extend than a regex based approach.
The PEG grammar is based on John MacFarlanes C implementation "peg-markdown".
Maybe something of interest to you...

If I was to try to parse markdown (and its extension Markdown extra) I think I would try to use a state machine and parse it one char at a time, linking together some internal structures representing bits of text as I go along then, once all is parsed, generating the output from the objects all stringed together.
Basically, I'd build a mini-DOM-like tree as I read the input file.
To generate an output, I would just traverse the tree and output HTML or anything else (PS, LaTex, RTF,...)
Things that can increase complexity:
The fact that you can mix HTML and markdown, although the rule could be easy to implement: just ignore anything that's between two balanced tags and output it verbatim.
URLs and notes can have their reference at the bottom of the text. Using data structures for hyperlinks could simply record something like:
[my text to a link][linkkey]
results in a structure like:
URLStructure:
| InnerText : "my text to a link"
| Key : "linkkey"
| URL : <null>
Headers can be defined with an underline, that could force us to use a simple data structure for a generic paragraph and modify its properties as we read the file:
ParagraphStructure:
| InnerText : the current paragraph text
| (beginning of line until end of line).
| HeadingLevel : <null> or 1-4 when we can assess
| that paragraph heading level, if any.
Anyway, just some thoughts.
I'm sure that there are many small details to take care of and I'm pretty sure that Regexes could become handy during the process.
After all, they were meant to process text.

I'd probably read the syntax specification enough times to know it, and get a feel for how to parse it.
Reading the existing parser code is of course brilliant, both to see what seems to be the main source of complexity, and if any special clever tricks are being used. The use of MD5 checksumming seems a bit weird, but I haven't studied the code enough to understand why it's being done. A comment in a routine called _EscapeSpecialChars() states:
We're replacing each such character with its corresponding MD5 checksum value;
this is likely overkill, but it should prevent us from colliding with the escape
values by accident.
Replacing a single character by a full MD5 does seem extravagant, but perhaps it really makes sense.
Of course, it'd be clever to consider creating a "true" syntax, for a tool such as Flex to get out of the regex bog.

If Perl isn't your thing, there are Markdown implementations in at least 10 other languages. They probably don't all have 100% compatibility, but tend to be pretty close.

MarkdownPapers is another Java implementation whose parser is defined in a JavaCC grammar.

If you are using a programming language that has more than three other
users, you should be able to find a library to parse it for you. A
quick Google-ing reveals libraries for CL, Haskell, Python,
JavaScript, Ruby, and so on. It is highly unlikely that you will need
to reinvent this wheel.
If you really have to write it from scratch, I recommend writing a
proper parser. With this technique, you won't have to escape things
with MD5 hashes. (I agree that if you have to do something like this,
it's time to reconsider your design.)

There are libraries available in a number of languages, including php, ruby, java, c#, javascript. I'd suggest looking at some of these for ideas.
It depends on which language you wish to use, for the best way to implement it, there will be idiomatic and non idiomatic ways to do it.
Regexes work in perl, because perl and regex are best friends.

Markdown is a JAWL (just another wiki language)
There are plenty of open source wiki's out there that you can examine the code of the parser. Most use REGEX
Check out the screwturn wiki, is has an interesting multi pass formatter pipeline, a very nice technique - see /core/Formatter.cs and /core/FormatterPipeline.cs
Best is to use/join an existing project, these sorts of things are always much harder than they appear

Here you can find a JavaScript-implementation of Markdown. It also relies heavily on regular expressions, as this is just the fastest and easiest way to parse the text.
But it spares the MD5 part.
I cannot help directly with the coding of the parsing, but maybe this link can help you one way or another.

Related

Semi-obfuscate/uglify JavaScript

I know about JS minfiers, obfuscators and minifiers. I was wondering if there is any existing tool (or any fast-to-code solution) to partially obfuscate JavaScript. By partially I mean that it should become difficult to read, but not appear as uglified/minified. It should keep indentation, but lose comments, and partially change variable names, making them unclear without converting them to "a, b, c" like an obfuscator.
The purpose of this could be to take an explicit and reusable code and make it implicit and difficult to be reused by other people, without making it impossible to work with for yourself.
Any idea from where to start to achieve this ? Maybe editing an existing obfuscator ?
[This answer is a direct response to OP's request].
Semantic Designs JavaScript obfuscator will do what you want, but you'll need two passes.
On the first pass, run it as obfuscator; it will rename identifiers (although you can control how much or how that is done), strip whitepspace and comments. If you limit its ability to rename the identifiers, you lose some the strength of the obfuscator but that's your choice.
On the second pass, run it as a prettyprinter; it will introduce nice indentation again.
(In fact, the idea for obfsucation came from building a prettyprinter; if you can print-pretty, surely it is easy to print-ugly).
From the point of view of working with the code, you are better off working with your master copy any way you like, complete with your indentation and nice commentary as documentation. When you are ready to obfsucate, you run the obfuscator, shipping the obfuscated result. Errors reported in the obfuscated result that involve obfuscated names can be mapped back to the original names, using the map of obfuscated <--> original names produced during the obfuscation step.
This a product of my company. I'd provide a link but SO hates it when I do that, so you'll have to find it via my bio or googling.
PS: It works exactly as #georg suggests, by parsing to an AST, mangling, and prettyprinting. It doesn't use esprima.
I'm not aware of a tool that would meet your specific requirements, but it seems to be relatively easy to create, given that the vital parts already exist.
parse the source into an AST, using esprima or similar
manipulate the tree in the way you want (eg. remove comments, mangle identifiers etc)
rebuild the source from the tree using escodegen

Finding comments in HTML

I have an HTML file and within it there may be Javascript, PHP and all this stuff people may or may not put into their HTML file.
I want to extract all comments from this html file.
I can point out two problems in doing this:
What is a comment in one language may not be a comment in another.
In Javascript, remainder of lines are commented out using the // marker. But URLs also contain // within them and I therefore may well eliminate parts of URLs if I
just apply substituting // and then the
remainder of the line, with nothing.
So this is not a trivial problem.
Is there anywhere some solution for this already available?
Has anybody already done this?
Problem 2: Isn't every url quoted, with either "www.url.com" or 'www.url.com', when you write it in either language? I'm not sure. If that's the case then all you haft to do is to parse the code and check if there's any quote marks preceding the backslashes to know if it's a real url or just a comment.
Look into parser generators like ANTLR which has grammars for many languages and write a nesting parser to reliably find comments. Regular expressions aren't going to help you if accuracy is important. Even then, it won't be 100% accurate.
Consider
Problem 3, a comment in a language is not always a comment in a language.
<textarea><!-- not a comment --></textarea>
<script>var re = /[/*]not a comment[*/]/, str = "//not a comment";</script>
Problem 4, a comment embedded in a language may not obviously be a comment.
<button onclick="// this is a comment//
notAComment()">
Problem 5, what is a comment may depend on how the browser is configured.
<noscript><!-- </noscript> Whether this is a comment depends on whether JS is turned on -->
<!--[if IE 8]>This is a comment, except on IE 8<![endif]-->
I had to solve this problem partially for contextual templating systems that elide comments from source code to prevent leaking software implementation details.
https://github.com/mikesamuel/html-contextual-autoescaper-java/blob/master/src/tests/com/google/autoesc/HTMLEscapingWriterTest.java#L1146 shows a testcase where a comment is identified in JavaScript, and later testcases show comments identified in CSS and HTML. You may be able to adapt that code to find comments. It will not handle comments in PHP code sections.
It seems from your word that you are pondering some approach based on regular expressions: it is a pain to do so on the whole file, try to use some tools to highlight or to discard interesting or uninteresting text and then work on what is left from your sieve according to the keep/discard criteria. Have a look at HTML::Tree and TreeBuilder, it could be very useful to deal with the HTML markup.
I would convert the HTML file into a character array and parse it. You can detect key strings like "<", "--" ,"www", "http", as you move forward and either skip or delete those segments.
The start/end indices will have to be identified properly, which is a challenge but you will have full power.
There are also other ways to simplify the process if performance is not a problem. For example, all tags can be grabbed with XML::Twig and the string can be parsed to detect JS comments.

Syntax / Logical checker In Javascript?

I'm building a solution for a client which allows them to create very basic code,
now i've done some basic syntax validation but I'm stuck at variable verification.
I know JSLint does this using Javascript and i was wondering if anyone knew of a good way to do this.
So for example say the user wrote the code
moose = "barry"
base = 0
if(moose == "barry"){base += 100}
Then i'm trying to find a way to clarify that the "if" expression is in the correct syntax, if the variable moose has been initialized etc etc
but I want to do this without scanning character by character,
the code is a mini language built just for this application so is very very basic and doesn't need to manage memory or anything like that.
I had thought about splitting first by Carriage Return and then by Space but there is nothing to say the user won't write something like moose="barry" or if(moose=="barry")
and there is nothing to say the user won't keep the result of a condition inline.
Obviously compilers and interpreters do this on a much more extensive scale but i'm not sure if they do do it character by character and if they do how have they optimized?
(Other option is I could send it back to PHP to process which would then releave the browser of responsibility)
Any suggestions?
Thanks
The use case is limited, the syntax will never be extended in this case, the language is a simple scripted language to enable the client to create a unique cost based on their users input the end result will be processed by PHP regardless to ensure the calculation can't be adjusted by the end user and to ensure there is some consistency.
So for example, say there is a base cost of £1.00
and there is a field on the form called "Additional Cost", the language will allow them manipulate the base cost relative to the "additional cost" field.
So
base = 1;
if(additional > 100 && additional < 150){base += 50}
elseif(additional == 150){base *= 150}
else{base += additional;}
This is a basic example of how the language would be used.
Thank you for all your answers,
I've investigated a parser and creating one would be far more complex than is required
having run several tests with 1000's of lines of code and found that character by character it only takes a few seconds to process even on a single core P4 with 512mb of memory (which is far less than the customer uses)
I've decided to build a PHP based syntax checker which will check the information and convert the variables etc into valid PHP code whilst it's checking it (so that it's ready to be called later without recompilation) using this instead of javascript this seems more appropriate and will allow for more complex code to arise without hindering the validation process
It's only taken an hour and I have code which is able to check the validity of an if statement and isn't confused by nested if's, spaces or odd expressions, there is very little left to be checked whereas a parser and full blown scripting language would have taken a lot longer
You've all given me a lot to think about and i've rated relevant answers thank you
If you really want to do this — and by that I mean if you really want your software to work properly and predictably, without a bunch of weird "don't do this" special cases — you're going to have to write a real parser for your language. Once you have that, you can transform any program in your language into a data structure. With that data structure you'll be able to conduct all sorts of analyses of the code, including procedures that at least used to be called use-definition and definition-use chain analysis.
If you concoct a "programming language" that enables some scripting in an application, then no matter how trivial you think it is, somebody will eventually write a shockingly large program with it.
I don't know of any readily-available parser generators that generate JavaScript parsers. Recursive descent parsers are not too hard to write, but they can get ugly to maintain and they make it a little difficult to extend the syntax (esp. if you're not very experienced crafting the original version).
You might want to look at JS/CC which is a parser generator that generates a parser for a grammer, in Javascript. You will need to figure out how to describe your language using a BNF and EBNF. Also, JS/CC has its own syntax (which is somewhat close to actual BNF/EBNF) for specifying the grammar. Given the grammer, JS/CC will generate a parser for that grammar.
Your other option, as Pointy said, is to write your own lexer and recursive-descent parser from scratch. Once you have a BNF/EBNF, it's not that hard. I recently wrote a parser from an EBNF in Javascript (the grammar was pretty simple so it wasn't that hard to write one YMMV).
To address your comments about it being "client specific". I will also add my own experience here. If you're providing a scripting language and a scripting environment, there is no better route than an actual parser.
Handling special cases through a bunch of if-elses is going to be horribly painful and a maintenance nightmare. When I was a freshman in college, I tried to write my own language. This was before I knew anything about recursive-descent parsers, or just parsers in general. I figured out by myself that code can be broken down into tokens. From there, I wrote an extremely unwieldy parser using a bunch of if-elses, and also splitting the tokens by spaces and other characters (exactly what you described). The end result was terrible.
Once I read about recursive-descent parsers, I wrote a grammar for my language and easily created a parser in a 10th of the time it took me to write my original parser. Seriously, if you want to save yourself a lot of pain, write an actual parser. If you go down your current route, you're going to be fixing issues forever. You're going to have to handle cases where people put the space in the wrong place, or perhaps they have one too many (or one too little) spaces. The only other alternative is to provide an extremely rigid structure (i.e, you must have exactly x number of spaces following this statement) which is liable to make your scripting environment extremely unattractive. An actual parser will automatically fix all these problems.
Javascript has a function 'eval'.
var code = 'alert(1);';
eval(code);
It will show alert. You can use 'eval' to execute basic code.

Programming tips with Japanese Language/Characters [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have an idea for a few web apps to write to help me, and maybe others, learn Japanese better since I am studying the language.
My problem is the site will be in mostly english, so it needs to mix fluently Japanese Characters, usually hirigana and katakana, but later kanji. I am getting closer to accomplishing this; I have figured out that the pages and source files need to be unicode and utf-8 content types.
However, my problem comes in the actual coding. What I need is to manipulate strings of text that are kana. One example is:
けす I need to take that verb and convert it to the te-form けして. I would prefer to do this in javascript as it will help down the road to do more manipulation, but if I have to will just do DB calls and hold everything in a DB.
My question is not only how to do it in javascript, but what are some tips and strategies to doing these kinds of things in other languages, too. I am hoping to get more into doing language learning apps, but am lost when it comes to this.
Stick to Unicode and utf-8 everywhere.
Stay away from the native Japanese encodings: euc-jp, shiftjis, iso-2022-jp, but be aware that you'll probably encounter them at some point if you continue.
Get familiar with a segmenter for doing complicated stuff like POS analysis, word segmentation, etc. the standard tools used by most people who do NLP (natural language processing) work on Japanese are, in order of popularity/power.
MeCab (originally on SourceForge) is awesome: it allows you to take text like,
「日本語は、とても難しいです。」
and get all sorts of great info back
kettle:~$ echo 日本語は、難しいです | mecab
日本語 名詞,一般,*,*,*,*,日本語,ニホンゴ,ニホンゴ
は 助詞,係助詞,*,*,*,*,は,ハ,ワ
、 記号,読点,*,*,*,*,、,、,、
難しい 形容詞,自立,*,*,形容詞・イ段,基本形,難しい,ムズカシイ,ムズカシイ
です 助動詞,*,*,*,特殊・デス,基本形,です,デス,デス
EOS
which is basically a detailed run-down of the parts-of-speech, readings, pronunciations, etc. It will also do you the favor of analyzing verb tenses,
kettle:~$ echo メキシコ料理が食べたい | mecab
メキシコ 名詞,固有名詞,地域,国,*,*,メキシコ,メキシコ,メキシコ
料理 名詞,サ変接続,*,*,*,*,料理,リョウリ,リョーリ
が 助詞,格助詞,一般,*,*,*,が,ガ,ガ
食べ 動詞,自立,*,*,一段,連用形,食べる,タベ,タベ
たい 助動詞,*,*,*,特殊・タイ,基本形,たい,タイ,タイ
EOS
However, the documentation is all in Japanese, and it's a bit complicated to set up and figure out how to format the output the way you want it. There are packages available for ubuntu/debian, and bindings in a bunch of languages including perl, python, ruby...
Apt-repos for ubuntu:
deb http://cl.naist.jp/~eric-n/ubuntu-nlp intrepid all
deb-src http://cl.naist.jp/~eric-n/ubuntu-nlp intrepid all
Packages to install:
$ apt-get install mecab-ipadic-utf8 mecab python-mecab
should do the trick I think.
The other alternatives to mecab are, ChaSen, which was written years ago by the author of MeCab (who incidentally works at google now), and Kakasi, which is much less powerful.
I would definitely try to avoid rolling your own conjugation routines. the problem with this is just that it will require tons and tons of work, which others have already done, and covering all the edge cases with rules is, at the end of the day, impossible.
MeCab is statistically driven, and trained on loads of data. It employs a sophisticated machine learning technique called conditional random fields (CRFs) and the results are really quite good.
Have fun with the Japanese. I'm not sure how good your Japanese is, but if you need help with the docs for mecab or whatever feel free to ask about that as well. Kanji can be quite intimidating at the beginning.
My question is not only how to do it
in javascript, but what are some tips
and strategies to doing these kinds
of things in other langauges too.
What you want to do is pretty basic string manipution - apart from the missing word separators, as Barry notes, though that's not a technical problem.
Basically, for a modern Unicode-aware programming language (which JavaScript has been since version 1.3, I believe) there is no real difference between a Japanese kana or kanji, and a latin letter - they're all just characters. And a string is just, well, a string of characters.
Where it gets difficult is when you have to convert between strings and bytes, because then you need to pay attention to what encoding you are using. Unfortunately, many programmers, especially native English speakers tend to gloss over this problem because ASCII is the de facto standard encoding for latin letters and other encodings usually try to be compatible. If latin letters are all you need, then you can get along being blissfully ignorant about character encodings, believe that bytes and characters are basically the same thing - and write programs that mutilate anything that's not ASCII.
So the "secret" of Unicode-aware programming is this: learn to recognize when and where strings/characters are converted to and from bytes, and make sure that in all those places the correct encoding is used, i.e. the same that will be used for the reverse conversion and one that can encode all the character's you're using. UTF-8 is slowly becoming the de-facto standard and should normally be used wherever you have a choice.
Typical examples (non-exhaustive):
When writing source code with non-ASCII string literals (configure encoding in the editor/IDE)
When compiling or interpreting such source code (compiler/interpreter needs to know the encoding)
When reading/writing strings to a file (encoding must be specified somewhere in the API, or in the file's metadata)
When writing strings to a database (encoding must be specified in the configuration of the DB or the table)
When delivering HTML pages via a webserver (encoding must be specified in the HTML headers or the pages' meta header; forms can be even more tricky)
What you need to do is to look at the rules of grammar. Have an array of rules for each conjugation. Let's take 〜て form for example.
Psudocode :
def te_form(verb)
switch verb.substr(-1, 1) == "る" then return # verb minus ru plus te
case "る" #return (verb - る) + て
case "す" #return (verb - す)+して
etc. Basically, break it down into Type I, II and III verbs.
your question is totally unclear to me.
however, i had some experience working with japanese language, so i'll give my 2 Cents.
since japanese texts do not feature word separation (e.g. space character), the most important tool we had to acquire is a dictionary-based word recognizer.
once you got the text split, it's easier to manipulate it with "normal" tools.
there were only 2 tools which did the above, and as a by-product they also worked as a tagger (i.e. noun, verb, etc.).
edit:
always use unicode when working w languagers.
If I recall correctly (and I slacked off a lot the year I took Japanese so I could be wrong), the replacements you want to do are determined by the last symbol or two in the word. Taking your first example, any verb ending in 'す' will always have 'して' when conjugated this way. Similarly for む -> んで. Could you maybe establish a mapping of last character(s) -> conjugated form. You might have to account for exceptions, such as anything which conjugates to xxって.
As for portability between languages, you'll have to implement the logic differently based on how they work. This solution would be fairly straightforward to implement for Spanish as well, since the conjugations depends on if the verb ends in -ar, -er, or -ir (with some verbs requiring exceptions in your logic). Unfortunately, that's the limit of my multi-lingual skills, so I don't know how well it would do beyond those two.
Since most verbs in Japanese follow one of a small set of predictable patterns, the easiest and most extensible way to generate all the forms of a given verb is to have the verb know what conjugation it should follow, then write functions to generate each form depending on the conjugation.
Pseudocode:
generateDictionaryForm(verb)
case Ru-Verb: verb.stem + る
case Su-Verb: verb.stem + す
case Ku-Verb: verb.stem + く
...etc.
generatePoliteForm(verb)
case Ru-Verb: verb.stem + ります
case Su-Verb: verb.stem + します
case Ku-Verb: verb.stem + きます
...etc.
Irregular verbs would of course be special-cased.
Some variant of this would work for any other fairly regular language (i.e. not English).
Try to install my gem (rom2jap). It is in ruby.
gem install rom2jap
Open your terminal and type:
require 'rom2jap'

What good is JSLint if jQuery fails the validation [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
So I have been exploring different methods to clean up and test my JavaScript. I figured just like any other language one way to get better is to read good code. jQuery is very popular so it must have a certain degree of good coding.
So why when I run jQuery through JSLint's validation it gives me this message:
Error:
Problem at line 18 character 5:
Expected an identifier and instead saw
'undefined' (a reserved word).
undefined,
Problem at line 24 character 27:
Missing semicolon.
jQuery = window.jQuery = window.$ =
function( selector, context ) {
Problem at line 24 character 28:
Expected an identifier and instead saw
'='.
jQuery = window.jQuery = window.$ =
function( selector, context ) {
Problem at line 24 character 28:
Stopping, unable to continue. (0%
scanned).
This was done using JSLint and jquery-1.3.1.js
JSLint tests one particular person's (Douglas Crockford) opinions regarding what makes good JavaScript code. Crockford is very good, but some of his opinions are anal retentive at best, like the underscore rule, or the use of the increment/decrement operators.
Many of the issues being tagged by JSLint in the above output are issues that Crockford feels leads to difficult to maintain code, or they are things that he feels has led him to doing 'clever' things in the past that can be hard to maintain.
There are some things Crockford identifies as errors that I agree with though, like the missing semicolons thing. Dropping semicolons forces the browser to guess where to insert the end-of-statement token, and that can sometimes be dangerous (it's always slower). And several of those errors are related to JSLint not expecting or supporting multiple assignments like jQuery does on line 24.
If you've got a question about a JSLint error, e-mail Crockford, he's really good about replying, and with his reply, you'll at least know why JSLint was implemented that way.
Oh, and just because a library is popular doesn't mean it's code is any good. JQuery is popular because it's a relatively fast, easy to use library. That it's well implemented is rather inconsequential to it's popularity among many. However, you should certainly be reading more code, we all should.
JSLint can be very helpful in identifying problems with the code, even if JQuery doesn't pass the standards it desires.
JSLint helps you catch problems, it isn't a test of validity or a replacement for thinking. jQuery is pretty advanced as js goes, which makes such a result understandable. I mean the first couple of lines are speed hacks, no wonder the most rigid js parser is going have a couple of errors.
In any case, the assumption that popular code is perfectly correct code or even 'good' is flawed. JQuery code is good, and you can learn a lot of from reading it. You should still run your stuff through JSLint, if only because it's good to hear another opinion on what you've written.
From JSLint's description:
JSLint takes a JavaScript source and scans it. If it finds a problem, it returns a message describing the problem and an approximate location within the source. The problem is not necessarily a syntax error, although it often is. JSLint looks at some style conventions as well as structural problems. It does not prove that your program is correct. It just provides another set of eyes to help spot problems.
JSLint defines a professional subset of JavaScript, a stricter language than that defined by Edition 3 of the ECMAScript Language Specification. The subset is related to recommendations found in Code Conventions for the JavaScript Programming Language.
"jQuery is very popular so it must have a certain degree of good coding."
One would like to hope this is the case with jQuery, but unfortunately it's not really true. jQuery is useful and popular, but it is not a well written JavaScript library. David Mark recently posted a scathing critique of jQuery in comp.lang.javascript that examines a large number of examples of the poor code found in jQuery:
http://groups.google.com/group/comp.lang.javascript/msg/37cb11852d7ca75c?hl=en&
If you're not actively developing jQuery itself, why even run JSLint over it at all? If it works, it works, and you don't have to worry about it.
The jQuery developers' goals are not the same as your goals. jQuery is built for speed and compactness and achieving those goals trumps readability and maintainability.
Crockford's tests in JSLint have more to do with achieving something that he would feel comfortable taking home to meet his mother, which is a valid concern if you will be married to your code for some time.
The purpose of JsLint is clearly stated in the FAQ [1]:
"JSLint defines a professional subset of JavaScript, a stricter language than that defined by Edition 3 of the ECMAScript Language Specification. The subset is related to recommendations found in Code Conventions for the JavaScript Programming Language."
Now if you are confused: ECMA3 is already a subset of the JS capabilities provided by any of todays JS interpreters (see [2] for an overview of the relation between JavasScript and ECMAScript versions)
To answer the quesition "what good is JSlint":
* use JsLint to verify you are using a "safe" subset of Javascript that is unlikly to break accross JS implementations.
* use Jslint to verify you followed the crockford code conventions [4]
[1] http://www.jslint.com/lint.html
[2] https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/JavaScript_Overview#Relationship_between_JavaScript_Versions_and_ECMAScript_Editions
[3] https://developer.mozilla.org/en/New_in_JavaScript_1.7#Avoiding_temporary_variables
[4] http://javascript.crockford.com/code.html
I've found one case where JSLint is very, very useful: when you grab one of those big-arse libraries that float around the 'Net, then another, then again one other, you soon find yourself loading 50k of Javascript on every new page load (caching may help, but it's not a cure-all solution).
What would you do then? Compress those libraries. But your host doesn't do compression for non-html file! So what? You use a Javascript compressor.
The best I've found is Dean Edward's; I used it to compress John Fraser's Showdown (a Markdown for Javascript library), but unfortunately, the compression broke the code. Since Showdown isn't supported anymore, I had to correct it myself - and JSlint was invaluable for that.
In short, JSlint is useful to prepare your JS code for heavy duty compression.
If you like to daisy-chain methods like jQuery allows you, you might appreciate YUI3.
JQuery is of course not the best thing in the world. That's already clear when you look at the notation. The dollar parentheses combination is really bad for your eyes. Programming should be clear and simple. JQuery is far from that. That reason is enough for me not to use it. That it's not properly written doesn't surprise me and only underscores my thoughts on this JavaScript library.

Categories

Resources