What is the canonical implementation of markdown? - javascript

The problem with writing my own Markdown parser in Clojure is that Markdown is not a well-specified language. There is no "official" grammar, just an informal "Here's how it works" description and a really ugly reference implementation in Perl.
http://briancarper.net/blog/415/
I can see Gruber's specification here and the implementation here.
This is an implementation that wins the google ranking test here
Then there is peg-markdown which appears to solve the 'there is no grammar' problem - but is not the canonical implementation.
My question is - what is the canonical implementation of markdown? (The one that everybody says defines the standard).
EDIT:
I acknowledge that "there is no canonical standard". I'm looking for the next best thing.
The answer seems to be showdown.js, but there are problems with it.
(using the definition of canonical being the one that everybody says defines the standard).
It gets referenced here and on github here.
I'll throw in pagedown as well (as aluded to by #deceze) because it appears to fix the bugs in showdown and be a little closer to Gruber's original.

I believe Gruber's is the original and sort-of-canonical (see, for example, his 'Introducing Markdown'), and then people have extended it from there. I think some extensions are more common than others though, so it's probably worth seeing what a few well-used packages have over his original.

The CommonMark project attempts to address some of the issues of the Markdown specification, in particular some ambiguities. It comes with a reference implementation, but that's obviously just the reference implementation for CommonMark, not for Markdown in general. It may become the de-facto standard in years to come, since some major users are involved in that project, but it might as well become just another dialect among many, in which case the reference implementation would add little value.

Related

why the name `reduce` was adopted instead of `fold` in javascript?

I did a search on this topic, but I do not know what keyword to search for so I post this question.
I wondered why the name reduce was chosen in javascript even though the name has been used, such as fold or accumulate, which is more traditional and meaningful(this is my personal opinion).
I've spoken about this topic with someone close to me (one of the people I know who has been dealing with Javascript for a long time and who has also worked with functional languages like Scheme, Racket, and Clojure). He cautiously speculated that this might be an effect of Python.
If you have any traditional context that I do not know, or someone you know about the background of this name, I would be very grateful if you could answer.
If you look at https://en.wikipedia.org/wiki/Fold_(higher-order_function)#Folds_in_various_languages, you can see that the most common names for this operation are fold, reduce, and inject. reduce is by no means an unusual name. In particular, reduce is used in Perl (which JavaScript copied many features from), and Perl probably got it from Lisp (like many other features). The earliest reference I can find is REDUCE in Common Lisp (which was developed in the 1980s and standardized in 1994).

How would I create a Text to Html parser? [duplicate]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Edit: I recently learned about a project called CommonMark, which
correctly identifies and deals with the ambiguities in the original
Markdown specification. http://commonmark.org/ It has great C# library
support.
You can find the syntax here.
The source that follows with the download is written in Perl, which I have no intentions of honoring. It is riddled with regular expressions, and it relies on MD5 hashes to escape certain characters. Something is just wrong about that!
I'm about to hard code a parser for Markdown. What is experience with this?
If you don't have anything meaningful to say about the actual parsing of Markdown, spare me the time. (This might sound harsh, but yes, I'm looking for insight, not a solution, that is, a third-party library).
To help a bit with the answers, regular expressions are meant to identify patterns! NOT to parse an entire grammar. That people consider doing so is foobar.
If you think about Markdown, it's fundamentally based around the concept of paragraphs.
As such, a reasonable approach might be to split the input into paragraphs.
There are many kinds of paragraphs, for example, heading, text, list, blockquote, and code.
The challenge is thus to identify these paragraphs and in what context they occur.
I'll be back with a solution, once I find it's worthy to be shared.
The only markdown implementation I know of, that uses an actual parser, is Jon MacFarleane’s peg-markdown. Its parser is based on a Parsing Expression Grammar parser generator called peg.
EDIT: Mauricio Fernandez recently released his Simple Markup Markdown parser, which he wrote as part of his OcsiBlog Weblog Engine. Because the parser is written in OCaml, it is extremely simple and short (268 SLOC for the parser, 43 SLOC for the HTML emitter), yet blazingly fast (20% faster than discount (written in hand-optimized C) and sixhundred times faster than BlueCloth (Ruby)), despite the fact that it isn't even optimized for performance yet. Because it is only intended for internal use by Mauricio himself for his weblog, there are a few deviations from the official Markdown specification, but Mauricio has created a branch which reverts most of those changes.
I released a new parser-based Markdown Java implementation last week, called pegdown.
pegdown uses a PEG parser to first build an abstract syntax tree, which is subsequently written out to HTML. As such it is quite clean and much easier to read, maintain and extend than a regex based approach.
The PEG grammar is based on John MacFarlanes C implementation "peg-markdown".
Maybe something of interest to you...
If I was to try to parse markdown (and its extension Markdown extra) I think I would try to use a state machine and parse it one char at a time, linking together some internal structures representing bits of text as I go along then, once all is parsed, generating the output from the objects all stringed together.
Basically, I'd build a mini-DOM-like tree as I read the input file.
To generate an output, I would just traverse the tree and output HTML or anything else (PS, LaTex, RTF,...)
Things that can increase complexity:
The fact that you can mix HTML and markdown, although the rule could be easy to implement: just ignore anything that's between two balanced tags and output it verbatim.
URLs and notes can have their reference at the bottom of the text. Using data structures for hyperlinks could simply record something like:
[my text to a link][linkkey]
results in a structure like:
URLStructure:
| InnerText : "my text to a link"
| Key : "linkkey"
| URL : <null>
Headers can be defined with an underline, that could force us to use a simple data structure for a generic paragraph and modify its properties as we read the file:
ParagraphStructure:
| InnerText : the current paragraph text
| (beginning of line until end of line).
| HeadingLevel : <null> or 1-4 when we can assess
| that paragraph heading level, if any.
Anyway, just some thoughts.
I'm sure that there are many small details to take care of and I'm pretty sure that Regexes could become handy during the process.
After all, they were meant to process text.
I'd probably read the syntax specification enough times to know it, and get a feel for how to parse it.
Reading the existing parser code is of course brilliant, both to see what seems to be the main source of complexity, and if any special clever tricks are being used. The use of MD5 checksumming seems a bit weird, but I haven't studied the code enough to understand why it's being done. A comment in a routine called _EscapeSpecialChars() states:
We're replacing each such character with its corresponding MD5 checksum value;
this is likely overkill, but it should prevent us from colliding with the escape
values by accident.
Replacing a single character by a full MD5 does seem extravagant, but perhaps it really makes sense.
Of course, it'd be clever to consider creating a "true" syntax, for a tool such as Flex to get out of the regex bog.
If Perl isn't your thing, there are Markdown implementations in at least 10 other languages. They probably don't all have 100% compatibility, but tend to be pretty close.
MarkdownPapers is another Java implementation whose parser is defined in a JavaCC grammar.
If you are using a programming language that has more than three other
users, you should be able to find a library to parse it for you. A
quick Google-ing reveals libraries for CL, Haskell, Python,
JavaScript, Ruby, and so on. It is highly unlikely that you will need
to reinvent this wheel.
If you really have to write it from scratch, I recommend writing a
proper parser. With this technique, you won't have to escape things
with MD5 hashes. (I agree that if you have to do something like this,
it's time to reconsider your design.)
There are libraries available in a number of languages, including php, ruby, java, c#, javascript. I'd suggest looking at some of these for ideas.
It depends on which language you wish to use, for the best way to implement it, there will be idiomatic and non idiomatic ways to do it.
Regexes work in perl, because perl and regex are best friends.
Markdown is a JAWL (just another wiki language)
There are plenty of open source wiki's out there that you can examine the code of the parser. Most use REGEX
Check out the screwturn wiki, is has an interesting multi pass formatter pipeline, a very nice technique - see /core/Formatter.cs and /core/FormatterPipeline.cs
Best is to use/join an existing project, these sorts of things are always much harder than they appear
Here you can find a JavaScript-implementation of Markdown. It also relies heavily on regular expressions, as this is just the fastest and easiest way to parse the text.
But it spares the MD5 part.
I cannot help directly with the coding of the parsing, but maybe this link can help you one way or another.

Javascript optional type hinting

When a programming language is statically typed, the compiler can be more precise about memory allocation and thus be generally more performant (with all other things equal).
I believe ES4 introduced optional type hinting (from what I understand, Adobe had a huge part in contributing to its spec due to actionscript). Does javascript officially support type hinting as a result? Will ES6 support optional type hinting for native variables?
If Javascript does support type hinting, are there any benchmarks that show how it pays off in terms of performance? I have not seen an open source project use this yet.
My understanding, from listening to many Javascript talks on the various sites, is that type-hinting won't do as much to help as people think it will.
In short, most Javascript objects tend to have the same "shape", if you will. That is, they will have the same properties created in the same order. This "shape" can be thought of as the "type" of the object.
An example:
function Point(x, y) {
this.x = x;
this.y = y;
}
All objects made from "Point" will have the same "shape" and the newer internal Javascript engines can do some fancy games to get faster lookup.
In Chrome (perhaps others), they use a high-bit flag to indicate if the rest of the number is an integer or a pointer.
With all of these fancy things going on, that just leaves typing for the human coders. I, for one, really like not having to worry about type and wouldn't use that feature.
You are semi-correct, though. Type hinting is a part of ActionScript 3 which is a derivative of ECMAScript -- but hinting has never made it into the standard. AFAIK, outside of wishful thinking, it hasn't been discussed.
This video describes things in far more detail:
http://www.youtube.com/watch?v=FrufJFBSoQY
I'm late, but since no one really answered you questions regarding the standards, I'll jump in.
Yes, type hinting was discussed as part of ECMAScript 4, and it looked like it was going to be the future of JavaScript... until ES4 bit the dust. ECMAScript 4 was abandoned and never finalized. ECMAScript 5 (the current standard) did not contain many of the things that were planned for ECMAScript 4 (including type hinting), and was really just a quickly beefed up version of the ECMAScript 3.1 draft -- to get some helpful features out the door in the wake of ES4's untimely demise.
As you mentioned, now they're working on churning out ECMAScript 6 (which has some totally awesome features!), but don't expect to see type hinting. The Adobe guys have, to a degree, parted ways with the ECMAScript committee, and the ES committee doesn't seem interested in bringing it back (I think for good reason).
If it is something you want, you might want to check out TypeScript. It's a brand new Microsoft project which is basically an attempt to be ES6+types. It's a superset of JavaScript (almost identical except for the inclusion of types), and it compiles to runnable JavaScript.
JavaScript JIT compilers have to do some pretty fancy stuff to determine the types of expressions and variables, since types are crucial to many optimizations. But the JavaScript compiler writers have spent the last five years doing all that work. The compilers are really smart now. Optional static types therefore would not improve the speed of a typical program.
Surprisingly, type annotations in ActionScript sometimes make the compiled code slower by requiring a type check (or implicit conversion) when a value is passed from untyped code to typed code.
There are other reasons you might want static types in a programming language, but the ECMAScript standards committee has no interest in adding them to JS.
ES7 (Not coming soon) has a new feature called guard might be the one you are asking.
The syntax now is a little similar to ES4 and TypeScript.
All use : and append the type to the variable.
But its not confirm syntax.
Javascript is prototype-based, so the 'type' of an object is entirely dynamic and able to change through its lifetime.
Have a look at Ben Firshman's findings on Javascript performance in regards to object types - http://jsconf.eu/2010/speaker/lessons_learnt_pushing_browser.html

I want to implement a scheme interpreter for studying SICP

I'm reading the book Structure and Interpretation
of Computer Programs, and I'd like to code a scheme interpreter gradually.
Do you knows the implementation of the scheme most easy to read (and short)?
I will make a JavaScript in C.
SICP itself has several sections detailing how to build a meta-circular interpreter, but I would suggest that you take a look at the following two books for better resources on Scheme interpreters: Programming Languages: Application and Interpretation and Essentials of Programming Languages. They're both easy to read and gradually guide you through building interpreters.
I would recommend the blog series Scheme from scratch which incrementally builds up a scheme interpreter in C.
Christian Queinnec's book Lisp In Small Pieces is superb. More modern that EoPL. Covers both Lisp and Scheme, and goes into detail about the gory low-level stuff that most books omit.
I would recommend reading Kent Dybvig's dissertation "Three Implementation Models for Scheme". Not the whole dissertation, but the first part (up to chapter 3) where he discusses the Heap-Based Model is very suitable for a naive implementation of Scheme.
Another great resource (if I understood it correctly and you want to implement it in C) is Nils Holm's "Scheme 9 from Empty Space". This link is to Nils's page, and there's a link at the bottom to the old, public domain, edition of the book and to the newer, easier to read, commercially available edition. Read both and loved 'em.
I can give you an overview on how my interpreter works, maybe it can give you an idea of the general thing.
Although the answer is pretty late, I hope this can help someone else, who has come to this thread and wants a general idea.
For each line of scheme entered , a Command object is created. If the command is partial then its nest level is stored(number of remaining right brackets to complete the expression). If the command is complete an Expression Object is created and the evaluators are fired on this object.
There are 4 types of evaluator classes defined , each derived from the base class Evaluator
a) Define_Evaluator :for define statements
b) Funcall_Evaluator :for processing other user defined functions
c) Read_Evaluator :for reading an expression and converting it to a scheme object
d) Print_Evaluator :prints the object depending on the type of the object.
e) Eval_Evaluator :does the actual processing of the expression.
3.-> First each expression is read using the Read Evaluator which will create a scheme object out of the expression. Nested expressions are calculated recursively until the expression is complete.
->Next, the Eval_Evaluator is fired which processes the Scheme Expression Object formed in the first step.
this happens as so
a) if the expression to be evaluated is a symbol. Return its value. Therefore the variable blk will return the object for that block.
b) if the expression to be evaluated is a list. Print the list.
c) if the expression to be evaluated is a function. Look for the definition of the function which will return the evaluation using the Funcall_Evaluator.
->Lastly the print evaluator is fired to print the outcome , this print will depend on what type the output expression is.
Disclaimer:
This is how my interpreter works , doesnt have to be that way.
I've been on a similar mission but several years later, recommendations:
Peter Michaux's scheme from scratch: http://michaux.ca/articles/scheme-from-scratch-introduction, and his github repo: https://github.com/petermichaux/bootstrap-scheme/blob/v0.21/scheme.c. Sadly his royal scheme effort seems to have stalled. There were promises of a VM, which with his clarity of explanations would have been great.
Peter Norvigs lis.py: http://norvig.com/lispy.html, although written in python, is very understandable and exploits all the advantages of using a dynamic, weakly typed language to create another. He has a follow up article that adds more advanced features.
Anthony C. Hay used lis.py as an inspiration to create an implementation in C++:
http://howtowriteaprogram.blogspot.co.uk/2010/11/lisp-interpreter-in-90-lines-of-c.html
A more complete implementation is chibi scheme: http://synthcode.com/scheme/chibi/ which does include a VM, but the code base is still not too large to comprehend.
I'm still searching for good blog posts on creating a lisp/scheme VM, which could be coupled with JIT (important for any competitive JS implementation :).
Apart from Queinnec's book, which probably is the most comprehensive one in scheme
to C conversion, you can read also literature from the old platform library.readscheme.org.

What good is JSLint if jQuery fails the validation [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
So I have been exploring different methods to clean up and test my JavaScript. I figured just like any other language one way to get better is to read good code. jQuery is very popular so it must have a certain degree of good coding.
So why when I run jQuery through JSLint's validation it gives me this message:
Error:
Problem at line 18 character 5:
Expected an identifier and instead saw
'undefined' (a reserved word).
undefined,
Problem at line 24 character 27:
Missing semicolon.
jQuery = window.jQuery = window.$ =
function( selector, context ) {
Problem at line 24 character 28:
Expected an identifier and instead saw
'='.
jQuery = window.jQuery = window.$ =
function( selector, context ) {
Problem at line 24 character 28:
Stopping, unable to continue. (0%
scanned).
This was done using JSLint and jquery-1.3.1.js
JSLint tests one particular person's (Douglas Crockford) opinions regarding what makes good JavaScript code. Crockford is very good, but some of his opinions are anal retentive at best, like the underscore rule, or the use of the increment/decrement operators.
Many of the issues being tagged by JSLint in the above output are issues that Crockford feels leads to difficult to maintain code, or they are things that he feels has led him to doing 'clever' things in the past that can be hard to maintain.
There are some things Crockford identifies as errors that I agree with though, like the missing semicolons thing. Dropping semicolons forces the browser to guess where to insert the end-of-statement token, and that can sometimes be dangerous (it's always slower). And several of those errors are related to JSLint not expecting or supporting multiple assignments like jQuery does on line 24.
If you've got a question about a JSLint error, e-mail Crockford, he's really good about replying, and with his reply, you'll at least know why JSLint was implemented that way.
Oh, and just because a library is popular doesn't mean it's code is any good. JQuery is popular because it's a relatively fast, easy to use library. That it's well implemented is rather inconsequential to it's popularity among many. However, you should certainly be reading more code, we all should.
JSLint can be very helpful in identifying problems with the code, even if JQuery doesn't pass the standards it desires.
JSLint helps you catch problems, it isn't a test of validity or a replacement for thinking. jQuery is pretty advanced as js goes, which makes such a result understandable. I mean the first couple of lines are speed hacks, no wonder the most rigid js parser is going have a couple of errors.
In any case, the assumption that popular code is perfectly correct code or even 'good' is flawed. JQuery code is good, and you can learn a lot of from reading it. You should still run your stuff through JSLint, if only because it's good to hear another opinion on what you've written.
From JSLint's description:
JSLint takes a JavaScript source and scans it. If it finds a problem, it returns a message describing the problem and an approximate location within the source. The problem is not necessarily a syntax error, although it often is. JSLint looks at some style conventions as well as structural problems. It does not prove that your program is correct. It just provides another set of eyes to help spot problems.
JSLint defines a professional subset of JavaScript, a stricter language than that defined by Edition 3 of the ECMAScript Language Specification. The subset is related to recommendations found in Code Conventions for the JavaScript Programming Language.
"jQuery is very popular so it must have a certain degree of good coding."
One would like to hope this is the case with jQuery, but unfortunately it's not really true. jQuery is useful and popular, but it is not a well written JavaScript library. David Mark recently posted a scathing critique of jQuery in comp.lang.javascript that examines a large number of examples of the poor code found in jQuery:
http://groups.google.com/group/comp.lang.javascript/msg/37cb11852d7ca75c?hl=en&
If you're not actively developing jQuery itself, why even run JSLint over it at all? If it works, it works, and you don't have to worry about it.
The jQuery developers' goals are not the same as your goals. jQuery is built for speed and compactness and achieving those goals trumps readability and maintainability.
Crockford's tests in JSLint have more to do with achieving something that he would feel comfortable taking home to meet his mother, which is a valid concern if you will be married to your code for some time.
The purpose of JsLint is clearly stated in the FAQ [1]:
"JSLint defines a professional subset of JavaScript, a stricter language than that defined by Edition 3 of the ECMAScript Language Specification. The subset is related to recommendations found in Code Conventions for the JavaScript Programming Language."
Now if you are confused: ECMA3 is already a subset of the JS capabilities provided by any of todays JS interpreters (see [2] for an overview of the relation between JavasScript and ECMAScript versions)
To answer the quesition "what good is JSlint":
* use JsLint to verify you are using a "safe" subset of Javascript that is unlikly to break accross JS implementations.
* use Jslint to verify you followed the crockford code conventions [4]
[1] http://www.jslint.com/lint.html
[2] https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/JavaScript_Overview#Relationship_between_JavaScript_Versions_and_ECMAScript_Editions
[3] https://developer.mozilla.org/en/New_in_JavaScript_1.7#Avoiding_temporary_variables
[4] http://javascript.crockford.com/code.html
I've found one case where JSLint is very, very useful: when you grab one of those big-arse libraries that float around the 'Net, then another, then again one other, you soon find yourself loading 50k of Javascript on every new page load (caching may help, but it's not a cure-all solution).
What would you do then? Compress those libraries. But your host doesn't do compression for non-html file! So what? You use a Javascript compressor.
The best I've found is Dean Edward's; I used it to compress John Fraser's Showdown (a Markdown for Javascript library), but unfortunately, the compression broke the code. Since Showdown isn't supported anymore, I had to correct it myself - and JSlint was invaluable for that.
In short, JSlint is useful to prepare your JS code for heavy duty compression.
If you like to daisy-chain methods like jQuery allows you, you might appreciate YUI3.
JQuery is of course not the best thing in the world. That's already clear when you look at the notation. The dollar parentheses combination is really bad for your eyes. Programming should be clear and simple. JQuery is far from that. That reason is enough for me not to use it. That it's not properly written doesn't surprise me and only underscores my thoughts on this JavaScript library.

Categories

Resources