Converting markdown to HTML with JavaScript - restricting sppported syntax

Converting markdown to HTML with JavaScript - restricting sppported syntax - javascript

I am using marked.js currently to convert markdown to HTML, so the users of my Web-App can create a structured content. I am wondering if there is a way to restrict the supported syntax tu just an sub-set, like
headers
italic text
bold text
lists with only 1 depth of indentation
quotes
I would like to prohibit conversion of list with multiple levels of indentation, code blocks, headers in lists ...
The reason is, that my WebApp should the users to create content in a specific way and if there will be possibility create some crazy structured content (list of headers, code in headers, lists of images ...) someone will for sure do it.

You have a few difference options:
Marked.js uses a multi-step method to parse Markdown. It uses a lexer, which breaks the document up into tokens, a parser to convert those tokens to a abstract syntax tree (AST) and a renderer to convert the AST to HTML. You can override any of those pieces to alter the handling of various parts of the syntax.
For example, if you simply wanted to ignore lists and leave them out of the rendered HTML, replace the list function from the renderer with one which returns an empty string.
Or, if you want the parser to act as if lists are not even a supported feature of Markdown, you could remove the list and listitem methods from the parser. In that case, the list would remain in the output, but would be treated as a paragraph instead.
Or, if you want to support one level of lists, but not nested lists, then you could replace the list and/or listitem methods in the parser with your own implementation that parses lists as you desire.
Note that there are also a number advanced options, which use the above methods to alter the parser and/or render in various ways. For the most part, those options would not provide the features you are asking for, but browsing though the source code might give you some ideas of how to implement your own modifications.
However, there is the sanitize option, which will accept a sanitizer function. You could provide your own sanitizer which removed any unwanted elements from the HTML output. This would result in a similar end result to overriding the renderer, but would be implemented differently. Depending on what you want to accomplish, one or the other may be more effective.

Another possibility would be to use Commonmark.js, parse the input ant than walk the parsed tree and remove all nodes with/without specific type. See this example, it worked fine for images, but failed for code blocks.
Downside of this approach is, that the parsed markdown source will be traversed two-times: one time for editing and second time for rendering.

Related

How to handle sanitizing in JavaScript editors that allow formatting

Many editors like Medium offers formatting now. From what I see in the DOM it simply adds HTML. But how do you sanitize this kind of input without losing the formatting applied by the user?
E.g. clicking bold adds:
<strong class="markup--strong markup--p-strong">text</strong>
but you wouldn't want to render if the user enters that by themselves. So how's that different? Also would that be different if you would style with markdown but also don't let users enter their own markdown but make it only accessible through the browser?
One way I could think of is, escaping every HTML special character, but that seems odd. As far as I know you sanitizer the content only when outputting it

You shold use a server side sanitizer, as stated by Vipin as client side validation is prone to be tampered.
OWASP (Open Web Application Security Project) has some guides and sanitizers that you may use like the java-html-sanitizer.
For a generic brief on the concept please read this https://www.owasp.org/index.php/Data_Validation under the section Sanitize.

You could replace the white-listed elements with other character, for example:
<strong.*> becomes |strong|
Then you remove ALL other HTML. Be aware of onmouseover="alert(1)" so keep it really simple.
Also be careful when rendering the user input. Don't just add it as code. Instead parse it and create the elements using JavaScript. Never use innerHTML, but do use .innerText and document.createElement().

Convert HTML to markdown using pagedown?

I have successfully setup pagedown on a site I am using, but I have run into an issue when trying to edit HTML that has already been created. I would like to take a HTML chunk that was created using pagedown, convert it back to markdown and place it in the editor.
I looked around but didn't see this covered in the documentation. I took a look in the Markdown.Converter.js file to see if there was a makeMarkdown function to match the makeHTML function but I didn't see anything.
How do I go about converting HTML back to markdown for editing?

As far as I know, no, there is no existing solution that will convert html to markdown. There are a few problems that would need to be solved before that can be done, for example, representing floats, text alignment, font sizes, etc in markdown. That leaves you with two options:
Store the markdown in the database, then convert the markdown to html on the fly. This has the advantage of being able to easily edit the text and reduces the amount of data you're storing in the database.
the second option is to store both the markdown and the html in the database. This uses more disk space, however will result in less resources being used to retrieve the html because you no longer have to convert markdown to html on the fly.
Both options are viable, each with their own advantages. I usually use the first option so that i don't have duplicate data in the database, but the second option is likely easier to use because the display-system that displays the content won't be required to have a markdown processor, instead it just pulls the generated html directly from the database.
I'll likely move to the second option instead in future projects because it makes the data more portable. If you were to access the database in a different server-language, you wouldn't need a markdown processor written in that language to get the html.

What javascript library/template engine to use in this case?

I have to make a invite your facebook friends module which fetches the names, photos of your friends and allows you to message 'em. I need this to look like a integral part of my website so I have to style it. I fetch the json with friends' ids, names etc. and want to put those values in certain html tags and attributes. How do I apporach this? I can make it in jQuery but want to avoid jQuery spaghetti code with ragu of strings and vars. What lib/template engine do you recommend me? Ease of use and weight are the most important things. The website has jQuery already included.

I can make it in jQuery but want to avoid jQuery spaghetti code with ragu of strings and vars. What lib/template engine do you recommend me?
I’d suggest to use no big additional lib or template engine – I’d just keep using jQuery, and embed one of the sprintf for jQuery implementations that are floating around the net.
So you can define your “HTML template” for your output in one location as one string, and than replace placeholders in that string with variable values while your looping over the data in jQuery.
If you don’t like any of the sprintf-Jquery-plugins out there, here is another very simple and short function that implements just the basic string placeholder %s (but more than that you most likely won’t need anyway): http://www.nczonline.net/blog/2011/10/11/simple-maintainable-templating-with-javascript/.
(And if you have to insert values in multiple places of your template string, than have a look at my comment on the bottom of that page, where if have proposed a simple adjustment to Nicolas’ function, that implements the “argument swapping” feature of PHP’s sprintf, so that you have to pass values to the function only once, but can use them in multiple places in your template string.)

The easiest way to do this is to use the Requests dialog. The first thing you need to do is create an app. Once you have this you should be able to use the JavaScript example on the request dialog page.

Where to store element specific extra data in a web page; data- or javascript

I have some data that I need to associate with specific element such as an individual table row. This data contains information such as the current state, and a unique identifier that correlates to an SQL row. When the user interacts with the element I want to read out the unique identifier, and with that identifier, issue an AJAX request to let the user change the state of that element.
After researching, it seems that there are two camps as to how to embed this element specific information.
1) Using a data- attribute in the html5. My understanding is this will work in modern web browsers as well as older browsers that don't support html5. But while this works, it does not following the HMTL spec ( less than HTML5) and so it won't validate if you run it through a HTML syntax checker.
2) Store the additional data into a javascript array, object etc. The extra work with this is now you need a way to correlate the javascript data to the html element.
What are the pros and cons of using these two different approaches to storing the data? And what approach would you recommend?
Thanks!

I wouldn't worry about the data- attributes not passing validators. The attribute is in HTML5 because people were using similar, non-standardized, attributes for a long time specifically to solve this problem. Go ahead and start writing "HTML5" by using the parts of the spec that work, i.e. don't break in a certain browser, and using the HTML5 doctype. The W3C validator at least already supports the doctype.
As for which method to use, I think it really boils down to when you want to parse the information in the JavaScript interpreter: on page load or when the data is needed. I imagine it depends on just how much information you have as to which will be most efficient. But you can't go wrong with adding it to the HTML with a data- attribute or two.
Personally, I like adding the information to the HTML with data- attributes. In the scenario you describe, I would use data-state and data-rid (or similar) so that I don't have to further parse the information (it sounds like you are thinking of putting two bits of information in one data- attribute). This way, your table of information is truly complete: the data is presented to the user and the markup contains further information that could be necessary to a parser.

I'd definitely go with option (1) and either not worry about those attributes not validating or just validate your document as html5. It's simple. It works.
The "separation of concerns" theory that leads some people to option (2) is nonsensical for this sort of situation because if you put the data into a JS object of some kind you still need a way to tie it back to the actual html elements so then not only are the two not really separate, they're more complicated than they need to be on the client side, and the server side code needed to produce it in the first place is more complicated.

In option (2), no special correlation is needed any more than in option (1)—rather, less. You can put the data into a property of the DOM object that corresponds to the element. Why not make use of the basic feature of JavaScript that you can add properties to an object?

You've tagged your question with jQuery, so I'll assume you have it. You can use the .data() method to store arbitrary data and associate it with an element.
$("tr").first().data('sqlId', 1234);
assert($("tr").first().data('sqlId') === 1234);

How to create a tree representation of generic JSON data in HTML?

I have some data stored in JSON format. I would like to be able to display it in a browser as a dynamic tree structure in a similar way MongoVUE presents the mondodb documents:
Screenshot
I have found a very nice jquery plugin, called jsTree. Unfortunately, in order to process JSON documents it requires data to have a very specific verbose (and redundant, in my opinion) structure: link. Using it means significant modifications of my json documents. I am rather searching for a tool that is able to build the tree automagically, without making severe manual adjustments to the data, yet allowing me to potentially apply some modifications to the view, if I would need to.
The tool at json.bloople.net makes something similar using a table, but because I have several levels of nested documents, the output looks very bloated. Moreover, the structure is not dynamically collapsible.
I would appreciate any hints regarding the right tools to do the job, including both those that might require (automated!) pre-processing of JSON data in Java/Groovy or pure JavaScript-based solution.

This is just a simple example of how you could output a tree like JSON structure in html. http://jsfiddle.net/K2ZQQ/1/ (see here for browser support for white-space). Note that the second parameter to JSON.stringify is a replacer function:
From http://msdn.microsoft.com/en-us/library/ie/cc836459(v=vs.94).aspx
If replacer is a function, JSON.stringify calls the function, passing
in the key and value of each member. The return value is used instead
of the original value. If the function returns undefined, the member
is excluded. The key for the root object is an empty string: "".
So if you need to add any further modifications to the dislpay of your JSON tree the replacer function may be of help.

Develop Reference

JavaScript is the programming language of the Web.

Converting markdown to HTML with JavaScript - restricting sppported syntax - javascript

Related

How to handle sanitizing in JavaScript editors that allow formatting

Convert HTML to markdown using pagedown?

What javascript library/template engine to use in this case?

Where to store element specific extra data in a web page; data- or javascript

How to create a tree representation of generic JSON data in HTML?

Categories

Resources