Javascript Markdown Parsing

Javascript Markdown Parsing - javascript

I'm working on a markdown to html parser. I understand this is a big project and there are third party libraries, but none the less I want to roll a simple solution on my own that doesn't have to handle every single aspect of markdown.
So far the process is to take an input (in my case the value of a textarea) and parse it line by line.
var html = [];
var lines = txt.split('\n'); //Convert string to array
//Remove empty lines
for(var index = lines.length-1; index >= 0; index--) {
if(lines[index] == '') lines.splice(index, 1);
}
//Parse line by line
for(var index = 0; index <= lines.length-1; index++) {
var str = lines[index];
if(str.match(/^#[^#]/)) {
//Header
str = str.replace(/#(.*?)$/g, '<h1>$1</h1>');
} else if(str.match(/^##[^#]/)) {
//Header 2
str = str.replace(/##(.*?)$/g, '<h2>$1</h2>');
} else if(str.match(/^###[^#]/)) {
//Header 3
str = str.replace(/###(.*?)$/g, '<h3>$1</h3>');
} else if(str.trim().startsWith('+')) {
//Unordered List
var orig = str;
str = str.replace(/\+(.*?)$/, '<li>$1</li>');
var previous, next;
if(index > 0) previous = lines[index-1];
if(!previous || previous && previous.indexOf('+') < orig.indexOf('+')) {
str = '<ul>' + str;
}
if(index < lines.length-1) next = lines[index+1];
if(!next || next && next.indexOf('+') < orig.indexOf('+')) {
var count = Math.max(0, orig.indexOf('+') / 4);
if(next) count = count - Math.max(0, next.indexOf('+') / 4);
for(var i=1; i<=count; i++) {
str = str + '</ul>';
}
}
if(next && next.trim().indexOf('+') == -1) str = str + '</ul>';
} else if(str.match(/^[0-9a-zA-Z]/)) {
//Paragraph
str = str.replace(/^([0-9a-zA-Z].*?)$/g, '<p>$1</p>');
}
//Inline formatting
str = str.replace(/\*\*(.*?)\*\*/g, '<strong>$1</strong>'); //Bold
str = str.replace(/\_\_(.*?)\_\_/g, '<strong>$1</strong>'); //Another bold
str = str.replace(/\*(.*?)\*/g, '<em>$1</em>'); //Italics
str = str.replace(/\_(.*?)\_/g, '<em>$1</em>'); //Another italics
//Append formatted to return string
html.push(str);
}
Where I run into problems is with nested blocks such as ul. Currently the code looks at a line that starts with a + and wraps it in an li. Great, but these list items never get placed within a ul. I could run through the output again after the line by line and just wrap every group of li's, but that screws me up when I have nested li's that require their own ul.
Any thoughts on how to apply these additional wrapper tags? I've considered using my own special characters around list type elements so I know where to add the wrapper tags, but that breaks traditional markdown. I wouldn't be able to pass the raw markdown to someone other than myself and know they'd understand what was going on.
Edit I updated my code sample to include a working sample. The working sample also supports nested lists.

You need a very simple state machine.
When you encounter the first + you add <ul> and raise a flag.
If you don't see a line that starts with + and your flag is raised, then close the </ul>

Related

Trying to find substring of string that equals "\" in javascript

What I'm trying to do is compare the order of backslashes in 2 texts (one is English text, the other is user inputted translated text).
I first do an if statement to make sure both texts have backslashes and then create separate lists for the two texts in order to compare the order.
NOTE: if sourcetext has a backslash, in dev tools, it does show as "\".
But even though I know there's a backslash at a specific index, my code isn't placing that sourcetext.substring(w+1, 1) into my tokenlist...
Not sure where to fix my bug and/or why it's not recognizing the comparison of the substring and "\".
The output I want from the tokenlist is:
tokenlist = "\\n\\n\\t";
But instead I get nothing put into tokenlist.
Any insight would be appreciated!
var sourcetext = "Example \n\t\n";
var newtext = "Exemple \n\t\n";
if (newtext.includes('\\') && sourcetext.includes('\\')) {
var tokenlist = "";
var tokenlisttran = "";
for (let w = 0; w < sourcetext.length - 1; w++) {
if (sourcetext.substring(w, 1) == "\\") {
tokenlist += "\\" + sourcetext.substring(w + 1, 1);
}
}
for (let i = 0; i < newtext.length - 1; i++) {
if (newtext.substring(i, 1) == "\\") {
tokenlisttran += "\\" + newtext.substring(i + 1, 1);
}
}
}
console.log(tokenlist,tokenlisttran)

Locate node by character position(position relative to innerText)

I have this structure (can be more complex/contain nested tags)
<div id="editor">
<p>Some <i>text</i>
<p>Some other text</p>
</div>
I'm extracting text as is visible on screen with editor.innerText
And I get:
Some text
Some other text
My backend service analyzes this text and gives me highlights with "positions". For the example above let's say it returns (0,9) meaning start is text node containing "Some" and end node is the one containing "text". How can I locate these nodes by given positions?
I've had some success with collecting text with DOM traversal and keeping track of positions but I'm losing newlines and some white space(due to textContent).

The answer to this isn't trivial.
The best solution is to pass the innerHtml to your backend service in order to correctly highlight the text, and it'll need to be able to parse HTML.
However, your other solution is to pass your innerText to the backend, then step through all the characters in the innerHtml and ignore all the characters inside angle brackets.
This will probably require some cleaning up of whitespace, and a bit of HTML mangling, but I'll leave that up to you.
Here's an example of what I mean
let searchHtml = "<p>Some <i>text</i><p>Some other text</p>";
let outputHtml = "";
let highlightOpenTag = "<b>";
let highlightCloseTag = "</b>";
let currentlyHighlighting = false;
// Start and end positions from your backend
let startIndex = 0;
let endIndex = 9;
let inTag = false;
let textIndex = 0;
for (let i = 0; i < searchHtml.length; i++) {
let char = searchHtml[i];
// We don't want to insert highlight tags when we're inside a tag already
if (char === '<') inTag = true;
if (inTag) {
outputHtml += char;
} else {
// If we're not in a tag, but we are within the text positions
// returned from the backend, let's get highlighting
if (textIndex >= startIndex && textIndex < endIndex) {
if (!currentlyHighlighting) {
outputHtml += highlightOpenTag;
currentlyHighlighting = true;
}
}
outputHtml += char;
// If we're about to hit a tag and we're already highlighting then
// insert our end highlight tag
if((searchHtml.length < i+1 || searchHtml[i+1]) === '<' && currentlyHighlighting) {
outputHtml += highlightCloseTag;
currentlyHighlighting = false;
}
textIndex++;
}
if (char === '>') inTag = false;
}
console.log(outputHtml);

Adding style tags around specific characters only

I am trying to add inline styling to only numbers in paragraph elements. For example:
<p>This paragraph has the numbers 1 and 2 in it.</p>
So in this instance, I would want to put <span class="style">1</span>and <span class="style">2</span>. Around the two numbers in that paragraph.
I am trying to write a javascript to accomplish this so I don't have to go back into the document I'm working on and manually add the styling tags around each number, as the document is very long.
So far this is what I wrote, but I'm having difficulty figuring out what to do for the next step on how to incorporate the edits back into the paragraph HTML.
let regEx=/[0-9]/g;
let list = [];
let paragraphs = document.getElementsByTagName("p");
for (var i = 0; i < paragraphs.length; i++) {
let html = paragraphs[i].innerHTML;
list.push(html);
}
// all paragraphs into one string.
let joined = list.join(' ');
// all the numbers in the paragraphs stored in array
let numbers = joined.match(regEx);
// define array for styling edits
let edits = [];
// adding the styling tags to each num
numbers.forEach(function(num){
edits.push('<span class="style">' + num + '</span>');
// outputs ["<span class='style'>3</span>", "<span class='style'>7</span>", "<span class='style'>4</span>", "<span class='style'>5</span>"]
});
// need to insert edits into paragraph html
If anyone can offer any suggestions on how I might be able to accomplish this that would be great, I am still relatively new to working with JS.

const paragraphs = document.getElementsByTagName("p");
for (var i = 0; i < paragraphs.length; i++) {
const regEx=/([0-9])/g;
const newHtml = paragraphs[i].innerHTML.replace(regEx, '<span class="style">$1</span>');
paragraphs[i].innerHTML = newHtml;
}
I updated your regex to put the number in a group, then in the string replace you can reference that group, since there is only one it will be $1. As you can see in the replace we are wrapping that with the appropriate span and then plugging it right back into the innerHTML.
I did notice that your regex is only capturing single digit numbers, if you wanted to capture multi-digit numbers, you could update your reg ex like this: /([0-9]+)/g.
I created a simple jsfiddle to show you how it works: https://jsfiddle.net/andyorahoske/dd6k6ekp/35/

I broke out the most fundamental part of this into a reusable function that you may find helpful in other contexts.
/**
* Wraps numbers in a string with any provided wrapper.
* #param {String} str A string containing numbers to be wrapped.
* #param {String} wrapper A string with placeholder %s to define the wrapper. Example - <pre>%s</pre>
* #return {String} The original string with numbers wrapped using the wrapper param.
*/
function wrapNumbers(str, wrapper) {
var numbersInStr = str.match(/\d+/g) || [];
var chunks = [];
var segmentStart = 0;
for(var i = 0; i < numbersInStr.length; i += 1) {
var number = numbersInStr[i];
var indexOfNumber = str.indexOf(number);
var fWrapper = wrapper.replace('%s', number);
chunks.push(str.slice(segmentStart, indexOfNumber));
chunks.push(fWrapper);
segmentStart = indexOfNumber + number.length;
}
if(segmentStart < str.length) {
chunks.push(str.slice(segmentStart, str.length));
}
return chunks.join('');
}
To use this in your use case it might look like the following:
var paragraphs = document.getElementsByTagName("p");
var wrapper = '<span class="style">%s</span>';
for(var i = 0; i < paragraphs.length; i += 1) {
var paragraph = paragraphs[i];
paragraph.innerHTML = wrapNumbers(paragraph.innerHTML, wrapper);
}
Codepen: https://codepen.io/bryceewatson/pen/vRqeVy?editors=1111

Here's a new code https://jsfiddle.net/fazanaka/au4jufrr/1/
var element = document.getElementById('text'),
text = element.innerText,
wordsArray = text.split(' '),
newString;
for(var i = 0; i < wordsArray.length; i++){
if(!isNaN(parseFloat(wordsArray[i])) && isFinite(wordsArray[i])){
wordsArray[i] = "<span class='style'>" + wordsArray[i] + "</span>";
}
}
newString = wordsArray.join(' ');
element.innerHTML = newString;
I hope it helps you
UPD:
For all paragraphs https://jsfiddle.net/fazanaka/qx2ehym4/

Pure Javascript - Find/Replace a list of words in DOM

I have a list of words and I want to replace them in DOM.
The replace is working but it has an issue. Before and after the replaced text, it puts an undesired slash bar (i.e. reduced to a brigade > reduced/ OTHER WORD 1 /a brigade)
Also, I need to wrap the replaced word with an html tag, like .
(i.e. reduced to a brigade > reduced OTHER WORD 1 a brigade)
You can see the code here:
https://jsfiddle.net/realgrillo/9uL6ofge/
var elements = document.getElementsByTagName('*');
var words = [
[/(^|\s)([Tt]o)(\s|$)/ig, 'OTHER WORD 1'],
[/(^|\s)([Aa]nd)(\s|$)/ig, 'OTHER WORD 2']
];
for (var i = 0; i < elements.length; i++) {
var element = elements[i];
for (var j = 0; j < element.childNodes.length; j++) {
var node = element.childNodes[j];
if (node.nodeType === 3) {
var text = node.data;
for (var k = 0; k < words.length; k++)
{
var from = new RegExp(words[k][0]);
var to = new RegExp('$1' + words[k][1] + '$3');
var replacedText = text.replace(from, to);
//console.log("from: " + from);
//console.log("to: " + to);
//console.log("toDo: " + toDo);
if (replacedText !== text)
{
element.innerHTML = element.textContent.replace(from, to);
console.log("text: " + text);
console.log("replacedText: " + replacedText);
}
}
/*
//
//!!!!THIS CODE FOR 1 WORD WORKS!!!! I need to do this for the list of words.
//
var replacedText = text.replace(/(^|\s)([Tt]o)(\s|$)/ig, '$1OTHER WORD$3');
if (replacedText !== text) {
element.innerHTML = element.textContent.replace(/(^|\s)([Tt]o)(\s|$)/ig, '$1<b class=modified>OTHER WORD</b>$3');
}
*/
}
}
}
Masters of JS, can you help me? Pure javascript, not jQuery please.
Thanks.

I can give you some much more simple function to use. First define a function for replacing words like this:
String.prototype.replaceAll = function (search, replacement) {
try {
var target = this;
return target.split(search).join(replacement);
} catch (e) {
return "";
}
};
That's why the pure javascript replace function only replace one time in a whole case. After that you can use it in your code like here:
var replacedText = text.replaceAll(/(^|\s)([Tt]o)(\s|$)/ig, '$1OTHER WORD$3');

You don't need to parse words[k][0] into a RegExp as it is already one, even though it won't break because the constructor will handle a RegExp well.
Also for the words[k][1], you don't need to parse it into a RegExp either because the second parameter of String.prototype.replace is expected to be a plain string. The slashes are there because you cast your RegExp to a string (and will end up output).
So this is a change that will fix it:
var to = '$1' + words[k][1] + '$3';
And this is optional but probably a good practice:
var from = words[k][0];
For getting an html tag, you can either change it directly in each words[k][1] or in the to var declaration, depending on what you want.
Extra:
In the for loop, the code assumes that there are always at least three groups (because of the '$3' reference), an alternative could be to delegate this replacement directly in words[k][1], but it is up to you to decide.

Get the next key from array from string with symbols

I'm working on a simple but difficult problem for me right now, I'm use to work in jQuery but need this to be done in Javascript.
So simple as it is, the user inputs a string lets say:
"hey, wanna hang today?". It should output the next character in my array, so it would be like this: "ifz, xboob iboh upebz?".
And I have tried everything I can come up with. Hopefully some of you guys see the problem right away.
I have set up a short jsFiddle that shows similar to what I got.
function gen() {
var str = document.getElementById('str').value,
output = document.getElementById('output');
var alph = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','æ','ø','å','a'];
for (var i=0;i<str.length;i++) {
var index = str[i].charAt(0),
e = alph.indexOf(index);
console.log(alph[e + 1]);
output.innerHTML += alph[e + 1];
}
}

If you only want to skip to next letter with those chars and leave the others like space and ? as they are:
var index = str[i].charAt(0),
e = alph.indexOf(index);
if(e == -1){
output.innerHTML += index;
}else{
output.innerHTML += alph[e + 1];
}
Update: using #David Thomas method, you could do the following: (wouldnt work for 'å' though)
var index= str[i].toLowerCase().charCodeAt(0);
if((index > 96 && index < 123)){ // a to z
output.innerHTML += String.fromCharCode(str[i].charCodeAt(0)+1);
}else{
output.innerHTML += str[i];
}
}

I'd personally recommend the following approach, which should work with any alphabet for which there's a Unicode representation and, somewhat importantly, doesn't require a hard-coded array of letters/punctuation for each language:
function gen() {
var str = document.getElementById('str').value,
strTo = '',
output = document.getElementById('output');
for (var i = 0; i < str.length; i++) {
strTo += String.fromCharCode(str[i].charCodeAt(0) + 1);
}
output.textContent = strTo;
}
// hey, wanna hang today? -> ifz-!xboob!iboh!upebz#
JS Fiddle demo.
References:
String.prototype.charCodeAt().
String.prototype.fromCharCode().

Why does gen(',') === 'a'?
var alph = 'abcdefghijklmnopqrstuvwxyz';
var e = alph.indexOf(',');
console.log(e);
// -1
console.log(alph[e + 1]);
// 'a'
You need to take this case into account; otherwise, any characters that aren't in alph will map to 'a'.
(I see that you've also duplicated 'a' at the start and end of alph. This works, though it's more common either to use the modulus operator % or to check explicitly if e === alph.length - 1.)

You just have to add an array with the non respected characters:
var ex = ['?','!',' ','%','$','&','/']
In whole
for (var i=0;i<str.length;i++) {
var index = str[i].charAt(0)
if (alph.indexOf(index) >-1) {
var e = alph.indexOf(index);
output.innerHTML += alph[e + 1];
} else {
var e = index;
output.innerHTML += e;
}
}
JSFIDDLE: http://jsfiddle.net/TRNCFRMCN/hs15f0kd/8/.

Develop Reference

JavaScript is the programming language of the Web.

Javascript Markdown Parsing - javascript

You need a very simple state machine. When you encounter the first + you add <ul> and raise a flag. If you don't see a line that starts with + and your flag is raised, then close the </ul>

Related

Trying to find substring of string that equals "\" in javascript

Locate node by character position(position relative to innerText)

Adding style tags around specific characters only

Pure Javascript - Find/Replace a list of words in DOM

Get the next key from array from string with symbols

Categories

Resources