How can I convert a piece of markdown text to HTML using any JS library like markdown-js or marked without enclosing it in paragraph tag?
For example I like to convert this *italic* text to this <i>italic</i> text without enclosing it in <p></p>.
Edit:
- My question is not how to remove p tags from output after conversion, my question is how to ask the library not to enclose output in p tags.
- markdown-js and marked enclose output inside <p></p> by default.
The marked library allows you to define your own renderer, which allows you to define the output for paragraphs.
You can pass in your own renderer by using:
marked.setOptions({
renderer: new MyRenderer(),
});
var output = marked('This **text** will be rendered with MyRenderer');
This will require you to define methods for blockquote, html, paragraph and all the other methods that the default marked.Renderer defines.
Here is an example:
function MyRenderer(options) {
this.options = options || {};
}
MyRenderer.prototype.paragraph = function(text) {
return 'This comes from my own renderer: ' + text + '\n';
};
However, this requires some efforts, so the quickest way to get rid of the paragraphs (<p> tags) is to change the code of the existing Renderer in the marked.js file:
Replace:
Renderer.prototype.paragraph = function(text) {
return '<p>' + text + '</p>\n';
};
With:
Renderer.prototype.paragraph = function(text) {
return text + '\n';
};
markdown-it has md.renderInline() method which allows to do that.
I got around this by using a regex on the result:
rawMarkup.replace(/^(?:<p>)?(.*?)(?:<\/p>)?$/, "$1")
This will however only work on simple cases, and will fail in cases where there are two adjacent paragraphs, and possibly others.
After some testing I realized that there is a good reason why there are paragraph tags around the text, and that my implementation will have to adapt.
I'm way late to the game on this question, but I had the exact same problem last night, and I opened up marked in order to build a solution to this issue.
I need to be able (for my i18n translation CMS) to embed tags inside any kind of tag, like headers, and have them render from Markdown without the enclosing <p> tag. So this fix makes it so that if there is only one line of text in my source (which for me is JSON), then it will not wrap in <p> tags. But if there is more, it will wrap them all in tags as normal.
Here is my pull request for the change
https://github.com/chjj/marked/pull/841
I highly doubt it will be included, but I am using it in my project and it works wonderfully.
Here's what worked for me (many years later). It assumes that you know what elements you don't want to wrap in paragraph tags.
const marked = require("marked");
// Every new line in markdown is considered a new paragraph, this prevents
// img tags from being wrapped in <p> tags which is helpful for resizing img's,
// centering captions, etc.
marked.Renderer.prototype.paragraph = (text) => {
if (text.startsWith("<img")) {
return text + "\n";
}
return "<p>" + text + "</p>";
};
html = marked(markdownFile.body)
Related
I used Google Tag Manager to create a custom data level variable to get the content of an ajax form. The result is in the attributes.response that looks like:
response:"{\"current_url\":\"https:\\/\\/domainname.com\\/ +
"manufacturer\\/category\\/model-number\\/\",\"h" +
"tml"\":{\"cart_status_528\":\"\\n <div id=\\\"s" +
...
"<a href=\\\"https:\\/\\/domainname.com\\/manufacturer" +
"-name\\/long-store-category-name\\/model-number-x\\/\\" +
"\" class=\\\"ty-product-notification__product-name\\\"" +
">PRODUCT-NAME THAT I WANT<\\/a>\\n " +
...
" <p><\\n more escaped html content +
}"
I am trying to extract/parse the attribute.response to retrieve the PRODUCT-NAME text. I have tried the following which matches in regexr. But, GTM keeps complaining there is an error in my javascript at the double quote symbol. What am I missing? Or is there a cleaner way to retrieve the text? Thanks
function() {
var regex = (?<=product-name(.|\n)*">)(.*)(?=<\\\\\/a);
var attributesResponse = {{attributes.response}};
if(regex.test{{attributesResponse}}
var ProductAddedToCart = regex.exec(attributesResponse)[1];
return ProductAddedToCart;
}
return false;
}
First of all, please read the top answer here: RegEx match open tags except XHTML self-contained tags
Secondly, your JS has many problems. Even the SO code highlighter indicates it. See some examples of how regex is used in JS.
The proper way to solve your task, however, would be adding a dataLayer push with the proper response details neatly stored in a dataLayer object. You would normally ask your front-end developers to add a push in their response callback. It should be trivial for them to tackle. You can read more on DL here.
I want to make to uppercase the contents of specific HTML tags with plain JavaScript in a React-Native application.
Note: This is a React-Native application. There is no JS document, available, nor jQuery. Likewise, CSS text-transform: uppercase cannot be used because it will not be displayed in a web browser.
Let's say, there is the following HTML text:
<p>This is an <mytag>simple Example</mytag></p>
The content of the Tag <mytag> shall be transformed to uppercase:
<p>This is an <mytag>SIMPLE EXAMPLE</mytag></p>
I tried this code:
let regEx = storyText.match(/<mytag>(.*?)<\/mytag>/g)
if(regEx) storyText = regEx.map(function(val){
return val.toUpperCase();
});
But the map() function returns only the matched content instead of the whole string variable with the transformed part of <mytag>.
Also, the match() method will return null, if the tag wasn't found. So a fluent programming style like storyText.match().doSomething isn't possible.
Since there are more tags to transform, an approach where I can pass variables to the regex-pattern would be appreciated.
Any hints to solve this?
(This code is used in a React-Native-App with the react-native-html-view Plugin which doesn't support text-transform out of the box.)
Since it seems that document and DOM manipulation (e.g., i.e., through jQuery and native JS document functions) are off limits, I guess you do have to use regex.
Then why not just create a function that does a job like the above: looping through each tag and replacing it via regex?
var storyText = "your HTML in a string";
function tagsToUppercase(tags) {
for(tag in tags) {
let regex = new RegExp("(<" + tags[tag] + ">)([^<]+)(<\/" + tags[tag] + ">)", "g");
storyText = storyText.replace(regex, function(match, g1, g2, g3) {
return g1 + g2.toUpperCase() + g3;
});
}
}
// uppercase all <div>, <p>, <span> for example
tagsToUppercase(["div", "p", "span"]);
See it working on JSFiddle.
Also, although it probably doesn't apply to this case, (#Bergi urged me to remind you to) try to avoid using regular expressions to manipulate the DOM.
Edit, Updated
The content of the Tag < mytag > shall be transformed to uppercase:
<p>This is an <mytag>SIMPLE EXAMPLE</mytag></p>
You can use String.prototype.replace() with RegExp /(<mytag>)(.*?)(<\/mytag>)/g to create three capture groups, call .toUpperCase() on second capture group
let storyText = "<p>This is an <mytag>simple Example</mytag></p>";
let regEx = storyText.replace(/(<mytag>)(.*?)(<\/mytag>)/g
, function(val, p1, p2, p3) {
return p1 + p2.toUpperCase() + p3
});
console.log(regEx);
In general you shouldn't be parsing html with javascript. With that in mind, if this is what you truly need to do, then try something like this:
let story = '<p>smallcaps</p><h1>heading</h1><div>div</div><p>stuff</p>';
console.log( story.replace(/<(p|span|div)>([^<]*)<\/(p|span|div)>/ig,
(fullmatch, startag,content,endtag) => `<${startag}>${content.toUpperCase()}</${endtag}>` )
)
Consider the cases where you might have nested values, p inside a div, or an a or strong or em inside your p. For those cases this doesn't work.
Why not this way ?
$("mytag").text($("mytag").text().toUpperCase())
https://jsfiddle.net/gub61haL/
I realize that there are several similar questions here but none of the answers solve my case.
I need to be able to take the innerHTML of an element and truncate it to a given character length with the text contents of any inner HTML element taken into account and all HTML tags preserved.
I have found several answers that cover this portion of the question fine as well as several plugins which all do exactly this.
However, in all cases the solution will truncate directly in the middle of any inner elements and then close the tag.
In my case I need the contents of all inner tags to remain intact, essentially allowing any "would be" truncated inner tags to exceed the given character limit.
Any help would be greatly appreciated.
EDIT:
For example:
This is an example of a link inside another element
The above is 51 characters long including spaces. If I wanted to truncate this to 23 characters, we would have to shorten the text inside the </a> tag. Which is exactly what most solutions out there do.
This would give me the following:
This is an example of a
However, for my use case I need to keep any remaining visible tags completely intact and not truncated in any way.
So given the above example, the final output I would like, when attempting to truncate to 23 characters is the following:
This is an example of a link
So essentially we are checking where the truncation takes place. If it is outside of an element we can split the HTML string to exactly that length. If on the other hand it is inside an element, we move to the closing tag of that element, repeating for any parent elements until we get back to the root string and split it there instead.
It sounds like you'd like to be able to truncate the length of your HTML string as a text string, for example consider the following HTML:
'<b>foo</b> bar'
In this case the HTML is 14 characters in length and the text is 7. You would like to be able to truncate it to X text characters (for example 2) so that the new HTML is now:
'<b>fo</b>'
Disclosure: My answer uses a library I developed.
You could use the HTMLString library - Docs : GitHub.
The library makes this task pretty simple. To truncate the HTML as we've outlined above (e.g to 2 text characters) using HTMLString you'd use the following code:
var myString = new HTMLString.String('<b>foo</b> bar');
var truncatedString = myString.slice(0, 2);
console.log(truncatedString.html());
EDIT: After additional information from the OP.
The following truncate function truncates to the last full tag and caters for nested tags.
function truncate(str, len) {
// Convert the string to a HTMLString
var htmlStr = new HTMLString.String(str);
// Check the string needs truncating
if (htmlStr.length() <= len) {
return str;
}
// Find the closing tag for the character we are truncating to
var tags = htmlStr.characters[len - 1].tags();
var closingTag = tags[tags.length - 1];
// Find the last character to contain this tag
for (var index = len; index < htmlStr.length(); index++) {
if (!htmlStr.characters[index].hasTags(closingTag)) {
break;
}
}
return htmlStr.slice(0, index);
}
var myString = 'This is an <b>example ' +
'of a link ' +
'inside</b> another element';
console.log(truncate(myString, 23).html());
console.log(truncate(myString, 18).html());
This will output:
This is an <b>example of a link</b>
This is an <b>example of a link inside</b>
Although HTML is notorious for being terribly formed and has edge cases which are impervious to regex, here is a super light way you could hackily handle HTML with nested tags in vanilla JS.
(function(s, approxNumChars) {
var taggish = /<[^>]+>/g;
var s = s.slice(0, approxNumChars); // ignores tag lengths for solution brevity
s = s.replace(/<[^>]*$/, ''); // rm any trailing partial tags
tags = s.match(taggish);
// find out which tags are unmatched
var openTagsSeen = [];
for (tag_i in tags) {
var tag = tags[tag_i];
if (tag.match(/<[^>]+>/) !== null) {
openTagsSeen.push(tag);
}
else {
// quick version that assumes your HTML is correctly formatted (alas) -- else we would have to check the content inside for matches and loop through the opentags
openTagsSeen.pop();
}
}
// reverse and close unmatched tags
openTagsSeen.reverse();
for (tag_i in openTagsSeen) {
s += ('<\\' + openTagsSeen[tag_i].match(/\w+/)[0] + '>');
}
return s + '...';
})
In a nutshell: truncate it (ignores that some chars will be invisible), regex match the tags, push open tags onto a stack, and pop off the stack as you encounter closing tags (again, assumes well-formed); then close any still-open tags at the end.
(If you want to actually get a certain number of visible characters, you can keep a running counter of how many non-tag chars you've seen so far, and stop the truncation when you fill your quota.)
DISCLAIMER: You shouldn't use this as a production solution, but if you want a super light, personal, hacky solution, this will get basic well-formed HTML.
Since it's blind and lexical, this solution misses a lot of edge cases, including tags that should not be closed, like <img>, but you can hardcode those edge cases or, you know, include a lib for a real HTML parser if you want. Fortunately, since HTML is poorly formed, you won't see it ;)
You've tagged your question regex, but you cannot reliably do this with regular expressions. Obligatory link. So innerHTML is out.
If you're really talking characters, I don't see a way to do it other than to loop through the nodes within the element, recursing into descendant elements, totalling up the lengths of the text nodes you find as you go. When you find the point where you need to truncate, you truncate that text node and then remove all following ones — or probably better, you split that text node into two parts (using splitText) and move the second half into a display: none span (using insertBefore), and then move all subsequent text nodes into display: none spans. (This makes it much easier to undo it.)
Thanks to T.J. Crowder I soon came to the realization that the only way to do this with any kind of efficiency is to use the native DOM methods and iterate through the elements.
I've knocked up a quick, reasonably elegant function which does the trick.
function truncate(rootNode, max){
//Text method for cross browser compatibility
var text = ('innerText' in rootNode)? 'innerText' : 'textContent';
//If total length of characters is less that the limit, short circuit
if(rootNode[text].length <= max){ return; }
var cloneNode = rootNode.cloneNode(true),
currentNode = cloneNode,
//Create DOM iterator to loop only through text nodes
ni = document.createNodeIterator(currentNode, NodeFilter.SHOW_TEXT),
frag = document.createDocumentFragment(),
len = 0;
//loop through text nodes
while (currentNode = ni.nextNode()) {
//if nodes parent is the rootNode, then we are okay to truncate
if (currentNode.parentNode === cloneNode) {
//if we are in the root textNode and the character length exceeds the maximum, truncate the text, add to the fragment and break out of the loop
if (len + currentNode[text].length > max){
currentNode[text] = currentNode[text].substring(0, max - len);
frag.appendChild(currentNode);
break;
}
else{
frag.appendChild(currentNode);
}
}
//If not, simply add the node to the fragment
else{
frag.appendChild(currentNode.parentNode);
}
//Track current character length
len += currentNode[text].length;
}
rootNode.innerHTML = '';
rootNode.appendChild(frag);
}
This could probably be improved, but from my initial testing it is very quick, probably due to using the native DOM methods and it appears to do the job perfectly for me. I hope this helps anyone else with similar requirements.
DISCLAIMER: The above code will only deal with one level deep HTML tags, it will not deal with tags inside tags. Though it could easily be modified to do so by keeping track of the nodes parent and appending the nodes to the correct place in the fragment. As it stands, this is fine for my requirements but may not be useful to others.
I want to remove html tags from given string using javascript. I looked into current approaches but there are some unsolved problems occured with them.
Current solutions
(1) Using javascript, creating virtual div tag and get the text
function remove_tags(html)
{
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent||tmp.innerText;
}
(2) Using regex
function remove_tags(html)
{
return html.replace(/<(?:.|\n)*?>/gm, '');
}
(3) Using JQuery
function remove_tags(html)
{
return jQuery(html).text();
}
These three solutions are working correctly, but if the string is like this
<div> hello <hi all !> </div>
stripped string is like
hello . But I need only remove html tags only. like hello <hi all !>
Edited: Background is, I want to remove all the user input html tags for a particular text area. But I want to allow users to enter <hi all> kind of text. In current approach, its remove any content which include within <>.
Using a regex might not be a problem if you consider a different approach. For instance, looking for all tags, and then checking to see if the tag name matches a list of defined, valid HTML tag names:
var protos = document.body.constructor === window.HTMLBodyElement;
validHTMLTags =/^(?:a|abbr|acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|bgsound|big|blink|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|data|datalist|dd|del|details|dfn|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frame|frameset|h1|h2|h3|h4|h5|h6|head|header|hgroup|hr|html|i|iframe|img|input|ins|isindex|kbd|keygen|label|legend|li|link|listing|main|map|mark|marquee|menu|menuitem|meta|meter|nav|nobr|noframes|noscript|object|ol|optgroup|option|output|p|param|plaintext|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|spacer|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr|xmp)$/i;
function sanitize(txt) {
var // This regex normalises anything between quotes
normaliseQuotes = /=(["'])(?=[^\1]*[<>])[^\1]*\1/g,
normaliseFn = function ($0, q, sym) {
return $0.replace(/</g, '<').replace(/>/g, '>');
},
replaceInvalid = function ($0, tag, off, txt) {
var
// Is it a valid tag?
invalidTag = protos &&
document.createElement(tag) instanceof HTMLUnknownElement
|| !validHTMLTags.test(tag),
// Is the tag complete?
isComplete = txt.slice(off+1).search(/^[^<]+>/) > -1;
return invalidTag || !isComplete ? '<' + tag : $0;
};
txt = txt.replace(normaliseQuotes, normaliseFn)
.replace(/<(\w+)/g, replaceInvalid);
var tmp = document.createElement("DIV");
tmp.innerHTML = txt;
return "textContent" in tmp ? tmp.textContent : tmp.innerHTML;
}
Working Demo: http://jsfiddle.net/m9vZg/3/
This works because browsers parse '>' as text if it isn't part of a matching '<' opening tag. It doesn't suffer the same problems as trying to parse HTML tags using a regular expression, because you're only looking for the opening delimiter and the tag name, everything else is irrelevant.
It's also future proof: the WebIDL specification tells vendors how to implement prototypes for HTML elements, so we try and create a HTML element from the current matching tag. If the element is an instance of HTMLUnknownElement, we know that it's not a valid HTML tag. The validHTMLTags regular expression defines a list of HTML tags for older browsers, such as IE 6 and 7, that do not implement these prototypes.
If you want to keep invalid markup untouched, regular expressions is your best bet. Something like this might work:
text = html.replace(/<\/?(span|div|img|p...)\b[^<>]*>/g, "")
Expand (span|div|img|p...) into a list of all tags (or only those you want to remove). NB: the list must be sorted by length, longer tags first!
This may provide incorrect results in some edge cases (like attributes with <> characters), but the only real alternative would be to program a complete html parser by yourself. Not that it would be extremely complicated, but might be an overkill here. Let us know.
var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");
Here is my solution ,
function removeTags(){
var txt = document.getElementById('myString').value;
var rex = /(<([^>]+)>)/ig;
alert(txt.replace(rex , ""));
}
I use regular expression for preventing HTML tags in my textarea
Example
<form>
<textarea class="box"></textarea>
<button>Submit</button>
</form>
<script>
$(".box").focusout( function(e) {
var reg =/<(.|\n)*?>/g;
if (reg.test($('.box').val()) == true) {
alert('HTML Tag are not allowed');
}
e.preventDefault();
});
</script>
<script type="text/javascript">
function removeHTMLTags() {
var str="<html><p>I want to remove HTML tags</p></html>";
alert(str.replace(/<[^>]+>/g, ''));
}</script>
Let's say we have a paragraph:
<p>txt0</p>
... and we want to append some text:
$("p").append("_txt1");
$("p").append("_txt2");
$("p").append("_txt3");
The result will be, as expected:
txt0_txt1_txt2_txt3
However if we inspect the result in browser, what it really is:
<p>
"txt0"
"_txt1"
"_txt2"
"_txt3"
</p>
There are 4 different strings that are only rendered as one. The problem is I'm creating a flash object dynamically and appending strings this way will not work because of the quotes. I really need it to be one continuous string like this:
<p>
txt0_txt1_txt2_txt3
</p>
Is there a way to append in such a way? Or to remove all the quotes afterwards?
PS. before you say to make one big string before appending, that won't work because the string is too big and for ex. it works in Chrome but not in Firefox or IExplorer (but that's a different issue).
Use text, otherwise you're appending a new TextNode everytime:
var $p = $('p');
$p.text('txt0');
$p.text($p.text() + '_txt1');
Or with textContent it's less confusing:
var p = $('p')[0];
p.textContent += 'txt0';
p.textContent += '_txt1';
...
You can manipulate the html inside the p-tag this way:
$('p').html($('p').html() + '_text');
My solution is similar to #DonnyDee's
$("p").append("_txt1");
$("p").append("_txt2");
$("p").append("_txt3");
$("p").html($("p").html());
Somehow .html knows how to remove the quotation marks where there are two strings together i.e. "txt0"
"_txt1" etc
afaik .text() is destructive, i.e. will replace html with text which is perhaps not what we want, although I agree for this example it would suffice.
$('p').html(function(index, oldHTML) { return oldHTML + '_text';});
looks like a good solution though. e.g.
$("p").append("_txt1");
$("p").append("_txt2");
$("p").append("_txt3");
$("p").html(function(index, oldHTML) { return oldHTML);