Convert html node to one line string(minify) - javascript

I have a dom node in variable and i want to remove all enter/line break, tabs between the html tags. Basically i want to minify it without using external library. How can i do it.
var target = document.getElementById('myid');
var wrap = document.createElement('div');
wrap.appendChild(target.cloneNode(true));
wrap contains the node..

Not elegant, but should work
target.innerHTML = target.innerHTML.replace(/\n|\t/g, ' ');

You could replace the line breaks with an empty string target.replace(/(\r\n|\n|\r)/gm,"");

Related

Parse contents of script tags inside string

Let's say I have the following string:
var myString = "<p>hello</p><script>console.log('hello')</script><h1>Test</h1><script>console.log('world')</script>"
I would like to use split to get an array with the contents of the script tags. e.g. I want my output to be:
["console.log('hello')", "console.log('world')"]
I tried doing myString.split(/[<script></script>]/) But did not get the expected output.
Any help is appreciated.
You can't parse (X)HTML with regex.
Instead, you can parse it using innerHTML.
var element = document.createElement('div');
element.innerHTML = myString; // Parse HTML properly (but unsafely)
However, this is not safe. Even if innerHTML doesn't run the JS inside script elements, malicious strings can still run arbitrary JS, e.g. with <img src="//" onerror="alert()">.
To avoid that problem, you can use DOMImplementation.createHTMLDocument to create a new document, which can be used as a sandbox.
var doc = document.implementation.createHTMLDocument(); // Sandbox
doc.body.innerHTML = myString; // Parse HTML properly
Alternatively, new browsers support DOMParser:
var doc = new DOMParser().parseFromString(myString, 'text/html');
Once the HTML string has been parsed to the DOM, you can use DOM methods like getElementsByTagName or querySelectorAll to get all the script elements.
var scriptElements = doc.getElementsByTagName('script');
Finally, [].map can be used to obtain an array with the textContent of each script element.
var arrayScriptContents = [].map.call(scriptElements, function(el) {
return el.textContent;
});
The full code would be
var doc = document.implementation.createHTMLDocument(); // Sandbox
doc.body.innerHTML = myString; // Parse HTML properly
[].map.call(doc.getElementsByTagName('script'), function(el) {
return el.textContent;
});
Javascript Code:
function myFunction() {
var str = "<p>hello</p><script>console.log('hello')</script><h1>Test</h1><script>console.log('world')</script>";
console.log(str.match(/<script\b[^>]*>(.*?)<\/script>/gm));
}
You have to escape the forward slash like so: /.
myString.split(/(<script>|<\/script>)/)

Can jQuery work for html strings that are not in DOM?

I have an html string that I created with a template.
This string has an html table with a bunch of rows, I'd like to manipulate this string using jquery, for example to add some classes to some rows based on logic, or other manipulation and then have jquery return a string. However, it seems that jQuery only manipulates the DOM. But I don't want to post this string into the DOM yet.
var origString = "<table><tr id='bla'>...more html inside here...</tr></table>";
//Something like
var newString = $(htmlString -> '#bla').addClass('blaClass');
// this syntax is obviously wrong, but what I mean is I'm trying
// to look inside the string not the dom
Or maybe it's better to post this string into an invisible div first and then manipulate it with jquery?
Parse it to a variable, manipulate, then append:
var origString = "<table><tr id='bla'>...";
origString = $.parseHTML(origString);
$(origString).find("tr").addClass("test");
$("body").append(origString);
Concept demo: http://jsfiddle.net/6bkUv/
Yeah, you can add a class without appending it to the dom.
var origString = "<table><tr id='bla'>...more html inside here...</tr></table>",
newString = $('<div>'+origString+'</div');
newString.find('#bla').addClass('blaClass');
console.log(newString.html());
Yes, you can definitely manipulate a string with jQuery. Here is what the following code does:
Declares a div to wrap the string in
Wraps the string in the div and does the manipulation
Finally, produces the manipulated string
No interaction with the DOM whatsoever.
var htmlString = "<table><tr id='bla'>...";
var div = $('<div/>');
div.html( htmlString ).find( '#bla' ).addClass( 'class' );
var newString = div.html();
WORKING JSFIDDLE DEMO
//OUTPUT
Original: <table><tr id='bla'><td></td></tr></table>
New: <table><tbody><tr id="bla" class="class"><td></td></tr></tbody></table>
NOTE: Please note that if your table string does not have a tbody element jQuery will include it as that makes for valid table markup.
The answers were too complicated. The answer is just a dollar sign and some parentheses:
var queryObj = $(str);
So
var str = "<table><tr>...</tr></table>"
var queryObj = $(str);
queryObj.find('tr').addClass('yoyo!');
// if you use 'find' make sure your original html string is a container
// in this case it was a 'table' container
$("body").append(queryObj);
works just fine..

How to insert HTML entities with createTextNode?

If I want to add an ascii symbol form js to a node somewhere?
Tried as a TextNode, but it didn't parse it as a code:
var dropdownTriggerText = document.createTextNode('blabla ∧');
You can't create nodes with HTML entities. Your alternatives would be to use unicode values
var dropdownTriggerText = document.createTextNode('blabla \u0026');
or set innerHTML of the element. You can of course directly input &...
createTextNode is supposed to take any text input and insert it into the DOM exactly like it is. This makes it impossible to insert for example HTML elements, and HTML entities. It’s actually a feature, so you don’t need to escape these first. Instead you just operate on the DOM to insert text nodes.
So, you can actually just use the & symbol directly:
var dropdownTriggerText = document.createTextNode('blabla &');
I couldn't find an automated way to do this. So I made a function.
// render HTML as text for inserting into text nodes
function renderHTML(txt) {
var tmpDiv = document.createElement("div"); tmpDiv.innerHTML = txt;
return tmpDiv.innerText || tmpDiv.textContent || txt;
}

How to remove only html tags in a string using javascript

I want to remove html tags from given string using javascript. I looked into current approaches but there are some unsolved problems occured with them.
Current solutions
(1) Using javascript, creating virtual div tag and get the text
function remove_tags(html)
{
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent||tmp.innerText;
}
(2) Using regex
function remove_tags(html)
{
return html.replace(/<(?:.|\n)*?>/gm, '');
}
(3) Using JQuery
function remove_tags(html)
{
return jQuery(html).text();
}
These three solutions are working correctly, but if the string is like this
<div> hello <hi all !> </div>
stripped string is like
hello . But I need only remove html tags only. like hello <hi all !>
Edited: Background is, I want to remove all the user input html tags for a particular text area. But I want to allow users to enter <hi all> kind of text. In current approach, its remove any content which include within <>.
Using a regex might not be a problem if you consider a different approach. For instance, looking for all tags, and then checking to see if the tag name matches a list of defined, valid HTML tag names:
var protos = document.body.constructor === window.HTMLBodyElement;
validHTMLTags =/^(?:a|abbr|acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|bgsound|big|blink|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|data|datalist|dd|del|details|dfn|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frame|frameset|h1|h2|h3|h4|h5|h6|head|header|hgroup|hr|html|i|iframe|img|input|ins|isindex|kbd|keygen|label|legend|li|link|listing|main|map|mark|marquee|menu|menuitem|meta|meter|nav|nobr|noframes|noscript|object|ol|optgroup|option|output|p|param|plaintext|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|spacer|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr|xmp)$/i;
function sanitize(txt) {
var // This regex normalises anything between quotes
normaliseQuotes = /=(["'])(?=[^\1]*[<>])[^\1]*\1/g,
normaliseFn = function ($0, q, sym) {
return $0.replace(/</g, '<').replace(/>/g, '>');
},
replaceInvalid = function ($0, tag, off, txt) {
var
// Is it a valid tag?
invalidTag = protos &&
document.createElement(tag) instanceof HTMLUnknownElement
|| !validHTMLTags.test(tag),
// Is the tag complete?
isComplete = txt.slice(off+1).search(/^[^<]+>/) > -1;
return invalidTag || !isComplete ? '<' + tag : $0;
};
txt = txt.replace(normaliseQuotes, normaliseFn)
.replace(/<(\w+)/g, replaceInvalid);
var tmp = document.createElement("DIV");
tmp.innerHTML = txt;
return "textContent" in tmp ? tmp.textContent : tmp.innerHTML;
}
Working Demo: http://jsfiddle.net/m9vZg/3/
This works because browsers parse '>' as text if it isn't part of a matching '<' opening tag. It doesn't suffer the same problems as trying to parse HTML tags using a regular expression, because you're only looking for the opening delimiter and the tag name, everything else is irrelevant.
It's also future proof: the WebIDL specification tells vendors how to implement prototypes for HTML elements, so we try and create a HTML element from the current matching tag. If the element is an instance of HTMLUnknownElement, we know that it's not a valid HTML tag. The validHTMLTags regular expression defines a list of HTML tags for older browsers, such as IE 6 and 7, that do not implement these prototypes.
If you want to keep invalid markup untouched, regular expressions is your best bet. Something like this might work:
text = html.replace(/<\/?(span|div|img|p...)\b[^<>]*>/g, "")
Expand (span|div|img|p...) into a list of all tags (or only those you want to remove). NB: the list must be sorted by length, longer tags first!
This may provide incorrect results in some edge cases (like attributes with <> characters), but the only real alternative would be to program a complete html parser by yourself. Not that it would be extremely complicated, but might be an overkill here. Let us know.
var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");
Here is my solution ,
function removeTags(){
var txt = document.getElementById('myString').value;
var rex = /(<([^>]+)>)/ig;
alert(txt.replace(rex , ""));
}
I use regular expression for preventing HTML tags in my textarea
Example
<form>
<textarea class="box"></textarea>
<button>Submit</button>
</form>
<script>
$(".box").focusout( function(e) {
var reg =/<(.|\n)*?>/g;
if (reg.test($('.box').val()) == true) {
alert('HTML Tag are not allowed');
}
e.preventDefault();
});
</script>
<script type="text/javascript">
function removeHTMLTags() {
var str="<html><p>I want to remove HTML tags</p></html>";
alert(str.replace(/<[^>]+>/g, ''));
}</script>

Add html string to DOM as element

Is there anyway to create a string and add to the DOM? And having Javascript to understand the elements in the string?
I tried the below and 4th line gives error:
var bmdiv = document.createElement('div');
bmdiv.setAttribute('id', 'myDiv');
var str = "<b>aa</b>";
bmdiv.innerHTML(str);
I need to add several tags in str to the DIV myDiv
I need NOT to use jQuery since the script will not load jQuery
Thanks.
The innerHTML property is not a function, you should assign it like this:
var bmdiv = document.createElement('div');
bmdiv.setAttribute('id', 'myDiv');
var str = "<b>aa</b>";
bmdiv.innerHTML = str;
Try
bmdiv.innerHTML = str;
Another way to do this is to manually create the DOM structure for each of the tags, then append them into the div.

Categories

Resources