js: how to simplify html string - javascript

is there any way to simplify the HTML string? Like removing all redundant tags from the string.
For instance:
Source HTML:
<div><span><span>1</span></span><span>2</span></div>
Expected output:
<div><span>12</span></div>
(or even less)
<div>12</div>
I've known some libs like quilljs can do this, but it's a huge library, kind of overkill for my case.
also, https://github.com/htacg/tidy-html5 is kind of what I want, but it does not have a js release

You can try using the DOMParser:
let s = `<div><span><span>1</span></span><span>2</span></div>`
let d = new DOMParser()
let doc = d.parseFromString(s, 'application/xml')
let tag = doc.children[0].tagName
let text = doc.children[0].textContent
let result = `<${tag}>${text}</${tag}>`
console.log(result)

Please refer to the below code, It may help you to go further.
var childs = document.querySelectorAll("div#parent")
var tmpTexts = []
for (const c of childs) {
if (tmpTexts.includes(c.innerText)) continue
tmpTexts.push((c.innerText).trim())
c.parentNode.removeChild(c)
}
tmpTextArr = tmpTexts[0].split('\n');
console.log(tmpTextArr);
const para = document.createElement("div");
tmpTextArr.forEach(function(text) {
var node = document.createElement("div");
var nodeTxt = document.createTextNode(text);
node.appendChild(nodeTxt);
para.appendChild(node)
});
document.body.appendChild(para);
https://jsfiddle.net/Frangly/pnLgr8ym/66/
In tmpTexts, for every new line - you should add a div tag.
Create a new Element and iterate the tmpTexts array and a div tag by using innerHTML

Related

Wrapping a string in a div in pure javascript

Is there a way in pure javascript to wrap a nacked string?
I have a string that I'm splitting based on a character to separate the header from the rest of the content. I would very much like to style that header, but i can't seem to find a good way to wrap a div around it with a class.
All I can seem to find is wrapping a div around something that already has other elements.
My code looks like this
var string = "Title*This is the very long content";
var title = string.split('*')[0]
var body = string.split('*')[1]
//put them back together
string = title + body;
but i can't seem to find a good way to wrap a div around it with a
class?
You can create an element (which is at the end a tag HTML) with createElement
and attach it a class with className
let string = "Title*This is the very long content";
/*
let title = string.split('*')[0]
let body = string.split('*')[1]
*/
let [title, body] = string.split('*'); // Destructuring assignment
let headerTitle = document.createElement('h1');
headerTitle.textContent = title;
headerTitle.className = "red";//headerTitle.classList.add('red');
let bodyHTML = document.createElement('p');
bodyHTML.textContent = body;
document.querySelector('#content').innerHTML = headerTitle.innerHTML +"<br/>"+ bodyHTML.innerHTML;
.red{
color: red;
}
<div id="content" />
Tip Try to avoid var keyword for declaring variable and replace them with either let or const and better using Destructuring assignment

How to make mutliple times of insertAdjacentElement

How do I make multiple times of insertAdjacentElement like below?
test.insertAdjacentElement('afterend',elm1);
test.insertAdjacentElement('afterend',elm2);
test.insertAdjacentElement('afterend',elm3);
I can make a function to refactor it, but is there any shortcut to doing this? Like:
test.insertAdjacentElement('afterend',elm1, elm2, elm3);
If you're inserting at the end of a container, you can use append, which accepts multiple arguments.
containerOfTest.append(elm1, elm2, elm3)
Otherwise, a very simple loop would do.
for (const elm of [elm1, elm2, elm3]) {
test.insertAdjacentElement('afterend',elm);
}
Another option is to create a DocumentFragment, insert all elements into it, and then insert the fragment (only once) into the DOM.
const test = document.querySelector('.test');
const elm1 = document.createElement('div');
const elm2 = document.createElement('div');
const elm3 = document.createElement('div');
const fragment = new DocumentFragment();
fragment.append(elm1, elm2, elm3);
test.parentElement.insertBefore(fragment, test.nextElementSibling);
console.log(document.body.innerHTML);
<div class="test">test</div>

Using getElementsByTagName to find all hrefs in a variable

In a variable I'm holding HTML source code, which I obtained from DB. I'd like to search this content through for all the "a href" attributes and list them in a table.
Now I've found here how to search it in a DOM (like below), but how to use it to search within a variable?
var links = document.getElementsByTagName("a").getElementsByAttribute("href");
Got this currently, which is searching by RegEx, but it doesn't work very well:
matches_temp = result_content.match(/\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’&quote]))/ig);
In result_content I'm holding that HTML Source.
getElementsByTagName returns a nodelist that does not have a method called getElementsByAttribute but ONLY if you have DOM access
Without DOM (for example node.js)
const hrefRe = /href="(.*?)"/g;
const urlRe = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’&quote]))/ig;
const stringFromDB = `000
Something something 001 something`
stringFromDB.match(hrefRe).forEach(
(href) => console.log(href.match(urlRe)[0] )
);
// oldschool:
// stringFromDB.match(hrefRe).forEach(function(href) { console.log(href.match(urlRe)[0] ) });
In this code I create a DOM snippet first
Also I ONLY get anchors that have an href to begin with
NOTE the getAttribute so the browser does not try to interpret the URL
With the regex if you wanted to only match SPECIFIC types of href:
const re = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’&quote]))/ig;
const stringFromDB = `000
001`
let doc = document.createElement("div");
doc.innerHTML = stringFromDB
doc.querySelectorAll("a[href]").forEach(
(x) => console.log(x.getAttribute("href").match(re)[0])
);
Without the regex
const stringFromDB = `000
001`
let doc = document.createElement("div");
doc.innerHTML = stringFromDB
doc.querySelectorAll("a[href]").forEach(
(x) => console.log(x.getAttribute("href"))
);
Firstly, you shouldn't be using RegEx to parse HTML. This answer explains why.
Secondly, you're using getElementsByAttribute incorrectly - it does exactly what it says and gets elements by attributes. You should just use querySelectorAll on all elements with a href, and then map out the hrefs:
var hrefs = document.querySelectorAll("a[href*=http]");
var test = Array.prototype.slice.call(hrefs).map(e => e.href);
console.log(test);
Example
Example 1
Example 2
Example 3

Problems when parsing nested html tags from string

I have this code that's to parse a string into html and display the text of each element.
That's working good except when I have nested tags for example <div><p>Element 1</p><p>Element 2</p></div>. In this case, the code displays <p>Element 1</p><p>Element 2</p>.
How can I do to get each tags one after the other ? (Here I want Element 1 and then Element 2)
Here's the code :
let text = new DOMParser().parseFromString(stringHtml, 'text/html');
let textBody = text.body.firstChild;
while (textBody) {
alert(textBody.innerHTML);
// other actions on the textBody element
textBody = textBody.nextSibling;
}
Thanks for helping me out
It sounds like you want a recursive function that prints the textContent of itself, or of its children, if it has children:
const stringHtml = '<div><p>Element 1</p><p>Element 2</p></div><div><p>Element 3</p><p>Element 4</p></div>';
const doc = new DOMParser().parseFromString(stringHtml, 'text/html');
const showElms = parent => {
const { children } = parent;
if (children.length) Array.prototype.forEach.call(children, showElms);
else console.log(parent.textContent);
}
showElms(doc.body);
That's assuming you want to iterate over the actual elements. If you want all text nodes instead, then recursively iterate over the childNodes instead.

How to append an HTML string to a DocumentFragment?

I'm adding textnodes to a documentFragment like this:
var foo = document.createDocumentFragment();
var someText = "Hello World";
foo.appendChild(document.createTextNode(someText));
This works fine, but sometimes I'm being passed "text" which includes inline links like this:
var someOtherText = "Hello <a href='www.world.com'>World</a>";
Which in my handler is converted to hardcoded text instead of a link.
Question:
How do I append an HTML string like the above into a documentFragment? If I'm not using textNodes can this be done using appendChild?
Create a template-element, add the text with .innerHTML and get a doumentFragment with the content-property:
function stringToFragment(string) {
const temp = document.createElement('template');
temp.innerHTML = string;
return temp.content;
}
Now you can create a documentFragment from a string, and you can even append a documentFragment to a documentFragment:
const frag = stringToFragment('<div>Hello</div>');
frag.append(stringToFragment('<div>Stackoverflow</div>'));
document.createRange().createContextualFragment("<span>Hello World!</span>");
It returns a DocumentFragment.
Support IE >= 9
EDIT:
recent versions Safari seems to fail with the short way, here is a little bit longer but working way:
var range = document.createRange();
range.selectNode(document.body); // Select the body cause there is always one (documentElement fail...)
var fragment = range.createContextualFragment("<span>Hello World!</span>");
This may works:
var foo = document.createDocumentFragment();
var someText = 'Hello World';
var item = document.createElement('span');
item.innerHTML = someText
foo.appendChild(item);
document.body.appendChild(foo);

Categories

Resources