Parsing Html from string into document - javascript

I'm trying to parse Html code from string into a document and start appending each node at a time to the real dom.
After some research i have encountered this api :
DOMImplementation.createHTMLDocument()
which works great but if some node has descendants than i need to first append the node and only after its in the real dom than i should start appending its descendants , i think i can use
document.importNode(externalNode, deep);
with the deep value set to false in order to copy only the parent node.
so my approach is good for this case and how should i preserve my order of appended nodes so i wont append the same node twice?
and one more problem is in some cases i need to add more html code into a specific location (for example after some div node) and continue appending , any idea how to do that correctly?

You can use the DOMParser for that:
const parser = new DOMParser();
const doc = parser.parseFromString('<h1>Hello World</h1>', 'text/html');
document.body.appendChild(doc.documentElement);
But if you want to append the same thing multiple times, you will have better performances using a template:
const template = document.createElement('template');
template.innerHTML = '<h1>Hello World</h1>';
const instance = template.cloneNode(true);
document.body.appendChild(instance.content);
const instance2 = template.cloneNode(true);
document.body.appendChild(instance2.content);
Hope this helps

Related

Puppeteer querySelectorAll doesn't get elements properly

I'm using Puppeteer and am trying to use document.querySelectorAll to get a list of elements to then loop over and do something, however, it seems that something is wrong in my code, it either returns nothing, undefined or an empty {} despite my elements being on the page, my JS:
let elements = await page.evaluate(() => document.querySelectorAll("div[class^='my-class--']"))
for (let el of Array.from(elements)) {
// do something
}
what's wrong with my elements and page.evaluate here?
As far as I understand, puppeteer returns all the HTML as a giant string. This is because Node doesn't run in the browser so the HTML doesn't get parsed. So DOM selectors won't work.
What you can do to solve this issue is to use the Cheerio.js module, which allows you to grab elements with JQuery as if it is a parsed DOM.
Since puppeteer returns all HTML as a string you could use DOMParser like in the below example.
let doc = new DOMParser().parseFromString('<template class="myClass"><span class="target">check it out</span></template>', 'text/html');
let templateContent = doc.querySelector("template");
let template = new DOMParser().parseFromString(templateContent.innerHTML, 'text/html');
let target = template.querySelector("span");
console.log([templateContent,target]);

Is DOMParser().parseFromString() worth using?

https://developer.mozilla.org/en-US/docs/Web/API/DOMParser#DOMParser_HTML_extension
It looks like the DOMParser uses innerHTML to add stringified elements to the DOM. What's the advantage of using it?
I have compared the difference between using DOMParser().parseFromString() and using element.innerHTML below. Am I overlooking something?
Using DOMParser
const main = document.querySelector('main');
const newNodeString = '<body><h2>I made it on the page</h2><p>What should I do now?</p><select name="whichAdventure"><option>Chill for a sec.</option><option>Explore all that this page has to offer...</option><option>Run while you still can!</option></select><p>Thanks for your advice!</p></body>';
// Works as expected
let newNode = new DOMParser().parseFromString(newNodeString, 'text/html');
let div = document.createElement('div');
console.log('%cArray.from: ', 'border-bottom: 1px solid yellow;font-weight:1000;');
Array.from(newNode.body.children).forEach((node, index, array) => {
div.appendChild(node);
console.log('length:', array.length, 'index: ', index, 'node: ', node);
})
main.appendChild(div);
Using innerHTML
const main = document.querySelector('main');
const newNodeString = '<h2>I made it on the page</h2><p>What should I do now?</p><select name="whichAdventure"><option>Chill for a sec.</option><option>Explore all that this page has to offer...</option><option>Run while you still can!</option></select><p>Thanks for your advice!</p>';
// Works as expected
let div = document.createElement('div');
div.innerHTML = newNodeString;
main.appendChild(div);
I expect that DOMParser().parseFromString() provides some additional functionality that I'm unaware of.
Well, for one thing, DOMParser can parse XML files. It also validates that the XML is well formed and produces meaningful errors if not. More to the point, it is using the right tool for the right job.
I've used it in the past to take an uploaded XML file and produce HTML using a predefined XSLT style sheet, without getting a server involved.
Obviously, if all you're doing is appending the string to an existing DOM, and innerHTML (or outerHTML) works for you, continue using it.

How to execute document.querySelectorAll on a text string without inserting it into DOM

var html = '<p>sup</p>'
I want to run document.querySelectorAll('p') on that text without inserting it into the dom.
In jQuery you can do $(html).find('p')
If it's not possible, what's the cleanest way to to do a temporary insert making sure it doesn't interfere with anything. then only query that element. then remove it.
(I'm doing ajax requests and trying to parse the returned html)
With IE 10 and above, you can use the DOM Parser object to parse DOM directly from HTML.
var parser = new DOMParser();
var doc = parser.parseFromString(html, "text/html");
var paragraphs = doc.querySelectorAll('p');
You can create temporary element, append html to it and run querySelectorAll
var element = document.createElement('div');
element.insertAdjacentHTML('beforeend', '<p>sup</p>');
element.querySelectorAll('p')

Create XML DOM Element while keeping case sensitivity

I'm trying to create the following element nodetree:
<v:custProps>
<v:cp v:nameU="Cost">
</v:custProps>
with:
newCustprop = document.createElement("v:custProps");
newcp = document.createElement("v:cp");
newcp.setAttribute("v:nameU", "Cost");
newCustprop.appendChild(newcp);
However, document.createElement("v:custProps") generates <v:custprops> as opposed to <v:custProps>. Is there anyway to escape this parsing?
Edit 1:
I'm currently reading this article on nodename case sensitivity. It's slightly irrelevant to my problem though because my code is unparsed with <![CDATA]]> and I'd rather not use .innerHTML.
You need to use createElementNS()/setAttributeNS() and provide the namespace, not only the alias/prefix. The example uses urn:v as namespace.
var xmlns_v = "urn:v";
var newCustprop = document.createElementNS(xmlns_v, "v:custProps");
var newcp = document.createElementNS(xmlns_v, "v:cp");
newcp.setAttributeNS(xmlns_v, "v:nameU", "Cost");
newCustprop.appendChild(newcp);
var xml = (new XMLSerializer).serializeToString(newCustprop);
xml:
<v:custProps xmlns:v="urn:v"><v:cp v:nameU="Cost"/></v:custProps>
It's not recommended to use document.createElement for qualified names. See if the document.createElementNS can better serve your purposes.
I still had issues where createElementNs would attach an attribute of "xmls" on my string about using new XMLSerializer().serializeToString(xmlDoc).
I ended up using the following function to create elements with case sensitive tag names:
function createElement(tagName) {
const doc = new DOMParser().parseFromString(`<${tagName}></${tagName}>`, 'text/xml')
return doc.children[0]
}

XML sort by tag in JavaScript

I am trying to convert this XML tree
<IN1>
<IN1.1>
<IN1.1.1>1</IN1.1.1>
</IN1.1>
<IN1.17>
<IN1.17.1>1</IN1.17.1>
</IN1.17>
<IN1.47>
<IN1.47.1>C</IN1.47.1>
</IN1.47>
<IN1.3>
<IN1.3.1>paycode</IN1.3.1>
</IN1.3>
</IN1>
into this
<IN1>
<IN1.1>
<IN1.1.1>1</IN1.1.1>
</IN1.1>
<IN1.3>
<IN1.3.1>paycode</IN1.3.1>
</IN1.3>
<IN1.17>
<IN1.17.1>1</IN1.17.1>
</IN1.17>
<IN1.47>
<IN1.47.1>C</IN1.47.1>
</IN1.47>
</IN1>
My current code is
for each (field in msg['IN1'].children())
{
fields.push(field.toString());
}
fields.sort();
This sorts the last two elements but then re-arranges the first two. What is a good way to approach this?
You might be able to find some luck by using the jQuery TinySort plugin. You can sort DOM elements based on numerical/alphabetical parameters.
XSLT was created to transform XML from one form to another (jsFiddle):
var xml = "<IN1><IN1.1><IN1.1.1>1</IN1.1.1></IN1.1><IN1.17><IN1.17.1>1</IN1.17.1></IN1.17><IN1.47><IN1.47.1>C</IN1.47.1></IN1.47><IN1.3><IN1.3.1>paycode</IN1.3.1></IN1.3></IN1>";
var xsl = "<xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\"><xsl:template match=\"IN1\"><IN1><xsl:apply-templates select=\"*\"><xsl:sort select=\"substring-after(name(), 'IN1.')\" data-type=\"number\"/></xsl:apply-templates></IN1></xsl:template><xsl:template match=\"*\"><xsl:copy-of select=\".\"/></xsl:template></xsl:stylesheet>";
var parser = new DOMParser();
var domToBeTransformed = parser.parseFromString(xml, "text/xml");
var xslt = parser.parseFromString(xsl, "text/xml");
var processor = new XSLTProcessor();
processor.importStylesheet(xslt);
var newDocument = processor.transformToDocument(domToBeTransformed);
var serializer = new XMLSerializer();
var newDocumentXml = serializer.serializeToString(newDocument);
alert(newDocumentXml);
The above code works in Chrome and Firefox; the fiddle has an implementation for IE. The trick with IE is its dependence on Active X. It's all installed with IE, though, so really, no "external" libraries were used.
Good luck.
1) Write a recursive algorithm that sorts each node in the tree.
For each child:
2) Gather all the children under the element into a JavaScript array.
3) Sort the array based on custom criteria.
4) Iterate the array, adding the children to the parent in sorted order.
Note: re-adding a child into the DOM automatically removes it from its previous location.
If you need additional help with any of these parts, feel free to clarify.

Categories

Resources