Is it possible to change domparser element to string? - javascript

I have some HTML string. Use domparser to update some values, now i need back to HTML string format with updated values... Bcoz document.write accept only string.
Checkout the Sample patch,
const domName = 'MOBILE_NO';
// Below dom was getting from api.
const dom = '<html><head><title>Merchant Checkout Page</title></head><body><center><h1>Please do not refresh this page...</h1></center><form method="post" name="paytm_form"><input type="hidden" name="MOBILE_NO" value="xxxxxxxx"></form></body></html>';
const parser = new DOMParser();
const parsedHtml = parser.parseFromString(dom, 'text/html');
parsedHtml.getElementsByName(domName)[0].setAttribute('value', '1234567890');
// now i need to replace current update data entire screen
document.write(parsedHtml)
Thanks,
Gopal.R

Bcoz document.write accept only string.
Using document.write is almost always poor practice.
But, if for some reason you really need to, what you get back from parseFromString is a Document object. It has a single documentElement, which you can get the innerHTML or outerHTML of:
document.write(parsedHtml.documentElement.innerHTML);
// or
document.write(parsedHtml.documentElement.outerHTML);
Live Example:
const domName = 'MOBILE_NO';
// Below dom was getting from api.
const dom = '<html><head><title>Merchant Checkout Page</title></head><body><center><h1>Please do not refresh this page...</h1></center><form method="post" name="paytm_form"><input type="hidden" name="MOBILE_NO" value="xxxxxxxx"></form></body></html>';
const parser = new DOMParser();
const parsedHtml = parser.parseFromString(dom, 'text/html');
parsedHtml.getElementsByName(domName)[0].setAttribute('value', '1234567890');
// now i need to replace current update data entire screen
document.write(parsedHtml.documentElement.innerHTML);
But, again, probably better to just append to the page, e.g.
document.body.appendChild(parsedHtml.documentElement);
Live Example:
const domName = 'MOBILE_NO';
// Below dom was getting from api.
const dom = '<html><head><title>Merchant Checkout Page</title></head><body><center><h1>Please do not refresh this page...</h1></center><form method="post" name="paytm_form"><input type="hidden" name="MOBILE_NO" value="xxxxxxxx"></form></body></html>';
const parser = new DOMParser();
const parsedHtml = parser.parseFromString(dom, 'text/html');
parsedHtml.getElementsByName(domName)[0].setAttribute('value', '1234567890');
document.body.appendChild(parsedHtml.documentElement);
Or loop through it and append its children:
let child = parsedHtml.documentElement.firstChild;
while (child) {
let next = child.nextSibling;
document.documentElement.appendChild(child);
child = next;
}
Live Example:
const domName = 'MOBILE_NO';
// Below dom was getting from api.
const dom = '<html><head><title>Merchant Checkout Page</title></head><body><center><h1>Please do not refresh this page...</h1></center><form method="post" name="paytm_form"><input type="hidden" name="MOBILE_NO" value="xxxxxxxx"></form></body></html>';
const parser = new DOMParser();
const parsedHtml = parser.parseFromString(dom, 'text/html');
parsedHtml.getElementsByName(domName)[0].setAttribute('value', '1234567890');
let child = parsedHtml.documentElement.firstChild;
while (child) {
let next = child.nextSibling;
document.documentElement.appendChild(child);
child = next;
}

Related

How can I group Javascript actions that force reflow?

I have a project which is responsible for managing the rendering of elements, but I'm running into a performance issue replacing elements and then focusing on whatever had focus before.
Below is a minimal example that replicates the performance issue:
const renderPage = () => {
// get the old section element
const oldSection = document.querySelector('section')
// create a new section element (we'll replaceWith later)
const newSection = document.createElement('section')
// create the render button
const newButton = document.createElement('button')
newButton.innerHTML = 'Render Page'
newButton.onclick = renderPage
newSection.appendChild(newButton)
// create a bunch of elements
const dummyDivs = [...new Array(100000)].forEach(() => {
const dummy = document.createElement('div')
dummy.innerHTML = 'dummy'
newSection.appendChild(dummy)
})
// replace the old page with the new one (causes forced reflow)
oldSection.replaceWith(newSection)
// reattach focus on the button (causes forced reflow)
newButton.focus()
}
window.renderPage = renderPage
<section>
<button onclick="renderPage()">Render</button>
</section>
When running this locally, I see the following in the performance report in Chrome/Edge
Both replaceWith and focus are triggering forced reflow. Is there a way to batch or group these actions so that only a single reflow occurs? I realize that there's no way to really get around this happening at all, but if I can batch them, I think that might improve my performance.
Indeed, focus always causes a reflow: What forces layout / reflow
So what you may do, is to reduce the reflowtime by inserting the new button standalone, initiate focus and after that you can append other childs:
Working example: Example
const renderPage = () => {
// get the old section element
const oldSection = document.querySelector('section')
// create a new section element (we'll replaceWith later)
const newSection = document.createElement('section')
// create the render button
const newButton = document.createElement('button')
newButton.innerHTML = 'Render Page'
newButton.onclick = renderPage
newSection.appendChild(newButton)
// create a bunch of elements
const dummies = []; // store in seperate array
const dummyDivs = [...new Array(100000)].forEach(() => {
const dummy = document.createElement('div')
dummy.innerHTML = 'dummy';
dummies.push(dummy)
})
// insert new section only with new button
oldSection.replaceWith(newSection)
newButton.focus(); // always causes reflow; but fast because it's only one element
// store all other nodes after focus
newSection.append(...dummies)
}
window.renderPage = renderPage

Getting rid of \r\n text showing up in web scraper

I am using async to fetch table tags in a website. It works great, however it is putting all of the \r\n tags at the bottom of my table. I can't figure out how to get rid of them in my .match(). Anyone have any answers?
var fetchCommand = "https://api.allorigins.win/get?url=" + encodeURIComponent("sampleurl");
(async () => {
const response = await fetch(fetchCommand);
const text = await response.text();
let result = text.match(/(?<=\<table>).*(?=\<\/table>)/);
console.log(result);
let html_content = document.getElementById("table");
html_content.innerHTML = result;
return html_content;
})()
</script>```
Works for me by parsing the response JSON with DOMParser() and then use querySelector() to get the table. You might want to look for all tables with querySelectorAll(). I also use outerHTML on the table element to add it to the DOM because innerHTML strips the html tags.
function fetchTable() {
fetch(`https://api.allorigins.win/get?url=${encodeURIComponent('https://www.w3schools.com/html/html_tables.asp')}`)
.then((res) => {
return res.json();
})
.then((data) => {
const parser = new DOMParser();
const htmlDoc = parser.parseFromString(data.contents, 'text/html');
const firstTable = htmlDoc.querySelector('table');
html_content.innerHTML = firstTable.outerHTML;
})
}
Edit: Here is a working example: https://jsfiddle.net/q0dj7rvz/
If you keep having issues with the end of line characters, please post the URL with the table.

Xpath doesn't recognize anchor tag?

I'm running some Node.js code to scrape a website and return some text from this part of the html:
And here's the code I'm using to get it
const fs = require('mz/fs');
const xpath = require('xpath');
const parse5 = require('parse5');
const xmlser = require('xmlserializer');
const dom = require('xmldom').DOMParser;
const axios = require('axios');
(async () => {
const response = await axios.get('https://www.aritzia.com/en/product/sculpt-knit-tank-%28arjun-knit-top%29/66139.html?dwvar_66139_color=17388');
const html = response.data;
const document = parse5.parse(html.toString());
const xhtml = xmlser.serializeToString(document);
const doc = new dom().parseFromString(xhtml);
const select = xpath.useNamespaces({"x": "http://www.w3.org/1999/xhtml"});
const nodes = select("//x:div[contains(#class, 'pdp-product-brand')]/*/text()", doc);
console.log(nodes.length ? nodes[0].nodeValue : nodes.length)
})();
The code above works as expected -- it prints Babaton.
But when I swap out the xpath above for one that includes a instead of * (i.e. //x:div[contains(#class, 'pdp-product-brand')]/a/text()) it instead tells me that nodes.length === 0.
I would expect it to give the same result because the div that it's pointing to does in fact have a child anchor tag (see screenshot above). I'm just confused why it doesn't work with a and was wondering if anybody else knew the answer. Thanks!

Selecting an html node's text content with htmlparser2 in Node.js

I want to parse some html with htmlparser2 module for Node.js. My task is to find a precise element by its ID and extract its text content.
I have read the documentation (quite limited) and I know how to setup my parser with the onopentag function but it only gives access to the tag name and its attributes (I cannot see the text). The ontext function extracts all text nodes from the given html string, but ignores all markup.
So here's my code.
const htmlparser = require("htmlparser2");
const file = '<h1 id="heading1">Some heading</h1><p>Foobar</p>';
const parser = new htmlparser.Parser({
onopentag: function(name, attribs){
if (attribs.id === "heading1"){
console.log(/*how to extract text so I can get "Some heading" here*/);
}
},
ontext: function(text){
console.log(text); // Some heading \n Foobar
}
});
parser.parseComplete(file);
I expect the output of the function call to be 'Some heading'. I believe that there is some obvious solution but somehow it misses my mind.
Thank you.
You can do it like this using the library you asked about:
const htmlparser = require('htmlparser2');
const domUtils = require('domutils');
const file = '<h1 id="heading1">Some heading</h1><p>Foobar</p>';
var handler = new htmlparser.DomHandler(function(error, dom) {
if (error) {
console.log('Parsing had an error');
return;
} else {
const item = domUtils.findOne(element => {
const matches = element.attribs.id === 'heading1';
return matches;
}, dom);
if (item) {
console.log(item.children[0].data);
}
}
});
var parser = new htmlparser.Parser(handler);
parser.write(file);
parser.end();
The output you will get is "Some Heading". However, you will, in my opinion, find it easier to just use a querying library that is meant for it. You of course, don't need to do this, but you can note how much simpler the following code is: How do I get an element name in cheerio with node.js
Cheerio OR a querySelector API such as https://www.npmjs.com/package/node-html-parser if you prefer the native query selectors is much more lean.
You can compare that code to something more lean, such as the node-html-parser which supports simply querying:
const { parse } = require('node-html-parser');
const file = '<h1 id="heading1">Some heading</h1><p>Foobar</p>';
const root = parse(file);
const text = root.querySelector('#heading1').text;
console.log(text);

Parse HTML string to HTML DOM element using Node.js

I am using Node.js to sanitize through some HTML elements using the cheerio module. I am trying to utilize the module so that I can parse tags into DOM elements. What I would like to be able to do is enter in text into a textarea field for a form and when I add HTML elements as a string inside the textarea, I would like for that HTML string to be rendered into an actual DOM element
exports.createStore = async (req, res) => {
req.body.author = req.user._id;
const store = await new Store(req.body).save();
const $ = cheerio.load(store.description);
$(store.description).text();
console.log($(store.description).text());
await store.save();
req.flash(
"success",
`Successfully Created ${store.name}. Care to leave a review?`
);
res.redirect(`/store/${store.slug}`);
};
Where store.description = 'Hello < b>World< /b>'
I would like store.description to equal Hello World

Categories

Resources