parse xml data by traversing through each node in javascript - javascript

I am trying to parse the below xml data by traversing through each node.
<example>
<name>BlueWhale</name>
<addr>101 Yesler Way, Suite 402</addr>
<city>Seattle</city>
<state>Washington</state>
</example>
Now I want to access each node without doing getElementsByTagName and print each NodeName & NodeValue in javascript, with the help of things like, rootElement,firstchild,nextSibling which i am not sure of.
I am trying the following manner
var txt = " <example> <name>BlueWhale</name> <addr>101 Yesler Way, Suite 402</addr> <city>Seattle</city> <state>Washington</state> </example> "
var domParser = new DOMParser();
xml = domParser.parseFromString(txt, "text/xml");
var el =xml.documentElement.nodeName;
console.log(el);
and print each var.
Could anyone please help.

if you xml is stored inside a string variable you can use jQuery.
var xml = "<example>...";
$(xml).children().each(function() {
var tagName = this.tagName;
var text = this.innerHtml
});

You should consider using library that does that for you rather than doing it by hand. One of commonly used one's you can find here.

Related

Is there a better way to turn HTML to plain text in JavaScript than a series of Regex search/replaces

My goal is to retrieve HTML via a REST API and convert it to plain text. Then I send it through another API to Slack, which does not accept HTML (so far as I'm aware).
I am using a series of Regex scripts to accomplish this.
var noHtml = text.replace(/<(?:.|\n)*?>/gm, '');
var noHtmlEncodeSpace = noHtml.replace(/ /g, ' ');
var noHtmlEncodersquo = noHtmlEncodeSpace.replace(/’/g, "'");
var noHtmlEncodeldsquo = noHtmlEncodersquo.replace(/‘/g, "'");
var noHtmlEncodeSingleQuote = noHtmlEncodeldsquo.replace(/'/g, "'");
var noHtmlEncodeldquo = noHtmlEncodeSingleQuote.replace(/“/g, "`");
var noHtmlEncodeDoubleQuote = noHtmlEncodeldquo.replace(/"/g, "`");
var noHtmlEncoderdquo = noHtmlEncodeDoubleQuote.replace(/”/g, "`");
The results are as expected. But transforming HTML to plain text seems like it is a common-enough task in JavaScript that there may be a smarter way to do it.
I am new to JavaScript. Thank you for any guidance.
You might use DOMParser to safely parse the HTML string into a document, after which you can retrieve the textContent of the document:
const htmlStr = `<div>
foo ’’
</div>
<script>
alert('evil');
</` + `script>
<img src="badsrc" onerror="alert('evil')">`;
const doc = new DOMParser().parseFromString(htmlStr, 'text/html');
console.log(doc.body.textContent);
Depending on the text spacing desired, you might use the innerText property instead:
doc.body.innerText
(This is in contrast to, for example, setting the innerHTML of a newly created element, which wouldn't be as safe - the "evil" scripts could be executed before the textContent is retrieved)

How to execute document.querySelectorAll on a text string without inserting it into DOM

var html = '<p>sup</p>'
I want to run document.querySelectorAll('p') on that text without inserting it into the dom.
In jQuery you can do $(html).find('p')
If it's not possible, what's the cleanest way to to do a temporary insert making sure it doesn't interfere with anything. then only query that element. then remove it.
(I'm doing ajax requests and trying to parse the returned html)
With IE 10 and above, you can use the DOM Parser object to parse DOM directly from HTML.
var parser = new DOMParser();
var doc = parser.parseFromString(html, "text/html");
var paragraphs = doc.querySelectorAll('p');
You can create temporary element, append html to it and run querySelectorAll
var element = document.createElement('div');
element.insertAdjacentHTML('beforeend', '<p>sup</p>');
element.querySelectorAll('p')

Javascript: XMLDocument iteration

I have the following code:
var xmlString = ajaxRequest.responseText.toString();
parser = new DOMParser()
doc = parser.parseFromString(xmlString, "text/xml");
The response text is a complete HTML document. After I create the XMLDocument (doc), I want to go over each node, manipulate some stuff and print it.
How can I iterate the XMLDocument? I want to go on each one of its nodes.
Thanks!
A little example if you want to get all links from this XML and print their text
var links = doc.documentElement.getElementsByTagName("a");
for (i=0;i<links.length;i++) {
var txt=links[i].firstChild.nodeValue;
document.write(txt + '<br>');
}
Almost sure that this is correct, didn't had time to test it.
You may read this articles to go deeper:
getElementsByTagName
nodeName
NodeList
Hope this helps.
Best regards!

How can I manipulate the DOM from a string of HTML in JavaScript?

I'm developing a Windows 8 Metro App using JavaScript. I need to manipulate a string of HTML to select elements like DOM.
How can I do that?
Example:
var html = data.responseText; // data.response is a string of HTML received from xhr function.
// Now I need to extract an element from the string like document.getElementById("some_element")...
Thanks!
UPDATE:
I solved!
var parser = new DOMParser();
var xml = parser.parseFromString(data.responseText);
I think your approach to the problem isn't the best, you could return JSON or xml. But if you need to do it that way:
To my knowledge you wont be able to use getElementById without inserting a new element in the document (in the example below, doing inserting div in document, for example document.appendChild(div)), but you could do this:
var div = document.createElement("div");
div.innerHTML = '<span id="rawr"></span>'; //here you would put data.responseText
var elements = div.getElementsByTagName("span"); // [<span id="rawr"></span>], there you could ask elements[0].id === "rawr" or whatever you like

parsing XML in JavaScript, trying to get elements by classname

I am trying to parse a large XML file using JavaScript. Looking online, it seems that the easiest way to start is to use the browser's DOM parser. This works, and I can get elements by ID. I can also get the "class" attribute for those elements, and it returns what I would expect. However, I don't appear to be able to get elements by class.
The following was tried in the latest Chrome:
xmlString = '<?xml version="1.0"?>';
xmlString = xmlString + '<example class="test" id="example">content</example>'
parser = new DOMParser();
xmlDoc = parser.parseFromString(xmlString,"text/xml");
xmlDoc.getElementById("example");
// returns the example element (good)
xmlDoc.getElementById("example").getAttribute("class");
// returns "test" (good)
xmlDoc.getElementsByClassName("test");
// returns [] (bad)
Any ideas?
This should get all elements of a given class, assuming that the tag name will be consistent.
var elements = xmlDoc.getElementsByTagName('Example');
var classArray = [];
for(var i=0;i<elements.length;i++){
if(elements[i].className=="test"){
classArray.push(elements[i])
}}
You can use JQuery to parse an XML file by using a class selector. http://jquery.com
Updating the parser type to HTML as opposed to XML should work.
parser = new DOMParser();
xmlDoc = parser.parseFromString(xmlString,"text/html")

Categories

Resources