Check for XML errors using JavaScript - javascript

Question: How do I syntax-check my XML in modern browsers (anything but IE)?
I've seen a page on W3Schools which includes an XML syntax-checker. I don't know how it works, but I'd like to know how I may achieve the same behavior.
I've already performed many searches on the matter (with no success), and I've tried using the DOM Parser to check if my XML is "well-formed" (also with no success).
var xml = 'Caleb';
var parser = new DOMParser();
var doc = parser.parseFromString(xml, 'text/xml');
I expect the parser to tell me I have an XML syntax error (i.e. an unclosed name tag). However, it always returns an XML DOM object, as if there were no errors at all.
To summarize, I would like to know how I can automatically check the syntax of an XML document using JavaScript.
P.S. Is there any way I can validate an XML document against a DTD (using JS, and not IE)?

Edit: Here is a more concise example, from MDN:
var xmlString = '<a id="a"><b id="b">hey!</b></a>';
var domParser = new DOMParser();
var dom = domParser.parseFromString(xmlString, 'text/xml');
// print the name of the root element or error message
dump(dom.documentElement.nodeName == 'parsererror' ? 'error while parsing' : dom.documentElement.nodeName);

NoBugs answer above did not work with a current chrome for me. I suggest:
var sMyString = "<a id=\"a\"><b id=\"b\">hey!<\/b><\/a>";
var oParser = new DOMParser();
var oDOM = oParser.parseFromString(sMyString, "text/xml");
dump(oDOM.getElementsByTagName('parsererror').length ?
(new XMLSerializer()).serializeToString(oDOM) : "all good"
);

You can also use the package fast-xml-parser, this package have a validate check for xml files:
import { validate, parse } from 'fast-xml-parser';
if( validate(xmlData) === true) {
var jsonObj = parse(xmlData,options);
}

Just F12 to enter developer mode and check the source there you can then search validateXML and you are to locate a very long complete XML checker for your reference.
I am using react and stuff using the DOMParser to present the error message as:
handleXmlCheck = () => {
const { fileContent } = this.state;
const parser = new window.DOMParser();
const theDom = parser.parseFromString(fileContent, 'application/xml');
if (theDom.getElementsByTagName('parsererror').length > 0) {
showErrorMessage(theDom.getElementsByTagName('parsererror')[0].getElementsByTagName('div')[0].innerHTML);
} else {
showSuccessMessage('Valid Xml');
}
}

Basic xml validator in javscript. This code may not valid for advance xml but basic xml.
function xmlValidator(xml){
// var xml = "<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";
while(xml.indexOf('<') != -1){
var sub = xml.substring(xml.indexOf('<'), xml.indexOf('>')+1);
var value = xml.substring(xml.indexOf('<')+1, xml.indexOf('>'));
var endTag = '</'+value+'>';
if(xml.indexOf(endTag) != -1){
// console.log('xml is valid');
// break;
}else{
console.log('xml is in invalid');
break;
}
xml = xml.replace(sub, '');
xml = xml.replace(endTag, '');
console.log(xml);
console.log(sub+' '+value+' '+endTag);
}
}
var xml = "<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";
xmlValidator(xml);

/**
* Check if the input is a valid XML file.
* #param xmlStr The input to be parsed.
* #returns If the input is invalid, this returns an XMLDocument explaining the problem.
* If the input is valid, this return undefined.
*/
export function xmlIsInvalid(xmlStr : string) : HTMLElement | undefined {
const parser = new DOMParser();
const dom = parser.parseFromString(xmlStr, "application/xml");
// https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString
// says that parseFromString() will throw an error if the input is invalid.
//
// https://developer.mozilla.org/en-US/docs/Web/Guide/Parsing_and_serializing_XML
// says dom.documentElement.nodeName == "parsererror" will be true of the input
// is invalid.
//
// Neither of those is true when I tested it in Chrome. Nothing is thrown.
// If the input is "" I get:
// dom.documentElement.nodeName returns "html",
// doc.documentElement.firstElementChild.nodeName returns "body" and
// doc.documentElement.firstElementChild.firstElementChild.nodeName = "parsererror".
//
// It seems that the parsererror can move around. It looks like it's trying to
// create as much of the XML tree as it can, then it inserts parsererror whenever
// and wherever it gets stuck. It sometimes generates additional XML after the
// parsererror, so .lastElementChild might not find the problem.
//
// In case of an error the <parsererror> element will be an instance of
// HTMLElement. A valid XML document can include an element with name name
// "parsererror", however it will NOT be an instance of HTMLElement.
//
// getElementsByTagName('parsererror') might be faster than querySelectorAll().
for (const element of Array.from(dom.querySelectorAll("parsererror"))) {
if (element instanceof HTMLElement) {
// Found the error.
return element;
}
}
// No errors found.
return;
}
(Technically that's TypeScript. Remove : string and : HTMLElement | undefined to make it JavaScript.)

Related

How to use XML Parser and get element by tagName when tag contains embedded element inside?

I'm using XMLParser to get data from a tag.
This tag is like this :
<tagName elem="XXX"></tagName>
I want to get : XXX
According to documentation, I'm doing like :
parseMyXML = new DOMParser();
xmlDoc = parseMyXML.parseFromString(contentXML,"text/xml");
var code_XXX = xmlDoc.getElementsByTagName("tagName")[0].childNodes[0].nodeValue;
I have an error : Uncaught (in promise) TypeError: Cannot read property 'nodeValue' of undefined
What I need is to go into tagName and to get the contain of elem
It's an attribute not a child node.
xmlDoc.getElementsByTagName("tagName")[0].getAttribute("elem");
With jQuery you can do something like this.
var $xml = $.parseXML(xml),
value = $xml.find('tagName').text();
if you're using plain JS you should parse xml like this.
if (window.DOMParser) {
// code for modern browsers
parser = new DOMParser();
xmlDoc = parser.parseFromString(text,"text/xml");
} else {
// code for old IE browsers
xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
xmlDoc.async = false;
xmlDoc.loadXML(text);
}

Does the browser have an HTML parser that I can access to get errors?

I'm able to parse XML documents in the browser and get error messages using the following code:
// Internet Explorer
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
xmlDoc.loadXML(txt);
var hasError = (xmlDoc.parseError.errorCode != 0);
// Firefox, Opera, Webkit
var parser = new DOMParser();
var xmlDoc = parser.parseFromString(text, "text/xml");
var hasError = (xmlDoc.getElementsByTagName("parsererror").length > 0);
But I need to be able to parse an HTML document and check for errors.
Is there an HTML parser I can access in the same way as I'm accessing the XML parser?
UPDATE:
It looks like at least in Firefox I can attempt to create an HTML dom and parse the contents. But it doesn't seem to throw any error or return an error message no matter what I throw at it:
var text = "<html><body><label>test</labe></body></html>";
var parser = new DOMParser();
//var func = function (e) { console.log(e); };
//parser.attachEvent("InvalidStateError", func); // no attachEvent, no addEventListener
var htmlDoc = parser.parseFromString(text, "text/html");
var hasError = (htmlDoc.getElementsByTagName("parsererror").length > 0);
console.log("Has error: " + hasError); // also false. no errors ever.
console.log(htmlDoc);
This page, this page and this page helped me understand the DOMParser class more.

Error with XPath in a parsed XML document (WrongDocumentError)

I'm creating a Firefox for Android extension and I have a problem with a XML document retrieved from a XMLHttpRequest: I can't find a way to select a node. The better solution I found is this, but I got this error when selecting with xpath on the document:
WrongDocumentError: Node cannot be used in a document other than the one in which it was created
This is my code:
var parser = Cc["#mozilla.org/xmlextras/domparser;1"].createInstance(Ci.nsIDOMParser);
var parsedXml = parser.parseFromString(xmlhttp.responseText, "text/xml");
var xpathExpression = "//td[contains(.,'Raw text')]/../td[2]/pre";
var res = window.content.document.evaluate(xpathExpression, parsedXml, null, window.XPathResult.STRING_TYPE , null);
If I replace the "evaluate" with the next line:
var res = parsedXml.selectSingleNode(xpathExpression);
Then I get the following error:
[JavaScript Error: "parsedXml.selectSingleNode is not a function"
{file: "resource://gre/modules/addons/XPIProvider.jsm ->
jar:file:///data/data/org.mozilla.fennec/files/mozilla/ggz9zzjr.default/extensions/qrReader#qrReader.xpi!/bootstrap.js"
line: 61}]
Well, the name of the exception, WrongDocumentErrort gave it away. You're trying to call .evaluate() on a DOM (Document) that does not belong to the same Document .evaluate() is bound to.
The nsIDOMParser will actually return a new XMLDocument that has an .evaluate() itself, which you'll have to use.
var parser = Cc["#mozilla.org/xmlextras/domparser;1"].
createInstance(Ci.nsIDOMParser);
var parsedDoc = parser.parseFromString(
'<?xml version="1.0"?>\n<doc><elem>Raw text</elem></doc>',
"text/xml");
var xpathExpression = "//elem[contains(.,'Raw text')]";
var res = parsedDoc.evaluate(
xpathExpression,
parsedDoc,
null,
XPathResult.STRING_TYPE,
null);
console.log(res, res.stringValue);
Instead using nsIDOMParser, since your content seems to be originating from XHR anyway, and seems to be (X)HTML (indicated by your expression), it might be better to use XHR.responseType = "document" instead, which will parse a DOM from the response using the HTML parser.
var req = new XMLHttpRequest();
req.onload = function() {
var doc = req.response;
var h1 = doc.evaluate("//h1", doc, null, XPathResult.STRING_TYPE, null);
console.log(h1.stringValue);
// Alternative in some cases
h1 = doc.querySelector("h1");
console.log(h1.textContent);
};
req.open("GET", "http://example.org/");
req.responseType = "document"; // Parse as text/html
req.send();
parseFromString returns a document object. selectSingleNode is not a document function. Can't you select a node using the standard document.getElementsByClassname, document.getElementById, or document.querySelector?
try
var window = parsedXml.ownerDocument.defaultView;
var res = window.content.document.evaluate(xpathExpression, parsedXml, null, window.XPathResult.STRING_TYPE , null);

troubles trying to parse an html string with DOMParser

here's come the snippet :
html = "<!doctype html>";
html += "<html>";
html += "<head><title>test</title></head>";
html += "<body><p>test</p></body>";
html += "</html>";
parser = new DOMParser();
dom = parser.parseFromString (html, "text/html");
here's come the error when trying to execute these lines :
Error: Component returned failure code: 0x80004001 (NS_ERROR_NOT_IMPLEMENTED) [nsIDOMParser.parseFromString]
I try to figure out what's going on but the code seems to be right and I searched on the web, i come here with no clues.
have you encounter this failure before ? if yes, where's the bug hiding ?
You should use DomParser function described at JavaScript DOMParser access innerHTML and other properties
I created fiddle for you http://jsfiddle.net/CSAnZ/
/*
* DOMParser HTML extension
* 2012-02-02
*
* By Eli Grey, http://eligrey.com
* Public domain.
* NO WARRANTY EXPRESSED OR IMPLIED. USE AT YOUR OWN RISK.
*/
/*! #source https://gist.github.com/1129031 */
/*global document, DOMParser*/
(function(DOMParser) {
"use strict";
var DOMParser_proto = DOMParser.prototype
, real_parseFromString = DOMParser_proto.parseFromString;
// Firefox/Opera/IE throw errors on unsupported types
try {
// WebKit returns null on unsupported types
if ((new DOMParser).parseFromString("", "text/html")) {
// text/html parsing is natively supported
return;
}
} catch (ex) {}
DOMParser_proto.parseFromString = function(markup, type) {
if (/^\s*text\/html\s*(?:;|$)/i.test(type)) {
var doc = document.implementation.createHTMLDocument("")
, doc_elt = doc.documentElement
, first_elt;
doc_elt.innerHTML = markup;
first_elt = doc_elt.firstElementChild;
if (doc_elt.childElementCount === 1
&& first_elt.localName.toLowerCase() === "html") {
doc.replaceChild(first_elt, doc_elt);
}
return doc;
} else {
return real_parseFromString.apply(this, arguments);
}
};
}(DOMParser));

javascript/ajax help needed

How can I get the document object out of this?
var xmlobject = (new DOMParser()).parseFromString(xmlstring, "text/xml");
In your example, xmlobject is the document object, according to MDC. According to w3schools, on IE, you need to use an IE-specific ActiveX object instead of DOMParser:
var xmlDoc, parser;
if (window.DOMParser) {
parser = new DOMParser();
xmlDoc = parser.parseFromString(text,"text/xml");
}
else { // Internet Explorer
xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
xmlDoc.async="false";
xmlDoc.loadXML(text);
}
You've said that getElementById isn't working. Note that id is not a special attribute (an attribute of type "ID") in XML by default, so even if you're giving elements an id attribute, getElementById won't work (it should return null). Details in the W3C docs for getElementById. I've never done it, but I assume you'd assign an attribute the "ID" type via a DTD.
Without one, though, you can use other traversal mechanisms. For example (live copy):
var xmlDoc, parser, text, things, index, thing;
text =
'<test>' +
'<thing>Thing 1</thing>' +
'<thing>Thing 2</thing>' +
'</test>';
if (window.DOMParser) {
parser = new DOMParser();
xmlDoc = parser.parseFromString(text,"text/xml");
}
else { // Internet Explorer
xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
xmlDoc.async="false";
xmlDoc.loadXML(text);
}
things = xmlDoc.documentElement.getElementsByTagName('thing');
for (index = 0; index < things.length; ++index) {
thing = things.item(index);
display(index + ": " + getText(thing));
}
...where getText is:
function getText(element) {
return textCollector(element, []).join("");
}
function textCollector(element, collector) {
for (node = element.firstChild; node; node = node.nextSibling) {
switch (node.nodeType) {
case 3: // text
case 4: // cdata
collector.push(node.nodeValue);
break;
case 8: // comment
break;
case 1: // element
if (node.tagName == 'SCRIPT') {
break;
}
// FALL THROUGH TO DEFAULT
default:
// Descend
textCollector(node, collector);
break;
}
}
return collector;
}
(getText is a good example of why I use libraries like jQuery, Closure, Prototype, YUI, or any of several others for this stuff. You'd think it would be simple to get the text inside an element, and it is if the element has exactly one text node inside it [as our things do above]. If it doesn't, well, it gets complicated fast.)

Categories

Resources