Microsoft.XMLDOM js question - javascript

Is it possible to check if loaded with xmlDoc.loadXML(xmlData); xml string is invalid? For example if there is missed closing bracket or a tag.

If you pass a string to loadXML that isn't a well-formed XML document, the document object will be empty (no childNodes) and xmlDoc.parseError.errorCode will be set to something other than 0. xmlDoc.parseError.reason will give you a user-readable error message.
If you want to test a snippet and not a full document, wrap it in <x>...</x> tags so that the parser will only see one root element.
(There are a few reasons that MSXML might fail to parse a document other than it being non-well-formed. For example an external DTD subset or entity might not be network-reachable, or the DTD might use features MSXML doesn't support. You can't use MSXML to parse XHTML documents with their DTD for this reason. But if DTD-cruft isn't involved, a parser failure means the input wasn't well-formed.)

All info about parse errors is hidden in "xmlDoc.parseError"

Related

Will exception occur when assigning innerHTML?

I read through https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML, which claims SyntaxError may happen.
dom = document.createElement('div')
// output: <div></div>
dom.innerHTML = '[try.various.strings.here]'
// output: "[try.various.strings.here]"
dom
// check final DOM
I have tried to replace test string with <div> (partial), <div (broken) and <div></p> (unmatched). Never have I met exception. I wonder whether I need to add pre-check or protection (try..catch) for it.
TL;DR
The specifications say it can if the browser wants it to, but I don't think any major browser does so for HTML (well, I wouldn't make any guarantees about IE9-IE11 and certain element types, actually). (Whereas they do for XML.) From the definition of an HTML parser in the HTML 5.2 specification:
This specification defines the parsing rules for HTML documents, whether they are syntactically correct or not. Certain points in the parsing algorithm are said to be parse errors. The error handling for parse errors is well-defined (that’s the processing rules described throughout this specification), but user agents, while parsing an HTML document, may abort the parser at the first parse error that they encounter for which they do not wish to apply the rules described in this specification.
(my emphasis)
Details
The DOM Parsing and Serialization spec defines innerHTML, and says:
On setting, these steps must be run:
Let fragment be the result of invoking the fragment parsing algorithm with the new value as markup, and the context object as the context element.
If the context object is a template element, then let context object be the template's template contents (a DocumentFragment).
Replace all with fragment within the context object.
If we follow the fragment parsing algorithm link, we get to:
The following steps form the fragment parsing algorithm, whose arguments are a markup string and a context element:
If the context element's node document is an HTML document: let algorithm be the HTML fragment parsing algorithm.
If the context element's node document is an XML document: let algorithm be the XML fragment parsing algorithm.
Let new children be the result of invoking algorithm with markup as the input, and context element as the context element.
Let fragment be a new DocumentFragment whose node document is context element's node document.
Append each Node in new children to fragment (in tree order).
Return the value of fragment.
If we follow the HTML parsing algorithm link above, we get steps creating and using an HTML parser. If we follow the link to the definition of an HTML parser, we get the first link in TL;DR above and the text quoted above.
Errors can occur.
Basically, it depends on the content type and how it is served by the browser.
In traditional cases it will not throw an error, but in some edge cases such as Uncaught SyntaxError: Failed to set the 'innerHTML' property on 'Element': The provided markup is invalid XML or as can be seen in this bug report, the error will be triggered.
Better wrap it in a try..catch statement.

Does UIWebView allow XML DOM Parser?

I am not quite sure, and haven't been able to find anything.
Using stringByEvaluatingJavascriptFromString:, I have been able to manipulate javascript in my page from objective-c. However, I now want to populate form fields by passing raw xml data from the iOS UIWebView into the html file (which is local to the app), and then using the parsed data.
Looking over the W3C document, it seems I need to do something like:
parser = new DOMParser();
xmlString = parser.parseFromString(txt, "text/xml");
Which should return a DOM object from the XML (which here is represented by the string txt). I should then be able to access the properties of this DOM object from
xmlString.getElementsByTagName("from")[0].childNodes[0].nodeValue;
Assuming we have an XML node such as:
<from>Sender</from>
However, this doesn't seem to work. Setting that nodeValue into a string and returning it returns nil. Likewise, form fields are not populated.
My question, then, is whether the embedded browser in an app can utilize the DOM Parser - and if it can, what syntax I might use to access values from it?
Solved, unfortunately very fast. It seems that when setting form fields you need to specify VALUE instead of innerhtml.
so I did:
document.getElementById("FIELDID").value =
xmlString.getElementsByTagName("from")[0].childNodes[0].nodeValue;
Also, I had the parameters passed to the DOM parser wrong - it is "text/xml" for the second parameter.

Parsing XML in a Web Worker

I have been using a DOMParser object to parse a text string to an XML tree. However it is not available in the context of a Web Worker (and neither is, of course, document.ELEMENT_NODE or the various other constants that would be needed). Is there any other way to do that?
Please note that I do not want to manipulate the DOM of the current page. The XML file won't contain HTML elements or anything of the sort. In fact, I do not want to touch the document object at all. I simply want to provide a text string like the following:
<car color="blue"><driver/></car>
...and get back a suitable tree structure and a way to traverse it. I also do not care about schema validation or anything fancy. I know about XML for <SCRIPT>, which many may find useful (hence I'm linking to it here), however its licensing is not really suitable for me. I'm not sure if jQuery includes an XML parser (I'm fairly new to this stuff), but even if it does (and it is usable inside a Worker), I would not include an extra ~50K lines of code just for this function.
I suppose I could write a simple XML parser in JavaScript, I'm just wondering if I'm missing a quicker option.
according to the spec
The DOM APIs (Node objects, Document objects, etc) are not available to workers in this version of this specification.
I guess thats why DOMParser is not availlable, but I don't really understand why that decision was made. (fetching and processing an XML document in a WebWorker does not seems unreasonnable)
but you can import other tools available: a "Cross Platform XML Parsing in JavaScript"
At this point I like to share my parser: https://github.com/tobiasnickel/tXml
with its tXml() method you can parse a string into an object and it takes only 0.5kb minified + gzipped

Using exslt extentions be used in javascript xpaths

I would like to use javascript XPaths in a web app using exslt extensions, but I can't figure out how to do this.
Pretend I've got an html doc with some divs in it. I want to run this:
namespaces={'regexp':'http://exslt.org/regular-expressions'};
result = document.evaluate(
"//div[regexp:test(.,'$')]",
document,
function(ns){
return namespaces.hasOwnProperty(ns) ? namespaces[ns] : null;
},
XPathResult.ANY_TYPE,
null);
Only that results in an invalid XPath expression exception in evaluate. I'm using chrome.
Is there anything else I need to do to make this stuff work? I see on exslt.org that there are implementations for javascript, but how do I make sure those are available? Do I need to insert my javascript into a namespaced script element in the dom or something insane?
UPDATE
If this isn't possible directly using browser dom + javascript and xpath, would it be possible to write XSLT using exslt extensions in the browser to simulate document.evaluate (returning a list of elements that match the xpath)?
I don't think the default browser XPath implementation supports EXSLT. The javascript support mentioned on the EXSLT page is likely about how you can provide your own implementation of the exslt function using in-browser.javascript. Here's one example I was able to find very quickly.
In Firefox, for example, you can have Saxon-B as an extension to run XSLT2.0 and Saxon-B has built-in support for exslt (unlike Saxon-HE), though you will likely be better off just using XSLT/XPath 2.0 features. Here's the regular expression syntax, for example. That said, however, relying on a Mozilla Saxon-B extension isn't something that will help you with Chrome or other browsers for that matter.
With that said I don't think you can find a cross-browser solution to use EXSLT extensions in your XPath. The conformance section of the DOM Level 3 XPath calls for XPath 1.0 support and doesn't mention EXSLT. The INVALID_EXPRESSION_ERR is said to be thrown:
if the expression has a syntax error or otherwise is not a legal expression according to the rules of the specific XPathEvaluator or contains specialized extension functions or variables not supported by this implementation.
Finally, here's an open bugzilla ticket for Firefox to open up EXSLT support for their DOM Level 3 XPath implementation. It seems to be sitting there in NEW status since 2007. The ticket says that:
Currently Mozilla gives an exception "The expression is not a legal expression." even if a namespace resolver correctly resolving the EXSLT prefixes to the corresponding URLs is passed in. Here's the test case.
--
If you don't mind me asking, what exactly you wanted to use the regex for? Maybe we can help you get away with a combination of standard XPath string functions?
--
UPDATE You can build an XPath runner via XSLT (like you're asking in the update to your question) but it won't return the nodes from the source document, it will return new nodes that look exactly the same. XSLT produces a new result tree document and I don't think there's a way to let it return references to the original nodes.
As far as I can tell, Mozilla (and Chrome) both support XSLT not only for XML documents loaded from external sources, but also for DOM elements from the document being displayed. The XSLTProcessor documentation mentions how tranformToFragment(), for example, will only produce HTML DOM objects if the owner document is itself an HTMLDocument, or if the output method of the stylesheet is HTML.
Here's a simple XPath Runner that I built testing out your ides:
1) First you would need an XSLT template to work with.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:regexp="http://exslt.org/regular-expressions"
extension-element-prefixes="regexp">
<xsl:template match="/">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
I started building it in the JavaScript using the document.implementation.createDocument APi but figured it would be easier to just load it. FF still supports document.load while Chrome only lets you load stuff using XHR. You would need to start your Chrome with --allow-file-access-from-files if you want to load files with XHR from your local disk.
2) Once we have the template loaded we would need to modify the value of the select attribute of the xsl:copy-of instruction to run the XPath we need:
function runXPath(xpath) {
var processor = new XSLTProcessor();
var xsltns = 'http://www.w3.org/1999/XSL/Transform';
var xmlhttp = new window.XMLHttpRequest();
xmlhttp.open("GET", "xpathrunner.xslt", false);
xmlhttp.send(null);
var transform = xmlhttp.responseXML.documentElement;
var copyof = transform.getElementsByTagNameNS(xsltns, 'copy-of')[0];
copyof.setAttribute('select', xpath);
processor.importStylesheet(transform);
var body = document.getElementById('body'); // I gave my <body> an id attribute
return processor.transformToFragment(body, document);
}
You can now run it with something like:
var nodes = runXPath('//div[#id]');
console.log(nodes.hasChildNodes());
if (nodes.firstChild) {
console.log(nodes.firstChild.localName);
}
It works great for "regular" XPath like that //div[#id] (and fails to find //div[#not-there]) but I just can't get it to run the regexp:test extension function. With the //div[regexp:test(string(#id), "a")] it doesn't error out, just returns empty set.
Mozilla documentation suggests their XSLT processor support EXSLT. I would imagine they are all using libxml/libxslt behind the scenes anyway. That said, I couldn't get it to work in Mozilla either.
Hope it helps.
Any chance you can get away with jQuery regexp? not likely to be helpful for your XPath builder utility but still a way to run regexp on HTML nodes.

How can I access with JS nonstandard tags in a HTML page?

I wish to develop some kind of external API which will include users putting some nonstandard tags on their pages (which I will then replace with the correct HTML). For example:
<body>
...
...
<LMS:comments></LMS:comments>
...
...
...
</body>
Hoe can I target and replace the <LMS:comments></LMS:comments> part?
Just use getElementsByTagName as usual to get the element.
You cannot change the tag name, you will have to replace the entire element.
See http://jsfiddle.net/2vcjm/
You want to use regular expressions.
Take a look at this page to get started:
http://www.regular-expressions.info/brackets.html
That whole website is a great reference.
If your document is valid XHTML (as opposed to just HTML), you can use XSLT to parse it.
There are JavaScript XSLT libraries, such as Google's AJAXSLT.
Barring that, you will need to extract the relevant part of the DOM, take the value of "innerHTML" for the contents, and replace the custom tags using JavaScript's regex and replace() function.
However, this sort of processing is usually done server-side, by passing your custom "HTML+" through some sort of templating/enrichment engine (which will also use XSLT or HTML parsers or worst case regexes).

Categories

Resources