SAXON Error- How to ignore/skip it? - javascript

Using Saxon HE (net version) wget and batch, I'm trying to transform a page I've downloaded via wget.
Whenever calling the command on the page, I get the following error:
SXXP0003: Error reported by XML parser: The entity name must
immediately follow the '&' in the entity reference.
It is related to a line in a VERY awkward line of javescript. However, I have no control about the page I am wanting to transform, so I can't do anything against this error on that side.
Is there any way to tell Saxon to skip such errors? I would not mind if it would drop the entire tag, since I'm not looking to read any data from the javascript elements.
Big thanks in advance!

As the error message says, it is an error reported by the underlying XML parser that Saxon uses to parse the markup of the document you are providing to it. If that is not well-formed XML then any XML parser will reject it. Saxon offers you the choice to use an HTML tag soup parser like TagSoup instead by calling it with the option -x:org.ccil.cowan.tagsoup.Parser if you put TagSoup from http://home.ccil.org/~cowan/tagsoup/ on the class path.

Related

XML Parsing Error when getting query from perl script

I am new to Perl and I just copied the Cleb answer (https://stackoverflow.com/a/28992992/5553963) (and also made the ajax.pl executable) but it doesn't work and it gives this error:
XML Parsing Error: not well-formed
Location: file:///home/workspace/snmp-agent/query_ui/ajax.pl
Line Number 1, Column 2:
When I run the ajax.pl there is no error but when I get the query via Ajax I faced above error and as you can see from Cleb answer the first line is: "#!/usr/bin/perl".
Can someone please give me a hint how to solve this?
CGI programs need to be executed by a web server (which has to be properly configured to execute CGI programs).
You are opening the file directly in a browser from the filesystem (we can tell because the Location is a file: scheme URL) and it is trying to parse it as XML (possibly you have associated pl files with the XML mime type in your OS somehow). Since it isn't XML, it fails.
Pick a web server, install it, and consult its manual on how to configure it to run CGI programs.

Uncaught SyntaxError: Unexpected token < on each js file that invoked in the index html

I am using mean stack to build a website, when testing, chrome returns the error like:
Uncaught SyntaxError: Unexpected token < angular.js:1.
I don't know what's wrong and what should i do.
Here is the directory of my app:
E-study
-client
-app
-components
-all the libraries are here.
-index.html
-controllers.js
-node_modules
-server
-config
-server.js
And I run the server in E-study like :node server/config/server.js
The scripts in the index.html is<script src="client/components/angular/angular.js"></script>
Just don't know why all the js files are changed to index.html when open in the browser.
open up those library files and see if there are some extra symbol < probably you will find it in the beginning.. if still not able to fix... simply download the fresh library (if those are libraries) from the internet and try again.
make sure that you don't put <script> </script> tags in the included .js files. that is an incorrect syntax for script files.
also make sure you are providing the correct path??? providing incorrect path can return a builtin customized error page. which is html. may be that is the source of error because returned page is HTML which is most likely going to start with a < symbol. and offcourse not a js file.
to ensure that the incorrect path is the issue just copy the path you included in the code and and paste into your favorite browsers url bar and hit enter. if you are not getting the script in plain text.. then it means you are not providing the correct path.
and if it is return a customized error page like .. 404 not found then probably it is returning the html and this is where the error is coming from.
In external js files, which you refer in some other files, don't use <script>..</script> tag.
For express server try to set the static path to entire project folder.It worked for me
app.use(express.static(__dirname ));
Could be a ReCaptcha bot checker type thing intercepting requests for JS files and serving up an HTML page instead, which is invalid HTML so it throws the < is invalid message error.
I know siteground specifically has issues with this intercepting CDN routed traffic.
Check with the host to remove this issue, in this case it's their anti-bot security setup. This has remedied these issues with Siteground for me.

How to change the url being called by javascript without access to the javascript

I have an jQuery('#Frame').animate360 script on page and it calls a file called Profile.xml to get its settings etc.
Problem is Profile.xml is on Azure Blob and is in uppercase (PROFILE.XML). This means a 404 file not found error.
I can't change filename(Profile.xml) on azure.
The piece of JS that calls the Profile.xml seems to be encrypted in a library (HTML5Loader.js) i.e. The text 'Profile.xml' does not appear in any file and can only be found with chrome debugger in an unnamed file.
My instinct was to use something like Application_BeginRequest etc to catch a 'call' to https://storageblabla.blob.core.windows.net/uploads/ALPHA/3DIMAGES/1.010659/Profile.xml
and change it to ...../PROFILE.XML
but it's to late at that stage. It already knows that its a 404.
There must be some access, with code, to a point where a remote url is being called that can be intercepted.
Rekon its a one line fix but I just can't find the right term to search on.

DOM parsing in JavaScript

Some background:
I'm developing a web based mobile application using JavaScript. HTML rendering is Safari based. Cross domain policy is disabled, so I can make calls to other domains using XmlHttpRequests. The idea is to parse external HTML and get text content of specific element.
In the past I was parsing the text line by line, finding the line I need. Then get the content of the tag which is a substring of that line. This is very troublesome and requires a lot of maintenance each time the target html changes.
So now I want to parse the html text into DOM and run css or xpath queries on it.
It works well:
$('<div></div>').append(htmlBody).find('#theElementToFind').text()
The only problem is that when I use the browser to load html text into DOM element, it will try to load all external resources (images, js files, etc.). Although it isn't causing any serious problem, I would like to avoid that.
Now the question:
How can I parse html text to DOM without the browser loading external resources, or run js scripts ?
Some ideas I've been thinking about:
creating new document object using createDocument call (document.implementation.createDocument()), but I'm not sure it will skip the loading of external resources.
use third party DOM parser in JS - the only one I've tried was very bad with handling errors
use iframe to create new document, so that external resources with relative path will not throw an error in console
It seems that the following piece of code works great:
var doc = document.implementation.createHTMLDocument("");
doc.documentElement.innerHTML = htmlBody;
var text = $(doc).find('#theElementToFind').text();
external resources aren't loaded, scripts aren't being evaluated.
Found it here:
https://stackoverflow.com/a/9251106/95624
Origin:
https://developer.mozilla.org/en/DOMParser#DOMParser_HTML_extension_for_other_browsers
You can construct jQuery object of any html string, without appending it to the DOM:
$(htmlBody).find('#theElementToFind').text();

Parsing xml (xforms) document with javascript `document.getElementsByTagName` not working in phonegap android

I am having some trouble parsing xform xml with javascript.
The structure of the root of the xml is:
<?xml version="1.0"?>
<h:html xmlns="http://www.w3.org/2002/xforms" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:ex="http://www.w3.org/2001/xml-events" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jr="http://openrosa.org/javarosa">
<h:head>
<h:title>Phonegap Survey</h:title>
</h:head>
<h:body>A ton more tags here</h:body>
</h:html>
I am using firefox (v.12) to develop and trying to deploy to phonegap/cordova on android. I am getting the data from an odk server using jquery ajax. I am specifiying dataType: 'xml' and it is responding with a document object as expected. In firefox I have written an app that presents data from this xml to the user. I am using built in javascript DOM functions like:
var title = surveyXML.getElementsByTagName('h:title')[0].firstChild.data
var body = surveyXML.getElementsByTagName('h:body')[0];
var text = surveyXML.getElementsByTagName('text')
I have written the full app (and it works fine) using these methods. But now that I have copied my files over to phonegap (running cordovo 1.7.0, on android emulator 2.3) on eclipse none of the DOM calls are returning elements!! I get something like:
05-31 13:21:28.686: D/CordovaLog(841): file:///android_asset/www/formcontroller.js:
line 159 : TypeError: Result of expression 'body' [undefined] is not an object.
For all the calls. I have verified that the device has the correct xml (it's not empty or anything)
So having to get this done I tried to use jQuery to navigate the document object hoping it knew something I didn't. Using calls like:
$(surveyXML).find('title');
$(surveyXML).find('h:title')
do not work at all. But tags without the h: prefix work fine, if I were to search for $(surveyXML).find('text') it returns all text elements as expected.
Seeing as the root elements are <html> I tried to specify dataType: html (just to try it even though the document is clearly marked as <?xml version="1.0"?>) and jquery is unable to parse it, as expected.
So I am wondering: How can I parse this XML cross platform so that it can work both in a browser and phonegap. And, supposing that it can't work in both using the same DOM manipulation functions, how can I at least make it work in phonegap??
As always any help is appreciated. Thank you.
EDIT: For right Now I am just manually referencing the exact locations of the elements like to get to body I am going xml.firstChild.children[1]. Because I am underdeadline. I def feel like there should be a way to use getElementsByTagName still. ty.
This has been unanswered for 2 weeks so I am just posting what I did.
For right Now I am just manually referencing the exact locations of the elements like to get to body I am going xml.firstChild.children[1].
So all I did was not search for tags with the namespace and just reference them directly.

Categories

Resources