how to parse web page contents in android with js using jsoup - javascript

How can i parse HTML page in Android with js results? The main problem is that if i simply use Jsoup.connect() method the Document object doesn't contain js results, because js needs some time for running. Is it possible to delay connection?

As already mentioned in the comments, JSOUP does not run any JavaScript. For that you would need a JavaScript Interpreter.
Since you mentioned that the page you are wanting to read takes some time to render, it seems clear that you actually need to run the JavaScript to render the DOM.
However, if you look into the source code of the page you may be able to figure out how the JavaScript actually renders the page. I see two possibilities:
1) The JavaScript really just runs to dynamically render the page with information that is already loaded with the initial access. That frequently happens for modern websites that are able to send along all relevant data with the first access (aka isomorphic rendering). Here you may get the wanted information for data that is usually available in the website as JSON objects. You can extract the JSON and then parse this with a JSON parser.
2) The JavaScript actually loads some data asynchronously. IN that case you can identify these http requests and use JSOUP to get this data. Usually such data is in JSON format, so also in this case it may make sense to use A JSON parser to read out the relevant parts.

Related

Duplicate an HTML file (and its content) with a different name in Javascript

I have an HTML file with some Javascript and css applied on.
I would like to duplicate that file, make like file1.html, file2.html, file3.html,...
All of that using Javascript, Jquery or something like that !
The idea is to create a different page (from that kind of template) that will be printed afterwards with different data in it (from a XML file).
I hope it is possible !
Feel free to ask more precision if you want !
Thank you all by advance
Note: I do not want to copy the content only but the entire file.
Edit: I Know I should use server-side language, I just don't have the option ):
There are a couple ways you could go about implementing something similar to what you are describing. Which implementation you should use would depend on exactly what your goals are.
First of all, I would recommend some sort of template system such as VueJS, AngularJS or React. However, given that you say you don't have the option of using a server side language, I suspect you won't have the option to implement one of these systems.
My next suggestion, would be to build your own 'templating system'. A simple implementation that may suit your needs could be something mirroring the following:
In your primary file (root file) which you want to route or copy the other files through to, you could use JS to include the correct HTML files. For example, you could have JS conditionally load a file depending on certain circumstances by putting something like the following after a conditional statement:
Note that while doing this could optimize your server's data usage (as it would only serve required files and not everything all the time), it would also probably increase loading times. Your site would need to wait for the additional HTTP request to come through and for whatever requested content to load & render on the client. While this sounds very slow it has the potential of not being that bad if you don't have too many discrete requests, and none of your code is unusually large or computationally expensive.
If using vanilla JS, the following snippet will illustrate the above:
In a script that comes loaded with your routing file:
function read(text) {
var xhr=new XMLHttpRequest;
xhr.open('GET',text);
xhr.onload=show;
xhr.send();
}
function show() {
var text = this.response;
document.body.innerHTML = text;//you can replace document.body with whatever element you want to wrap your imported HTML
}
read(path/to/file/on/server);
Note a couple of things about the above code. If you are testing on your computer (ie opening your html file on a browser, with a path like file://__) without a local server, you will get some sort of cross origin request error when trying to make an XML request. To bypass this error, either test your code on an actual server (not ideal constantly pushing code, I know) or, preferably, set up a local testing server. If this is something you would want to explore, its not that difficult to do, let me know and I'd be happy to walk you through the process.
Alternately, you could implement the above loading system with jQuery and the .load() function. http://api.jquery.com/load/
If none of the above solutions work for you, let me know more specifically what it is that you need, and I'll be happy to give a more useful/ relevant answer!

Loading JSON data as external script instead of AJAX

I have been shown a trick - instead of having the data in datafile.json and load it with ajax, the data is encapsulated in a single object in a datafile.js such as var Data = { //all data goes here }. Then the datafile.js is just loaded as external script in the html head
It works well just as if the objects were loaded with ajax, are there any drawbacks to this?
JSON is JavaScript. (Back in the day, the idea was that you could simply eval it ...) Therefore, the file that you speak of is simply ... "a JavaScript assignment-statement, stored in a file."
The only potential issue with storing this as a separate file might be "a timing hole." The source-code must be separately retrieved. I'm not sure if the browser would wait to do that, so it might be possible for other JavaScript code to execute that does not see var Data because that block of code hasn't been retrieved and executed yet.
When you have "lots of invariant fixed data," I customarily put everything including the data into one ("yeah, it's big ...") JS file, so that I know "it's all there" before any of it tries to be executed. Yes, there are definite advantages to simply including fixed data directly into your source-file, as you're effectively doing here, but I'm not sure I see an advantage (and, I might see a hole ...) in using several JS files.

AJAX Script Loading

I am currently writing a program that uses AJAX to load a form for editing objects on a website. I have found a similar question at Loading script tags via AJAX, but it doesn't really satisfy the needs of the program.
The ajax returned is a pre-built set of elements in a form, and when certain areas are called, say, a TinyMCE textarea (which it is), it returns a set of script tags built into the text.
So my question is, is it possible to run through the script tags that have been put in the div and run them?
Plus, I want to avoid using jQuery as it could be running on any number of platforms.
Yes, you can add the incoming html and scripts to the dom, then search the dom for any script tags. You would then eval the scripts and could ignore any jQuery script tags if you wish.
However:
This sort of solution tends to be quite brittle.
It would be much better and more stable for you to modify the Ajax payload into separate html and javascript scripts. That way your Ajax handler would be able to handle them directly without trying to separate them.
Added
Re: how to send back the html and javascript parts: you can either make separate Ajax calls, or return an JSON object that includes both parts. Eg:
{"js": "<the js part of the response>",
"html": "<the html part of the respons>"}
Use a json library on your host system to take care of the issue of escaping any quotes or other json special characters in either the js or html values.
Returning both the html and js at once saves an Ajax call (which can be significant) and will usually simplify your code quite a bit vs two calls.
I use this technique in production and it works well.
Do you mean you return a js script from the ajax and want to run it??If so, you can use the eval function.

Importing external json files to a javascript script client-side without external libraries

I'm a bit new to javascript. Is there a way to do what I am describing in the title completely client-side and without any external libraries? Or is using jQuery the best/only way to go?
You can import a json file from a server via AJAX and them simply eval it. You don't need a library for that but using one makes it a lot easier. Of course just evaling a json string is not very secure as it can contain arbitrary text so all libraries parse it to see if it's well formed etc.
EDIT:
If you want to learn about AJAX you can start with this tutorial from w3schools. Ajax stands for Asynchronous Javascript And XML and it allows you to send a request to the server without reloading the whole page. In your case you will not be using Xml but JSON. Anyway, the tutorial explains the whole idea.
Yes there is. You can use the "document.write" to add scripts to the DOM at runtime:
in your case:
document.write('<script ...></script>');
Basically you are adding the script tag to the dom that will request the new file.
However there is something else to consider, although the script will be downloaded, you will need to have a variable assignment in it in order to use it in your page:
var x = { //json object };

Including Information in HTML for Javascript to Consume?

I'm am building a web app with app engine (java) and GWT, although I think my problem is moreso a general javascript question.
In my application, I want to include a side-menu which is generated from data in my database. This obviously needs to be done dynamically from a servlet.
Currently, I am sending a blank menu container, then making an ajax call to get the information i need to populate the menu. I would rather just send the menu information along in the original request, so I do not need to waste time making a second request. I used this initial approach because it seemed simpler: I just wrote a gwt rpc service that grabbed the data i needed.
My question is, can I instruct a javascript library like gwt to look for its information in the current web page? Do I have to hide this information in my original HTML response, maybe in a hidden div or something?
If the data that you'd like to embed is restricted to menu items, why not directly generate lightweight HTML out of simple <ol> and <li> elements? You can still keep HTML out of your Java code by using a template engine. The menu markup could just be styled with CSS or if you need something fancier than mere <ol> and <li> elements, you can massage the DOM with JavaScript once the page loads (read: progressive enhancement).
If you're looking for a more generic solution, beyond the menu case, then you could embed a JSON block in your page, to be consumed when the page loads for the dynamic generation of your menu.
Or, you could look into using a microformat that is suitable for menu data.
You can include a script block in the original response defining the data and then use an onload event (or similar) to create the menu based on that data; that's very similar to what you're doing now, but without the extra trip to the server. I'm assuming there that the data to construct the menu is transformed in some way by JavaScript on the client; otherwise, just include the menu markup directly.
GWT has something called JSNI (Javascript Native Interface) that can interface with other non-GWT Javascript. So, you could in your HTML page container have a containing the generated menu items as a Javascript object. Then, in your GWT code, you have a JSNI call to fetch this data and put it in the right place in your UI/DOM with GWT methods.
I asked a similar question a few weeks ago about how to store data safely inside HTML tags. It also contains a few links to other questions. Here
There are in general 2 options:
Store the data in some xml tags somewhere in the html code and later get the information from there by traversing through the DOM. (you can hide them with css)
Store the data as JSON data in a script tag. (There en- and decoders for nearly every language)
var data = { "foo" : "bar", "my_array":[] };

Categories

Resources