First of all: I don't know anything about AJAX or similar. Please keep that in mind.
Question is above. I am trying to parse information from a website (http://www.sportstats.com/soccer/germany/bundesliga/). More specific: I want to parse the information which is held by the <table id="nextMatches_0">. I found out that this is not possible with the Library I used until now: Jsoup, because the website gets the information from outside. Until now I think that it's AJAX which is fillig in the table.
Though I didn't find a way to parse the information I want, it would be great to just make the same thing the website does and send a request to the server. But I don't have a clue how I could do this, which is why I am asking for help.
Big thanks already :)
It sounds like you're trying to reverse engineer how some data gets into a web page so you can figure out how you can get that same data from your Javas app. So far, you've concluded that the data itself is not in the HTML so your guess is that some script in the web page is putting the data into the page via an Ajax call.
First off, to confirm whether that is the case, you can do two things:
Bring up that page in the web browser and do View/Source. Examine the original HTML of the page and see if the content you want is in there. If it is, then you can just do a direct request from the server to get that page, parse the HTML and then grab your content. If the content you want is not in the original HTML of the page, then go to step 2.
Open the Chrome debugger. Switch to the network tab. Then, load your page into the browser. Examine the requests in the network tab and find all the request that list their "type" as "xhr". These will be the ajax requests from that page. I see at least 3 xhr requests in that page. Then examine each xhr request to see if it is the one requesting and receiving the specific data you are interested in. If you find it, then you can study how the request is formed to see if you can send that same request to the same source from your Java app.
If, in the first step you find the data is actually in the HTML, then you can just request that link from your Java, get the HTML, put it into an HTML parser and then find the content you want in the parsed page.
If, in the second step, you conclude there is an Ajax call that is fetching the data you want, then you need to see how the request is formed and what host it is sent to and copy that type of request from your Java app to see if you can obtain the same data. I see that page contains a couple Ajax calls that are fetching JSON. If one of those is what you want, then you would parse the JSON in your Java app so you could then access the data from your Java code.
Oh, and I'd suggest reading the licensing information on the site to see what you are actually allowed to do with someone else's content or Ajax calls.
Related
I have a web page A created by a PHP script which would need to use a service only available on another page B – and for various reasons, A and B can't be merged. In this particular instance, page A is a non-WordPress page and page B is WordPress-generated. And the service in question is sending emails in a specific format which is supplied by a WP plugin.
My idea is to use page A to generate the email content and then send that content to page B which then, aided by the plugin, sends the email in the appropriate format and transfers control back to page A. This would be perfectly doable – but what I would like in addition is for page B never to be displayed. The visitor should have the impression that they are dealing only with page A all the time. Can that be done and if so, how?
I do not intend this to be a WordPress question (although maybe it is), rather more generally about using another page's script in passing without displaying that other page.
If you do have source access, it would be most reliable to use the addon directly... But if you cannot, the second easiest would be to use curl to mimic the form post on page B. This would happen server side so the user wouldn't see it happening.
To figure out what you need to send in your POST request, open your browser's developer tools and watch the network tab when you send the form manually, take the URL, and all the post data. Then you'll be able to mimic it.
You may proxy https://SITEA.com/siteB/whatever to http://SITEB.com/whatever - or the other way around... I didn't fully understand the process :P
In case you just want the siteB service call, you may also send the requests via curl or a HTTP library of your choice - which might be better as you will have to get a nonce first and stuff like that.
I just want to know how can I post data without refreshing the page for example, now Facebook when you post a comment it will be posted and shown to the people without refreshing the page. I do know how to insert data in MySQL without refreshing the page with AJAX but the question is: how to insert the data and get it at the same time without refreshing the page.
Thank You
OSDM's answer might seem accomplish the behavior you want but it isn't the one you're asking about. His answer would only provide updates when a user upload's something and not as they are created in the system (uploaded).
There are 2 different ways you can accomplish the fetching of new information in the server: AJAX and WebSockets.
AJAX - AJAX stands for Asynchronous Javascript and XML. It allows you to fetch content with a particular server behind the scene and then you can insert the newly fetched data into your page to display it to the user. This however has to be manually triggered and therefore doesn't really happen in real time. You could trigger the fetching of data either manually (e.g. with the press of a button), or on a timer (e.g. every 5 seconds, 10 minutes, etc). It is important to note that it is hard for the server to know what information the page is currently displaying and therefore each AJAX call usually request all of the information to be displayed and re-render the page (deletes the current content and inserts the newly fetched one which also includes content that was already being displayed).
WebSockets - WebSockets can be thought of as an 'upgraded' HTTP connection. The client and the server establish a connection and are free to send data in either direction. You can set up web sockets between your server and the website (client) such that whenever new content is inserted into the MySQL database the server relays the new content to the client. Much like AJAX, you would interpret the new information and add it to the page. The upside of using web sockets is that information is being fed to you in-real time as the server receives it. This means that you only need to fetch data in bulk when you first load the site and updates are pushed to you as they occur. You do not need to rely on a timer or manual input to fetch data as you're being fed data and not fetching it.
Facebook, for example, doesn't rely on a timer or you fetching new data (although that certainly happens if you refresh the page) but each client is listening to the server for new information through web sockets.
That is all javascript (or jquery). You allready know how to send the data to your server. Now all you need to do is modify the html with javascript.
For example(jquery):
$("#submit").click(function(){
$("#comments").append("<div class=newcomment>"+$("#textbox").val()+"</div>");
$.POST('upload.php',{comments:$("#textbox").val()});
});
Now the comment is send to the upload.php and the comment is added to the comment section of your page.
If you need data from the server also to be included, just add some javascript to upload.php file and do something like this: $("#getdatefromserver").load('upload.php',{comments:$("#textbox").val()}); Now the javascript from upload.php will run in the page.
And no page refresh is done.
I have been tasked with reading information from a table on a 3rd party page. The website will have multiple pages and thus will have to have the bookmarklet run on it once per page. I currently have the bookmarketlet pulling the data, and putting it into a pipe delimited array. I would like to send this pipe delimited array to a server side function that, in case of injection, sanitizes the data and then checks if it exists in a temp table, if it doesn't exist the the table, then insert.
After all of that is said and done, the script will send information about what happened during the server side scripting and the results will be presented to the user on the web page where the bookmarklet was executed.
I have looked into JSON, AJAX, and JavaScript as possible solutions to submit and work with data(Which I quickly detoured away from).
I am limited to using Microsoft solutions because of the environment I am working in.
So my question is, what would be best and how would I go about this? I have been unable to understand or execute any of these solutions.
What would be be most efficient way to post data to a database and get a response in a Microsoft environment using a bookmarklet on a 3rd party page, and get a response that the user sees?
I'm working on a new web app where a large amount of content (text, images, meta-data) is requested via an Ajax request.
No auth or login required for a user to access this.
My concern is that you could easily lookup the data source URL and hit it directly outside the app to get large data. In some ways, if you can do this you could probably scrape the static HTML pages elsewhere that also have this content.
Are there any suggestions on methods to obfuscate, hide, or otherwise make it very difficult to access the data directly?
Example: web app HTML page contains a key that is republished every 30 min. On the server side the data is obfuscated based on this key. In order to get the data outside the app you'd need to figure out the data source but also the extra step of scraping the page for a key every 30 min.
I realize there is no 100% way to stop someone, but I'm talking more about deterrence.
Use sessions in your webapp. Make a note (e.g. database entry or some other mechanism which your server-side code can access) when a valid request for the first page is received and include code in the second page to exclude the data when processing a request without a corresponding session entry.
Obviously the specifics on how to do this will vary between languages, but most robust web platforms will support sessions, largely for this type of reason.
If you are wanting to display real-time data and are concerned about scrapers...if this is a big enough concern, then I suggest doing it with flash instead of JS (AJAX). Have the data display withing a flash object. Flash can make real-time send/receive requests to the server just like AJAX. But the benefit of Flash is that the whole stage, data, code, etc.. are within a flash object, which cannot be scraped. Flash object makes the request, you output the stuff as a crypted string of code. Decrypt it within flash and display from there.
"Are there any suggestions on methods to obfuscate, hide, or otherwise make it very difficult to access the data directly?"
Answers your own question because if the data is worth getting it will be obtained because you are obfuscating is merely making it harder to find.
You could in the server side script processing the ajax and returning the data check where the request came from.
This flickr blog post discusses the thought behind their latest improvements to the people selector autocomplete.
One problem they had to overcome was how to parse and otherwise handle so much data (i.e., all your contacts) client-side. They tried getting XML and JSON via AJAX, but found it too slow. They then had this to say about loading the data via a dynamically generated script tag (with callback function):
JSON and Dynamic Script Tags: Fast but Insecure
Working with the theory that large
string manipulation was the problem
with the last approach, we switched
from using Ajax to instead fetching
the data using a dynamically generated
script tag. This means that the
contact data was never treated as
string, and was instead executed as
soon as it was downloaded, just like
any other JavaScript file. The
difference in performance was
shocking: 89ms to parse 10,000
contacts (a reduction of 3 orders of
magnitude), while the smallest case of
172 contacts only took 6ms. The parse
time per contact actually decreased
the larger the list became. This
approach looked perfect, except for
one thing: in order for this JSON to
be executed, we had to wrap it in a
callback method. Since it’s executable
code, any website in the world could
use the same approach to download a
Flickr member’s contact list. This was
a deal breaker. (emphasis mine)
Could someone please go into the exact security risk here (perhaps with a sample exploit)? How is loading a given file via the "src" attribute in a script tag different from loading that file via an AJAX call?
This is a good question and this exact sort of exploit was once used to steal contact lists from gmail.
Whenever a browser fetches data from a domain, it send across any cookie data that the site has set. This cookie data can then used to authenticate the user, and fetch any specific user data.
For example, when you load a new stackoverflow.com page, your browser sends your cookie data to stackoverflow.com. Stackoverflow uses that data to determine who you are, and shows the appropriate data for you.
The same is true for anything else that you load from a domain, including CSS and Javascript files.
The security vulnerability that Flickr faced was that any website could embed this javascript file hosted on Flickr's servers. Your Flickr cookie data would then be sent over as part of the request (since the javascript was hosted on flickr.com), and Flickr would generate a javascript document containing the sensitive data. The malicious site would then be able to get access to the data that was loaded.
Here is the exploit that was used to steal google contacts, which may make it more clear than my explanation above:
http://blogs.zdnet.com/Google/?p=434
If I was to put an HTML page on my website like this:
<script src="http://www.flickr.com/contacts.js"></script>
<script> // send the contact data to my server with AJAX </script>
Assuming contacts.js uses the session to know which contacts to send, I would now have a copy of your contacts.
However if the contacts are sent via JSON, I can't request them from my HTML page, because it would be a cross-domain AJAX request, which isn't allowed. I can't request the page from my server either, because I wouldn't have your session ID.
In plain english:
Unauthorised computer code (Javascript) running on people's computers is not allowed to get data from anywhere but the site on which it runs - browsers are obliged to enforce this rule.
There is no corresponding restriction on where code can be sourced from, so if you embed data in code any website the user visits can employ the user's credentials to obtain the user's data.