Getting access to a XML from javascript without Node-Without JQuery - javascript

I am trying to developp modifications to a game. The thing is the game is already compiled and the developpers prefer not to decompile the game (for the time beeing). Because of the compilation probably, everytime I try to load JQuery or Node.js whatever version I get the error "that a key already exists in the dictionary". The thing is everything is fine without Node.js or JQuery.js.
What I am trying to achieve is add some features to the game that unfortunately aren't available through the Game's API function call itself. I want to be able to get access to data Inside .xml files used for items/weapons/devices/engines spécifications of items Inside the game. I've tried pretty much all I could find on Stackexchange with what I searched for which was Node and JQuery. Im sorry if you guys think this is a duplicate question. Because it isn't. I can't use Node.js neither can i use JQuery. What else could I try? can someone help me please.
I am a bit new to programing with only 1 year experience in c# and Javascript. Sorry if this feels really noObish to you guys.

What you need is ajax. Modern browsers provide a pretty functional XMLHttpRequest, so you don’t even need a framework anymore.
One important thing to know: you most likely won’t be able to download the xml file using ajax if it’s on a distant server, due to the same-origin policy. You need a reliable access to it. The most convenient solution is to have a copy of the file on a local server such as WAMP, XAMPP, and the like.
I’m not going to write yet another ajax tutorial. Insteal I’ll just provide you with a working minimal HTML page, and point you towards XMLHttpRequest documentation.
<button>Request</button>
<script>
'use strict';
document.querySelector('button').addEventListener('click',
function () {
let req = new XMLHttpRequest();
req.onload = function () {
if (this.responseXML) {
console.log(this.responseXML);
}
else {
console.log(this.responseText);
}
};
req.open('GET', xmlURL); // xmlURL should be the location of the .xml file
req.send();
});
</script>
When you click on the button, the script will request, and then display the server’s response, if any, in your browser console. To open the console, press F12 and select the console tab.
Be aware that the responseXML property will only be populated if the xml sent by the server is strictly well-formed. Xml parsing in JS is somewhat finicky, so you may want to rely on responseText as a fallback.

Related

How to extract information from web page

I'm looking for a way to automatically extract information from a web page, more specifically an online game (https://www.virtualregatta.com/fr/offshore-jeu/).
In the game, I want to extract/copy the position of the boat. With Mozilla and its debug tools, I used the network debugger and I saw an HTML POST request containing what I want.
It seems that we receive as a response a json containing a structure with latitude/longitude.
This is perfect to me, but I want a more user friendly way to get it and I would need advices. Problem is that I'm really a beginner in web development haha.
Is it possible to do this using a script ? (But I suppose it will be complicated to first log into the game)
Is it possible to create a basic Mozilla plugin which would be able to catch the request/response and copy the position to clipboard for me ?
anything else ?
EDIT:
I've tried using a Mozilla plugin, and I achieved to add a listener on POST request. I see the request to get the boat information but I can't find a way to get the json response in JS.
function logURL(responseDetails) {
console.log(responseDetails);
}
browser.webRequest.onResponseStarted.addListener(
logURL,
{urls: ["*://*.virtualregatta.com/getboatinfos"]}
);
In Chrome I use Broomo for this purposes. It helps you to add scripts in web pages, you can console.log the POST you found, and of course you can create functions and Use the webpage Backend.
In firefox I found this one js-injector. But I didn't use it before.
Update:
Now there are a new extension for both browsers:
Chrome: ABC JS-CSS Injector
Firefox: ABC JS-CSS Injector

Best option for crawling a website that loads content via ajax [duplicate]

Please advise how to scrape AJAX pages.
Overview:
All screen scraping first requires manual review of the page you want to extract resources from. When dealing with AJAX you usually just need to analyze a bit more than just simply the HTML.
When dealing with AJAX this just means that the value you want is not in the initial HTML document that you requested, but that javascript will be exectued which asks the server for the extra information you want.
You can therefore usually simply analyze the javascript and see which request the javascript makes and just call this URL instead from the start.
Example:
Take this as an example, assume the page you want to scrape from has the following script:
<script type="text/javascript">
function ajaxFunction()
{
var xmlHttp;
try
{
// Firefox, Opera 8.0+, Safari
xmlHttp=new XMLHttpRequest();
}
catch (e)
{
// Internet Explorer
try
{
xmlHttp=new ActiveXObject("Msxml2.XMLHTTP");
}
catch (e)
{
try
{
xmlHttp=new ActiveXObject("Microsoft.XMLHTTP");
}
catch (e)
{
alert("Your browser does not support AJAX!");
return false;
}
}
}
xmlHttp.onreadystatechange=function()
{
if(xmlHttp.readyState==4)
{
document.myForm.time.value=xmlHttp.responseText;
}
}
xmlHttp.open("GET","time.asp",true);
xmlHttp.send(null);
}
</script>
Then all you need to do is instead do an HTTP request to time.asp of the same server instead. Example from w3schools.
Advanced scraping with C++:
For complex usage, and if you're using C++ you could also consider using the firefox javascript engine SpiderMonkey to execute the javascript on a page.
Advanced scraping with Java:
For complex usage, and if you're using Java you could also consider using the firefox javascript engine for Java Rhino
Advanced scraping with .NET:
For complex usage, and if you're using .Net you could also consider using the Microsoft.vsa assembly. Recently replaced with ICodeCompiler/CodeDOM.
In my opinion the simpliest solution is to use Casperjs, a framework based on the WebKit headless browser phantomjs.
The whole page is loaded, and it's very easy to scrape any ajax-related data.
You can check this basic tutorial to learn Automating & Scraping with PhantomJS and CasperJS
You can also give a look at this example code, on how to scrape google suggests keywords :
/*global casper:true*/
var casper = require('casper').create();
var suggestions = [];
var word = casper.cli.get(0);
if (!word) {
casper.echo('please provide a word').exit(1);
}
casper.start('http://www.google.com/', function() {
this.sendKeys('input[name=q]', word);
});
casper.waitFor(function() {
return this.fetchText('.gsq_a table span').indexOf(word) === 0
}, function() {
suggestions = this.evaluate(function() {
var nodes = document.querySelectorAll('.gsq_a table span');
return [].map.call(nodes, function(node){
return node.textContent;
});
});
});
casper.run(function() {
this.echo(suggestions.join('\n')).exit();
});
If you can get at it, try examining the DOM tree. Selenium does this as a part of testing a page. It also has functions to click buttons and follow links, which may be useful.
The best way to scrape web pages using Ajax or in general pages using Javascript is with a browser itself or a headless browser (a browser without GUI). Currently phantomjs is a well promoted headless browser using WebKit. An alternative that I used with success is HtmlUnit (in Java or .NET via IKVM, which is a simulated browser. Another known alternative is using a web automation tool like Selenium.
I wrote many articles about this subject like web scraping Ajax and Javascript sites and automated browserless OAuth authentication for Twitter. At the end of the first article there are a lot of extra resources that I have been compiling since 2011.
I like PhearJS, but that might be partially because I built it.
That said, it's a service you run in the background that speaks HTTP(S) and renders pages as JSON for you, including any metadata you might need.
Depends on the ajax page. The first part of screen scraping is determining how the page works. Is there some sort of variable you can iterate through to request all the data from the page? Personally I've used Web Scraper Plus for a lot of screen scraping related tasks because it is cheap, not difficult to get started, non-programmers can get it working relatively quickly.
Side Note: Terms of Use is probably somewhere you might want to check before doing this. Depending on the site iterating through everything may raise some flags.
I think Brian R. Bondy's answer is useful when the source code is easy to read. I prefer an easy way using tools like Wireshark or HttpAnalyzer to capture the packet and get the url from the "Host" field and the "GET" field.
For example,I capture a packet like the following:
GET /hqzx/quote.aspx?type=3&market=1&sorttype=3&updown=up&page=1&count=8&time=164330
HTTP/1.1
Accept: */*
Referer: http://quote.hexun.com/stock/default.aspx
Accept-Language: zh-cn
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Host: quote.tool.hexun.com
Connection: Keep-Alive
Then the URL is :
http://quote.tool.hexun.com/hqzx/quote.aspx?type=3&market=1&sorttype=3&updown=up&page=1&count=8&time=164330
As a low cost solution you can also try SWExplorerAutomation (SWEA). The program creates an automation API for any Web application developed with HTML, DHTML or AJAX.
Selenium WebDriver is a good solution: you program a browser and you automate what needs to be done in the browser. Browsers (Chrome, Firefox, etc) provide their own drivers that work with Selenium. Since it works as an automated REAL browser, the pages (including javascript and Ajax) get loaded as they do with a human using that browser.
The downside is that it is slow (since you would most probably like to wait for all images and scripts to load before you do your scraping on that single page).
I have previously linked to MIT's solvent and EnvJS as my answers to scrape off Ajax pages. These projects seem no longer accessible.
Out of sheer necessity, I have invented another way to actually scrape off Ajax pages, and it has worked for tough sites like findthecompany which have methods to find headless javascript engines and show no data.
The technique is to use chrome extensions to do scraping. Chrome extensions are the best place to scrape off Ajax pages because they actually allow us access to javascript modified DOM. The technique is as follows, I will certainly open source the code in sometime. Create a chrome extension ( assuming you know how to create one, and its architecture and capabilities. This is easy to learn and practice as there are lots of samples),
Use content scripts to access the DOM, by using xpath. Pretty much get the entire list or table or dynamically rendered content using xpath into a variable as string HTML Nodes. ( Only content scripts can access DOM but they can't contact a URL using XMLHTTP )
From content script, using message passing, message the entire stripped DOM as string, to a background script. ( Background scripts can talk to URLs but can't touch the DOM ). We use message passing to get these to talk.
You can use various events to loop through web pages and pass each stripped HTML Node content to the background script.
Now use the background script, to talk to an external server (on localhost), a simple one created using Nodejs/python. Just send the entire HTML Nodes as string, to the server, where the server would just persist the content posted to it, into files, with appropriate variables to identify page numbers or URLs.
Now you have scraped AJAX content ( HTML Nodes as string ), but these are partial html nodes. Now you can use your favorite XPATH library to load these into memory and use XPATH to scrape information into Tables or text.
Please comment if you cant understand and I can write it better. ( first attempt ). Also, I am trying to release sample code as soon as possible.

Intercepting window.external.notify call from javascript in objective C

I need to intercept a javascript call to window.external.notify which is returning a security token string that I need to get into my objective C code. The javascript is being executed in a UIWebView. The script there looks like:
<script type="text/javascript">
try{
window.external.notify('{<extremely long dictionary as a JSON string>}');
}
catch(err){
alert("Error ACS50021: window.external.notify is not registered.");
}
</script>
I need to somehow get the JSON dictionary into a string in objective C. I've tried going through the method here: http://www.stevesaxon.me/posts/2011/window-external-notify-in-ios-uiwebview/; but it just seems to interfere with the rendering of the html/javascript page and also not capture the string (I don't have a handy acs identifier to check for to know I'm intercepting the right call). Other similar questions have been asked, but I haven't been able to get any working, many seem extremely hackish, and they are usually quite out of date. I've tried accessing the webview's html content, but the token isn't present there, because it's only sent through window.external.notify, which errors out with the alert that it isn't registered.
I know there's now a native JS->objC bridge in iOS 7, and I only need to support iOS 7+, but I've never used it and I can't seem to get that up and running either. It also appears to be mainly for having your own JS source files as part of your app, and not for communicating with a server through a UIWebView, but if I'm wrong about that, let me know.
Try this method.Here the page is redirected and the token is loaded using receive data method.

There is any way to save image/pdf content into local file system applicable for all browsers [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I got image content by ajax response in an array buffer.appended that array buffer to blob builder.now i want to write these contents to a file.Is there any way to do this..?
I used windows.requestFileSystem it is working fine with chrome but in mozilla not working..
here is my piece of code ,
function retrieveImage(studyUID, seriesUID, instanceUID, sopClassUID,nodeRef) {
window.requestFileSystem = window.requestFileSystem||window.webkitRequestFileSystem;
var xhr = new XMLHttpRequest();
var url="/alfresco/createthumbnail?ticket="+ticket+"&node="+nodeRef;
xhr.open('GET', url, true);
xhr.responseType = 'arraybuffer';
xhr.onload = function(e) {
if(this.status == 200) {
window.requestFileSystem(window.TEMPORARY, 1024*1024, function(fs) {
var fn = '';
if(sopClassUID == '1.2.840.10008.5.1.4.1.1.104.1') {
fn = instanceUID+'.pdf';
} else {
fn = instanceUID+'.jpg';
}
fs.root.getFile(fn, {create:true}, function(fileEntry) {
fileEntry.createWriter(function(writer) {
writer.onwriteend = function(e) {
console.log(fileEntry.fullPath + " created");
}
writer.onerror = function(e) {
console.log(e.toString());
}
var bb;
if(window.BlobBuilder) {
bb = new BlobBuilder();
} else if(window.WebKitBlobBuilder) {
bb = new WebKitBlobBuilder();
}
bb.append(xhr.response);
if(sopClassUID == '1.2.840.10008.5.1.4.1.1.104.1') {
writer.write(bb.getBlob('application/pdf'));
} else {
writer.write(bb.getBlob('image/jpeg'));
}
}, fileErrorHandler);
}, fileErrorHandler);
}, fileErrorHandler);
}
};
xhr.send();
}
The script of a web page is not allowed to write arbitrary files [such as pdfs] to client's storage. And you should be thankful because that means that web pages have a hard time trying to put malware on your machine.
Instead you should redirect the user (or open a new window/tab) to an url where the browser can find the content desired for download, and let it handle it. Use the header to tell the client to download it or displayed as explained here.
If you need to create the downloaded content dynamically, then manage it on the server making it an active page (.php, .jsp, .aspx, etc...). What matters is to have the correct MIME type in the header of the response.
Note: yes, I'm telling you to not use ajax, just window.open. Edit: I guess you may want to present the images in a img, in that case, it is the same, just put the url in the src attribute and have no ajax. Only some javascript to update the attribute if appropiate.
Given your comment I understand that you want:
To cache the image in the client to avoid to have to get it back from the server every time.
To allow the user to customize his experience allowing the use of images from local storage.
Now, again for security reasons, arbirary access to client's files is not allowed. In this case it works both ways: first it prevents the webpage to spy you, and second it prevents you to inject malicious content on the page.
So, for the first part, as far as I know the default is to cache images [this is handled by your browser, and yes, you should clean it from time to time because it tends to grow]. If that is not working for you, I guess you could try use a cache manifest.
About the second, the usual way would be use local storage [which, again is handled by your browser, but is not arbitrary access to client's files] to store/retrieve the url of the image and use it present the image.
The image can still be saved at the server, and yes, it can be cached. To get it to the server - of course - you can always upload it with <input type="file" ... /> and you may need to set enctype to your form. - You already knew that, right? - On the server, store the image on a database (or dedicated folder). Now the page that is resposible to retrieve the image should:
check the request method
check user's permissions (identify it by the session / cookie)
check the parameters of the request (if any)
set the header
output the file got the database (or dedicated folder)
Now, let's say you want to allow this to works as an xcopy deployable application (that just happens to run in a browser). In this case you can always tell the user to store the images he want in a particular location and access them with a relative path.
Or - just because - you are hosting in a place were there is no chance of server-side scripting. So you got to go along only with what javascript gives you. Well, you cannot use relative path here, since it is not local... and if you try to use a local absolute path, the browser will just diss you (I mean, it just ignores it).
So, you can't get the image from a file of the client, and you can't store it on the server...
Well, as you know there is a working draft for that, and I notice it is what you are trying. The problem is that it is a working draft. The initial implementation gets staggered by the security issues, to quote Jonas Sicking:
The main problem with exposing this functionality to the web is security. You wouldn’t want just any website to read or modify your images. We could put up a prompt like we do with the GeoLocation API, given that this API potentially can delete all your pictures from the last 10 years, we probably want something more. This is something we are actively working on. But it’s definitely the case here that security is the hard part here, not implementing the low-level file operations.
So, I guess the answer is "not yet"? In fact, considering Microsoft's approach of only providing the parts of the standardar that reach recommendation status, and also its approach of launching a new version of IE each new version of Windows... then you will have to wait a while to have supports in all the browsers. First wait until FileAPI reaches recommendation status. Then wait until Microsoft updates IE to support it. And if, by any chance (as it seems will happen) it will be only for IE10 (or a future IE11) and those deosn't work on a Windows before Windows 8, you will be waiting a lot of people to upgrade.
If this is your situation, I would suggest to get an API for some image hosting web site, and use that instead [That will probably not be free (or not be private), so you could just change your web hosting already].
you cant have a common way to store the response in files compatible with all the browsers ,
there is a way , u can use FileReader in javascript but that again wudn't work on IE either .
I had the similar prob a few weeks ago , what i did was i made an ajax request to a server passing the content , the server stored the content for me in the file , then it return a reference to the stored file.
i stored my files in a temp database table and the server action returned the id for the file by which we can access the file from database whenever we want.
you can also store your files on the server in some thumbnail , but i prefered database.
if u need any more specification , let me know

How to manipulate Javascript websites in Perl

I have been asked to automate the logging into a webapp(what I assume to be one, that runs a lot of .aspx and .js scripts) that, currently, can only run in IE. Now i am programming in Perl and have tried to use Win32::IE::Mechanize to run the IE browser and log in. What i did was try an extract all the forms from the webapp, and given the users information, fill out the required forms, but this is where the problem arises, when I try and run the subroutine no forms appear......
So then I transitioned into WWW::Mechanize and used the post subroutine(from LWP::UserAgent) which solved the problem for the most part. Now i've run into a problem in the response, from the server, I get this script as the content of the response and I don't know what to do with it.
So my question is: Using Perl how can I go about to manipulate a Javascript functions in a website? Would that even be a valid solution to the problem?
I am open to writing this in other programming languages as well. Thanks in advance for the help!
(So that I can fully log in to the webapp)
Update: The content of the response:
var msgTimerID;
var strForceLogOff = "false";
function WindowOnLoad(){
if ("false" == "true" && "false" == "false")
MerlinSystemMsg("",64);
if ("false"=="true")
msgTimerID = window.setInterval("MerlinSystemMsg(10095,64)", 300000,'javascript');
}
function MyShowModal(){
showModalDialog("", window, strFeatures);}
function clearMsgInterval(){
window.clearInterval(msgTimerID);
}
function WindowOnUnLoad(){
if(top.frames(0).document.getElementById("OPMODE").value =="LOGOFF"){
strFeatures = "width=1,height=1,left=1000,top=1000,toolbar=no,scrollbars=no,menubar=no,location=no,directories=no,status=yes,resizable=1";
window.open("ForceLogOff.aspx","forcelogout",strFeatures);
}
}
window.onbeforeunload = WindowOnUnLoad;
window.onload = WindowOnLoad;
There is also this Frame Title that has the src:
FRAME TITLE="Service Desk Express Navigator" SRC="options_nailogo.aspx" MARGINWIDTH=0 MARGINHEIGHT=0 NORESIZE scrolling=no
Trying to emulate the browser with a fully functioning JS engine is going to be a mighty big task. Instead, I'd suggest that you just try to emulate the actual interaction with the web site and not care what HTML/JS is actually sent back. Your server side code doesn't care how the HTTP submissions take place, only that they do. Admittedly this is more fragile if the forms change a lot, but at least you're not trying to implement a full browser.
So look at modules like LWP::UserAgent, HTTP::Request and HTTP::Response.
I'm copying and pasting my answer to your other duplicate question here
(You should consider deleting one of these?)
That content is the website source :)
How WWW::Mechanize deals with FRAME SRC as a link:
Note that <FRAME SRC="..."> tags are parsed out of the the HTML and
treated as links so this method works with them.
You'll want to use follow_link on that link.
As far as dealing with Javascript, there is support for a Firefox Add-on called MozRepl that you can use in conjunction with WWW::Mechanize::Firefox that I have used in the past to call Javascript code while crawling a page.

Categories

Resources