How to capture the souce code of an onload modified webpage - javascript

Well, I was scraping data from one of the website (purely in legal limits).
The situation is that the site has 5 questions on a page and answers to them.But the source code that I see by pressing Ctrl+u is different from the code that I see by Inspect element or firebug in Firefox.That means the site is changing the answers on page load to fool the people around as scrapers would normally grab the unmodified code.The correct answers are on the onLoad modified page.
What I want is to capture the source code that I see in Firebug or Inspect element (the modified code) instead of the code that I see on pressing Ctrl+U.
I used one of the scraping API but it is capturing that original Ctrl+U code.
Is there any solution?

In chrome, select root element (<html>) and right click -> Copy as HTML, paste wherever
[EDIT]
I suspect you are trying to scrape the data automatically, this obviously won't work, not sure how to do this otherwise. There are some headless web browsers that support JS (phantom js for example), they might do the trick. Also check out this super user post

I'ld suggest you to do a log of the html of your page before the onload happens. Which can be done using jquery.
Or plain classic debugging using "debugger" which will stop the exec of your webpage once the line where you put it is interpreted by the browser.
As a HTML5 game dev, I usually do advanced logging through the console, to know what does what and what's executed when. Although it may take some time, it allows you to have a good comprehension of what you've written and to ensure optimization (mostly in number of execs of some stuff) and to catch bugs that might not be obvious.

Related

Reverse Engineer JSBin: Why are they using iframe to run the given code instead of 'eval'?

I'm reverse engineering on how JSBIN is running the JavaScript and outputting to Console. It seems that they are creating an IFRAME and put the code that user entered. Override 'console.' (ex console.log) to capture the output or something similar (still reading the code).
My question is why are they running in iframe instead of simply running the 'eval' method. I'm sure they must have good reason. Is it because of XSS attack? If so what prevents XSS from happening by running the code in iframe? Any help would be appreciated!

Method to determine javascript functions calls made in a browser

In a browser you can determine what files are loaded when a website loads and you can even view the timeline.
But is there any way to determine what javascript calls are being made once the script loads for a website?(in firefox or chrome or any software package)
Hope you got my question
(Because that would be useful for debugging logical javascript errors and others I suppose)
I use Chrome's Developer Tools for this:
Check the click box, and then click on the element on the page you want to find the handler for. If you are using jQuery (or similar library), you may have to step through their code before you get to yours.
Taken from: How do I find out what javascript function is being called by an object's onclick event?
Typically I just use logging in FireBug personally (which despite my desperate approach to avoid Chrome, is turning out to force me to adopt it with built-in developer tools).

Is the source code presented on view source in browser always accurate/complete?

I have a problem with an specific web page. When I press on a link I got an application error (not http error etc but application level error).
But I had the developer tools and net console open and I saw that no requests are send to the server.
So I double clicked and selected view source and I saw that this error was part of the dynamically generated html but the html page seemed malformed.
It seemed to end like this:
<div id="theId"> You can not access page
</html>
The page as a whole seems ok though. When I use either Crome or Firefox I see the same thing.(IE is not an option as I don't have a windows pc available).
The view source does not allow me to "study" the page e.g. expand tags etc.
How can I debug something like this? Could it be for some reason that the browsers do not display the code correctly?
"View source" gives the HTML as it was sent by the server. What's generally way more useful is looking at the generated document structure.
In Chrome, you do this by opening developer tools (cmd-alt-i on Mac, ctrl-alt-i on Windows) and then look at the Elements tab (first tab).
In Firefox, you need to look at the "Inspector" tab.
In both Chrome and Firefox, you can right-click on any page element and select "inspect element". This brings up the tab with the document structure, with the selected element revealed. This is easier than hunting down the element you're interested in yourself.
Also, when hovering over an element in the document tree, both browsers highlight the element you're pointing at in the regular viewport.
There are two reasons why the final document structure (or at least, final up to that point in time) can differ from the HTML send by the server:
Javascript that modifies the document. For sites who make very heave use of Javascript (and especially the so-called single-page apps), the resulting document structure can be much different from the original HTML source. That just forms the basis for rendering the page. There's a lot done by Javascript afterwards.
Malformed HTML. In this case, all browsers try to make the best of what they have before them to generate a valid document structure.
To answer the question in your title, if your talking javascript yes and no if your talking html, then its a yes to accurate but no to complete.
Html can and is in a lot of web applications generated by the server side language, or even by javascript. So depending on what you have requested from the server the HTML could be far from complete.
Also when using things like html generation with languages like ruby or php, it can be very easy to end up generating bad/sucktacular html.
The story is a little different with javascript. By necessity of what it is, all of the source for javascript must be served up in some form in your browser. But do to the fact that things like jquery can be cached by your browser which means the code for it might not show when you open the developers console.
Then you have the issue of minifiers and obfuscation, and then obfuscators used on minified obfuscated code! Which can leave a horrible mess.
My guess is that their server generated some bad html which then broke their javascript,(resulting in the javascript not calling the server as you saw on the net log) which was then handled on your side of things with a error message displayed by the javascript that captured the error.
In Firefox, using the Web Developer Toolbar, there is a 'View Source' menu with a dropdown option to 'View Generated Source'. This outputs the complete html source as your inspector sees it, after being processed and updated by javascript. A very useful plug-in, and was my mainstay before the advent of Firebug.
As far as I'm aware, "view source" gives you whatever the webserver sent your web browser. Web browsers are often very forgiving when it comes to rendering syntactically incorrect HTML. If you're the one that develops this web page, you may want to take a closer look at what it's sending.

Create a Addon to modify JavaScript Data before executed on Firefox

i want to create an addon for firefox, that should check every JavaScript on a loading page. And if there is a Code, which is not allowed it should be blocked or modiefied (it is a part of XSS Protection).
But i don't know, how to implement this.
I tried to create an http-on-modify-request observer and so i have an access to the scripts. But how can i modify them before Firefox execute it?
My second trial was to create an addon like the Flashblock addon.
So i made a CSS-file and bind the script tags to a xml-file.
In the xml file i create a placeholder and replace the javascript.
When i start a page and look into DOM-Inspector it works fine... there are div-tags instead of javascript tags.
The Problem is, that Firefox still executed the original javascripts and so my trial failed.
Have anybody some tips for me?
ps: sry, for my english, but english is not my native language
I think you're looking for nsITraceableChannel:
http://www.softwareishard.com/blog/firebug/nsitraceablechannel-intercept-http-traffic/

Interactive Javascript console (preferably integrated with Firebug)

I'm looking for a way to have an interactive JIT debugger, preferably integrated with Firebug.
I got the idea from PHPEd, which has an "Immediate" debug tab where you can just type in PHP code and modify objects on the fly. This makes debugging a breeze as you can re-assign variables multiple times, re-execute functions, etc without leaving the program.
Here's what I think would be superb:
- set a breakpoint in Firebug
- arrive to breakpoint
- have an Execute JS tab where one could enter JS code, similar to what I described above
Does anything like this exist already?
TIA.
You can already do this in Firebug. Just get to a break point, then go to the "console" tab, and type your commands into the command line at the bottom (where there's the ">>>").
If I understand the question correctly, I think can do that already in firebug.
Set a breakpoint (or use the debugger
keyword)
Click the console tab
the bottom line allows you to enter a
javascript command.
if you need more space click the icon
that looks like an upside down v in
the bottom right part of the browser.
You might also like the JS execute extension.
Actually, Firebug can do this and it's only a matter of a little investigation on their website to find out how to do this best :) Good luck!
Agree with parents that Firebug is the best choice. Another option that requires a good deal of configuration would be Aptana. For folks using the Eclipse IDE, Aptana is a solid editor for Javascript work. The plus with Aptana is that it's tied more to a code editing environment.

Categories

Resources