How do I use Mechanize to process JavaScript? - javascript

I'm connecting to a web site, logging in.
The website redirects me to new pages and Mechanize deals with all cookie and redirection jobs, but, I can't get the last page. I used Firebug and did same job again and saw that there are two more pages I had to pass with Mechanize.
I took a quick look at the pages and saw that there is some JavaScript and HTML code but couldn't understand it because it doesn't look like normal page code. What are those pages for? How they can redirect to other pages? What should I do to pass these?

If you need to handle pages with Javascript, try WATIR or Selenium - those drive a real web browser, and can thus handle any Javascript. WATIR Classic requires either IE or Firefox with a certain extension installed, and you will see the pages flash on the screen as it works.
Your other option would be understanding what the Javascript on the offending page does and bypassing it manually, but that seems onerous.

At present, Mechanize doesn't handle JavaScript. There's talk of eventually merging Johnson's capabilities into Mechanize, but until that happens, you have two options:
Figure out the JavaScript well enough to understand how to traverse those pages.
Automate an actual browser that does understand JavaScript using Watir.

what are those pages for? how they can redirect to other pages. what should i do to pass these?
Sometimes work is done on those pages. Sometimes the JavaScript is there to prevent automated access like what you're trying to do :). A lot of websites have unnecessary checks to make sure you have a "good" browser, so make sure that your user_agent is set to something common, like IE. Sometimes setting the user_agent to look like an old browser will let you get past without JavaScript.
Website automation is fun because you have to outsmart the website and its software developers, using multiple strategies. Like the others said, Watir is the best tool for getting past JavaScript at the moment.

Related

How to bypass JavaScript detection when using requests in Python?

So there is a problem with JavaScript and requests (in Python) and that is, it does not use JavaScript when requesting a webpage.
The website I'm working with (https://access.paylocity.com/) requires JavaScript and without it, it changes the content of the page to just a text at the top saying, "Please enable JavaScript to view the page content."
(I could be wrong here but) I think one solution is the use of Selenium, but that would replace requests which I'm fine with as long as there are no other ways of fixing/bypassing this JavaScript detection.
(For those wondering, this python project of mine is supposed to automatically fetch the events on the Paylocity calendar, then port those events to another calendar that I frequently use everyday. It's also just intended for myself.)
Edit: Here is the code I have if that will help https://pastecode.io/s/GXTUO1BgtR (I didn't know where to paste my code, so I decided on that website. If I should change it, please comment or say something about it.)
Since the website you're working with is dynamically loading the JS as far I can tell, I think you have no other choice as to making use of Selenium. I had a project on my own a couple weeks ago and run into a similar problem which I could also solve using Selenium. But, I'm no expert, I'm just giving away my thoughts on this.

Logging in to target.com with script

I am new to the world of programming so I am trying to make a simple login script that will log me in to target.com when ran! Eventually I will want to add other things to it but I can’t seem to figure out the best way to do this. Any tips would be appreciated!
Using a tool like Puppeteer or Selenium can be used to automate browser tasks.
The basic idea is to use the tool to navigate to the website url, then while on the page you have to get references to the DOM elements you want to interact with. Once you do that you can use the tool to simulate events like clicking, typing etc.
However, logging in to sites like Target using scripts poses issues because they have ways to guard against automated scripts accessing their site. Some defenses include setting up Captchas and examining request headers.

Turning Javascript into a widget

I've made a tool take people can use and its made in Javascript.
They generate the code and then paste it into their webpage code.
I am just wondering how would I go about making it work in forum posts, or other site posts or do the websites normally block this sort of thing?
Would it be possible to convert it for social media sites also? Like have it so the users can interact with it inside posts? Thanks!
So first a warning: DO NOT DO THIS THE WAY THAT YOU HAVE IT RIGHT NOW
You are forcing people to inject a script directly into their website directly from your servers. This script sets and users global variables and can break other things. I would be very very angry if someone added something like this to a site I ran. For example your script creates a function called clicked() in the global scope. What if the site already had a function named that on it? Then all of a sudden that site breaks. It is possible to do this without these issues (see how twitter buttons work for example) but it still has limitations.
Doing this in the way you have it set up looks to another site exactly like an XSS (Cross Site Scripting) attack so no, you cannot do this in forums/etc - they specifically disallow it.
Occasionally a forum might allow iframes (they have much better security for the including site). In your case using iframes would actually be a much cleaner solution. Simply have them include an iframe that points to your site. Your site then (using php, .Net, node, perl, whatever server scripting language you want) would reply with the markup that you want. You can pass the parameters you want as parameters in your url.

Add enhancements to a website (whether it be by C#, Chrome Extensions, etc.) -- Not sure what would work?

There is a website that I visit often... let's call it www.example.com. And, I am able to interact with parts of this website. The interactions send XMLHttpRequest and get a response back through Javascript, jQuery I believe.
I'm not sure what technology will let me achieve what I want to do, and where to start. Basically, I want to add additional options/shortcuts that the site does not provide. I thought about maybe using a macro, but trying to use macro recording software is just a pain in the butt.
I inspected (using Google Chrome's Developer Tools) the XMLHttpRequest being sent back and forth and I noticed that it is simple JSON messages. I figured the best way to add enhancements to the site without waiting for the actual owners of the site to do so would be to simulate the website sending/recieving these XMLHttpRequest/Response and making additional adjustments to the DOM to provide extra shortcuts.
I don't want to interfere with the original site's functionality though... ie if I send a request and receive a response I want both the original script and my script to process the response. So, here is where I'm stuck... I'm not sure whether to go along the paths of creating a C# application or a Google Chrome extension (I use Google Chrome) or something else alltogether. Any pointers on what dev tools/languages will give me the ability to do what I want would be great. Thanks!
Chrome has built in support for user scripts. You can use these to modify the page as you see fit and also to make requests. Without more details regarding what exactly you want to do with these AJAX request it's hard to advise further.
I'm not 100% sure what your question is, but as I understand it, you want to be able to make changes to a certain website. If these changes can be done with js, i would recommend Greasemonkey for Firefox. It basically lets you run a custom script when you are visiting a certain webpage/domain. You can be as specific as you want about which pages use the script. Once your script loads jQuery, it is really easy to add any functionality.
https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/
You can find pre-written scripts for tons of sites here:
http://userscripts.org/

Emulate javascript _dopostback in python, web scraping

Here LINK it is suggested that it is possible to "Figure out what the JavaScript is doing and emulate it in your Python code: " This is what I would like help doing ie my question. How do I emulate javascript:__doPostBack ?
Code from a website (full page source here LINK:
<a style="color: Black;" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$gvSearchResults','Page$2')">2</a>
Of course I have basically know idea where to go from here.
Thanks in advance for your help and ideas
Ok there are lots of posts asking how to CLICK a javascript button when web scraping with python libraries mechanize, beautifulsoup....,similar. I see a lot of "that is not supported" responses use THIS non python solution. I think a python solution to this problem would be of great benefit to many. In that light I am not looking for answers like use x,y or z which are not python code or require interacting with a browser.
The mechanize page is not suggesting that you can emulate JavaScript in Python. It is saying that you can change a hidden field in a form, thus tricking the web server that a human1 has selected the field. You still need to analyse the target yourself.
There will be no Python-based solution to this problem, unless you wish to create a JavaScript interpreter in Python.
My thoughts on this problem have led me to three possible solutions:
create an XULRunner application
browser automation
attempt to interpret the client-side code
Of those three, I've only really seen discussion of 2. I've seen something
close to 1 in a commercial scraping application, where you basically create
scripts by browsing on sites and selecting things on the pages that you
would like the script to extract in the future.
1 could possibly made to work with a Python script by accepting a
serialisation (JSON ?) of wsgi Request objects, getting the app to fetch the
URL, then sending the processed page as a wsgi Response object. You could
possibly wrap some middleware around urllib2 to achieve this. Overkill
probably, but kind of fun to think about.
2 is usually achieved via Selenium RC (Remote Control), a testing-centric
tool. It provides a few methods like getHtmlSource but most people that I've
heard using it get don't like its API.
3 I have no idea about. node.js is very hot right now, but I haven't
touched it. I've never been able to build spidermonkey on my Ubuntu
machine, so I haven't touched that either. My hunch is that in order to do
this, you would provide the HTML source and your details to a JS
interpreter, that would need to fake being your User-Agent etc in case the
JavaScript wanted to reconnect with the server.
1 well, more technically, a JavaScript compliant User-Agent, which is almost always a web browser used by a human

Categories

Resources