Intercept page redirect - javascript

Im scraping a website and Im trying to redirect to another website if people click on a link so I injected some javascript:
$('a').on('click', function() {
for (var ls = document.links, numLinks = ls.length, i=0; i<numLinks; i++){
ls[i].href= 'http://mywebsite.com/test.php?url=' + this;
}
});
It works, but it only works on actual links <a href..>. Sometimes a click on an element will act as a link do to some javascript, I also would like to capture that 'event'. It has me thinking about XMLHttpRequest if I Im not mistaken the browser has a built in object called XMLHttp object, which one could use to intercept ajax calls:
(function(open) {
XMLHttpRequest.prototype.open = function(method, url, async) {
//do something...
So my question is: Does anything similar exist for listening and altering outgoing URL's?

Just so I'm clear on this, whenever somebody clicks on a link, you want to figure out what that URL is, alter it, then redirect the user to your own URL?
$('a').on('click', function(event){
event.preventDefault();
var url = $(this).attr('href');
});
preventDefault() will stop the page from redirecting.
At this point, url will be the URL string, and you can do whatever you want to it. To redirect the user, either use window.location.href or window.location.replace.
JSfiddle here
http://jsfiddle.net/JQ5qC/

Related

How to Handle redirects in Node.JS with HorsemanJs and PhantomJS

I´ve recently started using horseman.js to scrap a page with node. I can´t figure out how exactly it works and I can´t find good examples on the internet.
My main goal is to log on a platform and extract some data. I´ve managed to do this with PhantomJS, but know I want to learn how to do it with horseman.JS.
My code should open the login page, fill the login and password inputs and click on the "login" button. Pretty easy so far. However, after clicking on the "login" button the site makes 2 redirects before loading the actual page where I want to work.
My problem is that I don´t know how to make my code wait for that page.
With phantomJS I had a workaround with the page URL. The following code shows how I´ve managed to do it with phantomJS and it works just fine:
var page = require('webpage').create();
var urlHome = 'http://akna.com.br/site/montatela.php?t=acesse&header=n&footer=n';
var fillLoginInfo = function(){
$('#cmpLogin').val('mylogin');
$('#cmpSenha').val('mypassword');
$('.btn.btn-default').click();
};
page.onLoadFinished = function(){
var url = page.url;
console.log("Page Loaded: " + url);
if(url == urlHome){
page.evaluate(fillLoginInfo);
return;
}
// After the redirects the url has a "sid" parameter, I wait for that to apear when the page loads.
else if(url.indexOf("sid=") >0){
//Keep struggling with more codes!
return;
}
}
page.open(urlHome);
However, I can´t find a way to handle the redirects with horseman.JS.
Here is what I´ve been trying with horseman.JS without any success:
var Horseman = require("node-horseman");
var horseman = new Horseman();
var urlHome = 'http://akna.com.br/site/montatela.php?t=acesse&header=n&footer=n';
var fillLoginInfo = function(){
$('#cmpLogin').val('myemail');
$('#cmpSenha').val('mypassword');
$('.btn.btn-default').click();
}
var okStatus = function(){
return horseman.status();
}
horseman
.open(urlHome)
.type('input[name="cmpLogin"]','myemail')
.type('input[name="cmpSenha"]','mypassword')
.click('.btn-success')
.waitFor(okStatus, 200)
.screenshot('image.png')
.close();
How do I handle the redirects?
I'm currently solving the same problem, and my best solution so far is to use the waitForSelector method to target something on the final page.
E.g.
horseman
.open(urlHome)
.type('input[name="cmpLogin"]','myemail')
.type('input[name="cmpSenha"]','mypassword')
.click('.btn-success')
.waitForSelector("#loginComplete")
.screenshot('image.png')
.close();
Of course you have to know the page you're waiting for to do this.
If you know there are two redirects, you can use the approach of .waitForNextPage() twice. A naive approach if you didn't know how many redirects to expect would be to chain these until a timeout is reached (I don't recommend this as it will be slow!),
Perhaps a cleverer way, you can also use on events to capture redirects, like .on('navigationRequested') or .on('urlChanged').
Although it doesn't answer your question directly, this link may help: https://github.com/ariya/phantomjs/issues/11507

Ajax with history.pushState and popstate - what do I do when popstate state property is null?

I'm trying out the HTML5 history API with ajax loading of content.
I've got a bunch of test pages connected by relative links. I have this JS, which handles clicks on those links. When a link is clicked the handler grabs its href attribute and passes it to ajaxLoadPage(), which loads content from the requested page into the content area of the current page. (My PHP pages are set up to return a full HTML page if you request them normally, but only a chunk of content if ?fragment=true is appended to the URL of the request.)
Then my click handler calls history.pushState() to display the URL in the address bar and add it to the browser history.
$(document).ready(function(){
var content = $('#content');
var ajaxLoadPage = function (url) {
console.log('Loading ' + url + ' fragment');
content.load(url + '?fragment=true');
}
// Handle click event of all links with href not starting with http, https or #
$('a').not('[href^=http], [href^=https], [href^=#]').on('click', function(e){
e.preventDefault();
var href = $(this).attr('href');
ajaxLoadPage(href);
history.pushState({page:href}, null, href);
});
// This mostly works - only problem is when popstate happens and state is null
// e.g. when we try to go back to the initial page we loaded normally
$(window).bind('popstate', function(event){
console.log('Popstate');
var state = event.originalEvent.state;
console.log(state);
if (state !== null) {
if (state.page !== undefined) {
ajaxLoadPage(state.page);
}
}
});
});
When you add URLs to the history with pushState you also need to include an event handler for the popstate event to deal with clicks on the back or forward buttons. (If you don't do this, clicking back shows the URL you pushed to history in the address bar, but the page isn't updated.) So my popstate handler grabs the URL saved in the state property of each entry I created, and passes it to ajaxLoadPage to load the appropriate content.
This works OK for pages my click handler added to the history. But what happens with pages the browser added to history when I requested them "normally"? Say I land on my first page normally and then navigate through my site with clicks that do that ajax loading - if I then try to go back through the history to that first page, the last click shows the URL for the first page, but doesn't load the page in the browser. Why is that?
I can sort of see this has something to do with the state property of that last popstate event. The state property is null for that event, because it's only entries added to the history by pushState() or replaceState() that can give it a value. But my first loading of the page was a "normal" request - how come the browser doesn't just step back and load the initial URL normally?
This is an older question but there is a much simpler answer using native javascript for this issue.
For the initial state you should not be using history.pushState but rather history.replaceState.
All arguments are the same for both methods with the only difference is that pushState creates a NEW history record and thus is the source of your problem. replaceState only replaces the state of that history record and will behave as expected, that is go back to the initial starting page.
I ran into the same issue as the original question. This line
var initialPop = !popped && location.href == initialURL;
should be changed to
var initialPop = !popped;
This is sufficient to catch the initial pop. Then you do not need to add the original page to the pushState. i.e. remove the following:
var home = 'index.html';
history.pushState({page:home}, null, home);
The final code based on AJAX tabs (and using Mootools):
if ( this.supports_history_api() ) {
var popped = ('state' in window.history && window.history.state !== null)
, changeTabBack = false;
window.addEvent('myShowTabEvent', function ( url ) {
if ( url && !changingTabBack )
setLocation(url);
else
changingTabBack = false;
//Make sure you do not add to the pushState after clicking the back button
});
window.addEventListener("popstate", function(e) {
var initialPop = !popped;
popped = true;
if ( initialPop )
return;
var tabLink = $$('a[href="' + location.pathname + '"][data-toggle*=tab]')[0];
if ( tabLink ) {
changingTabBack = true;
tabLink.tab('show');
}
});
}
I still don't understand why the back button behaves like this - I'd have thought the browser would be happy to step back to an entry that was created by a normal request. Maybe when you insert other entries with pushState the history stops behaving in the normal way. But I found a way to make my code work better. You can't always depend on the state property containing the URL you want to step back to. But stepping back through history changes the URL in the address bar as you would expect, so it may be more reliable to load your content based on window.location. Following this great example I've changed my popstate handler so it loads content based on the URL in the address bar instead of looking for a URL in the state property.
One thing you have to watch out for is that some browsers (like Chrome) fire a popstate event when you initially hit a page. When this happens you're liable to reload your initial page's content unnecessarily. So I've added some bits of code from the excellent pjax to ignore that initial pop.
$(document).ready(function(){
// Used to detect initial (useless) popstate.
// If history.state exists, pushState() has created the current entry so we can
// assume browser isn't going to fire initial popstate
var popped = ('state' in window.history && window.history.state !== null), initialURL = location.href;
var content = $('#content');
var ajaxLoadPage = function (url) {
console.log('Loading ' + url + ' fragment');
content.load(url + '?fragment=true');
}
// Handle click event of all links with href not starting with http, https or #
$('a').not('[href^=http], [href^=https], [href^=#]').on('click', function(e){
e.preventDefault();
var href = $(this).attr('href');
ajaxLoadPage(href);
history.pushState({page:href}, null, href);
});
$(window).bind('popstate', function(event){
// Ignore inital popstate that some browsers fire on page load
var initialPop = !popped && location.href == initialURL;
popped = true;
if (initialPop) return;
console.log('Popstate');
// By the time popstate has fired, location.pathname has been changed
ajaxLoadPage(location.pathname);
});
});
One improvement you could make to this JS is only to attach the click event handler if the browser supports the history API.
I actually found myself with a similar need today and found the code you provided to be very useful. I came to the same problem you did, and I believe all that you're missing is pushing your index file or home page to the history in the same manner that you are all subsequent pages.
Here is an example of what I did to resolve this (not sure if it's the RIGHT answer, but it's simple and it works!):
var home = 'index.html';
history.pushState({page:home}, null, home);
Hope this helps!
I realize this is an old question, but when trying to manage state easily like this, it might be better to take the following approach:
$(window).on('popstate',function(e){
var state = e.originalEvent.state;
if(state != null){
if(state.hasOwnProperty('window')){
//callback on window
window[state.window].call(window,state);
}
}
});
in this way, you can specify an optional callback function on the state object when adding to history, then when popstate is trigger, this function would be called with the state object as a parameter.
function pushState(title,url,callback)
{
var state = {
Url : url,
Title : title,
};
if(window[callback] && typeof window[callback] === 'function')
{
state.callback = callback;
}
history.pushState(state,state.Title,state.Url);
}
You could easily extend this to suit your needs.
And Finally says:
I'd have thought the browser would be happy to step back to an entry that was created by a normal request.
I found an explanation of that strange browser's behavior here. The explanation is
you should save the state when your site is loaded the first time and thereafter every time it changes state
I tested this - it works.
It means there is no need in loading your content based on window.location.
I hope I don't mislead.

Window.Location Refreshes instead of Redirects

I have a JQUERY function as follows
this.getURL = function()
{
var name = getName();
alert("Menu.aspx?name"+name);
//window.location = "Menu.aspx?name"+name;
}
When I alert the URL I am attempting to go to, it is correct. However, when I call window.location on that string, the page just refreshes without going anywhere.
I have similar code where I have used window.location and it works. I typed in the url into my browser and it works as well.
At worst (even if the URL was wrong), I was hoping that it would just redirect me to some URL. However, I can't get it to do anything other than refresh the current page.
Also to clarify, the page which calls this function is not Menu.aspx
Thanks in advance.
If you're using a relative path try setting window.location.pathname, otherwise set window.location.href for a full path.
You may also want to try self.location.href
In my experience, it's been difficult to get redirects like this to work right. I've had to use window.location.replace(<url>). If you're just changing an anchor tag, it's even more difficult. You have to do the following to get it to work in all browsers:
window.location.replace(<url>);
window.location=<url>;
window.open(<url>,'_self');
window.location.reload();

Display Webpage current URL with Firefox extension

I've written the following code for the purpose of the title of the post but instead of having the real URL I get the previous URL (e.g. If I'm on Google and type "car" in the search field and type "Enter" I get "http://www.google.fr" and not the URL from the search).
code :
window.addEventListener("change", function() { myExtension_with_change.init(); }, false);
var myExtension_with_change = {
init: function() {
var url = window.location.href;
alert(url);
}
}
You might need to add an event listener inside the first to wait for the window to load, such as:
window.addEventListener("change", function()
{
window.addEventListener("load", function()
{
myExtension_with_change.init();
}, false);
}, false);
I doubt that window is the correct anchor to listen for changes of the URL. My first try would be listen to change events at #urlbar (I didn't try that, though):
window.getElementById('#urlbar').addEventListener("change", function() {
myExtension_with_change.init(); }, false);
If your ultimate goal is to listen to URL changes on every tab I suggest you also have look at the Tabbed Browser documentation and this code snippet on location changes.
In another post https://stackoverflow.com/users/785541/wladimir-palant gave a perfect answer ( Get URL of a webpage dynamically with Javascript on a Firefox extension ) .
In my case, I followed the recommendation in http://forums.mozillazine.org/viewtopic.php?f=9&t=194671. Simply calling the following code snippet gives me the current url
gBrowser.mCurrentBrowser.currentURI.QueryInterface(Components.interfaces.nsIURI);
var currentUrl = gBrowser.mCurrentBrowser.currentURI.spec;

Trigger function based on result of custom function Referring URL

I need to use JavaScript (jQuery if applicable) to trigger my modal call if the result of my function is true and the referring URL is not of the domain.
The desire is that the user visits the main splash page and as long as they have not been redirected there by the site itself (via timeout on a session, invalid login credentials, etc) it displays the message so:
function showModalIf() {
if (checkFunction) {
if(////// REFERRING URL not from this site)
Trigger Modal Call
else
Don't Do anything else
}
}
Assuming you use jQuery UI Dialog to show the modal
function checkReferrerExternal() {
if (!document.referrer || document.referrer == '') return false;
var regex = /^https?:\/\/\/?([^?:\/\s]+).*/;
var referrermatch = regex.exec(document.referrer);
var locationmatch = regex.exec(document.location);
return referrermatch[1] != locationmatch[1];
}
function showModalIf() {
if (checkReferrerExternal()) {
//show jQuery UI Dialog modal or replace with whatever
$("div#dialog").dialog('open');
}
}
Check demo page http://jsbin.com/efico
If you are talking about forced redirection in the code, and not just a hyperlink click from elsewhere in the site, you could add a query string parameter on your redirection and check that way. Another option is to set a cookie and check for the cookie in javascript.
Here is a nice link on cookie handling in Javascript:
Javascript - Cookies
And here's one for parsing query string params/hashes in Javascript as well:
Parsing The Querystring with Javascript
Hope this points you in the right direction :)

Categories

Resources