Running a client-script on loading a webpage

Running a client-script on loading a webpage - javascript

I want to run a script when a user tries to access a webpage.
E.g I type in google.com and as the page loads I want a client side script to
know the protocol of the page( https in this case).
I know that windows.location.protocol is one way of knowing the protocol in JS.
But how do I make this run when a user accesses a webpage in a browser.
Also can I send a request for a webpage and analyse the response using , say ajax.Suppose I send a http request to facebook. And i get a 301 redirect message.How do I analyse this response and know that this is a redirect message.
Does it require browser modifications?Can it be done without it.Thanks

Like #SachinKumar said...
1) getting and using protocol: create a browser plugin in your preferred browser. Chrome is the most obvious choice.
2) determining response: do you mean, analyze it using your own code when the user receives a 301? (Probably doable with a browser plugin.) Or, do you mean you just want to check the response of some page yourself? (Jquery's http/get functions are one clear way to do that.)

Related

Use JavaScript to crawl a website -> Possible and which IP is shown on the crawled site

it is possible to crawl a website within an Angular-App? I am speaking about to call a website from Angular, not crawling an Angular-App. If that so, then I am wondering which IP will be shown on the crawled website. Since JavaScript is client-side, I would suggest, its the IP of the client, not of the server (like probably at nodejs). But all I know, its mostly browser-implemented stuff what we can use in JS, so it is even possible to crawl websites with methods from JavaScript (or Angular)?
Best Regards
Buzz

In theory, you can create an AJAX request to fetch the data with reponse type text/html. That would give you the remote document as a string. The browser wouldn't try to load the JavaScript and CSS in that document, though. That might not be a problem but CORS is. For security reasons, most browsers prevent you from loading data from somewhere else (otherwise, it would be too easy for criminals to put JavaScript into any web page). See here for details: https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS
If you have control over the second domain, you can configure the server there to send Access-Control-Allow-Origin headers to the browser to allow access from the Angular App.
Note: You could use an iframe to load the other website but when the domains of the current document and the one in the iframe don't match, then you can't access the contents of the iframe from JavaScript.
One way to work around this is to install a proxy on your server. The browser can then ask your server for the pages in question. In this case, the remote web site will get the IP of your server.

"Run and forget" a link from javascript

I'm working on a site that will keep a track of when user enters a page and how long he's there (or to be more specific, when he leaves)
I have set up a node server and I would like to run the action there, but the question is irrelevant to node itself, it's a javascript related question.
My goal is to have javascript call a certain method with specific parameters and then forget about that. However, I would like to avoid ajax if possible, I know I could do it with ajax, but I think that's an overkill and I'm not sure even if ajax would work on a node server.
What I'm looking at is something like
*User opens web page
*Javascript runs the script, let's say.
Run("http://server.com/User/EnteredPage/IDOFUSER");
and when the user closes the page
Run("http://server.com/User/LeftPage/IDOFUSER");
Point is.. I don't need anything from that call, I just want the javascript to run it, to save the data I need and that's it.

HTTP is stateless. The browser asks for a resource. The server gives the browser the resource. The end.
The request is done and dealt with. There is no further communication about that request so the server doesn't know when the visitor has left the page that it served up. If you want to know that, then you need another request to tell the server about it.
The problem with that approach is that the visitor might leave the page by:
Quitting their browser entirely
Running out of battery
Getting disconnected from the network
… so you can't reliably send a new request when the user leaves the page.
So the best you can do is to have the browser tell the server that the user hasn't left yet (you could do this with Ajax or (potentially more efficiently) WebSockets).
Combine this with a timer based action on the server that tests how long it has been since the visitor's browser last sent an I'm still here message and use that to call your visitor has left function.

Differenciate Between User Requests and AJAX/Resource Requests

I'm attempting to create an app with Node.js (using http.createServer()) which will be a single page application with requests for data via XMLHttpRequest. To do this I need to be able to differentiate between a user navigating to my domain, and AJAX requests and requests generated by the browser for linked resources.
If the request is from the user I always want to return the index.html page which will handle requesting content but if the request is browser generated or AJAX and is for CSS, Javascript or other linked files I want to serve those files. Is there any way to detect this?
Looking at the request headers for the different file types I saw the referer header appeared when the request for content was generated by the page. I figured that was the solution I was looking for but that header is also set when a user clicks on a link to the page making it useless.
The only other thing which seems to change is the accept header which could sort of work but might not be a catch all solution. Any user requests always seem to have text/html as the preferred return type regardless of which url was entered. I could detect that but I'm pretty sure AJAX requests for html files would also have that accept header which would cause problems.
Is there anything I'm missing here (any headers or properties I can look for)?
Edit: I do not need the solution to protect files and I don't care about users bypassing it with their own requests. My intention is not to hide files or make them secure, but rather to keep any data that is requested within the scope of the app.
For example, if a user navigates to http://example.com/images/someimage.jpg they are instead shown the index.html file which can then show the image in a richer context and include all of the links and functionality to go with it.
TL/DR: I need to detect when someone is trying to access the app to then serve them the index page and have that send them the content they want. I also need to detect when the browser has requested resources (JS, CSS, HTML, images, etc) needed by the app to be able to actually return the resource not the index file.

In terms of HTTP protocol there are NO difference between a user-generated-query and a browser-generated-query.
Every query is just... a query.
You can make a query with a command line, with a browser, you can click a link, send some ascii text via telnet, request a proxy which will make the query for you, the server goal is never to identify how the query was requested by the user.
See for example a request made by a user on a reverse proxy cache, this query will never reach your server (response comes from the cache), the first query made to build this response could have been made by a real user or by a browser.
In terms of security trying to control that the user is never requesting data by-himself cannot be done by detecting that the query is a real human click (and search google for clickjacking if you want to be afraid). Every query that a browser can make can also be played by the user, every one, you have no way to prevent that.
Some browsers plugins are even doing pre-fetching, detecting links on the page and making the request before you do it yourself (if it's a GET query).
For ajax, some libraries like JQuery will add an X-Requested-With: XMLHttpRequest header, and this is used on most framework to detect ajax mode.
But it is more robust to depend on a location policy for that (like making your ajax queries with a /format/ajax, which could also be used on other ways (like /format/json, /format/html, or /format/csv).
Spending time on a location policy based routing is certainly more usefull.
But one thing can make a difference, POST queries are not indempotent, it means the browser cannot make a POST query without a real user interaction, because a POST query may alter the state of the session or the state of the server data (but js can make POST queries, this is just a default behavior of browsers). The browser will never automatically retrieve a POST query, so you could make a website where all users interactions are POST queries (via forms or via some js altering link clicks to send POST ajax queries instead). But I'm not that's your real goal.

Not technically an answer to the question but I found a simple solution which does what I want: prefix all app based requests with a subdomain eg. http://data.example.com/. It's then really simple to check the host header for that subdomain: if present send the resource else send the index page.

"The owner of this website has banned your access based on your browser's signature" ... on a url request in a python program

When doing a simple request, on python (Entought Canopy to be precise), with urllib2, the server denies me access :
data = urllib.urlopen(an url i cannot post because of reputation, params)
print data.read()
Error:
Access denied | play.pokemonshowdown.com used CloudFlare to restrict access
The owner of this website (play.pokemonshowdown.com) has banned your access based on your browser's signature (14e894f5bf8d0920-ua48).
This is a apparently a generic issue, so I found several clues on the web.
https://support.cloudflare.com/hc/en-us/articles/200171806-Error-1010-The-owner-of-this-website-has-banned-your-access-based-on-your-browser-s-signature:
A firewall, proxy, a browser plugin or extension may be throwing a false positive. Try visiting the site with a different browser as an alternative way of accessing the site.
https://support.cloudflare.com/hc/en-us/articles/200170176-Why-am-I-getting-a-Checking-your-Browser-before-accessing-message-before-entering-a-site-on-CloudFlare-:
The "Checking your browser before accessing (insertsite.com) occurs when the site owner has turned on a DDoS protection and mitigation tool called "I'm Under Attack". The page will generally go away and grant you access to the site after 5 seconds.
Note: You will need to have both JavaScript and Cookies turned on in your browser to pass the check. The check is in place to make sure that you are not part of a botnet."
The answers are rather clear, except for this one thing ... *I'm not using any browser! The request is done trough a python program, with an urllib.urlopen request ...
Does this mean I'm supposed to have, like, cookies and JavaScript turned on in ... Enthought Canopy? Does this sentence makes any sentence at all? I barely understand anything about this browser specific check activating when trying to access the site with a basic request from a programming console. And that's why I ask for your help.
Why does it happen? How to bypass it?

What this site is "checking" is not your browser, it's the "user agent" - a string your client program (browser, Python script or whatever) eventually sends as a request header. You can specify another user agent, cf Changing user agent on urllib2.urlopen.

I just saw it with Safari from my home IP, looking at a site I author! After performing a login to cloudflare website and hitting refresh its back. Probably my mobile internet was too slow (in New Zealand) and the javascript did not load in time? I have DDOS protection and "under attack" enabled AFAIK.

Chrome extension for blocking websites based on database blacklist

We have a database with millions of domain categorizations (storing it client side is not an option) and we want to make a chrome extension to blacklist sites based on how they are categorized in the Mysql database.
The server side stuff is easy, we post the domain, and return the category.
The tricky part is blocking requests based on the categorization. Here are a few potential implementations and why they won't (quite) work.
Idea 1:
Redirect all traffic using Chrome.webRequest to mysite.com/script.php?url=www.theoriginalurl
This script checks the database's category & either redirects them to the theoriginalurl.com or denies the request, redirecting them to www.youGotBlocked...
Have the chrome extension check the http referrer header to make sure that they came from mysite.com (unless the url is mysite.com, then do nothing).
Problems:
It doesn't seem like we can set the referrer header in PHP, so we have no way of knowing that they came from mysite.com. It seems like maybe we should be passing info via a cookie, but I haven't thought of an elegant solution involving cookies.
Idea 2:
Every time Chrome.webRequest fires make an AJAX POST request to mysite.com/categorizeURL.php with the URL to get the category. Block or allow based on the server's response.
Problems:
Either we make the request asynchronous and we can't get the response in time (their is no way that we have found to delay the callback until the server responds -- more on that here). Or we make the request synchronous, and IT WORKS!!! Except for the fact that if they can't reach our server, their entire browser locks up and they essentially need to refresh the extension to be able to access the internet again.
Other ideas?
Does anyone have other ideas for creating a blacklist via a Chrome extension? I simply refuse to believe that it is not possible.

Develop Reference

JavaScript is the programming language of the Web.