Intercept browser request using chrome extension

Intercept browser request using chrome extension - javascript

I'm working on google chrome extension which get the page url and analyze it. How can i intercept the browser request and serve that request condionally based on some criteria. I'm surfing but could find any material.

That's going to be very tricky, if at all possible.
The closest that extensions API provide is blocking webRequest API. There, you can intercept a request and make a decision to allow it or block it, but..
You can only do that until the request is sent out. So you can only rely on the URL and maybe request headers. Even in later events (when it's too late to redirect) no point webRequest API gives access to the response itself.
You have to make the decision synchronously, which basically severely limits processing options.
What you could do (very much in theory) is always redirect the request to your own "loading" page, meanwhile trying to replicate the request yourself (near-impossible to fully do, also consider side-effects), analyze the response and then substitute the "loading" page with the real one.
It's going to be either very complicated or impossible to do in complex cases. You're basically trying to implement an intercepting proxy in a Chrome extension - it doesn't really provide the full toolset to do so.

Related

Http request to a website to get the content of a specific html element

I am building a site to help students schedule their university courses. It will include things like days, times, professor, etc. I want to fetch the "rating" of professors off www.ratemyprofessors.com and have it show on my site. For example, at https://www.ratemyprofessors.com/ShowRatings.jsp?tid=1230754 you can see Michael has a rating of 4.6. I want to request that data and have it show on the site. I can't scrape it beforehand as their ratings change and I want it to show their current rating. Am I able to do this with an XmlHttpRequest? How would I do that? I'm hoping to do it in JavaScript.

Browser won't let http requests towards third party websites leave your webpage unless the target site allows it. This is called CORS. See https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS. While you may be lucky if that site allows (or doesn't disallow), that may change in the future, leaving you in a bind (malfuntioning feature).
Also, what you're planning to do is called web scraping, and typically it isn't favored by webmasters, so you might eventually get blocked or stumble upon a change in content markup, again leaving you in the same bind.
I would ask the owner of that site for permission and, perhaps, API access.
Otherwise, your option #1 is to try making that http-request from the browser-level script (yes, you can use ajax, XmlHttpRequest, the new fetch API, or a third-party script), which will work only if CORS isn't a problem.
Your option #2 is to make the same request from the server (so, ajax to your server app, which scrapes the remote site), and this would be the workaround for the potential CORS problem. Again, CORS is an obstacle only at the browser level, cause browsers are coded to intercept that to minimize potential harm to user's data. However, this option is subject to eventually having your server blocked from accessing the remote site, which would be done by that site's owner and by simply configuring it to not accept connections from IP addresses that they detect as belonging to your site. Pretty cool, huh?
Both of these options are further subject to the problem of dealing with content changes, which would be in hands of your post-request script, whether executing at the browser (option 1) or at the server (option 2), which could be an ongoing maintenance. Either way, craft it in such a way to treat that 3rd-party data as a nice-to-have (so, don't crash your page when fetching that other data fails).
Edit: I would have to try this to be certain, but it's something to think about: you could embed a hidden iframe in your page, targetting that remote webpage (as in your example), then parse the iframe's content once it's available. Note that this endeavor (did I spell that right) is not trivial AT ALL, and it would cost quite a chunk of development time (and it wouldn't be a task a beginner could reasonably complete, at least not quickly), and - again - I am not 100% certain that it would even be possible, as the iframe-hosting webpage may not have access to iframe's content when it'a served by a 3rd-party website. So, this would potentially be option #3, and it would be at-browser solution (so, lots of javascript), however not susceptible to CORS blocking. Phew, a lot of words, I know - but they do make sense, if you can believe me.
Hope that helps decide. Good luck.

How much network troubleshooting could be done with a JavaScript AJAX script?

Sometimes it could be useful for a web application to be able to determine whether a seeming slowness in the application is due to network conditions or something else.
To make it easier to determine the cause in this kind of situation, I was thinking about creating something like a "check the connectivity" client-side script makes e.g. an AJAX call to the server and returns some useful metrics that make it easier to determine if the cause lies with the network.
But how much useful network troubleshooting could be done using a JavaScript that calls the server?

It sounds like you need a way to keep an eye on your site's performance from the end-user's perspective.
One way to do this is to have your client-side scripts include a way to log to a log aggregation site like SumoLogic. Here is a doc to reference about using client-side JavaScript to log to SumoLogic.
On your server side, you could implement a /ping API endpoint that would just immediately return true so you know how long it takes your user to at least reach your site. You can then log to SumoLogic how long that request took. You could do this with other requests as well to see which APIs are slower than others.
If you include geo-location when logging to SumoLogic, you can see how well your site performs around the world.
And if you want to get really fancy, then you should implement a custom header that your APIs understand which is a transaction token of some sort for all requests. When your server receives that header, it should use that token throughout the request's logs so you can see where things go wrong and what to do about them.
Another good site to check out for this sort of thing is New Relic - Browser Monitoring. This is much more performance-centric and you don't get the insights of your own logs, but it's an awesome app in its own right.
EDIT
As mentioned in the comments by #Bergi, you could also have your server respond with the headers immediately and measure performance that way.

How to add headers to a browser request without any browser extensions?

Sometimes, one needs to add special headers to each request or specific requests made from a browser. The common approach to do this is by using browser extensions which allow us to modify request headers. Is there another way to do this, without any browser extension ?
PS - I have searched SO and not found a single post which actually suggests or shows how to do what I need.

Outside of APIs designed to make custom HTTP requests (XMLHttpRequest and fetch), it is impossible to add arbitrary HTTP headers to requests made by browsers using JS embedded in a page.

If you control the websites that you want this functionality on, you could achieve this by setting each application to install a ServiceWorker. In a nutshell, service workers run as a proxy server within your browser. They can do things like notify you of updates even if you don't have the website open.
Within a ServiceWorker you are able to set up event listeners that can do some asynchronous task on behalf of the client app. This includes the fetch event which is fired every time the web page makes a request.
Here's a write up on someone implementing a ServiceWorker who also needed to intercept network requests. You could follow most of this and just alter the logic when inspecting the request type. At that point you could add any special headers before dispatching on the applications behalf.

Theres no possibility to edit requests in existing DOM without using any external tools. The most suitable is Browser Extension which is editing the existing DOM and HTTP requests (XMLHttpRequest and fetch) done by JavaScript code.
Theres millions of possibilities to add headers to requests if the owner of website is you. And the solutions are different consider on what lib are you using for doing requests.
But in general it's not recommended to modify website data that is not yours.
The Browser Extension is the exact thing that you found for your problem.
Hope my comment will help you.

Chrome extension for blocking websites based on database blacklist

We have a database with millions of domain categorizations (storing it client side is not an option) and we want to make a chrome extension to blacklist sites based on how they are categorized in the Mysql database.
The server side stuff is easy, we post the domain, and return the category.
The tricky part is blocking requests based on the categorization. Here are a few potential implementations and why they won't (quite) work.
Idea 1:
Redirect all traffic using Chrome.webRequest to mysite.com/script.php?url=www.theoriginalurl
This script checks the database's category & either redirects them to the theoriginalurl.com or denies the request, redirecting them to www.youGotBlocked...
Have the chrome extension check the http referrer header to make sure that they came from mysite.com (unless the url is mysite.com, then do nothing).
Problems:
It doesn't seem like we can set the referrer header in PHP, so we have no way of knowing that they came from mysite.com. It seems like maybe we should be passing info via a cookie, but I haven't thought of an elegant solution involving cookies.
Idea 2:
Every time Chrome.webRequest fires make an AJAX POST request to mysite.com/categorizeURL.php with the URL to get the category. Block or allow based on the server's response.
Problems:
Either we make the request asynchronous and we can't get the response in time (their is no way that we have found to delay the callback until the server responds -- more on that here). Or we make the request synchronous, and IT WORKS!!! Except for the fact that if they can't reach our server, their entire browser locks up and they essentially need to refresh the extension to be able to access the internet again.
Other ideas?
Does anyone have other ideas for creating a blacklist via a Chrome extension? I simply refuse to believe that it is not possible.

How does google analytics collect its data?

Yes, I know you have to embed the google analytics javascript into your page.
But how is the collected information submitted to the google analytics server?
For example an AJAX request will not be possible because of the browsers security settings (cross domain scripting).
Maybe someone had already a look at the confusing google javascript code?

When html page makes a request for a ga.js file the http protocol sends big amount of data, about IP, refer, browers, language, system. There is no need to use ajax.
But still some data cant be achieved this way, so GA script puts image into html with additional parameters, take a look at this example:
http://www.google-analytics.com/__utm.gif?utmwv=4.3&utmn=1464271798&utmhn=www.example.com&utmcs=UTF-8&utmsr=1920x1200&utmsc=32-bit&utmul=en-us&utmje=1&utmfl=10.0%20r22&utmdt=Page title&utmhid=1805038256&utmr=0&utmp=/&utmac=cookie value
This is a blank image, sometimes called a tracking pixel, that GA puts into HTML.

Some good answers here which individually tend to hit on one method or another for sending the data. There's a valuable reference which I feel is missing from the above answers, though, and covers all the methods.
Google refers to the different methods of sending data 'transport mechanisms'
From the Analytics.js documentation Google mentions the three main transport mechanisms that it uses to send data.
This specifies the transport mechanism with which hits will be sent. The options are 'beacon', 'xhr', or 'image'. By default, analytics.js will try to figure out the best method based on the hit size and browser capabilities. If you specify 'beacon' and the user's browser does not support the navigator.sendBeacon method, it will fall back to 'image' or 'xhr' depending on hit size.
One of the common and standard ways to send some of the data to Google (which is shown in Thinker's answer) is by adding the data as GET parameters to a tracking pixel. This would fall under the category which Google calls an 'image' transport.
Secondly, Google can use the 'beacon' transport method if the client's browser supports it. This is often my preferred method because it will attempt to send the information immediately. Or in Google's words:
This is useful in cases where you wish to track an event just before a user navigates away from your site, without delaying the navigation.
The 'xhr' transport mechanism is the third way that Google Analytics can send data back home, and the particular transport mechanism that is used can depend on things such as the size of the hit. (I'm not sure what other factors go into GA deciding the optimal transport mechanism to use)
In case you are curious how to force GA into using a specific transport mechanism, here is a sample code snippet which forces this event hit to be sent as a 'beacon':
ga('send', 'event', 'click', 'download-me', {transport: 'beacon'});
Hope this helps.
Also, if you are curious about this topic because you'd like to capture and send this data to your own site too, I recommend creating a binding to Google Analytics' send, which allows you to grab the payload and AJAX it to your own server.
ga(function(tracker) {
// Grab a reference to the default sendHitTask function.
originalSendHitTask = tracker.get('sendHitTask');
// Modifies sendHitTask to send a copy of the request to a local server after
// sending the normal request to www.google-analytics.com/collect.
tracker.set('sendHitTask', function(model) {
var payload = model.get('hitPayload');
originalSendHitTask(model);
var xhr = new XMLHttpRequest();
xhr.open('POST', '/index.php?task=mycollect', true);
xhr.send(payload);
});
});

Without looking at the code, I assume their data is collected from the HTTP headers they receive in the asynchronous request.
Remember that most browsers send data such as OS, platform, browser, version, locale, etc... Also they do have the IP so they can guesstimate your location. And I assume they have some sort of clever algorithm to decide whether you are a unique visitor or not.
Time on the site is probably calculated by using an onUnload() event.

Google Analytics web page provides detailed information of how Google Analytics server collect data. http://code.google.com/apis/analytics/docs/concepts/gaConceptsOverview.html
All Google Analytics data is collected and packed into the Request URL's query string and sent to Google Analytics server. The http request is made by a gif image(http://www.google-analytics.com/__utm.gif) activated by Google Analytics JS.

It's easy enough to tell by using something like Firebug's Net tab.
Ajax isn't needed - since data isn't being fetched from Google. They just encode the information in a query string, and then load a transparent gif using it.

To expand on other very good answers, Google does provide an API to track async "virtual pageviews" which are reported by website authors themselves in their scripts to Google.
_gaq.push(['_trackPageview', 'my_unique_action']);
They provide it so it is possible to track actions that are not part of regular page views and http requests.
Async tracking guide:
http://code.google.com/apis/analytics/docs/tracking/asyncUsageGuide.html#Syntax

Use the httpfox or firebug Firefox extension to figure out what HTTP requests the browser sends and what responses it receives.
I don't know how Google Analytics works, but one possibility is to make the browser download an image: <img src="http://my-analytics.com" width="1" height="1"> (with a single, transparent pixel), and log all the HTTP request headers (e.g. Referer:) on the server side.

//edit: see coment at the bottom
*Ok, find an answer during a discussion with a friend of mine :-)
The informations to google analytics are submitted in three ways:
List item
The HTTP Request can be analyzed with all informations of the http headers.
A cookie is recognized by the google analytics server.
An ajax call is done within the embeded javascript to submit such informations like display resolution, flash player version, etc.
These informations are not transmitted via the http headers.
*This is possible, because the ajax call is done in the context of the embedded javascript, so its no cross domain scripting. This was an error in reasoning by me.**

Develop Reference

JavaScript is the programming language of the Web.