We have a database with millions of domain categorizations (storing it client side is not an option) and we want to make a chrome extension to blacklist sites based on how they are categorized in the Mysql database.
The server side stuff is easy, we post the domain, and return the category.
The tricky part is blocking requests based on the categorization. Here are a few potential implementations and why they won't (quite) work.
Idea 1:
Redirect all traffic using Chrome.webRequest to mysite.com/script.php?url=www.theoriginalurl
This script checks the database's category & either redirects them to the theoriginalurl.com or denies the request, redirecting them to www.youGotBlocked...
Have the chrome extension check the http referrer header to make sure that they came from mysite.com (unless the url is mysite.com, then do nothing).
Problems:
It doesn't seem like we can set the referrer header in PHP, so we have no way of knowing that they came from mysite.com. It seems like maybe we should be passing info via a cookie, but I haven't thought of an elegant solution involving cookies.
Idea 2:
Every time Chrome.webRequest fires make an AJAX POST request to mysite.com/categorizeURL.php with the URL to get the category. Block or allow based on the server's response.
Problems:
Either we make the request asynchronous and we can't get the response in time (their is no way that we have found to delay the callback until the server responds -- more on that here). Or we make the request synchronous, and IT WORKS!!! Except for the fact that if they can't reach our server, their entire browser locks up and they essentially need to refresh the extension to be able to access the internet again.
Other ideas?
Does anyone have other ideas for creating a blacklist via a Chrome extension? I simply refuse to believe that it is not possible.
Related
I recently found (here: Does every web request send the browser cookies?) that every HTTP request contains the cookies related to a domain every time a request is made to that same domain.
Given this, what happens when the request is not sent through a browser but from Node.js, for example? Is it possible that no information is sent in the request?
Is it also possible to prevent it to be sent in the browser requests?
Browsers
Is not possible to prevent browser to send cookies.
This is why is generally it is recommended (Yahoo developer Best practice, see section Use Cookie-free Domains for Components) to serve static content like css, images, from a different domain that is cookie free.
When the browser makes a request for a static image and sends cookies together with the request, the server doesn't have any use for those cookies. So they only create network traffic for no good reason. You should make sure static components are requested with cookie-free requests. Create a subdomain and host all your static components there.
Programmatically
From any programming language, instead, you can choose if you like to send cookies or not.
Cookie management is done by the programmer, because libraries are written to make single requests.
So if you make a first request that return cookies, you need to explicit read them, hold them locally somewhere, and eventually put them in a second request to the same server if you need.
So from NodeJS if you don't explicitly add cookies in your requests the http call doesn't hold them.
You Can Use Fetch with the credentials option set to omit
see
https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API
You can strip cookies with a proxy server. For example our product WinGate will allow you to modify requests (and responses), and you could use this to clear the Cookie header in requests.
However, this will prevent a large number of websites from functioning properly, as cookies are used to transport session IDs so that the server can identify each connection / request your browser makes as being from the same "session". HTTP itself does not have any concept of session.
Disclaimer: I work for Qbik who make WinGate.
I'm working on google chrome extension which get the page url and analyze it. How can i intercept the browser request and serve that request condionally based on some criteria. I'm surfing but could find any material.
That's going to be very tricky, if at all possible.
The closest that extensions API provide is blocking webRequest API. There, you can intercept a request and make a decision to allow it or block it, but..
You can only do that until the request is sent out. So you can only rely on the URL and maybe request headers. Even in later events (when it's too late to redirect) no point webRequest API gives access to the response itself.
You have to make the decision synchronously, which basically severely limits processing options.
What you could do (very much in theory) is always redirect the request to your own "loading" page, meanwhile trying to replicate the request yourself (near-impossible to fully do, also consider side-effects), analyze the response and then substitute the "loading" page with the real one.
It's going to be either very complicated or impossible to do in complex cases. You're basically trying to implement an intercepting proxy in a Chrome extension - it doesn't really provide the full toolset to do so.
I'm attempting to create an app with Node.js (using http.createServer()) which will be a single page application with requests for data via XMLHttpRequest. To do this I need to be able to differentiate between a user navigating to my domain, and AJAX requests and requests generated by the browser for linked resources.
If the request is from the user I always want to return the index.html page which will handle requesting content but if the request is browser generated or AJAX and is for CSS, Javascript or other linked files I want to serve those files. Is there any way to detect this?
Looking at the request headers for the different file types I saw the referer header appeared when the request for content was generated by the page. I figured that was the solution I was looking for but that header is also set when a user clicks on a link to the page making it useless.
The only other thing which seems to change is the accept header which could sort of work but might not be a catch all solution. Any user requests always seem to have text/html as the preferred return type regardless of which url was entered. I could detect that but I'm pretty sure AJAX requests for html files would also have that accept header which would cause problems.
Is there anything I'm missing here (any headers or properties I can look for)?
Edit: I do not need the solution to protect files and I don't care about users bypassing it with their own requests. My intention is not to hide files or make them secure, but rather to keep any data that is requested within the scope of the app.
For example, if a user navigates to http://example.com/images/someimage.jpg they are instead shown the index.html file which can then show the image in a richer context and include all of the links and functionality to go with it.
TL/DR: I need to detect when someone is trying to access the app to then serve them the index page and have that send them the content they want. I also need to detect when the browser has requested resources (JS, CSS, HTML, images, etc) needed by the app to be able to actually return the resource not the index file.
In terms of HTTP protocol there are NO difference between a user-generated-query and a browser-generated-query.
Every query is just... a query.
You can make a query with a command line, with a browser, you can click a link, send some ascii text via telnet, request a proxy which will make the query for you, the server goal is never to identify how the query was requested by the user.
See for example a request made by a user on a reverse proxy cache, this query will never reach your server (response comes from the cache), the first query made to build this response could have been made by a real user or by a browser.
In terms of security trying to control that the user is never requesting data by-himself cannot be done by detecting that the query is a real human click (and search google for clickjacking if you want to be afraid). Every query that a browser can make can also be played by the user, every one, you have no way to prevent that.
Some browsers plugins are even doing pre-fetching, detecting links on the page and making the request before you do it yourself (if it's a GET query).
For ajax, some libraries like JQuery will add an X-Requested-With: XMLHttpRequest header, and this is used on most framework to detect ajax mode.
But it is more robust to depend on a location policy for that (like making your ajax queries with a /format/ajax, which could also be used on other ways (like /format/json, /format/html, or /format/csv).
Spending time on a location policy based routing is certainly more usefull.
But one thing can make a difference, POST queries are not indempotent, it means the browser cannot make a POST query without a real user interaction, because a POST query may alter the state of the session or the state of the server data (but js can make POST queries, this is just a default behavior of browsers). The browser will never automatically retrieve a POST query, so you could make a website where all users interactions are POST queries (via forms or via some js altering link clicks to send POST ajax queries instead). But I'm not that's your real goal.
Not technically an answer to the question but I found a simple solution which does what I want: prefix all app based requests with a subdomain eg. http://data.example.com/. It's then really simple to check the host header for that subdomain: if present send the resource else send the index page.
I've got an internal web server that presents an interface to let me view and change some data on that server. When I am in one location I can access the server directly, but from another location I have to create an ssh tunnel to the server. As a result, the URL I put in my browser changes depending on my location: for instance, http://myserver/blah versus http://localhost:8000/blah. It's the same server, just a different host name.
This is inconvenient because occasionally I will forget to save my changes in one location and when I go to the other location the server is suddenly not found. It's also inconvenient because I keep having to reload the page. I would like to just load the page once and have it work in either place. So, I thought I would add some code in my XmlHttpRequest handling to detect if the server is not found and re-issue the request using the alternate server address. The problem is, when this happens I find that my cookies are not sent to the server.
I've got cookies for both localhost and myserver. They are the really the same set of values because it's really the same server, but they are duplicated obviously because the server is accessed from two different host names. If I manually change the host name of the server I have no problem, but obviously this is what I am trying to avoid having to do.
I suspect that perhaps there is some security issue, but after re-reading how cookies work I can't figure out specifically might be tripping this behavior, or how to fix it.
By the way, the problem is NOT that I am trying to do a cross-site request. I explicitly allow this on the server side by returning the field "Access-Control-Allow-Origin:*" and I have had no problem with this part of the request. With firebug I can see the problem is that when the request is re-issued to the new host name, no cookies are sent, even though cookies exist for that host name.
Suppose:
You have a website http://www.example.com that redirects to a project on Google App Engine (i.e. example.appspot.com);
you want communications to pass between the user over SSL (i.e. https://example.appspot.com); and
You want the domain to be shown to the user to be *://www.example.com (i.e. not https://example.appspot.com).
Given that Google's Appspot HTTPS support only works for https://example.appspot.com (i.e. you cannot set up https://www.example.com with GAE), I'd like to have an Ajax solution, namely:
http://www.example.com serves HTML and Javascript over http
Ajax requests go over SSL to https://example.appspot.com
My question/concern is: How does one ensure that the users logged into http://www.example.com (by way of Google's users API) pass their authentication credentials over Ajax to https://example.appspot.com?
This seems to be a violation of the same origin policy (which may or may not be a concern for the Google Users API), so how would one know what user is logged in to example.com for the Ajax requests to example.appspot.com?
Thoughts, comments and input is quite appreciated.
Thank you.
Brian
There are ways to work around same-origin when both sites cooperate, e.g. see this post, but only trial-and-error will reveal which techniques do work for your specific requirements (it may depend on how strictly the user has set security safeguards in their browser, as well as on server-side implementations).
You can try using JSONP to get around the around that. However JSONP doesnt have very good error recovery like JSON does when doing XHR calls.
Wouldn't it be far simpler to use frames? Serve up a single full-size frameset from yourdomain.com containing content from https://yourapp.appspot.com/.
Note, though, that either solution has the problem that users see an unsecured site, not a secured one.
example.appspot.com does not share any cookies with example.com - it will be impossible for you to identify the user without making them sign-in on example.appspot.com as well.
you could, of course, completely ditch Google Authentication on example.appspot.com and implement your own scheme; you could append a signature and the username to the AJAX requests you create and verify that signature on your app-engine app. if the signature is valid, just accept the user that was passed in as the authenticated user and pretend he logged in.