PHP $_SERVER['HTTP_REFERER'] vs. Javascript document.referrer? - javascript

Ultimately I need to know what domain is hosting one of my javascript files. I have have read and experienced first hand that $_SERVER['HTTP_REFERER'] unreliable. One of the first 3 browser/computer combos I tested didn't send the HTTP_REFERER, and I know that it can be spoofed. I implemented a different solution using two javascript methods.
document.referrer
AND
window.location.href
I use the former to get the url of the window where someone clicked on one of my links. I use the former to see which domain my javascript file is included in. I have tested it a little so far and it is grabbing the urls from the browser very well with no hiccups. My question is, are the two javascript methods reliable? Will they return the url from the browser everytime or are there caveats like using the $_SERVER['HTTP_REFERER'] that I haven't run into yet?

You should always assume that any information about the referrer URI is going to be unavailable (or perhaps even unreliable), due to browsers or users wanting to conceal this information because of privacy issues.
In general, you won't have the referrer information when linking from an HTTPS to an HTTP domain. Check this question for more info on this:
https://webmasters.stackexchange.com/questions/47405/how-can-i-pass-referrer-header-from-my-https-domain-to-http-domains
About using window.location.href, I'd say it's reliable in practice, but only because it's interesting that the client will supply the correct information so that applications depending on that will behave as expected.
Just keep in mind that this is still the client side sending you some information, so it'll always be up to the browser to send you something that is correct. You can't have control over that, just trust that it's going to work according to what is specified in the standard. The client might still decide to conceal it or fake it for any reason.
For example it might be possible that in some situations, like third party included scripts (also privacy reasons), the browser might opt to just leave it blank.

Related

Http request to a website to get the content of a specific html element

I am building a site to help students schedule their university courses. It will include things like days, times, professor, etc. I want to fetch the "rating" of professors off www.ratemyprofessors.com and have it show on my site. For example, at https://www.ratemyprofessors.com/ShowRatings.jsp?tid=1230754 you can see Michael has a rating of 4.6. I want to request that data and have it show on the site. I can't scrape it beforehand as their ratings change and I want it to show their current rating. Am I able to do this with an XmlHttpRequest? How would I do that? I'm hoping to do it in JavaScript.
Browser won't let http requests towards third party websites leave your webpage unless the target site allows it. This is called CORS. See https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS. While you may be lucky if that site allows (or doesn't disallow), that may change in the future, leaving you in a bind (malfuntioning feature).
Also, what you're planning to do is called web scraping, and typically it isn't favored by webmasters, so you might eventually get blocked or stumble upon a change in content markup, again leaving you in the same bind.
I would ask the owner of that site for permission and, perhaps, API access.
Otherwise, your option #1 is to try making that http-request from the browser-level script (yes, you can use ajax, XmlHttpRequest, the new fetch API, or a third-party script), which will work only if CORS isn't a problem.
Your option #2 is to make the same request from the server (so, ajax to your server app, which scrapes the remote site), and this would be the workaround for the potential CORS problem. Again, CORS is an obstacle only at the browser level, cause browsers are coded to intercept that to minimize potential harm to user's data. However, this option is subject to eventually having your server blocked from accessing the remote site, which would be done by that site's owner and by simply configuring it to not accept connections from IP addresses that they detect as belonging to your site. Pretty cool, huh?
Both of these options are further subject to the problem of dealing with content changes, which would be in hands of your post-request script, whether executing at the browser (option 1) or at the server (option 2), which could be an ongoing maintenance. Either way, craft it in such a way to treat that 3rd-party data as a nice-to-have (so, don't crash your page when fetching that other data fails).
Edit: I would have to try this to be certain, but it's something to think about: you could embed a hidden iframe in your page, targetting that remote webpage (as in your example), then parse the iframe's content once it's available. Note that this endeavor (did I spell that right) is not trivial AT ALL, and it would cost quite a chunk of development time (and it wouldn't be a task a beginner could reasonably complete, at least not quickly), and - again - I am not 100% certain that it would even be possible, as the iframe-hosting webpage may not have access to iframe's content when it'a served by a 3rd-party website. So, this would potentially be option #3, and it would be at-browser solution (so, lots of javascript), however not susceptible to CORS blocking. Phew, a lot of words, I know - but they do make sense, if you can believe me.
Hope that helps decide. Good luck.

how to get hostname from Javascript, so that user cannot spoof it?

Though we can get the hostname from the javascript by using window.location.hostname but the other user can download the js and pass it as constant valid hostname, I'm working on something for which I need to know where the script is hosted and the user of that js cannot spoof it.
One more solution which i thought of is using request.headers.origin but that also can be spoofed.
Is there any such solution by which I can get the hostname where the js is hosted so that I can restrict unauthorized use of js hosting.
I tried googling but couldn't find any solution. The solution which most people suggest is to obfuscate the js code after passing it from js only.
Can we do better?
You can't.
Everything that happens in the browser is entirely under the control of the user.
If you don't trust the user, then you can't trust any information you get from the browser.
You should find a source for the information that doesn't depend so heavily on the browser (e.g. generating the information server side and then associating it with information from the browser via the use of a session).

Very Confused (And Worried) about security with JSON and Javascript

I've been attempting to do some research on this topic for a while, and even cite the following Stack Overflow threads :
Javascript Hijacking - When and How Much Should I Worry
JSON Security Best Practices
But my basic problem is this.
When I am building my web applications, I use tools like Fiddler, Chrome Developer Tools, Firebug, etc. I change things on the fly to test things. I can even seem to use fiddler to change the data that gets sent to the server.
What stops someone else from just opening up my webpage and doing this too? All of the jQuery validation in the world is useless if a user can just hit F12 and open up Chrome Developer tools, and change the data being sent over the wire, right?
I'm still relatively new in this field and this just has me very concerned as I see "Open" Protocols become more and more ubiquitous. I don't understand SSL yet (which is on my list of things to begin researching), so perhaps that is the answer and I just haven't dug deep enough. But the level of flexibility I have over manipulating my pages seems very extreme - which has me very concerned about what someone malicious could do.
Your concerns are indeed justified. This is why you should always validate everything on the server. Client-side validation should only be used for UX.
JavaScript's security is, in a nutshell, based around a trusted server. If you always trust what code the server sends you, it should be safe. It's impossible for a third party (like an ad supplier) to fetch data from the domain it's included on.
If the server also sends you user generated content, and in particular user generated code, then you have a potential security problem. This is what XSS attacks focus on (running a malicious script in a trusted environment).
Client side validation should focus on easy of use, make it easy to correct mistakes or guide the user so no mistakes are made. The server should always do validation, but validation of a more strict nature.
Validation should always happen Server Side, Client Side Validation is only valuable to make for a more convenient experience for the user. You can never trust a user to not manipulate the data on their end. (Javascript is ClientSide)
Next if you are wanting to secure your service so that only user1 can edit user1's profile you'll need to sign you JSON request with OAuth (or similar protocol).
yeah nothing can stop anybody from interfering the data that is being sent from the browser to your server and that's the reason you shouldn't trust it
always check the data from the user for authenticity and validity
also with it you can check and interfere with the data that big sites like google and microsoft send back and you might get an idea.
You have to assume that the client is malicious-- using SSL does not prevent this at all. All data validation and authorization checking needs to be done server side.
Javascript isn't going to be you only line of defense against hackers, in fact it shouldn't be used for security at all. Client side code can be used to verify form input so that users trying to use the page can have faster response times, and the page runs nice. Anyone who is trying to hack your page isn't going to care if your page works or not. No matter what, everything coming into your server should be verified and never assumed as safe.

Best practice use sam AJAX in multiple browser windows?

I am developing a website that has some sort of realtime update.
Now the website is generated with a javascript variable of the current ID of the dataset.
Then in an interval of some seconsd an AJAX call is made passing on the current ID, and if theres something new the server returns it along with the latest ID which is then updated in the javascript.
Very simple, but here comes the Problem.
If the user opens the same page multiple times, every page does this AJAX requests which produces heavy serverload.
Now I thought about the following approach:
The website is loaded with a javascript variable of the current timestamp and ID of the current dataset.
My desired refresh interval is for example 3 seconds.
In the website an interval counter counts up every seconds, and everytime the timestamp reaches a state where (timestmap % 3===0) returns true, the content is updated.
The link looks like http://www.example.com/refresh.php?my-revision=123&timestamp=123456
Now this should ensure that every browser window calls the same URL.
Then I can turn on browser level caching.
But I don't really like this solution.
I would prefer adding another layer of data sharing in a Cookie.
This shouldn't be much of a problem, I can just store every request in a cookie named by timestamp and data revision with a TTL of 10 seconds or so and check for its exitence first.
BUT
The pages will do the request at the same time. So the whole logic of browser caching and cookie might not work because the requests occour simultanously and not one after another.
So I thought about limiting the current connections to 1 server side. But then I would need at least an extra vhost, because I really dont want to do that for the whole page.
And this lets me run into problems concerning cross-site policies!
Of course there are some super complicated load balancing solutions / server side solusions bound to request uri and ip adress or something but thats all extreme overkill!
It must be a common problem! Just think of facebook chat. I really don't think they do all the requests in every window you have open...
Any ideas? I'm really stuck with this one!
Maby I can do some inter-window Javascript communication? Shouldnt be a problem if its all on the same domain?
A thing I can do of course is server side caching. Which avoids at least DB Connections and intensive calculations... but it still is an request which I would like to avoid.
You might want to check out Comet and Orbited .
This is best solved with server push technology.
The first thing is: Do server-side caching anyway, using Memcache or Redis or whatever. So you're defended against three machines doing the requests. But you knew that.
I think you're onto the right thing with cookies, frankly (but see below for a more modern option) — they are shared by all window instances, easily queried, etc. Your polling logic could look something like this:
On polling interval:
Look at content cookie: Is it fresher than what you have? If so, use it and you're done.
Look at status cookie; is someone else actively polling (e.g., cookie is set and not stale)? If yes, come back in a second.
Set status cookie: I'm actively polling at (now).
Do request
On response:
If the new data is newer than the (possibly updated) contents of the content cookie, set the content cookie to the new data
Clear status cookie if you're the one who set it
Basically, the status cookie acts as a semaphore indicating to all window instances that someone, somewhere is on the job of updating the content.
Your content cookie might contain the content directly, or if your content is large-ish and you're worried about running into limits, you could have each page have a hidden iframe, each with a unique name, and have your Ajax update write the output to the iframe. The content cookie would publish the name of the most up-to-date iframe, and other windows seeing that there's fresh content could use window.open to get at that iframe (since window.open doesn't open a window if you use the name of an existing one).
Be alert to race conditions. Although JavaScript within any given page is single-threaded (barring the explicit use of web workers), you can't expect that JavaScript in the other windows is necessarily running on the same thread (it is on some browsers, not on others — heck, on Chrome it's not even the same process). I also don't know that there's any guarantee of atomicity in writing cookies, so you'll want to be vigilant.
Now, HTML5 defines some useful inter-document communication mechanisms, and so you might consider looking to see if those exist and using them before falling back on this cookie approach, since they'll work in modern browsers today but not in older browsers you're probably having to deal with right now. Still, on the browsers that support it, great!
Web storage might also be an option worth investigating as an aspect of the above, but your clients will almost certainly have to give your app permissions and it's also a fairly new thing.

How can I use JavaScript to identify a client?

I have a problem where I cannot identify visitors to my intranet page because their browser is configured to use a proxy, even for the local intranet. I always see the proxy IP and no other details about the client. The SOE that my company uses has the proxy set up already for Firefox and Internet Explorer, and I cannot ask them to reconfigure their browser because that is fairly complicated. I have tried using the PHP $_SERVER['REMOTE_ADDR'] and also one called $HTTP_SERVER_VARS['HTTP_X_FORWARD_FOR']. In fact, I wrote a page that lists both the $_SERVER and $HTTP_SERVER_VARS arrays and there was nothing informative of the actual client connecting. This is why I think it needs to be done on the client's side.
I'm not looking for a secure solution because it is only a simple page, so I was hoping that I could use Javascript or something similar to find something revealing about the client and send it to my intranet page as a GET variable. It's basically for collating statistics. It is no use telling me most of the visitors are a proxy! :)
I also want to avoid having users log in if possible.
You could use a cookie with a random, unique ID that's set upon the first entrance, and then used for identification. Could be done either in JavaScript or in PHP.
I am pretty sure there's no universal way to do this otherwise the whole concept of anonymous proxies go down the drain :)
My advice would be to ask your IT department to configure the proxy to populate the HTTP-X-FORWARD-FOR, REMOTE-ADDR or some other identifying header.

Categories

Resources