The idea is quite simple in concept:
I would like to create a userscript that will let me press a button and save something on the page(most commonly and problematically images).
Note: A userscript is a script that is injected client-side(by browser extensions such as Tampermonkey and Greasemonkey) and is used to add functionality to a site.
To do so I merely need to call the saveAs() function and pass it the data.
The question then becomes how to I obtain the data.
Most approaches I've seen run into the situation where the resource is not of the same domain as the script perhaps?(not sure how this works).
Now, Tampermonkey(and Greasemonkey) have created a function to deal with this problem specifically - GM_XMLHTTPRequest, which can circumvent the need for proper CORS headers.
This however creates another request to the server, for a file that has already been downloaded.
My question is: Is there a way to not have to send secondary requests to the server?
Here is a chronicle of my efforts:
From what research I've managed to do, you can create a canvas and draw the image in there. However this "taints" the canvas, preventing it from running functions that extract that data(such as .toBlob() or .toDataURL()).
CORS offers 2 mechanisms as far as I understand it: Setting the proper HTTP headers, which requires control of the server, and a special attribute that can be put on HTML elements: crossorigin
I tried adding this property post-load and it won't work, you still get a tainted canvas.
Tampermonkey offers several different options on when to run the script. So the next idea was to run when the DOM is loaded, but the resources haven't yet been fetched. It seems the earliest this is possible is document-end(earlier the getElementById call returns null). However this currently returns an error when loading the image on the page(before any other additional code is run):
Image from origin '...' has been blocked from loading by Cross-Origin Resource Sharing policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin '...' is therefore not allowed access.
There's also the --disable-web-security flag in Chrome, but I'd rather not go there.
No, there is no way to do it without a new request to the server.
When the first request is made, the image is marked as unsafe by the browser, and will then block a few features, like canvas' toDataURL, getImageData or toBlob, or in case of audio files, AudioContext's createMediaElementSource and AnalyserNode's methods and probably some others.
There is nothing you can do to circumvent this security, once it's marked as unsafe, it is unsafe.
You then have to make a new request to the server to get a new file from the server in safe way this time.
Commonly, you would just set the crossOrigin attribute on the media element before doing the request, and after the server has been properly configured to answer to such requests.
Now in your case, it seems clear that you can't configure any server where your script will be used on.
But as you noticed, extensions such as GreaseMonkey or TamperMonkey have access to more features than basic javascript ran from a webpage. In these features, there is one allowing your browser to be less careful about such cross-origin requests, and this is what the GM_xmlhttpRequest method does.
But once again, even extensions don't have enough power to unmark non-safe media.
You must perform a new request, using their less secured way.
Related
Sometimes websites won't let you load images without a valid referrer. Take this as an example, it returns a forbidden error. In a similar question I have found this answer, that suggests doing something like this:
<img src onerror="fetch('https://i.pximg.net/img-original/img/2020/11/28/06/04/23/85949640_p0.png', {headers: {Referer: 'https://www.pixiv.net/en/'}}).then(r=>r.blob()).then(d=> this.src=window.URL.createObjectURL(d));" />
This however, still fails to load the image. Checking the code of some extensions that promise to fix this error by installing scripts shows that they're just changing the referral, and that this should get things working.
Sometimes websites won't let you load images without proper a referrer.
Yes. Plenty of websites do not want to pay to store images and transfer them over the network so that freeloaders can display them on their websites without shouldering the cost.
In a similar question I have found this answer, that suggests doing something like this
That fails for two reasons.
It is a forbidden header. Browsers are designed to prevent your JavaScript from lying about where requests are being triggered from.
Most sites hosting images don't grant permission, via CORS, to third-parties to read them with JS. They are even more unlikely to if they do referer checking to stop freeloading!
If a website doesn't want you displaying images they host, you need to respect that.
Pay for your own image hosting instead.
Don't copy the images from the third-party to your own site unless you are sure they aren't protected by copyright (or you have permission).
How can I copy the source code from a website (with javascript)? I want to copy the text that is showing the temperature from this website: http://www.accuweather.com/
I want to copy only the number that is displaying the temperature. Is there a way of copying that exact line from source code on the website? I heard about html scraping. if not javascript, what would be simplest way of doing it? Just copying the temeprature, and displaying it on my webpage.
Well the way you could do something like that in a simple way by loading the site into a hidden HTML element via AJAX and then search DOM for the element you want.
There is also a jQuery command that allows that directly. It would be something like:
<div id='temp'></div>
<script>
$('div#temp').load('https://www.accuweather.com/ #popular-locations-ul .large-temp', { limit: 1 });
</script>
#popular-locations-ul .large-temp is a css locator for the specific elements that contain the temperature.
However for some time web has a security feature called CORS. To be able to load something from one site via AJAX, the target site has to allow CORS headers explicitly. In the case of this particular site, CORS headers aren't present in the site configuration, so that means that any connection that tries to load something via AJAX won't be allowed.
You can only use a command like the above mentioned in a site you control and that you specify to allow CORS headers or in a site who already has this specification.
But as people have told you that's not a good thing from the start due to web sites impermanent nature. Things change a lot. So even if you could get a value in the way I mentioned from some other site, sometime later, the site would change and your code would be broken.
The reason I answered is because you are just learning and need guidance and not trying to do 'serious work'. Serious work would be using an API as people told you.
An web api is a special url you access (something like https://www.accuweather.com:1234/api/temperature/somecity) normally with some kind of security and that responds with the result you need for the function you want. For this kind service CORS is allowed because you are accessing in a secure and 'official' way.
Hope I clarified a bit.
Plugin only applies filter to files on my server, but with remote images:
Unable to get image data from canvas because the canvas has been tainted by cross-origin data.
How to fix it?
The error you are receiving is part of the browser's built-in security mechanisms to prevent hackers from using remote data to manipulate sites. You can't get around it (and if you could, it would be a major security hole that the browser makers would be very quick to fix)
To confirm this, the same question has already been asked on the Vinatge.js website (as a github issue ticket), and the answer given by the Vinatge.js author was that it isn't possible to load remote images.
Sorry to disappoint you.
The work-around given was to use a proxy loader on your own server to load the images so that they are not given to the site as remote images.
In some cases, it may in fact be possible. It is possible for the remote server to be configured to allow it. However this configuration would need to be done on the remote server, so if you don't have access to that then the problem remains.
Every time i refresh a site and view its page source, the javascript src i.e js.js?version=1364903356; the version number always changes.
My question is: What is the meaning of this number; and if i put js.js in every page, the site is not working.
The version is generally appended for caching purposes, or rather, for invalidating the cache (by changing the version number, and hence, the requested URL), so it's seen as a new resources and downloaded afresh.
The number is probably meaningless. It is almost certainly just being appended to the URL so that the URL changes so the JS won't be fetched from the cache.
it's just for to avoid Caching purposes and request new each time. whenever you visit a same content. if you set static content caching enabled in IIS, then Browser will issue HTTP 304 not modified status to the resource.
you can view in chrome. open developer tools (f12) then go for network tab. you will see in request header like this.
Request Method:GET
Status Code:304 Not Modified
IIS/Any web server wil determine whether the content is changed or the same content. if the content is the same as resides in the cache then it will not iniitate the new request.
by appendign the version number, filename/url/resource will be changed. so browser will issue a new GET request for the resources.
This is a common technique used to prevent or manage caching of javascript and other files that the browser would normally cache.
If the version number always changes, then it means that the page in question is preventing your browser from caching the file at all; every request will load a new copy of the file regardless of whether it's changed or not.
This is poor practice, and likely due to a misconfiguration of the site in question.
More commonly, the version number would remain static, but could be triggered to change by the site itself. This would mean that for most requests the browser's caching would be in play, but that the site owner has control over whether to refresh the cache, for example when he updates the script file.
Without this technique, a browser that has already cached the old version of the file might not know that the file has been updated, and may not fetch the updated version. This could result in version conflicts between script files on the page.
There are, in fact, more technically correct ways of doing this that don't involve adding random values to the end of your URLs. The HTTP standard specifies that the browser should query the URL, and tell the site what version it has cached. The site can then respond with a "Not changed" message, and the browser can use the cached version. This ought to mean that the technique used in the question isn't necessary.
However, the technique is necessary in some cases because some browsers and/or web server configurations may not work correctly with the standard method, and the browser may still end up using the cached version incorrectly.
This technique can therefore be seen as a work-around for that.
we have a site with Iframes pointing to dynamic Urls (by user input).
In case of a 404/500 or any other error, we want to replace the Iframe source with a different user friendly other URL.
For this we can use with the onerror event to identify when the dynamic websites have problems. (then, in case of problem replace the iframe url)
This works also for cross domain urls, however there might be a case where the dynamic url might be malicious and such security issue rises where the malicious code will execute in the same frame ,same domain of our website.
Is this assumption correct?
Is there any solution for this?
Any other suggestions?
Thanks,
Tal
we have a site with Iframes pointing to dynamic Urls (by user input). In case of a 404/500 or any other error, we want to replace the Iframe source with a different user friendly other URL.
So it sounds like you are making a sort of "browser in a web page."
For this we can use with the onerror event to identify when the dynamic websites have problems. (then, in case of problem replace the iframe url)
Yes, except not many things have onerror events. I assume you are aware of this from your comments on other answers. If I understand you right, you're talking about using a dummy script element to load the URL first (as a script, even though it's not really a script), and determine whether the URL is valid using the using onload/onerror handlers for the script element (onerror will not fire on a script error, only a network error).
This works also for cross domain urls, however there might be a case where the dynamic url might be malicious and such security issue rises where the malicious code will execute in the same frame ,same domain of our website.
Is this assumption correct?
Your assumption is correct. If the URL actually does contain a script, it will execute in the user's browser in the same domain as your site.
Is there any solution for this?
A simple workaround might be to do something like what jsfiddle.net does... have a separate subdomain act as a "firewall" between the third-party content and your real domain.
Any other suggestions?
The script preload hack is really just that, a hack. It misappropriates the script tag and makes needless requests. I would probably look into using XHR to fire off a HEAD request instead, or doing some light server-side proxying.
Yes, if you use a <script> tag to embed a remote JS file, you have a security problem as the code is going to be executed in the context of your page.
The only workaround idea that comes to mind is making a server-side request to the resource and parse the response headers. This however may behave differently from a client-side request, as the call will be coming from the server, so it'll have a different IP, different cookies, etc.
If the user can only specify the frame’s URL, then any scripting in the frame’s document would be run in the context of the frame’s document and not in the context of the parent document the frame is embedded.
The question whether a script running inside the frame can access the parent’s document (i. e. your document) depends on the origin of both documents: only if they are equal both document’s are said to be same origin. And only in that case one document can access the other document.