Figuring out source of programmatic content [duplicate]

Figuring out source of programmatic content [duplicate] - javascript

I have a weird network request in my page, which refers to JavaScript files, which I removed from every html file earlier. Cache is cleared and there is no single reference to be found in the source html and the JavaScript files. For fixing that and also out of general curiosity I would like to know if there is a simple way to find out where a request was triggered, preferably using the chrome-devtools.
Update:
Thanks to jaredwilli I found the initator column under the network-tab. However this only shows Other. What I would like to know, is the (html or javascript) file where those Requests have been triggered.

On the Network panel, you can determine what the initiator of a request was by viewing the Initiator column. It gives you the file, line number and type of resource it was, either Script or something else.

Related

<!DOCTYPE html> in JS file

I am referencing two JS files in my map.HTML header. Chrome console gives
Uncaught SyntaxError: Unexpected token <
Here is why I'm confused. When I click on the Chrome Console error message, it takes me to the Sources tab. Under Sources, it puts me on the relative JS tab, and shows code starting with < !DOCTYPE html> then continues with a ton of code that is not in my map.html file or JS file. Presumably this is generated when the JS is read?
The two JS files are:
https://github.com/socib/Leaflet.TimeDimension/tree/master/dist
https://github.com/calvinmetcalf/leaflet-ajax/tree/gh-pages/dist
I am opening map.HTML locally with Chrome using a simple python server using a batch file (python.exe -m http.server).
I am sure this is very basic, but it's confusing me because I reference plenty of other JS files both online and locally and I don't get this error.
Thanks

If you try https://github.com/socib/Leaflet.TimeDimension/blob/master/dist/leaflet.timedimension.min.js in your browser, you will get an HTML page.
If you try https://raw.githubusercontent.com/socib/Leaflet.TimeDimension/master/dist/leaflet.timedimension.min.js you will get what seams a source javascript file. But your browser may also consider it text/html, because that's what github sends in content-type header.
You can use third party sites which will serve files with appropriate content-type header, (example: https://rawgit.com/socib/Leaflet.TimeDimension/master/dist/leaflet.timedimension.min.js ).

In the future, try to do more research before posting here, otherwise a lot of people are going to downvote your questions, and even insult you.
A simple Google search for the differences between html and javascript may be a good start. The first step would be to remove those doctype lines. They mean nothing in Javascript. Just like the word granola has no meaning in Japanese. Different languages.
However, looking at your code, I don't see any DOCTYPE text in your javascript. In order to really debug this, you're going to want to open your webpage (html) in a browser (I recommend Chrome) and press F12 to open the developer tools. Go to the console and trace the error back through all of the files to find the origin.
In order to check and make sure that you're trying to pull javascript files and not html, take all the src urls you're using and paste them in a browser. If you land on a webpage, that url will serve up html, not javascript like you want. If you get a wall of text, you're probably referencing it correctly.
Correct: https://api.mapbox.com/mapbox.js/v3.0.1/mapbox.js
Incorrect: https://github.com/socib/Leaflet.TimeDimension/blob/master/dist/leaflet.timedimension.min.js
Hopefully this helps before this question gets deleted or put on hold. Also notice that people are going to downvote me for actually answering and trying to help.

You can't directly reference code stored in a github repo like you're trying to.
The URLs you're listing aren't javascript files; they're github webpages. That's why they contain HTML doctypes and code you don't recognize -- it's the github website code.
You can get the URL for the actual javascript files by clicking the "raw" button at the top of any of those pages (after selecting a specific individual file -- the urls you gave were for directories, not individual files.) For example:
This is an HTML file: https://github.com/socib/Leaflet.TimeDimension/blob/master/dist/leaflet.timedimension.min.js
This is the raw javascript:
https://raw.githubusercontent.com/socib/Leaflet.TimeDimension/master/dist/leaflet.timedimension.min.js
(That said, I don't believe it's a good idea to treat github like a CDN; usually you would use that purely as a repository and host the actual files in use elsewhere.)

Preventing 'content-sniffing' type vulnerabilities when handling user-uploaded images?

The problem:
I work on an internal tool that allows users to upload images - and then displays those images back to them and others.
It's a Java/Spring application. I have the benefit of only needing to worry about IE11 exactly and Firefox v38+ (Chrome v43+ would be a nice to have)
After first developing the feature, it seems that users can just create a text file like:
<script>alert("malicious code here!")</script>
and save it as "maliciousImage.jpg" and upload it.
Later, when that image is displayed inside image tags like:
<img src="blah?imgName=foobar" id="someImageID">
actualImage.jpg displays normally, and maliciousImage.jpg displays as a broken link - and most importantly no malicious content is interpreted!
However If the user right-clicks on this broken link, and clicks 'view image'... bad things happen.
the browser does 'content-sniffing' a concept which is new to me, detects that 'maliciousImage.jpg' is actually a text file, and very kindly renders it as HTML without hesitation. Any script tags are passed to the JavaScript interpreter and, as you can imagine, we don't want this.
What I've tried so far
In short, every possible combination of response headers I can think of to prevent the browser from content-sniffing. All the answers I've found here on stackoverflow, and other docs, imply that setting the content-type header should prevent most browsers from content-sniffing, and setting X-content options should prevent some versions of IE.
I'm setting the x-content-type-options to no sniff, and I'm setting the response content type. The docs I've read lead me to believe this should stop content-sniffing.
response.setHeader("X-Content-Type-Options", "nosniff");
response.setContentType("image/jpg");
I'm intercepting the response and these headers are present, but seem to have no effect on how the malicious content is processed...
I've also tried detecting which images are and are not malicious at the point of upload, but I'm quickly realizing this is very much non-trivial...
End goal:
Naturally - any output at all for images that aren't really images (garbled nonsense, an unhandled exception, etc) would be better than executing the text-file as HTML/javascript in the clear, but displaying any malicious HTML as escaped/CDATA'd plain-text would be ideal... though maybe a bit impractical.

So I ended up fixing this problem but forgot to answer my own question:
Step 1: blocking invalid images
To get a quick fix out, I simply added some fairly blunt code that checked if an image was actually an image - during upload and before serving it, using the imageio lib:
import javax.imageio.ImageIO;
//......
Image img = attBO.getImage(imgId);
InputStream x = new ByteArrayInputStream(img.getData());
BufferedImage s;
try {
s = ImageIO.read(x);
s.getWidth();
} catch (Exception e) {
throw new myCustomException("Invalid image");
}
Now, initially i'd hoped that would fix my problem - but in reality it wasn't that simple and just made generating a payload more difficult.
While this would block:
<script>alert("malicious code here!")</script>
It's very possible to generate a valid image that's also an XSS payload - just a little more effort....
Step 2: framework silliness
It turned out there was an entire post-processing workflow that I'd never touched, that did things such as append tokens to response bodies and use additional frameworks to decorate responses with CSS, headers, footers etc.
This meant that, although the controller was explicitly returning image/png, it was being grabbed and placed (as bytes) post processing was taking that bytestream, and wrapping it in a header and footer, to form a fully qualified 'view' - this view would always have the 'content-type' text/html and thus was never displayed correctly.
The crux of this problem was that my controller was directly returning an image, in a RESTful fashion, when the rest of the framework was built to handle controllers returning full fledged views.
So I had to step through this workflow and create exceptions for the controllers in my code that returned something other than worked in a restful fashion.
for example with with site-mesh it was just an exclude(as always, simple fix once I understood the problem...):
<decorators defaultdir="/WEB-INF/decorators">
<excludes>
<pattern>*blah.ctl*</pattern>
</excludes>
<decorator name="foo" page="myDecorator.jsp">
<pattern>*</pattern>
</decorator>
and then some other other bespoke post-invocation interceptors.
Step 3: Content negotiation
Now, I finally got the stage where only image bytecode was being served and no review was being specified or explicitly generated.
A Spring feature called 'content negotiation' kicked in. It tries to reconcile the 'accepts' header of the request, with the 'messageconverters' it has on hand to produce such responses.
Because spring by default doesn't have a messageconverter to produce image/png responses, it was falling back to text/html - and I was still seeing problems.
Now, were I using spring 4, I could've simply added the annotation:
#Produces("image/png")
to my controller - simple fix...
Step 4: Legacy dependencies
but because I only had spring 3.0.5 (and couldn't upgrade it) I had to try other things.
I tried registering new messageconverters but that was a headache or adding a new post-method interceptor to simply change the content-type back to 'image/png' - but that was a hacky headache.
In the end I just exposed the request/reponse in the controller, and wrote my image directly to the response body - circumventing Spring's content-negotiation altogether
....and finally my image was served as an image and displayed as an image - and no injected code was executed!

That sounds odd, because it works perfectly elsewhere. Are you sure the X-Content-Type-Options header is present in the responses?
Here is a demo I built a while back, where I have a file that's a valid html, gif and javascript. As you can see it first loads as an HTML, but then loads itself as an image and as a script (which executes):
http://research.insecurelabs.org/content-sniffing/gifjs.html
However if you load it using the "X-Content-Type-Options: nosniff" header, the script no longer executes:
http://research.insecurelabs.org/content-sniffing/nosniff/gifjs.html
Btw, the image renders properly in FF/IE, but not in Chrome.
Here is a demo, where I attempted what you described:
http://research.insecurelabs.org/content-sniffing/stackexchange.html
First image is without nosniff, and second is with, and it seems to work as intended. Second one does not run the script when opened with "view image".
Edit:
Firefox doesn't seem to support X-Content-Type-Options: nosniff
So, you should also add "Content-disposition: attachment;filename=image.gif" or similar to the images. The image will load normally if loaded through an image tag, but if you open the URL directly, you will force a download instead of showing the image directly in the browser.
Example: http://research.insecurelabs.org/content-sniffing/attachment/

adeneo is pretty much spot-on. You should use whatever image library you want to check if the uploaded file is a valid file for the type it claims to be. Anything the client sends can be manipulated.

Force browser to reload all cache after site update

Is there a way to force the clients of a webpage to reload the cache (i.e. images, javascript, etc) after a server has been pushed an update to the code base? We get a lot of help desk calls asking why certain functionality no longer works. A simple hard refresh fixes the problems as it downloads the newly updated javascript file.
For specifics we are using Glassfish 3.x. and JSF 2.1.x. This would apply to more than just JSF of course.
To describe what behavior I hope is possible:
Website A has two images and two javascript files. A user visits the site and the 4 files get cached. As far as I'm concerned, no need to "re-download" said files unless user specifically forces a "hard" refresh or clears their cache. Once a site is pushed an update to one of the files, the server could have some sort of metadata in the header informing the client of said update. If the client chooses, the new files would be downloaded.
What I don't want to do is put meta-tag in the header of a page to force nothing from ever being cached...I just want something that tells the client an update has occurred and it should get the latest once something has been updated. I suppose this would just be some sort of versioning on the client side.
Thanks for your time!

The correct way to handle this is with changing the URL convention for your resources. For example, we have it as:
/resources/js/fileName.js
To get the browser to still cache the file, but do it the proper way with versioning, is by adding something to the URL. Adding a value to the querystring doesn't allow caching, so the place to put it is after /resources/.
A reference for querystring caching: http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.9
So for example, your URLs would look like:
/resources/1234/js/fileName.js
So what you could do is use the project's version number (or some value in a properties/config file that you manually change when you want cached files to be reloaded) since this number should change only when the project is modified. So your URL could look like:
/resources/cacheholder${project.version}/js/fileName.js
That should be easy enough.
The problem now is with mapping the URL, since that value in the middle is dynamic. The way we overcame that is with a URL rewriting module that allowed us to filter URLs before they got to our application. The rewrite watched for URLs that looked like:
/resources/cacheholder______/whatever
And removed the cacheholder_______/ part. After the rewrite, it looked like a normal request, and the server would respond with the correct file, without any other specific mapping/logic...the point is that the browser thought it was a new file (even though it really wasn't), so it requested it, and the server figures it out and serves the correct file (even though it's a "weird" URL).
Of course, another option is to add this dynamic string to the filename itself, and then use the rewrite tool to remove it. Either way, the same thing is done - targeting a string of text during rewrite, and removing it. This allows you to fool the browser, but not the server :)
UPDATE:
An alternative that I really like is to set the filename based on the contents, and cache that. For example, that could be done with a hash. Of course, this type of thing isn't something you'd manually do and save to your project (hopefully); it's something your application/framework should handle. For example, in Grails, there's a plugin that "hashes and caches" resources, so that the following occurs:
Every resource is checked
A new file (or mapping to this file) is created, with a name that is the hash of its contents
When adding <script>/<link> tags to your page, the hashed name is used
When the hash-named file is requested, it serves the original resource
The hash-named file is cached "forever"
What's cool about this setup is that you don't have to worry about caching correctly - just set the files to cache forever, and the hashing should take care of files/mappings being available based on content. It also provides the ability for rollbacks/undos to already be cached and loaded quickly.

i use a no-cache parameter for this situations...
a have a string constant value like (from config file)
$no_cache = "v11";
and in pages, i use assets like
<img src="a.jpg?nc=$no_cache">
and when i update my code, just change the $no_cache value, and it works like a charm.

What is a "?" for in the src attribute of a html script tag?

If this has been asked before, I apologize but this is kinda of a hard question to search for. This is the first time I have come across this in all my years of web development, so I'm pretty curious.
I am editing some HTML files for a website, and I have noticed that in the src attribute of the script tags that the previous author appended a question mark followed by data.
Ex: <script src="./js/somefile.js?version=3.2"></script>
I know that this is used in some languages for value passing in GET request, such as PHP, but as I far as I ever knew, this wasn't done in javascript - at least in calling a javascript file. Does anyone know what this does, if anything?
EDIT: Wow, a lot of responses. Thanks one and all. And since a lot of people are saying similar things, I will post an global update instead of commenting everyone.
In this case the javascript files are static, hence my curiosity. I have also opened them up and did not see anything attempt to access variables on file load. I've never thought about caching or plain version control, both which seam more likely in this circumstance.

I believe what the author was doing was ensuring that if he creates version 3.3 of his script he can change the version= in the url of the script to ensure that users download the new file instead of running off of the old script cached in their browser.
So in this case it is part of the caching strategy.

My guess is it's so if he publishes a new version of the JavaScript file, he can bump the version in the HTML documents. This will not do anything server-side when requested, but it causes the browser to treat it as a different file, effectively forcing the browser to re-fetch the script and bypass the local cache of the file.
This way, you can set a really high cache time (like a week or a month!) but not sacrifice the ability to update scripts frequently if necessary.

What you have to remember is that this ./js/somefile.js?version=3.2 doesn't have to be a physical file. It can be a page which creates the file on the fly. So you could have it where the request says, "Hey give me version 3 of this js file," and the server side code creates it and writes it to the output stream.
The other option is to force the browser to not cache the file and pull down the new one when it makes the request. Since the URI changed, it will think the file is completely new.

A (well-configured) web server will send static files like JavaScript source code once and tell the web browser to cache that file locally for a certain period of time (could be a day, a week, a month, or longer). When the browser sees another request for that same file, it will just use that version instead of getting new code from the server.
If the URL changes -- for example by adding a query string -- then the browser suspects that its cached version is no good and gets a new one. As such, the ? helps developers say "Oops, I changed this file, make sure the browser gets a new copy."

In this case it's probably being used to ensure the source file isn't cached between versions.
Of course, it could also be used server side to generate the javascript file, without knowing what you have on the other end of the request, it's difficult to be definitive.
BTW, the ?... portion of the url is called the query string.

this is used to guarantee that the browser downloads a new version of the script when available. The version number in the url is incremented each time a new version is deployed so that the browser see it as a different file.

Just because the file extension is .js doesn't mean that the target is an actual .js file. They could set up their web server to pass the requested URL to a script (or literally have a script named somefile.js) and have that interpret the filename and version.

The query string has nothing to do with the javascript. Some server side code is hosting up a different version depending on that querystring it appears.
You should never assume anything about paths in a URL. The extension on a path in a URL doesn't really tell you anything. URLs can be completely dynamic and served by some server side code or can rewritten in web servers dynamically.
Now it is common to add a querystring to urls when loading javascript files to prevent client side caching. If the page updates and references a new version of the script then the page can bust through and cause the client to refresh it's script.

Refused to execute a JavaScript script. Source code of script found within request

In WebKit I get the following error on my JavaScript:
Refused to execute a JavaScript script. The source code of script found within request.
The code is for a JavaScript spinner, see ASCII Art.
The code used to work OK and is still working correctly in Camino and Firefox. The error only seems to be thrown when the page is saved via a POST and then retrieved via a GET. It happens in both Chrome/Mac and Safari/Mac.
Anyone know what this means, and how to fix this?

This "feature" can be disabled by sending the non-standard HTTP header X-XSS-Protection on the affected page.
X-XSS-Protection: 0

It's a security measure to prevent XSS (cross-site scripting) attacks.
This happens when some JavaScript code is sent to the server via an HTTP POST request, and the same code comes back via the HTTP response. If Chrome detects this situation, the script is refused to run, and you get the error message Refused to execute a JavaScript script. Source code of script found within request.
Also see this blogpost about Security in Depth: New Security Features.

Short answer: refresh the page after making your initial submission of the javascript, or hit the URL that will display the page you're editing.
Long answer: because the text you filled into the form includes javascript, and the browser doesn't necessarily know that you are the source of the javascript, it is safer for the browser to assume that you are not the source of this JS, and not run it.
An example: Suppose I gave you a link your email or facebook with some javascript in it. And imagine that the javascript would message all your friends my cool link. So, the game of getting that link to be invoked becomes simply, find a place to send the javascript such that it will be included in the page.
Chrome and other WebKit browsers try to mitigate this risk by not executing any javascript that is in the response, if it was present in the request. My nefarious attack would be thwarted because your browser would never run that JS.
In your case, you're submitting it into a form field. The Post of the form field will cause a render of the page that will display the Javascript, causing the browser to worry. If your javascript is truly saved, however, hitting that same page without submitting the form will allow it to execute.

As others have said, this happens when an HTTP response contains a JavaScript and/or HTML string that was also in the request. This is usually caused by entering JS or HTML into a form field, but can also be triggered in other ways such as manually tweaking the URL's parameters.
The problem with this is that someone with bad intentions could put whatever JS they want as the value, link to that URL with the malicious JS value, and cause your users trouble.
In almost every case, this can be fixed by HTML encoding the response, though there are exceptions. For example, this will not be safe for content inside a <script> tag. Other specific cases can be handled differently - for example, injecting input into a URL is better served by URL encoding.
As Kendall Hopkins mentioned, there may be a few cases when you actually want JavaScript from form inputs to be executed, such as creating an application like JSFiddle. In those cases, I'd recommend that you you at least scrub through the input in your backend code before blindly writing it back. After that, you can use the method he mentioned to prevent the XSS blockage (at least in Chrome), but be aware that it is opening you to attackers.

I used this hacky PHP trick just after I commit to database, but before the script is rendered from my _GET request.:
if(!empty($_POST['contains_script'])) {
echo "<script>document.location='template.php';</script>";
}
This was the cheapest solution for me.

Develop Reference

JavaScript is the programming language of the Web.