Get file size without downloading it using javascript [duplicate]

Get file size without downloading it using javascript [duplicate] - javascript

I can't seem to find a way to display the size of a file in JavaScript in a terminal simulator. (I'm very new to JavaScript)
I've tried these:
https://bitexperts.com/Question/Detail/3316/determine-file-size-in-javascript-without-downloading-a-file
Ajax - Get size of file before downloading
My expected results were to get the byte size but nothing happens.
I'm not able to show any error messages (if there were any) as I am on my school laptop and they blocked Inspect Element.
The output needs to be displayed on the "terminal" itself and it must be strictly JavaScript.
Thanks!
Edit 1:
These are the "terminal" files to make it easier than making files based on snippets that are the whole source. The commands are located at js/terminal.html. The main area we need to pay attention to is Line 144.
I would post it in snippets but I'd make this question 20x the size it is. It's based on Andrew Barfield's HTML5 Terminal

If the server supports HEAD, you can try to use that. However, there's no guarantee that the Content-Length header is returned even if HEAD requests are supported!
Run the below code in a console from stackoverflow and you'll see the size of HTML for their home page without downloading the full page. (Note that StackOverflow no longer provides a content-length header)
fetch('/', {method: 'HEAD'}).then((result) => {
console.log(result.headers.get("content-length"))
})
https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/HEAD
The HTTP HEAD method requests the headers that are returned if the specified resource would be requested with an HTTP GET method.

Related

Figuring out source of programmatic content [duplicate]

I have a weird network request in my page, which refers to JavaScript files, which I removed from every html file earlier. Cache is cleared and there is no single reference to be found in the source html and the JavaScript files. For fixing that and also out of general curiosity I would like to know if there is a simple way to find out where a request was triggered, preferably using the chrome-devtools.
Update:
Thanks to jaredwilli I found the initator column under the network-tab. However this only shows Other. What I would like to know, is the (html or javascript) file where those Requests have been triggered.

On the Network panel, you can determine what the initiator of a request was by viewing the Initiator column. It gives you the file, line number and type of resource it was, either Script or something else.

Is there a way to display the size of a file without downloading it?

I can't seem to find a way to display the size of a file in JavaScript in a terminal simulator. (I'm very new to JavaScript)
I've tried these:
https://bitexperts.com/Question/Detail/3316/determine-file-size-in-javascript-without-downloading-a-file
Ajax - Get size of file before downloading
My expected results were to get the byte size but nothing happens.
I'm not able to show any error messages (if there were any) as I am on my school laptop and they blocked Inspect Element.
The output needs to be displayed on the "terminal" itself and it must be strictly JavaScript.
Thanks!
Edit 1:
These are the "terminal" files to make it easier than making files based on snippets that are the whole source. The commands are located at js/terminal.html. The main area we need to pay attention to is Line 144.
I would post it in snippets but I'd make this question 20x the size it is. It's based on Andrew Barfield's HTML5 Terminal

If the server supports HEAD, you can try to use that. However, there's no guarantee that the Content-Length header is returned even if HEAD requests are supported!
Run the below code in a console from stackoverflow and you'll see the size of HTML for their home page without downloading the full page. (Note that StackOverflow no longer provides a content-length header)
fetch('/', {method: 'HEAD'}).then((result) => {
console.log(result.headers.get("content-length"))
})
https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/HEAD
The HTTP HEAD method requests the headers that are returned if the specified resource would be requested with an HTTP GET method.

Chrome webRequest onCompleted: Size of the page

I am working on a Chrome Extension, and I wanted to know if there was a way to know the amount of data which has been downloaded while loading a page.
For example, if the user activates the extension, and goes on Google.com, I want to show him the size of the page: google.com.
Is there a way to do this?
Thanks!

There are a multitude of ways you could determine the size of a page using just javascript.
You could manually calculate the size of the page by counting the ammount of characters in the page and in scripts. The example script bellow calculates the size of an ascii encoded html doc (so not including pictures, scripts from urls, ect). I'm not too sure how accurate or fast this is, so don't quote me on it.
Example script:
var html=document.getElementsByTagName('HTML')[0].outerHTML,//get all html as string
sizeKB=a.length/1024;//assuming the page is encoded in ascii, each char is one byte, 1024 bytes = 1 kb
For determining size of images, this question could help: Determining image file size + dimensions via Javascript?
You could load the results using another service like google's pagespeed insights or pingdom. You could try to load the services in an iframe in your background page and use content scripts to input the url and extract the site's statistics and send them to the popup. I'm sure plenty of other services could help you do the same with ajax calls although I don't know of any.
Using ajax and jquery, you could determine the size of all the assets in the page and add them together: Get size of file requested via ajax . Used correctly, this could fetch all of the files from the catch, so it wouldn't use more of the network. But it might be a bit slow for pages with a lot of non-inline scripts, stylesheets, and images
Using the chrome.webrequest api, you could get the header 'Content-Length' to determine the file size. I haven't tested this script also, so tell me how this works. Make sure to have a fallback if the header is missing!
Example script:
chrome.webRequest.onHeadersReceived.addListener(
function(details){
var fileSize;
details.responseHeaders.forEach(function(v,i,a){
if(v.name == 'Content-Length')
fileSize = v.value;
});
if(!fileSize)//if Content-Length header is missing fall back to another method of calculating file size
fallBackGetFileSize(details);
},
{urls: ["http://*/*"]},["responseHeaders"]);

Preventing 'content-sniffing' type vulnerabilities when handling user-uploaded images?

The problem:
I work on an internal tool that allows users to upload images - and then displays those images back to them and others.
It's a Java/Spring application. I have the benefit of only needing to worry about IE11 exactly and Firefox v38+ (Chrome v43+ would be a nice to have)
After first developing the feature, it seems that users can just create a text file like:
<script>alert("malicious code here!")</script>
and save it as "maliciousImage.jpg" and upload it.
Later, when that image is displayed inside image tags like:
<img src="blah?imgName=foobar" id="someImageID">
actualImage.jpg displays normally, and maliciousImage.jpg displays as a broken link - and most importantly no malicious content is interpreted!
However If the user right-clicks on this broken link, and clicks 'view image'... bad things happen.
the browser does 'content-sniffing' a concept which is new to me, detects that 'maliciousImage.jpg' is actually a text file, and very kindly renders it as HTML without hesitation. Any script tags are passed to the JavaScript interpreter and, as you can imagine, we don't want this.
What I've tried so far
In short, every possible combination of response headers I can think of to prevent the browser from content-sniffing. All the answers I've found here on stackoverflow, and other docs, imply that setting the content-type header should prevent most browsers from content-sniffing, and setting X-content options should prevent some versions of IE.
I'm setting the x-content-type-options to no sniff, and I'm setting the response content type. The docs I've read lead me to believe this should stop content-sniffing.
response.setHeader("X-Content-Type-Options", "nosniff");
response.setContentType("image/jpg");
I'm intercepting the response and these headers are present, but seem to have no effect on how the malicious content is processed...
I've also tried detecting which images are and are not malicious at the point of upload, but I'm quickly realizing this is very much non-trivial...
End goal:
Naturally - any output at all for images that aren't really images (garbled nonsense, an unhandled exception, etc) would be better than executing the text-file as HTML/javascript in the clear, but displaying any malicious HTML as escaped/CDATA'd plain-text would be ideal... though maybe a bit impractical.

So I ended up fixing this problem but forgot to answer my own question:
Step 1: blocking invalid images
To get a quick fix out, I simply added some fairly blunt code that checked if an image was actually an image - during upload and before serving it, using the imageio lib:
import javax.imageio.ImageIO;
//......
Image img = attBO.getImage(imgId);
InputStream x = new ByteArrayInputStream(img.getData());
BufferedImage s;
try {
s = ImageIO.read(x);
s.getWidth();
} catch (Exception e) {
throw new myCustomException("Invalid image");
}
Now, initially i'd hoped that would fix my problem - but in reality it wasn't that simple and just made generating a payload more difficult.
While this would block:
<script>alert("malicious code here!")</script>
It's very possible to generate a valid image that's also an XSS payload - just a little more effort....
Step 2: framework silliness
It turned out there was an entire post-processing workflow that I'd never touched, that did things such as append tokens to response bodies and use additional frameworks to decorate responses with CSS, headers, footers etc.
This meant that, although the controller was explicitly returning image/png, it was being grabbed and placed (as bytes) post processing was taking that bytestream, and wrapping it in a header and footer, to form a fully qualified 'view' - this view would always have the 'content-type' text/html and thus was never displayed correctly.
The crux of this problem was that my controller was directly returning an image, in a RESTful fashion, when the rest of the framework was built to handle controllers returning full fledged views.
So I had to step through this workflow and create exceptions for the controllers in my code that returned something other than worked in a restful fashion.
for example with with site-mesh it was just an exclude(as always, simple fix once I understood the problem...):
<decorators defaultdir="/WEB-INF/decorators">
<excludes>
<pattern>*blah.ctl*</pattern>
</excludes>
<decorator name="foo" page="myDecorator.jsp">
<pattern>*</pattern>
</decorator>
and then some other other bespoke post-invocation interceptors.
Step 3: Content negotiation
Now, I finally got the stage where only image bytecode was being served and no review was being specified or explicitly generated.
A Spring feature called 'content negotiation' kicked in. It tries to reconcile the 'accepts' header of the request, with the 'messageconverters' it has on hand to produce such responses.
Because spring by default doesn't have a messageconverter to produce image/png responses, it was falling back to text/html - and I was still seeing problems.
Now, were I using spring 4, I could've simply added the annotation:
#Produces("image/png")
to my controller - simple fix...
Step 4: Legacy dependencies
but because I only had spring 3.0.5 (and couldn't upgrade it) I had to try other things.
I tried registering new messageconverters but that was a headache or adding a new post-method interceptor to simply change the content-type back to 'image/png' - but that was a hacky headache.
In the end I just exposed the request/reponse in the controller, and wrote my image directly to the response body - circumventing Spring's content-negotiation altogether
....and finally my image was served as an image and displayed as an image - and no injected code was executed!

That sounds odd, because it works perfectly elsewhere. Are you sure the X-Content-Type-Options header is present in the responses?
Here is a demo I built a while back, where I have a file that's a valid html, gif and javascript. As you can see it first loads as an HTML, but then loads itself as an image and as a script (which executes):
http://research.insecurelabs.org/content-sniffing/gifjs.html
However if you load it using the "X-Content-Type-Options: nosniff" header, the script no longer executes:
http://research.insecurelabs.org/content-sniffing/nosniff/gifjs.html
Btw, the image renders properly in FF/IE, but not in Chrome.
Here is a demo, where I attempted what you described:
http://research.insecurelabs.org/content-sniffing/stackexchange.html
First image is without nosniff, and second is with, and it seems to work as intended. Second one does not run the script when opened with "view image".
Edit:
Firefox doesn't seem to support X-Content-Type-Options: nosniff
So, you should also add "Content-disposition: attachment;filename=image.gif" or similar to the images. The image will load normally if loaded through an image tag, but if you open the URL directly, you will force a download instead of showing the image directly in the browser.
Example: http://research.insecurelabs.org/content-sniffing/attachment/

adeneo is pretty much spot-on. You should use whatever image library you want to check if the uploaded file is a valid file for the type it claims to be. Anything the client sends can be manipulated.

Get element from website with python without opening a browser

I'm trying to write a python script which parses one element from a website and simply prints it.
I couldn't figure out how to achieve this, without selenium's webdiver, in order to open a browser which handles the scripts to properly display the website.
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://groceries.asda.com/asda-webstore/pages/landing/home.shtml#!product/910000800509')
content = browser.page_source
print(content[42000:43000])
browser.close()
This is just a rough draft which will print the contents, including the element of interest <span class="prod-price-inner">£13.00</span>.
How could I get the element of interest without the browser opening, or even without a browser at all?
edit: I've previously tried to use urllib or in bash wget, which both lack the required javascript interpretation.

As other answers mentioned, this webpage requires javascript to render content, so you can't simply get and process the page with lxml, Beautiful Soup, or similar library. But there's a much simpler way to get the information you want.
I noticed that the link you provided fetches data from an internal API in a structured fashion. It appears that the product number is 910000800509 based on the url. If you look at the networking tab in Chrome dev tools (or your brower's equivalent dev tools), you'll see that a GET request is being made to following URL: http://groceries.asda.com/api/items/view?itemid=910000800509.
You can make the request like this with just the json and requests modules:
import json
import requests
url = 'http://groceries.asda.com/api/items/view?itemid=910000800509'
r = requests.get(url)
price = r.json()['items'][0]['price']
print price
£13.00
This also gives you access to lots of other information about the product, since the request returns some JSON with product details.

How could I get the element of interest without the browser opening,
or even without a browser at all?
After inspecting the page you're trying to parse :
http://groceries.asda.com/asda-webstore/pages/landing/home.shtml#!product/910000800509
I realized that it only displays the content if javascript is enabled, based on that, you need to use a real browser.
Conclusion:
The way to go, if you need to automatize, is:
selenium

Develop Reference

JavaScript is the programming language of the Web.