I'm writing a client-side upload script that pre-chunks the input data before sending.
I was just wondering if this could potentially lead to a huge memory spike, calling the slice method on the potentially very big File (e.g., slicing a 2GB file into 1000x2MB chunks) before starting the upload, as opposed to slicing it just before each chunk upload starts.
I figured the browser is not actually handling the data with the slice method, it's just returning a new instance of Blob with a different start and end range, but I just want to be sure.
I know it's easily testable but I'm not sure if I can trust a test result on Chrome would apply to all browsers.
Related
All:
I wonder how can I make Chrome handle about 4GB data loaded into it? My use case is:
The Front End starts and tries to download 3GB json file data and makes some calculation. But Chrome always crash.
Any solution for this? Thanks
When you work with large data typically optimization rule is :
Don't read all data at once, don't save all data at once.
If your code allows perform calculations "step-by-step", split your JSON to small parts (for example, by 50Mb).
Of course, it works slowly, however this approach allows to keep memory.
This optimization rule is useful not only for JS and browser, but for various languages and platforms.
I'm making some tests to store large files locally with IndexedDb API, and I'm using PHP with JSON (and AJAX on Javascript's side) to receive the file data.
Actually, I'm trying to get some videos and, to do so, I use the following PHP code:
$content_ar['content'] = base64_encode(file_get_contents("../video_src.mp4", false, NULL, $bytes_from, $package_size)));
return json_encode($content_ar);
I know base64_encode will deliver 1/3 more of information than the original, but that's not the problem right now as it's the only way I know how to retrieve binary data without losing it on the way.
As you can see, I specify from which byte it has to start to read and how many of them I want to retrieve. So, on my JS side, I know how much of the file I have already stored and I ask the script to get me from actual_size to actual_size + $package_size bytes.
What I'm seeing already is that the scripts seems to run more slowly as time goes by and depending on the file size. I'm trying to understand what happens there.
I've read that file_get_contents() stores the file contents in memory, so with big files it could be a problem (that's why I'm reading it in chunks).
But seeing it gets slower with big files (and time), may it be possible that it's still storing the whole file on memory and then delivering me the chunk I tell it to? Like it loads everything and then returns the part I demand?
Or is it just storing everything until the $bytes_from + $package_size (that's why it gets slower with time, as it increases)?
If any of the above, is there any way to get it to run more efficiently and improve performance? Maybe I have to do some operations before or after to empty memory resources?
EDIT:
I've made a screenshot showing the difference (in ms) of the moment I make the call to get the file bytes I need, and the right moment when I receive the AJAX response (before I do anything with the received data, so Javascript has no impact on the performance). Here it is:
As you can see, it's increasing with every call.
I think the problem is the time it spends to get to the initial byte I need. It does not load the whole file into memory, but it's slow until getting into the first byte to read, so as it increases the initial point, it takes more time.
EDIT 2:
Could it have something to do with the fact that I'm JSON encoding the base64 content? I've been making some performance tests and I've seen that setting $content_ar['content'] = strlen(base64_encode(file...)) is done in so much less time (when, theorically, it's doing the same work).
However, if that's the case, I still cannot understand why it increases the slowness among time. The work of encoding the same length of bytes should take the same amount of time, isn't it?
Thank you so much for your help!
I have a very large array with thousands of items
I tried this solution:
Create a file in memory for user to download, not through server
of creating an anchor
text file
~~JSON.stringify on the array caused the tab to freeze~~ Correction: Trying to log out the result caused the tab to freeze, stringify by itself works fine
The data was originally in string form but creating an anchor with that data resulted in a no-op, I'm assuming also because the data was too big, because using dummy data successfully resulted in a file download being triggered
How can I get this item onto my filesystem?
edit/clarification:
There is a very large array that I can only access via the the browser inspector/console. I can't access it via any other language
Javascript does not allow you to read or write files, except for cookies, and I think the amount of data you are using exceeds the size limit for cookies. This is for security reasons.
However languages such as php, python and ruby allow the reading and writing of files. It appears you are using binary data, so use binary files and write functions.
As to the choice of language : if you already know one use that, or whichever you can get help with. Writing a file is a very basic operation and all three languages are equally good. If you don't know any of these languages you can literally copy and paste the code from their websites.
This may already be answered somewhere on the site, but if it is I couldn't find it. I also couldn't find an exact answer to my question (or at least couldn't make sense of how to implement a solution) based on any of the official Node.js documentation.
Question: Is it possible to customize the length (in bytes) of each disk write that occurs while piping the input of a readable stream into a file?
I will be uploading large files (~50Gb) and it's possible that there could be many clients doing so at the same time. In order to accomplish this I'll be slicing files at the client side and then uploading a chunk at a time. Ideally I want physical writes to disk on the server side to occur in 1Mb portions - but is this possible? And if it is possible then how can it be implemented?
You will probably use a WriteStream. While it is not documented in the fs api, any Writable does take a highWaterMark option for when to flush its buffer. See also the details on buffering.
So it's just
var writeToDisk = fs.createWriteStream(path, {highWaterMark: 1024*1024});
req.pipe(writeToDisk);
Disclaimer: I would not believe in cargo-cult "server-friendly" chunk sizes. I'd go with the default (which is 16kb), and when performance becomes a problem test other sizes to find the optimal value for the current setup.
To date, I've not found any suitable .torrent file parsers coded in javascript, therefore I began creating my own.
So far, I was able to recode a php bdecoder in javascript, one issue I found is that larger .torrent files (like the second one in http://www.vuze.com/content/channel.php?id=53&name=Scam%20School%20(Quicktime%20HD)) sometimes result in Uncaught RangeError: Maximum call stack size exceeded errors in chrome. Is there a method to have the bdecode function run less recursively?
Along with this issue, I haven't been able to accurately produce an info hash for the '.torrent' files which decoded successfully. I hash the info dictionary beginning right after the info name and ending at the e 'closing tag'. However, this results in incorrect hashes compared to that of actual bittorrent clients. Am I reading the file incorrectly?
Current code:
http://jsfiddle.net/e23YQ/
Thanks.
Reading the torrent file using readAsTest or readAsBinaryString (which is deprecated) will not suffice for generating an accurate info hash. In order to keep things as native as possible, you must read the file as an ArrayBuffer and parse using Uint8Arrays. While parsing, save the beginning and ending offsets of the info dictionary for generating the info hash.
In order to generate an accurate info hash, you must use a javascript implementation of Sha-1 which allows for hashing of ArrayBuffers. Rusha seemed to be a viable option. Using the digestFromArrayBuffer in Rusha with a slice of the initial ArrayBuffer containing the info dictionary, we get an accurate info hash.
Using an ArrayBuffer eliminated the stackoverflow issue I was having earlier.
This is the adjusted code:
http://jsfiddle.net/e23YQ/5/