Repeatedly re-writing JSON and getting undefined due to re-write - javascript

I am currently generating some data in Python and saving the corresponding data to a JSON file. This data grows quite quickly although it is not large to the point where it will be an issue, it is also re-generated fairly fast. I am then reading this data into Javascript at a very fast rate. The issue I am having is most of the time when I am requesting the JSON data the file is returning undefined in the python size due to the fact that I re-write the JSON file each time which causes the data to be undefined. The solution I am looking for would lead to the file being undefined for far less time, I could potentially switch to other methods for transferring data between Python and Javascript if any spring to mind.

Related

Store very large JSON Object in memory while parsing uploaded CSV file (approx. 8GB) on the UI

I have a functionality where the user can browse CSV file approx. size 8GB on the UI. Once the UI has the File object I used Papa Parse to parse the CSV file, which works like a charm.
While parsing I am constructing an object from each CSV record by doing some manipulation on data. As the parsing proceeds, the size on the object keeps on increasing and ultimately parsing fails with browser displaying out of memory exception.
The object size constructed will be pretty less around 2 GB after data manipulation of CSV but it fails even before this. Is there a way to handle such large objects on the UI?
You may not be using the parser correctly
Did I mention the file is huge?
See All configuration : https://www.papaparse.com/docs#config
Try with worker: true
The most typical reason to use a web worker is if your web page becomes unresponsive during parsing. In other words, if it freezes and you can't click things or the scrolling becomes choppy.

Merging millions of data using nodejs

i need help/tips
i have a huge amount of json data that needs to be merged, sorted and filtered. right now, they're separated into different folders. almost 2GB of json files.
what i'm doing right now is:
reading all files inside each folders
appending JSON parsed data to an Array variable inside my script.
sorting the Array variable
filtering.
save it to one file
i'm rethinking that instead of appending parsed data to a variable, maybe i should store it inside a file ?.. what do you guys think ?
what approach is better when dealing with this kind of situation ?
By the way, i'm experiencing a
Javascript Heap out of memory
you could use some kind of database, e.g. MySQL with table's engine "memory" so it would be saved in ram only and would be blazing quick and would be erased after reboot but you should truncate it anyways after the operation while it's all temp. When you will have data in the table, it will be easy to filter/sort required bits and grab data incrementally by let's say 1000 rows and parse it as needed. You will not have to hold 2gigs of data inside js.
2gigs of data will probably block your js thread during loops and you will get frozen app anyways.
If you will use some file to save temporary data to avoid database, i recommend using some temporary disk which would be mounted on RAM, so you will have much better i/o speed.

PHP - file_get_contents() reading chunks gets slower with time

I'm making some tests to store large files locally with IndexedDb API, and I'm using PHP with JSON (and AJAX on Javascript's side) to receive the file data.
Actually, I'm trying to get some videos and, to do so, I use the following PHP code:
$content_ar['content'] = base64_encode(file_get_contents("../video_src.mp4", false, NULL, $bytes_from, $package_size)));
return json_encode($content_ar);
I know base64_encode will deliver 1/3 more of information than the original, but that's not the problem right now as it's the only way I know how to retrieve binary data without losing it on the way.
As you can see, I specify from which byte it has to start to read and how many of them I want to retrieve. So, on my JS side, I know how much of the file I have already stored and I ask the script to get me from actual_size to actual_size + $package_size bytes.
What I'm seeing already is that the scripts seems to run more slowly as time goes by and depending on the file size. I'm trying to understand what happens there.
I've read that file_get_contents() stores the file contents in memory, so with big files it could be a problem (that's why I'm reading it in chunks).
But seeing it gets slower with big files (and time), may it be possible that it's still storing the whole file on memory and then delivering me the chunk I tell it to? Like it loads everything and then returns the part I demand?
Or is it just storing everything until the $bytes_from + $package_size (that's why it gets slower with time, as it increases)?
If any of the above, is there any way to get it to run more efficiently and improve performance? Maybe I have to do some operations before or after to empty memory resources?
EDIT:
I've made a screenshot showing the difference (in ms) of the moment I make the call to get the file bytes I need, and the right moment when I receive the AJAX response (before I do anything with the received data, so Javascript has no impact on the performance). Here it is:
As you can see, it's increasing with every call.
I think the problem is the time it spends to get to the initial byte I need. It does not load the whole file into memory, but it's slow until getting into the first byte to read, so as it increases the initial point, it takes more time.
EDIT 2:
Could it have something to do with the fact that I'm JSON encoding the base64 content? I've been making some performance tests and I've seen that setting $content_ar['content'] = strlen(base64_encode(file...)) is done in so much less time (when, theorically, it's doing the same work).
However, if that's the case, I still cannot understand why it increases the slowness among time. The work of encoding the same length of bytes should take the same amount of time, isn't it?
Thank you so much for your help!

Read file from last document read - JSON stream

I'm receiving a stream from a server, that is increasing exponentially. and I need to check every minute for new data, process that data, and ask for more next minute.
the data is JSON documents. receive in average ~600-700 documents per minute.
I have to avoid reading the documents already processed due to performance issues.
Is it possible to only read the data received from the last minute?
You can use a circular buffer and put there the data using a listener.
As an example, by storing there the last N documents or chunks, it mainly depends on the code of your application.
This way older data will be discarded for design and you have not to deal neither with streams' internals nor with poorly designed solutions.
It's a matter of defining the right size for the buffer, but it looks to me as a far easier problem.

What is the most efficient way in JavaScript to parse huge amounts of data from a file

What is the most efficient way in JavaScript to parse huge amounts of data from a file?
Currently I use JSON parse to serialize an uncompressed 250MB file, which is really slow. Is there a simple and fast way to read a lot of data in JavaScript from a file without looping through every character? The data stored in the file are only a few floating point arrays?
UPDATE:
The file contains a 3d mesh, 6 buffers (vert, uv etc). Also the buffers need to be presented as typed arrays. streaming is not a option because the file has to be fully loaded before the graphics engine can continue. Maybe a better question is how to transfer huge typed arrays from a file to javascript in the most efficient way.
I would recommend a SAX based parser for these kind of JavaScript or a stream parser.
DOM parsing would load the whole thing in memory and this is not the way to go by for large files like you mentioned.
For Javascript based SAX Parsing (in XML) you might refer to
https://code.google.com/p/jssaxparser/
and
for JSON you might write your own, the following link demonstrates how to write a basic SAX based parser in Javascript
http://ajaxian.com/archives/javascript-sax-based-parser
Have you tried encoding it to a binary and transferring it as a blob?
https://developer.mozilla.org/en-US/docs/DOM/XMLHttpRequest/Sending_and_Receiving_Binary_Data
http://www.htmlgoodies.com/html5/tutorials/working-with-binary-files-using-the-javascript-filereader-.html#fbid=LLhCrL0KEb6
There isn't a really good way of doing that, because the whole file is going to be loaded into memory and we all know that all of them have big memory leaks. Can you not instead add some paging for viewing the contents of that file?
Check if there are any plugins that allow you to read the file as a stream, that will improve this greatly.
UPDATE
http://www.html5rocks.com/en/tutorials/file/dndfiles/
You might want to read about the new HTML5 API's to read local files. You will have the issue with downloading 250mb of data still tho.
I can think of 1 solution and 1 hack
SOLUTION:
Extending the split the data in chunks: it boils down to http protocol. REST parts on the notion that http has enough "language" for most client-server scenarios.
You can setup on the client a request header Content-len to establish how much data you need per request
Then on the backend have some options http://httpstatus.es
Reply a 413 if the server is simply unable to get that much data from the db
417 if the server is able to reply but not under the requested header (Content-len)
206 with the provided chunk, letting know the client "there is more from where that came from"
HACK:
Use Websocket and get the binary file. Then use the html5 FileAPI to load it into memory.
This is likely to fail though because its not the download causing the problem, but the parsing of an almost-endless JS object
You're out of luck on the browser. Not only do you have to download the file, but you'll have to parse the json regardless. Parse it on the server, break it into smaller chunks, store that data into the db, and query for what you need.

Categories

Resources