Getting large AJAX responses in Chrome without crashing - javascript

I am working on a project which is aimed at the Chrome browser. Our goal which we would like to accomplish is to get a one million record array into the browser to work with the data. When I generated a test file that contained a million records it was a bit more than one gigabyte.
For reasons I will explain, I believe we can accomplish the goal if can get the browser to collect the garbage when necessary. I believe the browser holds the text of the AJAX responses when it doesn't need to and crashes for that reason.
Now, I can generate a million records within the browser and manipulate it as I need to. However, I have trouble sending the AJAX to the browser without crashing it.
Since sending one million crashes it, I tried sending batches of one hundred thousand. I can get two such batches across and parse the JSON. If I do not have a onreadystatechange on my AJAX call, I can make the call a number of times. Also, if I receive a hundred thousand records, I can go over it ten times and make the full array.
Because I seem to be able to actually hold one million records, I believe that, as I said, holding the response texts is overwhelming the browsers.
In order to try to get better memory management, I have pushed the AJAX resquests and parsing into a web worker. When the webworker gets the AJAX and makes the hundred thousand record array, it pushes it to the DOM thread. When the DOM thread has taken the data it has the web worker do another AJAX.
However, it still crashes.
I am open to using websockets or something else, if that would help somehow.
Here is the code in the DOM thread:
var iterations=3;
var url='hunthou.json';
var worker=new Worker('src/worker.js');
var count=0;
worker.addEventListener('message',function(e){
alert('count: '+count);
//bigArr=bigArr.concat(e.data);
console.log('e.data length: '+e.data.length);
bigArr[count]=e.data;
console.log('bigArr length: '+bigArr.length);
if(count<(iterations-1)){
worker.postMessage(url);
} else{
alert('done');
console.log('done');
worker.terminate();
console.log('bye');
}
count++;
});
worker.postMessage(url);
Here is the webworker:
var arr=[];
var request = new XMLHttpRequest();
request.onreadystatechange = function () {
var DONE = this.DONE || 4;
if (this.readyState === DONE){
arr=JSON.parse(request.responseText);
self.postMessage(arr);
arr.length=0;
request.responseText.length=0;
console.log('okay');
}
};
self.addEventListener('message', function(e) {
var url=e.data;
console.log('url: '+url);
request.open("GET",'../'+url,true);
request.send(null);
}, false);

Instead of sending the whole data at once.
You can create multiple requests which will retrieve chunks of data instead of retrieving whole data at once, this will prevent your browser from crashing.

Related

Call stack size exceeded on re-starting Node function

I'm trying to overcome Call stack size exceeded error but with no luck,
Goal is to re-run the GET request as long as I get music in type field.
//tech: node.js + mongoose
//import components
const https = require('https');
const options = new URL('https://www.boredapi.com/api/activity');
//obtain data using GET
https.get(options, (response) => {
//console.log('statusCode:', response.statusCode);
//console.log('headers:', response.headers);
response.on('data', (data) => {
//process.stdout.write(data);
apiResult = JSON.parse(data);
apiResultType = apiResult.type;
returnDataOutside(data);
});
})
.on('error', (error) => {
console.error(error);
});
function returnDataOutside(data){
apiResultType;
if (apiResultType == 'music') {
console.log(apiResult);
} else {
returnDataOutside(data);
console.log(apiResult); //Maximum call stack size exceeded
};
};
Your function returnDataOutside() is calling itself recursively. If it doesn't gets an apiResultType of 'music' on the first time, then it just keeps calling itself deeper and deeper until the stack overflows with no chance of ever getting the music type because you're just calling it with the same data over and over.
It appears that you want to rerun the GET request when you don't have music type, but your code is not doing that - it's just calling your response function over and over. So, instead, you need to put the code that makes the GET request into a function and call that new function that actually makes a fresh GET request when the apiResultType isn't what you want.
In addition, you shouldn't code something like this that keeping going forever hammering some server. You should have either a maximum number of times you try or a timer back-off or both.
And, you can't just assume that response.on('data', ...) contains a perfectly formed piece of JSON. If the data is anything but very small, then the data may arrive in any arbitrary sized chunks. It make take multiple data events to get your entire payload. And, this may work on fast networks, but not on slow networks or through some proxies, but not others. Instead, you have to accumulate the data from the entire response (all the data events that occur) concatenated together and then process that final result on the end event.
While, you can code the plain https.get() to collect all the results for you (there's an example of that right in the doc here), it's a lot easier to just use a higher level library that brings support for a bunch of useful things.
My favorite library to use in this regard is got(), but there's a list of alternatives here and you can find the one you like. Not only do these libraries accumulate the entire request for you with you writing any extra code, but they are promise-based which makes the asynchronous coding easier and they also automatically check status code results for you, follow redirects, etc... - many things you would want an http request library to "just handle" for you.

Non blocking Javascript and concurrency

I have code on a web-worker and because i can't post to it an object with methods(functions) , i dont know how to stop blocking the UI with this code:
if (data != 'null') {
obj['backupData'] = obj.tbl.data().toArray();
obj['backupAllData'] = data[0];
}
obj.tbl.clear();
obj.tbl.rows.add(obj['backupAllData']);
var ext = config.extension.substring(1);
$.fn.dataTable.ext.buttons[ext + 'Html5'].action(e, dt, button, config);
obj.tbl.clear();
obj.tbl.rows.add(obj['backupData'])
This code exports records from an html table. Data is an array and is returned from a web worker and sometimes can have 50k or more objects.
As obj and all the methods that it contains are not transferable to we-worker, when data length 30k ,40k or 50k or even more, the UI blocks.
which is the best way to do this?
Thanks in advance.
you could try wrapping the heavy work in an async function like a timeout to allow the engine to queue the whole logic and elaborate it as soon as it has time
setTimeout(function(){
if (data != 'null') {
obj['backupData'] = obj.tbl.data().toArray();
obj['backupAllData'] = data[0];
}
//heavy stuff
}, 0)
or , if the code is extremely long, you can try figure it out a strategy to split your code into chunk of operation and execute each chunk in a separate async function (timeout)
Best way to iterate over an array without blocking the UI
Update:
Sadly, ImmutableJS doesn't work at the moment across webworkers. You should be able to transfer the ArrayBuffer so you don't need to parse it back into an array. Also read this article. If your workload is that heavy, it would be best to actually send back one item at a time from the worker.
Previously:
The code is converting all the data into an array, which is immediately costly. Try returning an immutable data structure from web worker if possible. This will guarantee that it doesn't change when the references change and you can continue iterating over it slowly in batches.
The next thing you can do is to use requestIdleCallback to schedule small batches of items to be processed.
This way you should be able to make the UI breathe a bit.

Parsing a large JSON array in Javascript

I'm supposed to parse a very large JSON array in Javascipt. It looks like:
mydata = [
{'a':5, 'b':7, ... },
{'a':2, 'b':3, ... },
.
.
.
]
Now the thing is, if I pass this entire object to my parsing function parseJSON(), then of course it works, but it blocks the tab's process for 30-40 seconds (in case of an array with 160000 objects).
During this entire process of requesting this JSON from a server and parsing it, I'm displaying a 'loading' gif to the user. Of course, after I call the parse function, the gif freezes too, leading to bad user experience. I guess there's no way to get around this time, is there a way to somehow (at least) keep the loading gif from freezing?
Something like calling parseJSON() on chunks of my JSON every few milliseconds? I'm unable to implement that though being a noob in javascript.
Thanks a lot, I'd really appreciate if you could help me out here.
You might want to check this link. It's about multithreading.
Basically :
var url = 'http://bigcontentprovider.com/hugejsonfile';
var f = '(function() {
send = function(e) {
postMessage(e);
self.close();
};
importScripts("' + url + '?format=json&callback=send");
})();';
var _blob = new Blob([f], { type: 'text/javascript' });
_worker = new Worker(window.URL.createObjectURL(_blob));
_worker.onmessage = function(e) {
//Do what you want with your JSON
}
_worker.postMessage();
Haven't tried it myself to be honest...
EDIT about portability: Sebastien D. posted a comment with a link to mdn. I just added a ref to the compatibility section id.
I have never encountered a complete page lock down of 30-40 seconds, I'm almost impressed! Restructuring your data to be much smaller or splitting it into many files on the server side is the real answer. Do you actually need every little byte of the data?
Alternatively if you can't change the file #Cyrill_DD's answer of a worker thread will be able to able parse data for you and send it to your primary JS. This is not a perfect fix as you would guess though. Passing data between the 2 threads requires the information to be serialised and reinterpreted, so you could find a significant slow down when the data is passed between the threads and be back to square one again if you try to pass all the data across at once. Building a query system into your worker thread for requesting chunks of the data when you need them and using the message callback will prevent slow down from parsing on the main thread and allow you complete access to the data without loading it all into your main context.
I should add that worker threads are relatively new, main browser support is good but mobile is terrible... just a heads up!

Alternatives to Local Storage with JavaScript

I'm working on an app that needs to post frequently to the server using ajax, which slows down performance. As a workaround, I started testing localStorage as an alternative. The results are less than ideal, retrieving/setting localStorage frequently is super slow (something I was surprised to find), and kills performance.
On the other-hand, I don't really want to post to the server every time an event is fired, as this has pretty much the same performance as localStorage.
I'm wondering what other solutions there are?
Could I create a "data" object that gets continually updated when events are fired? With a timeout function that falls-back to post? If so how would you set that up?
What options are there for storing data on events that can be retrieved quickly? If someone could point me in the right direction I'd be super appreciative.
simple solution is to collect the information you wish to save in an array and save it once in a while
buffer = [];
// collect all mousemove events on #mydiv
$('#mydiv').mousemove(function(event) {
buffer.push(event.pageX + ", " + event.pageY)
})
createInterval(function() {
if (buffer.length > 0) { // save only if something happened
var to_transmit = buffer; // theoretically between this line and the next
// line you may miss an event, but in practice I wouldn't worry
buffer = []; // new buffer - next events will be inserted here
$.ajax({url: "/post/here/", data: to_transmit})
}
},
5000); // send ajax every 5 seconds

Socket.io: How to limit the size of emitted data from client to the websocket server

I have a node.js server with socket.io. My clients use socket.io to connect to the node.js server.
Data is transmitted from clients to server in the following way:
On the client
var Data = {'data1':'somedata1', 'data2':'somedata2'};
socket.emit('SendToServer', Data);
On the server
socket.on('SendToServer', function(Data) {
for (var key in Data) {
// Do some work with Data[key]
}
});
Suppose that somebody modifies his client and emits to the server a really big chunk of data. For example:
var Data = {'data1':'somedata1', 'data2':'somedata2', ...and so on until he reach for example 'data100000':'data100000'};
socket.emit('SendToServer', Data);
Because of this loop on the server...
for (var key in Data) {
// Do some work with Data[key]
}
... the server would take a very long time to loop through all this data.
So, what is the best solution to prevent such scenarios?
Thanks
EDIT:
I used this function to validate the object:
function ValidateObject(obj) {
var i = 0;
for(var key in obj) {
i++;
if (i > 10) { // object is too big
return false;
}
}
return false;
}
So the easiest thing to do is just check the size of the data before doing anything with it.
socket.on('someevent', function (data) {
if (JSON.stringify(data).length > 10000) //roughly 10 bytes
return;
console.log('valid data: ' + data);
});
To be honest, this is a little inefficient. Your client sends the message, socket.io parses the message into an object, and then you get the event and turn it back into a String.
If you want to be even more efficient then on the client side you should be enforcing max lengths of messages.
For even more efficiency (and to protect against malicious users), as packets come into Socket.io, if the length gets too long, then you should discard them. You'll either need to figure a way to extend the prototypes to do what you want or you'll need to pull the source and modify it yourself. Also, I haven't looked into the socket.io protocol but I'm sure you'll have to do more than just "discard" the packet. Also, some packets are ack-backs and nack-backs so you don't want to mess with those, either.
Side note: If you ONLY care about the number of keys then you can use Object.keys(obj) which returns an array of keys:
if (Object.keys(obj).length > 10)
return;
Probably you may consider switching to socket.io-stream and handle input stream directly.
This way you should join chunks and finally parse json input manually, but you have chance to close connection when input data length exceeds threshold you decide.
Otherwise (staying with socket.io approach) your callback won't be called until whole data stream were received. This doesn't stop your js main thread execution, but waste memory, cpu and bandwith.
On the other hand, if your only goal is to avoid overload of your processing algorithm you can continue limitting it by counting elements in the received object. For instance:
if (Object.keys(data).length > n) return; // Where n is your maximum acceptable number of elements.
// But, anyway, this doesn't control the actual size of each element.
EDIT: Because the question is about "how to handle server overload" You should check load balancing with gninx http://nginx.com/blog/nginx-nodejs-websockets-socketio/ - you could have additional servers in case one client is creating a bottleneck. The other servers would be available. Even if you solve this problem, there are still other problems, like client sending several small packets and so on.
The Socket.io -library seems to be a bit problematic, managing too big messages is not available at the websockets layer, there was a pull -request three years ago, which gives an idea how it might be solved:
https://github.com/Automattic/socket.io/issues/886
However, because WebSockets -protocol does have finite packet size it would allow you to stop processing of the packets if certain size has been achieved. The most effective way of doing this would be before the packet is stransformed to JavaScript heap. This means that you should handle the WebSocket transform manually - this is what the socket.io is doing for you but it does not take into account the packet size.
If you want to implement you own websocket layer, using this WebSocket -node implementation might be useful:
https://github.com/theturtle32/WebSocket-Node
If you do not need to support older browsers, using this pure websockets -approach might be suitable solution.
Well, I'll go with the Javascript side of the thing... let's say you don't want to allow users to go over a certain limit of data, you can just:
var allowedSize = 10;
Object.keys(Data).map(function( key, idx ) {
if( idx > allowedSize ) return;
// Do some work with Data[key]
});
this not only allows you to properly cycle through the elements of your object, it lets you limit easily. ( obviously this can also ruin your own pre-set requests )
Maybe destroy buffer size is what you need.
From the wiki:
destroy buffer size defaults to 10E7
Used by the HTTP transports. The Socket.IO server buffers HTTP request bodies up to this limit. This limit is not applied to websocket or flashsockets.

Categories

Resources