Background tasks in JavaScript that do not interfere UI interaction - javascript

I am working on a mobile App implemented in JavaScript that has to do lots of background stuff -- mainly fetching data from a server (via JSONP) and writing it into a database using local storage. In the foreground the user may navigate on the locally stored data and thus wants a smoothly reacting App.
This interaction (fetching data from server via JSONP and after doing some minor work with it, storing it in the local database) is done asynchronously, but because of the necessary DOM manipulation for JSONP and the database interaction I am not able to this work with WebWorkers. Thus I am running in the problem in blocking the JavaScript event queue with lots of "background processing requests" and the App reacts very slowly (or even locks completely).
Here is a small sketch of the way I am doing the background stuff:
var pendingTasksOnNewObjects = new Object();
//add callbacks to queue
function addTaskToQueue(id, callback) {
if (!pendingTasksOnNewObjects[id]) {
pendingTasksOnNewObjects[id] = new Array();
}
pendingTasksOnNewObjects[id].push(callback);
}
// example for this pending tasks:
var myExampleTask = function () {
this.run = function(jsonObject) {
// do intersting stuff here as early as a specific new object arrives
// i.e. update some elements in the DOM to respect the newly arrived object
};
};
addTaskToQueue("id12345", myExampleTask);
// method to fetch documents from the server via JSONP
function getDocuments(idsAsCommaSeparatedString) {
var elem;
elem = document.createElement("script");
elem.setAttribute("type", "text/javascript");
elem.setAttribute("src", "http://serverurl/servlet/myServlet/documents/"+idsAsCommaSeparatedString);
document.head.appendChild(elem);
}
// "callback" method that is used in the JSONP result to process the data
function locallyStoreDocuments(jsonArray) {
var i;
for (i=0; i<jsonArray.length; i++) {
var obj = jsonArray[i];
storeObjectInDatabase(obj);
runTasks(obj.docID, obj);
}
return true;
}
// process tasks when new object arrives
function runTasks(id, jsonObject) {
if(pendingTasksOnNewObjects[id]) {
while(pendingTasksOnNewObjects[id][0]) {
pendingTasksOnNewObjects[id][0].run(jsonObject);
pendingTasksOnNewObjects[id].shift();
}
}
return true;
}
I already looked around reading some stuff about the event processing mechanisms in JavaScript (i.e. John Resigs description of working with timers), but what I really want to do is the following: Check if there is something to do in the event queue. If so, wait for 100ms and check again. If nothing is currently waiting in the queue, process a small amount of data and check again.
As far as I read, this is not possible, since JavaScript does not have direct access to the event queue.
So are there any design ideas on how to optimize my code to get closer to the goal not to interfere with the UI events to give the user a smooth App interaction?

Web Workers seem to be available in iOS 5, http://caniuse.com/webworkers

Related

How to efficiently stream a real-time chart from a local data file

complete noob picking up NodeJS over the last few days here, and I've gotten myself in big trouble, it looks like. I've currently got a working Node JS+Express server instance, running on a Raspberry Pi, acting as a web interface for a local data acquisition script ("the DAQ"). When executed, the script writes out data to a local file on the Pi, in .csv format, writing out in real-time every second.
My Node app is a simple web interface to start (on-click) the data acquisition script, as well as to plot previously acquired data logs, and visualize the actively being collected data in real time. Plotting of old logs was simple, and I wrote a JS function (using Plotly + d3) to read a local csv file via AJAX call, and plot it - using this script as a starting point, but using the logs served by express rather than an external file.
When I went to translate this into a real-time plot, I started out using the setInterval() method to update the graph periodically, based on other examples. After dealing with a few unwanted recursion issues, and adjusting the interval to a more reasonable setting, I eliminated the memory/traffic issues which were crashing the browser after a minute or two, and things are mostly stable.
However, I need help with one thing primarily:
Improving the efficiency of my first attempt approach: This acquisition script absolutely needs to be written to file every second, but considering that a typical run might last 1-2 weeks, the file size being requested on every Interval loop will quickly start to balloon. I'm completely new to Node/Express, so I'm sure there's a much better way of doing the real-time rendering aspect of this - that's the real issue here. Any pointers of a better way to go about doing this would be massively helpful!
Right now, the killDAQ() call issued by the "Stop" button kills the underlying python process writing out the data to disk. Is there a way to hook into using that same button click to also terminate the setInterval() loop updating the graph? There's no need for it to be updated any longer after the data acquisition has been stopped so having the single click do double duty would be ideal. I think that setting up a listener or res/req approach would be an option, but pointers in the right direction would be massively helpful.
(Edit: I solved #2, using global window. variables. It's a hack, but it seems to work:
window.refreshIntervalId = setInterval(foo);
...
clearInterval(window.refreshIntervalId);
)
Thanks for much for the help!
MWE:
html (using Pug as a template engine):
doctype html
html
body.default
.container-fluid
.row
.col-md-5
.row.text-center
.col-md-6
button#start_button(type="button", onclick="makeCallToDAQ()") Start Acquisition
.col-md-6
button#stop_button(type="button", onclick="killDAQ()") Stop Acquisition
.col-md-7
#myDAQDiv(style='width: 980px; height: 500px;')
javascript (start/stop acquisition):
function makeCallToDAQ() {
fetch('/start_daq', {
// call to app to start the acquisition script
})
.then(console.log(dateTime))
.then(function(response) {
console.log(response)
setInterval(function(){ callPlotly(dateTime.concat('.csv')); }, 5000);
});
}
function killDAQ() {
fetch('/stop_daq')
// kills the process
.then(function(response) {
// Use the response sent here
alert('DAQ has stopped!')
})
}
javascript (call to Plotly for plotting):
function callPlotly(filename) {
var csv_filename = filename;
console.log(csv_filename)
function makeplot(csv_filename) {
// Read data via AJAX call and grab header names
var headerNames = [];
d3.csv(csv_filename, function(error, data) {
headerNames = d3.keys(data[0]);
processData(data, headerNames)
});
};
function processData(allRows, headerNames) {
// Plot data from relevant columns
var plotDiv = document.getElementById("plot");
var traces = [{
x: x,
y: y
}];
Plotly.newPlot('myDAQDiv', traces, plotting_options);
};
makeplot(filename);
}
node.js (the actual Node app):
// Start the DAQ
app.use(express.json());
var isDaqRunning = true;
var pythonPID = 0;
const { spawn } = require('child_process')
var process;
app.post('/start_daq', function(req, res) {
isDaqRunning = true;
// Call the python script here.
const process = spawn('python', ['../private/BIC_script.py', arg1, arg2])
pythonPID = process.pid;
process.stdout.on('data', (myData) => {
res.send("Done!")
})
process.stderr.on('data', (myErr) => {
// If anything gets written to stderr, it'll be in the myErr variable
})
res.status(200).send(); //.json(result);
})
// Stop the DAQ
app.get('/stop_daq', function(req, res) {
isDaqRunning = false;
process.on('close', (code, signal) => {
console.log(
`child process terminated due to receipt of signal ${signal}`);
});
// Send SIGTERM to process
process.kill('SIGTERM');
res.status(200).send();
})

NodeJS - Memory/CPU management with MongoJS Stream

I'm parsing a fairly large dataset from MongoDB (of about 40,000 documents, each with a decent amount of data inside).
The stream is being accessed like so:
var cursor = db.domains.find({ html: { $exists: true } });
cursor.on('data', function(rec) {
i++;
var url = rec.domain;
var $ = cheerio.load(rec.html);
checkList($, rec, url, i);
// This "checkList" function parses HTML data with Cheerio to find different elements on the page. Lots of if/else statements
});
cursor.on('end', function(){
console.log("Streamed all objects!");
})
Each record gets parsed with Cheerio (the record contains HTML data from a page scraped earlier) and then I process the Cheerio data to look for various selectors, then saved back to MongoDB.
For the first ~2,000 objects the data is parsed quite quickly (in ~30 seconds). After that it becomes far slower, around 50 records being parsed per second.
Looking in my Macbook Air's activity monitor I see that it's not using a crazy amount of memory (226.5mb / 8gb ram) but it is using a whole lot of CPU (io.js is taking up 99% of my cpu).
Is this a possible memory leak? The checkLists function isn't particularly intensive (or at least, as far as I can tell - there are quite a few nested if/else statements but not much else).
Am I meant to be clearing my variables after they're being used, like setting $ = '' or similar? Any other reason with Node would be using so much CPU?
You basically need to "pause" the stream or otherwise "throttle" it from executing on very data item recieved straight away. So the code in the "event" does not wait before completion before the next event is fired, unless you stop the events emitting.
var cursor = db.domains.find({ html: { $exists: true } });
cursor.on('data', function(rec) {
cursor.pause(); // stop processessing new events
i++;
var url = rec.domain;
var $ = cheerio.load(rec.html);
checkList($, rec, url, i);
// if checkList() is synchronous then here
cursor.resume(); // start events again
});
cursor.on('end', function(){
console.log("Streamed all objects!");
})
If checkList() contains async methods, then pass in the cursor
checkList($, rec, url, i,cursor);
And process the "resume" inside:
function checkList(data, rec, url, i, cursor) {
somethingAsync(args,function(err,result) {
// We're done
cursor.resume(); // start events again
})
}
The "pause" stops the events emitting from the stream until the "resume" is called. This means your operations don't "stack up" in memory and wait for each to complete.
You probably want more advanced flow control for some parallel processing, but this is basically how you do it with streams.
And resume inside

Running JS in a killable 'thread'; detecting and canceling long-running processes

Summary: How can I execute a JavaScript function, but then "execute" (kill) it if it does not finish with a timeframe (e.g. 2 seconds)?
Details
I'm writing a web application for interactively writing and testing PEG grammars. Unfortunately, the JavaScript library I'm using for parsing using a PEG has a 'bug' where certain poorly-written or unfinished grammars cause infinite execution (not even detected by some browsers). You can be happily typing along, working on your grammar, when suddenly the browser locks up and you lose all your hard work.
Right now my code is (very simplified):
grammarTextarea.onchange = generateParserAndThenParseInputAndThenUpdateThePage;
I'd like to change it to something like:
grammarTextarea.onchange = function(){
var result = runTimeLimited( generateParserAndThenParseInput, 2000 );
if (result) updateThePage();
};
I've considered using an iframe or other tab/window to execute the content, but even this messy solution is not guaranteed to work in the latest versions of major browsers. However, I'm happy to accept a solution that works only in latest versions of Safari, Chrome, and Firefox.
Web workers provide this capability—as long as the long-running function does not require access to the window or document or closures—albeit in a somewhat-cumbersome manner. Here's the solution I ended up with:
main.js
var worker, activeMsgs, userTypingTimeout, deathRowTimer;
killWorker(); // Also creates the first one
grammarTextarea.onchange = grammarTextarea.oninput = function(){
// Wait until the user has not typed for 500ms before parsing
clearTimeout(userTypingTimeout);
userTypingTimeout = setTimeout(askWorkerToParse,500);
}
function askWorkerToParse(){
worker.postMessage({action:'parseInput',input:grammarTextarea.value});
activeMsgs++; // Another message is in flight
clearTimeout(deathRowTimer); // Restart the timer
deathRowTimer = setTimeout(killWorker,2000); // It must finish quickly
};
function killWorker(){
if (worker) worker.terminate(); // This kills the thread
worker = new Worker('worker.js') // Create a new worker thread
activeMsgs = 0; // No messages are pending on this new one
worker.addEventListener('message',handleWorkerResponse,false);
}
function handleWorkerResponse(evt){
// If this is the last message, it responded in time: it gets to live.
if (--activeMsgs==0) clearTimeout(deathRowTimer);
// **Process the evt.data.results from the worker**
},false);
worker.js
importScripts('utils.js') // Each worker is a blank slate; must load libs
self.addEventListener('message',function(evt){
var data = evt.data;
switch(data.action){
case 'parseInput':
// Actually do the work (which sometimes goes bad and locks up)
var parseResults = parse(data.input);
// Send the results back to the main thread.
self.postMessage({kind:'parse-results',results:parseResults});
break;
}
},false);

How to structure my code to return a callback?

So I've been stuck on this for quite a while. I asked a similar question here: How exactly does done() work and how can I loop executions inside done()?
but I guess my problem has changed a bit.
So the thing is, I'm loading a lot of streams and it's taking a while to process them all. So to make up for that, I want to at least load the streams that have already been processed onto my webpage, and continue processing stream of tweets at the same time.
loadTweets: function(username) {
$.ajax({
url: '/api/1.0/tweetsForUsername.php?username=' + username
}).done(function (data) {
var json = jQuery.parseJSON(data);
var jsonTweets = json['tweets'];
$.Mustache.load('/mustaches.php', function() {
for (var i = 0; i < jsonTweets.length; i++) {
var tweet = jsonTweets[i];
var optional_id = '_user_tweets';
$('#all-user-tweets').mustache('tweets_tweet', { tweet: tweet, optional_id: optional_id });
configureTweetSentiment(tweet);
configureTweetView(tweet);
}
});
});
}};
}
This is pretty much the structure to my code right now. I guess the problem is the for loop, because nothing will display until the for loop is done. So I have two questions.
How can I get the stream of tweets to display on my website as they're processed?
How can I make sure the Mustache.load() is only executed once while doing this?
The problem is that the UI manipulation and JS operations all run in the same thread. So to solve this problem you should just use a setTimeout function so that the JS operations are queued at the end of all UI operations. You can also pass a parameter for the timeinterval (around 4 ms) so that browsers with a slower JS engine can also perform smoothly.
...
var i = 0;
var timer = setInterval(function() {
var tweet = jsonTweets[i++];
var optional_id = '_user_tweets';
$('#all-user-tweets').mustache('tweets_tweet', {
tweet: tweet,
optional_id: optional_id
});
configureTweetSentiment(tweet);
configureTweetView(tweet);
if(i === jsonTweets.length){
clearInterval(timer);
}
}, 4); //Interval between loading tweets
...
NOTE
The solution is based on the following assumptions -
You are manipulating the dom with the configureTweetSentiment and the configureTweetView methods.
Ideally the solution provided above would not be the best solution. Instead you should create all html elements first in javascript only and at the end append the final html string to a div. You would see a drastic change in performance (Seriously!)
You don't want to use web workers because they are not supported in old browsers. If that's not the case and you are not manipulating the dom with the configure methods then web workers are the way to go for data intensive operations.

Why does the Segment.io loader script push method names/args onto a queue which seemingly gets overwritten?

I've been dissecting the following code snippet, which is used to asynchronously load the Segment.io analytics wrapper script:
// Create a queue, but don't obliterate an existing one!
var analytics = analytics || [];
// Define a method that will asynchronously load analytics.js from our CDN.
analytics.load = function(apiKey) {
// Create an async script element for analytics.js.
var script = document.createElement('script');
script.type = 'text/javascript';
script.async = true;
script.src = ('https:' === document.location.protocol ? 'https://' : 'http://') +
'd2dq2ahtl5zl1z.cloudfront.net/analytics.js/v1/' + apiKey + '/analytics.min.js';
// Find the first script element on the page and insert our script next to it.
var firstScript = document.getElementsByTagName('script')[0];
firstScript.parentNode.insertBefore(script, firstScript);
// Define a factory that generates wrapper methods to push arrays of
// arguments onto our `analytics` queue, where the first element of the arrays
// is always the name of the analytics.js method itself (eg. `track`).
var methodFactory = function (type) {
return function () {
analytics.push([type].concat(Array.prototype.slice.call(arguments, 0)));
};
};
// Loop through analytics.js' methods and generate a wrapper method for each.
var methods = ['identify', 'track', 'trackLink', 'trackForm', 'trackClick',
'trackSubmit', 'pageview', 'ab', 'alias', 'ready'];
for (var i = 0; i < methods.length; i++) {
analytics[methods[i]] = methodFactory(methods[i]);
}
};
// Load analytics.js with your API key, which will automatically load all of the
// analytics integrations you've turned on for your account. Boosh!
analytics.load('MYAPIKEY');
It's well commented and I can see what it's doing, but I'm puzzled when it comes to the methodFactory function, which pushes details (method name and arguments) of any method calls made before the main analytics.js script has loaded onto the global analytics array.
This is all well and good, but then if/when the main script does load, it seemingly just overwrites the global analytics variable (see last line here), so all that data will be lost.
I see how this prevents script errors in a web page by stubbing out methods which don't exist yet, but I don't understand why the stubs can't just return an empty function:
var methods = ['identify', 'track', 'trackLink', 'trackForm', 'trackClick',
'trackSubmit', 'pageview', 'ab', 'alias', 'ready'];
for (var i = 0; i < methods.length; i++) {
lib[methods[i]] = function () { };
}
What am I missing? Please, help me understand!
Ian here, co-founder at Segment.io—I didn't actually write that code, Calvin did, but I can fill you in on what it's doing.
You're right, the methodFactory is stubbing out the methods so that they are available before the script loads, which means people can call analytics.track without wrapping those calls in an if or ready() call.
But the methods are actually better than "dumb" stubs, in that they save the method that was called, so we can replay the actions later. That's this part:
analytics.push([type].concat(Array.prototype.slice.call(arguments, 0)));
To make that more readable:
var methodFactory = function (method) {
return function () {
var args = Array.prototype.slice.call(arguments, 0);
var newArgs = [method].concat(args);
analytics.push(newArgs);
};
};
It tacks on the name of the method that was called, which means if I analytics.identify('userId'), our queue actually gets an array that looks like:
['identify', 'userId']
Then, when our library loads in, it unloads all of the queued calls and replays them into the real methods (that are now available) so that all of the data recorded before load is still preserved. That's the key part, because we don't want to just throw away any calls that happen before our library has the chance to load. That looks like this:
// Loop through the interim analytics queue and reapply the calls to their
// proper analytics.js method.
while (window.analytics.length > 0) {
var item = window.analytics.shift();
var method = item.shift();
if (analytics[method]) analytics[method].apply(analytics, item);
}
analytics is a local variable at that point, and after we're done replaying, we replace the global with the local analytics (which is the real deal).
Hope that makes sense. We're actually going to have a series on our blog about all the little tricks for 3rd-party Javascript, so you might dig that soon!
Not very related to the question, but may be useful to those who googled for issue "segment not sends queued events".
In my code I assigned window.analytics to another variable at page loading stage:
let CLIENT = analytics;
Then I used this variable instead of using global analytics:
CLIENT.track();
CLIENT.page();
// etc
But I encountered a problem when sometimes events are sent, and sometimes nothing is being sent. That "sometimes" vary between page reloads. Sometimes it also could ignore all events that fire at page loading, and without page reloading start sending events that are binded after page loading.
Then I debugged and found that CLIENT holds all not sent events in queue. Obviously they were put using methodFactory(). Then I found this SO question. So that's what is happening I think:
CLIENT holds reference to stub analytics object, which calls this methodFactory(). After Segment is fully loaded it replaces window.analytics with actual code while CLIENT still holds reference to old window.analytics. That's why this "sometimes" happens: sometimes window.analytics was replaced by Segment before loading the main script which initializes this CLIENT, and sometimes main script loaded earlier than Segment script.
New code:
let CLIENT = undefined;
if (CLIENT) {
CLIENT.page();
} else {
window.analytics.page();
}
I need to have this CLIENT because I'm using same analytics code for web and mobile. On mobile this CLIENT will be initialized separately while on web window.analytics is always available.

Categories

Resources