I'm seeing some strange behavior in my nodejs game server in which there appears to be concurrency. This is strange because Nodejs is supposed to run in one thread as it doesn't use any concurrency. The problem is that I have an update function that's repeatedly called using setImmediate(). In this function I am using an array in two places. However, this same array is also modified when the "disconnect" event fires (which is when the client disconnects from the server). So it so happens that when the timing aligns so that the disconnect event fires AFTER the first place in which the array is accessed in the update function but BEFORE the second place, the array is modified and so the server crashes when the array is attempted to be accessed in the second place.
Here's some code that might make this picture clear:
function update(){
for(var i = 0; i < gameWorlds.length; i++){
gameWorlds[i].update();
console.log("GAMEWORLDS LENGTH BEFORE: " + gameWorlds.length);
NetworkManager.sendToClient(gameWorlds[i].id, "gameupdate", gameWorlds[i].getState());
console.log("GAMEWORLDS LENGTH AFTER: " + gameWorlds.length);
gameWorlds[i].clearGameState();
}
}
setImmediate(update);
//in the NetworkManager module, the disconnect event handler:
socket.on("disconnect", function(){
for(var a = 0; a < sockets.length; a++){
if(sockets[a].id === socket.id){
sockets.splice(a, 1);
}
}
listenerFunction("disconnect", socket.id);
console.log("Client " + socket.id + " DISCONNECTED!");
});
//also in the NetworkManager module, the sendToClient function:
function sendToClient(clientId, messageName, data){
for(var i = 0; i < sockets.length; i++){
if(sockets[i].id === clientId){
sockets[i].emit(messageName, data);
}
}
}
//in the main module (the same one as the update function), the listener
//function that's called in the disconnect event handler:
function networkEventsListener(eventType, eventObject){
if(eventType === "disconnect"){
for(var i = 0; i < gameWorlds.length; i++){
if(gameWorlds[i].id === eventObject){
gameWorlds.splice(i, 1);
console.log("GAME WORLD DELETED");
}
}
}
}
Now, I have a socketio event listener set up for when the client disconnects in which an element in the array is deleted. When this event occurs RIGHT in between the first and second places the array is accessed (as shown above), my server crashes. Either threads are being used or my function is stopped to let the event handler execute and then my function is resumed. Either way, I don't want this to be happening. Thank you!
EDIT 1: I edited the code to incorporate the console logs I have in my code. The reason why I am saying my loop is getting interrupted is because of the fact that the second console log outputs a length of 0 while the first console log outputs it greater than 0. Also, there is another console log in the disconnect event handler which FIRES in between the two console logs in my update function. This means that my function is getting interrupted.
EDIT 2: Thank you for all your replies I really appreciate it. I think there's been some confusion regarding:
1. The fact that no one has acknowledged how the console logs are appearing. In my previous edit, I changed the code to reflect how I am logging to see the problem. The issue is that in the disconnect event handler, I have a console log which is happening in between the two console logs in the loop. I.e. the disconnect event handler executes BEFORE the second console log is reached in the loop. Unless I am confused about the implementation of the console log function, the logs should be happening in the correct order (that is that the two console logs in the loop should always occur before any other console log in the rest of the program due to the ASYNC nature as most of you have stated.) But this is not the case, which leads me to believe something strange is happening.
2. None of the code inside the loop is changing the array. In a lot of your replies, you assume that there is code which actually modifies the array INSIDE the loop, which is not the case. The only code that modifies the array is code OUTISDE of the loop, which is why it's very strange that the first part of the loop in which the array is accessed doesn't crash but the second part does, even though the code in between DOESN'T change the array.
EDIT 3: Ok so a lot of the replies have been asking for the COMPLETE code. I have update the code with all the relevant REAL code.
Javascript in node.js is single threaded. A given thread of execution in Javascript will NOT be interrupted by a socket.io disconnect event. That physically can't happen. node.js is event driven. When the disconnect event happens, an event will be put into the Javascript event queue and ONLY when your current thread of execution is done will Javascript grab the next event out of the event queue and call the callback associated with it.
You don't show enough of your real code to know for sure, but what could be happening is if you have asynchronous operations, then when you start an async operation and register a callback for its completion, then you are finishing that Javascript thread of execution and it is merely a race to see which async event happens next (the completion of this specific async operation or the disconnect event from the socket.io disconnect). That is indeterminate and those events can happen in any order. So, if you have async code in the code in question, then the disconnect event can get processed while that code is waiting for a completion of an async event.
That is the type of race conditions that you have to be aware of in node.js programming. Anytime your logic goes asynchronous, then other things can get processed in node.js while your code is waiting for the asynchronous callback that signals the operation is complete.
What exactly to do about this depends entirely upon the exact situation and we would need to see and understand your real code (not pseudo code) to know which option to best recommend to you. FYI, this is one of the reasons we can always help you better if you show us your real code, not just pseudo code.
Here are some of the techniques that can be used when you are operating with async operations on a shared data structure that could be changed by other async code:
Make a copy of the data you want to process so no other code has access to your copy so it can't be modified by any other code. This might be making a copy of an array or it might be just using a closure to capture an index locally so the index can't be impacted by other code.
Use a flag to protect a data structure that is in the middle of being modified and train all other code to respect that flag. How exactly to do this depends upon the specific data. I have code in a Raspberry Pi node.js app that regularly saves data to disk and is subject to a race condition where other event driven code may want to update that data while I'm in the middle of using async I/O to write it to disk. Because the data is potentially large and the memory of the system not so large, I can't make a copy of the data as suggested in the first point. So, I used a flag to indicate that I'm in the middle of writing the data to disk and any code that wishes to modify the data while this flag is set, adds its operations to a queue rather than directly modifies the data. Then, when I'm done writing the data to disk, the code checks the queue to see if any pending operations need to be carried out to modify the data. And, since the data is represented by an object and all operations on the data are carried out by methods on the object, this is all made transparent to the code using the data or trying to modify the data.
Put the data in an actual database that has concurrency features and controls built into it so that it can make atomic changes to the data or data can be locked for brief periods of time or data can be fetched or updated in a safe way. Databases have lots of possible strategies for dealing with this since it happens with them a lot.
Make all accesses to the data be asynchronous so if some other async operation is in the middle of modifying the data, then other unsafe attempts to access the data can "block" until the original operation is done. This is one technique that databases use. You do, of course, have to watch out for deadlocks or for error paths where the flags or locks aren't cleared.
Some new comments based on your posting of more code:
This code is just wrong:
//in the main module (the same one as the update function), the listener
//function that's called in the disconnect event handler:
function networkEventsListener(eventType, eventObject){
if(eventType === "disconnect"){
for(var i = 0; i < gameWorlds.length; i++){
if(gameWorlds[i].id === eventObject){
gameWorlds.splice(i, 1);
console.log("GAME WORLD DELETED");
}
}
}
}
When you call .splice() in the middle of a for loop on the array you are iterating, it causes you to miss an item in the array you are iterating. I don't know if this has anything to do with your issue, but it is wrong. One simple way to avoid this issue it to iterate the array backwards. Then calling .splice() will not influence the position of any of the array elements that you have not yet iterated and you won't miss anything in the array.
Same issue in the for loop in your disconnect handler. If you only ever expect one array element to match in your iteration, then you can break right after the splice() and this will avoid this issue and you won't have to iterate backwards.
Two things I think you should change to fix the problem.
1) don't modify the length of the array when disconnect occurs but instead make a value that is falsey. A boolean or a one and zero scenario
2) add logic in the form of an if statement to check if the value is falsey for player two. That way you'll know they disconnected and don't deserve to have anything because they're lame and couldn't watch the loser screen.
That should fix the issue and you can. Decide what to do if they're to lazy to stay and watch the winning losing ceremony of your game.
var gameWorld = [ ];
function update(){ // some code } is async and is pushed to the event loop.
function disconnect(){ // some code } is also async and gets pushed to the event loop.
Even though update() is running on the call stack it's waiting for the event loop and it doesn't mean that it'll complete it's execution before the next tick occurs. gameWorld is outside both scopes it can be modified in the middle of update(). So when update() tries to access the array again it's different then when it started.
disconnect() is called before update() finishes and modifies the array on the event loop nexttick() thus by the time the code for update() gets to second player bam the array is messed up.
Even if you have an event listener, execution should not just stop mid function. When the event occurs, node will push the event callback on to the stack. Then when node finishes executing the current function it will start processing the other requests on the stack. You can't be sure of the order things will execute, but you can be sure that things will not get interrupted mid execution.
If your doWhatever function is async then the problem may be occurring because when node finally gets around to servicing the requests on the stack the loop has already finished, therefore everytime doWhatever is called it is being called with the same index (whatever its last value was.)
If you want to call async functions from a loop then you should wrap them in a function to preserve the arguments.
e.g.
function doWhateverWrapper(index){
theArray[index].doWhatever();
}
function update(){
for(var i = 0; i < theArray.length; i++){
//first place the array is accessed
doWhateverWrapper(i);
....more code.....
//second place the array is accessed
doWhateverWrapper(i);
}
}
setImmediate(update);
Related
I'm following a Node.js tutorial, it gave me the following code to start:
process.stdout.write("I'm thinking of a number from 1 through 10. What do you think it is? \n(Write \"quit\" to give up.)\n\nIs the number ... ");
let playGame = (userInput) => {
let input = userInput.toString().trim();
testNumber(input);
};
It asked me to finish the app, so I did this, and it worked:
process.stdin.on('data', (userInput) => {
let input = userInput.toString()
playGame(input)
//console.log(input)
});
However when I clicked the "check your work" button, it said that I did it wrong. The correct answer ended up being this:
process.stdin.on('data', (userInput) => {
let input = userInput.toString()
//console.log(input)
});
process.stdin.on('data', playGame)
I have a couple of questions about this. 1. Why does the playGame method need to be called in a listener rather than just explicitly calling it in the same method that grabs the user's data? 2. Why doesn't this example create a race condition? Don't events emitted with the 'data' name fire at the same time? Sorry if these are basic questions, just trying to understand what I'm learning. Thanks in advance for any insight.
What were the exact instructions in your tutorial?
process.stdin is a Stream, which in turn is an EventEmitter. The data event callbacks will be executed when the emitter emits the data event. Here is what the documentation says regarding emit():
Synchronously calls each of the listeners registered for the event
named eventName, in the order they were registered, passing the
supplied arguments to each.
So as far as I understand, the two solutions are functionnally equivalent - more specifically, splitting the code in two separate callbacks will not allow the event loop to execute any other code in between. Note that you are still doing a redundant string conversion with your solution.
Also, there is no race condition since, again, the callbacks will be executed one after the other in the order in which they are registered. There actually is one single execution thread in the javascript execution model - so essentially only a single piece of javascript code can execute at any given point in time.
I have a problem using eventemitter.emit method.
Basically this is what I want to do. I have a long running process (CPU bounded) that generates output objects, and since this is CPU bounded process i run it as a separate process using fork().
class Producer extends EventEmitter {
constructor() {
this.on('MyEvent', this.produce);
}
produce(input) {
var output = longRunningProcess();
this.emit('MyEvent, output);
process.send(output);
}
}
var producer = new Producer();
producer.emit('MyEvent', 0); // To kick off the execution
And once each output is generated, I want to send it to the parent process. And also use it to emit an event to produce another object and so on.
Now, the problem is that the process.send(output) doesn't seem to be executed. I can see the outputs being printed in the console one after one. But the parent doesn't seem to be receiving anything from the child process. In my understanding, nodejs event loop shouldn't pick up a new task until it finishes the current one and the stack is empty, but this is not the case here.
So can you guys help me with this?
Edit: Parent process code
this.producer = ChildProcess.fork('.path/to/produer.js'silent: true });
this.producer.on('message', (data) => {
this.miningProcess.send({ type: "StopMining", body: 0 });
});
It looks to me like you may be starving the event loop (never giving it any cycles to processing incoming events) which can wreck the ability to process networking, even outbound networking. I'd suggest that you start the next iteration only after the process.send() has completed.
class Producer extends EventEmitter {
constructor() {
this.on('MyEvent', this.produce.bind(this));
}
produce(input) {
let output = longRunningProcess();
process.send(output, () => {
// When the send finishes, start the next iteration
// This should allow the node.js event queue to process things
this.emit('MyEvent, output);
});
}
}
var producer = new Producer();
producer.emit('MyEvent', 0); // To kick off the execution
Other comments of note:
You need this.produce.bind(this) on your event handler instead of just this.produce to make sure the right this value is set when that function is called.
Keep in mind that eventEmitter.emit() is synchronous. It does not allow the event queue to process events and eventEmitter events do not go through the event queue.
This code assumes that the process.send() callback is called asynchronously and gives the event loop enough chances to process any events that are waiting. It also makes sure the interprocess message is completely sent before you start the next CPU intensive iteration which will temporarily block the event queue processing again. This way, you are sure the whole communication is done before blocking the event queue again.
You probably could have made things work with an appropriately places setTimeout() to kick off the next iteration, but I think it's more reliable to make sure the interprocess messaging is done before kicking off the next iteration.
FYI, if you're not using the EventEmitter you derive from for anything other than is shown here, then it isn't really needed. You could just call methods on your object directly rather than using EventEmitter events.
My code includes a function with a lengthy For loop. Near the end of the loop is a code line that, with each iteration, increases the width of a div that serves as a status bar. The problem? The status bar's width only increases if there is an alert line above it to alert me of the current loop counter. When I comment out that alert, the web page freezes, I hear a lot of computer whirring, and eventually I get a A web page is slowing down your browser. What would you like to do? browser message. The function is outlined below:
function processData(dataArr) {
// initialize the status bar
var bar = document.createElement('div');
bar.setAttribute("style","top:7px;left:125px;height:25px;width:400px");
bar.id="StatusBar";
document.body.appendChild(bar);
// now start the loop:
for (var i=0;i<dataArr.length;i++) {
// do a lot of string manipulation and parsing on dataArr[i]
// update the status bar every 50 iterations
if (Math.ceil(i/50) == Math.floor(i/50)) {
alert('Loop # '+(i+1)); // If this line is commented out, the status bar stops updating!
}
bar.style.width = Math.round(((1+i)/dataArr.length)*400)+'px';
}
// loop has finished last iteration, remove status bar
document.body.removeChild(bar);
}
The alert line noted above is the line whose presence makes the difference between the code functioning normally (data gets processed, the status bar grows with loop counter) or the web page freezing and getting a web page is slowing down your browser warning. Can anyone explain why a seemingly benign alert line would so dramatically change the functionality of this code?
The following fact is likely irrelevant, but just in case it isn't: the above function is called from within an onload: function() code block of a xmlhttpRequest in a second function.
Updated: I just found another possibly-related stackoverflow question which suggests maybe the underlying cause is that this function is called from within the onload block of an xmlhttpRequest. The answers to that question do not provide a solution to my problem, however, and the questions do not deal with quite the same topic (I am using a userscript, and page-loading does not seem to be the problem, as I am able to generate the initial status bar without any difficulty. With the exception of updating the status bar at the end of each loop, no web page manipulation is being conducted within the loop.)
2nd update: corrected so that the brackets are correctly closed
When inside a for loop, the presence of a seemingly innocuous alert allows a CSS command to immediately change a div (the div change command immediately follows the alert inside the loop) with each the loop increment because the alert command invokes the event loop to wait for a response from the user (e.g., hitting the OK button on the alert box). During this wait, the DOM changes can be rendered. Without the alert command (or some alternative way such as timers to explicitly cause real-time DOM change rendering), changes are not rendered on the web page until all Javascript functions are complete and the code's execution has returned to the browser's main event loop. In my case, this point in the code would be when the function returned to the second function that called it, within the onload: block of an xmlhttpRequest.
An explicit illustration of this behavior: while the CSS command to increase div width was supposed to occur at the end of each for loop increment, the div (serving as a status bar) instead only grew on every 50th loop iteration, the same frequency that the alert was coded to execute. To satisfy the need for a status bar to show data processing progress, I restructured the work flow in my code: instead of waiting until all chunks of data from the second function's xmlhttpRequest had been downloaded before calling the above function, I called the above function to process each chunk of data as it was downloaded. The status bar that was working well within the xmlhttpRequest block now represents the combined download and processing of the data.
This is a very simple use case. Show an element (a loader), run some heavy calculations that eat up the thread and hide the loader when done. I am unable to get the loader to actually show up prior to starting the long running process. It ends up showing and hiding after the long running process. Is adding css classes an async process?
See my jsbin here:
http://jsbin.com/voreximapewo/12/edit?html,css,js,output
To explain what a few others have pointed out: This is due to how the browser queues the things that it needs to do (i.e. run JS, respond to UI events, update/repaint how the page looks etc.). When a JS function runs, it prevents all those other things from happening until the function returns.
Take for example:
function work() {
var arr = [];
for (var i = 0; i < 10000; i++) {
arr.push(i);
arr.join(',');
}
document.getElementsByTagName('div')[0].innerHTML = "done";
}
document.getElementsByTagName('button')[0].onclick = function() {
document.getElementsByTagName('div')[0].innerHTML = "thinking...";
work();
};
(http://jsfiddle.net/7bpzuLmp/)
Clicking the button here will change the innerHTML of the div, and then call work, which should take a second or two. And although the div's innerHTML has changed, the browser doesn't have chance to update how the actual page looks until the event handler has returned, which means waiting for work to finish. But by that time, the div's innerHTML has changed again, so that when the browser does get chance to repaint the page, it simply displays 'done' without displaying 'thinking...' at all.
We can, however, do this:
document.getElementsByTagName('button')[0].onclick = function() {
document.getElementsByTagName('div')[0].innerHTML = "thinking...";
setTimeout(work, 1);
};
(http://jsfiddle.net/7bpzuLmp/1/)
setTimeout works by putting a call to a given function at the back of the browser's queue after the given time has elapsed. The fact that it's placed at the back of the queue means that it'll be called after the browser has repainted the page (since the previous HTML changing statement would've queued up a repaint before setTimeout added work to the queue), and therefore the browser has had chance to display 'thinking...' before starting the time consuming work.
So, basically, use setTimeout.
let the current frame render and start the process after setTimeout(1).
alternatively you could query a property and force a repaint like this: element.clientWidth.
More as a what is possible answer you can make your calculations on a new thread using HTML5 Web Workers
This will not only make your loading icon appear but also keep it loading.
More info about web workers : http://www.html5rocks.com/en/tutorials/workers/basics/
I will try to explain my actual setup, the idea behind it, what breaks, what I've tried around it.
The context
I have a PHP5.3 backend feeding "events" (an event being a standard array containing some data, among which a unique sequential number) to Javascript (with jQuery 1.7.x). The events are retrieved using jsonp (on a subdomain) and long-polling on the server side. The first event has the id 1, and then it increments with each new event. The client keeps track of the "last retrieved event id", and that value starts at 0. With each long-polling request, it provides that id so the backend only returns events that occurred after that one.
Events are processed in the following manner: Upon being received (through the jsonp callback), they are stored in an eventQueue variable and "the last retrieved event id" is updated to the one of the last event received and stored in the queue. Then a function is called that processes the next queued event. That function checks whether an event is already being processed (through the means of another variable that is set whenever an event is starting to get processed), if there is it does nothing, so the callstack brings us back to the jsonp callback where a new long-polling request is emitted. (That will repeat the process of queueing new events while the others are processed) However, if there is no event currently being processed, it verifies if there are events left in the queue, and if so it processes the first one (the one with the lowest id). "Processing an event" can be various tasks pertinent to my application, but not to the problem I have or to the context. For example, updating a variable, a message on the page, etc. Once an event is deemed "done being processed" (some events make an ajax call to get or send data, in which case this happens in their success ajax callback), a call to a another function called eventComplete is made. That function deletes the processed event from the event queue, makes sure the variable that handles whether an event is being processed is set to false, and then calls the function that processes the event queue. (So it processes the next, lowest id, event)
The problem
This works really well, on all tested major browsers too. (Tested on Internet Explorer 8 and 9, Chrome, Opera, Firefox) It also is very snappy due to the utilization of long polling. It's also really nice to get all the "history" (most events generate textual data that gets appended in a sort of console in the page) of what has happened and be in the exact same state of the application, even after reloading the page. However, this also becomes problematic when the number of events gets high. Based on estimates, I would need to be able handle as many as 30,000 events. In my tests, even at 7,000 events things start to go awry. Internet Explorer 8 stack overflows around 400 events. Chrome doesn't load all events, but gets close (and breaks, not always at the same point however, unlike IE8). IE9 and FF handle everything well, and hang 2-3 seconds while all events are processed, which is tolerable. I'm thinking however that it might just be a matter of some more events before they break as well. Am I being just too demanding of current web browsers, or is there something I got wrong? Is there a way around that? Is my whole model just wrong?
Possible solutions
I fiddled around with some ideas, none of which really worked. I tried forcing the backend to not output more than 200 events at a time and adding the new poll request after all the current queue was done processing. Still got a stack overflow. I also tried deleting the eventQueue object after it's done processing (even though it is empty then) and recreating it, in the hope that maybe it would free some underlying memory or something. I'm short on ideas, so any idea, pointer or general advice would be really appreciated.
Edit:
I had an enlightenment! I think I know exactly why all of this is happening (but I'm still unsure on how to approach it and fix it), I will provide some basic code excerpts too.
var eventQueue = new Object();
var processingEvent = false;
var lastRetrievedEventId = 0;
var currentEventId = 0;
function sendPoll() {
// Standard jsonp request (to a intentionally slow backend, i.e. long-polling),
// callback set to pollCallback(). Provide currentEventId to the server to only get
// the events starting from that point.
}
function pollCallback( data ) {
// Make sure the data isn't empty, this happens if the jsonp request
// expires (30s in my case) and it didn't get any new data.
if( !jQuery.isEmptyObject( data ) )
{
// Add each new event to the event queue.
$.each(data.events, function() {
eventQueue[ this.id ] = this;
lastRetrievedEventId = this.id; // Since we just put the event in the queue, we know it is officially the last one "retrieved".
});
// Process the next event, we know there has to be at least one in queue!
processNextEvent();
}
// Go look for new events!
sendPoll();
}
function processNextEvent() {
// Do not process events if they are currently being processed, that would happen
// when an event contains an asynchronous function, like an AJAX call.
if( !processingEvent )
{
var nextEventId = currentEventId + 1;
// Before accessing it directly, make sure the "next event" is in the queue.
if( Object.prototype.hasOwnProperty.call(eventQueue, nextEventId) )
{
processingEvent = true;
processEvent( eventQueue[ nextEventId ] );
}
}
}
function processEvent( event ) {
// Do different actions based on the event type.
switch( event.eventType ) {
case SOME_TYPE:
// Do stuff pertaining to SOME_TYPE.
eventComplete( event );
break;
case SOME_OTHER_TYPE:
// Do stuff pertaining to SOME_OTHER_TYPE.
eventComplete( event );
break;
// Etc. Many more cases here. If there is an AJAX call,
// the eventComplete( event ) is placed in the success: callback
// of that AJAX call, I do not want events to be processed in the wrong order.
}
}
function eventComplete( event ) {
// The event has completed, time to process the event after it.
currentEventId = event.id; // Since it was fully processed, it is now the most current event.
delete eventQueue[ event.id ]; // It was fully processed, we don't need it anymore.
processingEvent = false;
processNextEvent(); // Process the next event in queue. Most likely the source of all my woes.
}
function myApplicationIsReady() {
// The DOM is fully loaded, my application has initiated all its data and variables,
// start the long polling.
sendPoll();
}
$(function() {
// Initializing my application.
myApplicationIsReady();
});
After looking at things, I understood why the callstack gets full with many events. For example (-> meaning calls):
myApplicationIsReady() -> sendPoll()
And then when getting the data:
pollCallback() -> [ processNextEvent() -> processEvent() -> eventComplete() -> processNextEvent() ]
The part in brackets is the one that loops and causes the callstack overflow. It doesn't happen with a low amount of events because then it does this:
pollCallback() -> processNextEvent() -> processEvent() -> eventComplete() -> sendPoll()
That would be with two events, and the first one containing an asynchronous call. (So it gets to the second event, which doesn't get processed because the first one isn't done processing, instead it calls the polling function, which then frees the whole callstack and eventually the callback from that will resume the activity)
Now it is not easy to fix and it was designed like that in the first place, because:
I do not want to lose events (As in, I want to make sure all events are processed).
I do not want to hang the browser (I can't use synchronous AJAX calls or an empty loop waiting for something to finish).
I absolutely want events to get processed in the right order.
I do not want for events to get stuck in the queue and the application not processing them anymore.
That is where I need help now! To do what I want it sounds like I need to use chaining, but that is exactly what is causing my callstack issues. Perhaps there is a better chaining structure that lets me do all that, without going infinitely deep in the callstack and I might have overlooked it. Thank you again in advance, I feel like I'm making progress!
How about instead of calling functions recursively, use setTimeout(func, 0)?