Is it safe to use and update global variables in NodeJS? - javascript

I have a node server for loading certain scripts that can be written by anyone. I understand that when I fire up my Node server, modules load for the first time in the global scope. When one requests a page, it gets loaded by the "start server" callback; and I can use all the already loaded modules per request. But I haven't encountered a script where global variables get changed during request time and affects every single other instance in the process (maybe there is).
My question is, how safe is it, in terms of server crashing, to alter the global data? Also, suppose that that I have written a proper locking mechanism that will "pause" the server for all instances for a very short amount of time until the proper data is loaded.

Node.js is single threaded. So it's impossible for two separate requests to alter a global variable simultaneously. So in theory, it's safe.
However, if you're doing stuff like keep user A's data temporarily in a variable and then when user A later submits another request use that variable be aware that user B may make a request in between potentially altering user A's data.
For such cases keeping global values in arrays or objects is one way of separating user data. Another strategy is to use a closure which is a common practice in callback-intensive or event/promise oriented libraries such as socket.io.
When it comes to multithreading or multiprocessing, message passing style API like node's built-in cluster module has the same guarantees of not clobbering globals since each process have its own global. There are several multithreading modules that's implemented similarly - one node instance per thread. However, shared memory style APIs can't make such guarantees since each thread is now a real OS thread which may preempt each other and clobber each others memory. So if you ever decide to try out one of the multithreading modules, be aware of this issue.
It is possible to implement fake shared memory using message passing though - sort of like how we do it with ajax or socket.io. So I'd personally avoid shared memory style multithreading unless I really, really need to cooperatively work on a very large dataset that would bog down message passing architectures.
Then again, remember, the web is a giant message passing architecture with the messages being HTML and XML and json. So message passing scales to Google size.

Related

How can I sandbox code in an application with dynamically loaded, untrusted modules?

I'm making a game in Electron and I want it to support mods. Sometimes those mods will need to use custom logic, which leaves me with the issue of dynamically loading their code and I'm finding it hard to come up with a way to do that securely.
What I've considered
Ideally, I'd like to execute the mod scripts while passing just the few safe game objects they need as parameters, but that seems to be impossible (no matter the solution, that code can still access the global scope).
Most importantly, I have to prevent the untrusted code from accessing the preload global scope (with Node APIs), so require or anything else done in the preload is out the window.
Therefore that code has to be executed in the renderer.
My solution so far
I can either read the files in preload using fs or directly in the renderer using fetch. I have nodeIntegration set to false and contextIsolation set to true, and trusted code loaded by the preload script is selectively passed to the renderer through a contextBridge. The code which accesses Node APIs is properly encapsulated.
Unfortunately, that still leaves me with having to execute the unsafe code somehow, and I don't think there's any other way than to use eval or Function. Even though malicious code could not access Node APIs, it would still have full access to the renderer global scope, leaving the application vulnerable to, for example, a prototype pollution attack.
To sum up:
The safer place to execute untrusted code is clearly in the renderer
There is no alternative to using eval or Function
This leaves the renderer global scope vulnerable to attacks which I can try to mitigate but can never make it completely safe
My first question: are these assumptions true, or is there a better way to do it?
The risks and how to mitigate them
So the potentially malicious code has access to the renderer global scope. What's the risk?
Well, any sensitive user data will be safely stored in the preload, the same goes for access to the user's computer with Node APIs. The attacker can break the game (as in, the current 'session'), but I can catch any errors caused by that and reload the game with the malicious mod turned off. The global scope will only hold the necessary constructors and no actual instances of the game's classes. It seems somewhat safe, the worst thing that could happen is a reload of the game.
My second question: am I missing anything here regarding the risks?
My third question: are there any risks of using eval or Function that I'm not thinking of? I've sort of been bombarded with "eval bad" ever since I've started getting into JS and now I feel really dirty for even considering using it. To be exact, I'd probably be using new Function instead.
Thank you for reading this long thing!
There is no general solution for this, as this heavily depends on the structure of the project itself.
What you could try is to use espree to parse the unsafe code, and only execute it if there is no access to any global variable.
But that most likely will not prevent all attacks, because you might not think certain other attacks that might be possible due to the way the program is structured, require (or any other way to include/load other scripts) in that unsafe code could also open side channels allowing certain attacks.
eval and new Function are not bad in general, at least not as bad as loading/including unsafe code in any different way. Many libraries use code evaluation for generated code, and that's the purpose of those functions. But it is often misused in a situation in which there no need for that and that is something that should not be done.
The safest way is most likely to run the code in a WebWorker and define an API for the Mods to communicate between the mod and the application. But that requires to serialize and deserialize the data, when passing it form the app to the mod and the other way round, this can be expensive (but this is what is done e.g. with WebAssmebly). So I would read a bit how communication is solved with WebAssembly.

Using a global variable to prevent code executing repeatedly unnecessarily

TL;DR: Is it bad practice to use a global variable to prevent code executing unnecessarily, and if so what are the alternatives?
I have an Electron application that reads a stream of real time data from another application and outputs some elements of it on screen.
There are two types of data received, real time (telemetry) data of what is currently going on and more static data that updated every few seconds (sessionInfo).
The first time a sessionInfo packet is received I need to position and size some of the UI elements according to some of the data in it. This data will definitely not change during the course of the application being used, so I do not want the calculations based on it to be executed more than once*.
I need to listen for all sessionInfo packets, there are other things I do with them when received, it is just this specific part of the data that only needs to be considered once.
Given the above, is this a scenario where it would be appropriate to use a global variable to store this information (or even just a flag to say that this info had been processed) and use this to prevent the code executing multiple times? All my reading suggests that Global Variables are never a good idea, but short of allowing this code to execute repeatedly I am unsure what alternatives I have here.
*I recognise that allowing this would probably make no practical difference to my application, but this is a learning experience for me as well as producing something useful so I would like to understand the 'right' approach rather than just bodge something inefficient together and make it work.
Global variables are generally a bad practice for many reasons like:
They pollute the global scope, so if you create a global i, you
can't use that name anywhere else.
They make it impossible or very difficult to unit test your code.
You usually end up needing more of them than you think, to the point where many things become global.
A better practice is to create singleton services. For example, you might create a SessionService class that handles all your auth stuff. Or, perhaps a DataStreamService that holds the state variables of your data stream. To access that singleton, you'd import it into whatever file needs it. Some frameworks go even farther and have a single global data store, like react/redux.

Automatically call a function when accessing an object in Javascript

I am currently on a big Javascript project with a lot of libraries.
I would like to have some part of this project to run on separate thread. There is already something inJavascript doing that : the web workers.
Though, the web workers can't access the window object, and a lot of the libraries use it. Is there a way to automatically change the call to the window object (in the libraries used for the web workers), into a message sent to the parent thread ?
Then, the parent thread would perform the action that the worker want and send back the result to the worker.
Is it possible to do that ? And id yes, do you have any idea how ?
Thank you !
I'm afraid there's no real solution to this. What you'd probably want is a special object in your worker, which, at every property access, passes the execution to the dispatching thread - which handles the request using the original window object.
To do this, you would need some sort of catch-all accessor method which would run whenever a property is referenced. Sadly, there's no such thing in Javascript, see this detailed discussion (especially T.J. Crowder's answer): Is it possible to implement dynamic getters/setters in JavaScript?
ECMAScript 6 introduces a new mechanism called Proxy (currently supported in FF and IE12 (go figure!)), which would enable you to do these dynamic property lookups, technically - but I feel there's a more fundamental problem with your idea: you're aiming to turn a local call into a message across the boundaries of single threaded environments.
Message passing from and to the worker threads must be asynchronous (as a javascript "thread" cannot be interrupted until it yields), which would mean that even if you do manage to set up a proxy like that, it'd be effectively turning a usually synchronous operation (ie. a property access) into an asynchronous one, which is a pretty big issue, especially if you're looking for a drop-in replacement in order to use some existing libraries.

Node.js process.exit() does not exit cleanly, and the dangers of async fs.writeFile

tl;dr:
Calling the asynchronous fs.writeFile from asynchronous events (and perhaps even from just a plain old loop) and then calling process.exit() successfully opens the files but fails to flush the data into the files. The callbacks given to writeFile do not get a chance to run before the process exits. Is this expected behavior?
Regardless of whether process.exit() is failing to perform this cleanup, I call into question whether it should be node's duty to at least attempt to work the file writes into the schedule, because it may very well be the case that the deallocation of huge buffers depends on writing them out to disk.
details
I have a conceptually basic piece of node.js code which performs a transformation on a large data file. This happens to be a LiDAR sensor's data file, which should not be relevant. It is simply a dataset that is quite large owing to the nature of its existence. It is structurally simple. The sensor sends its data over the network. My task for this script is to produce a separate file for each rotating scan. The details of this logic is irrelevant as well.
The basic idea is I use node_pcap to read a huge .pcap file using the method given to do this task by node_pcap, which is "offline mode".
What this means is that instead of asynchronously catching the network packets as they appear, what appears to be a rather dense stream of asynchronous events representing the packets are "generated".
So, the main structure of the program consists of a few global state variables, and a single callback to the pcap session. I initialize globals, then assign the callback function to the pcap session. This callback to the packet event does all the work.
Part of this work is writing out a large array of data files. Once in a while a packet will indicate some condition that means I should move on to writing into the next data file. I increment the data filename index, and call fs.writeFile() again to begin writing the new file. Since I am writing only, it seems natural to let node decide when a good time is to begin writing.
Basically, both fs.writeFileSync and fs.writeFile should end up calling the OS's write() system call on their respective files in an asynchronous fashion. This does not bother me because I am only writing, so the asynchronous nature of the write which can affect certain access patterns does not matter to me since I do not do any access. The only difference is in that writeFileSync forces the node event loop to block until such time as the write() syscall completes.
As the program progresses, when I use writeFile (the js-asynchronous version), hundreds of my output files are created, but no data is written to them. Not one. The very first data file is still open when the hundredth data file is created.
This is conceptually fine. The reason is that node is busy crunching new data, and is happily holding on to the increasing number of file descriptors that it will eventually get to in order to write the files' data in. Meanwhile it also has to keep inside of memory all the eventual contents of the files. This will eventually run out, but let's ignore the RAM size limitation for a moment. Obviously a bad thing to happen here would be running out of RAM and crashing the program. Hopefully node will be smart and realize it just needs to schedule some file writes and then it can free a bunch of buffers...
If I stick a statement in the middle of all this to call process.exit(), I would expect that node will clean up and flush the pending writeFile writes before exiting.
But node does not do this.
Changing to writeFileSync fixes the problem obviously.
Changing and truncating my input data such that process.exit() is not explicitly called also results in the files eventually getting written (and the completion callback given to writeFile to run) at the very end when the input events are done pumping.
This seems to indicate for me that the cleanup is being improperly performed by process.exit().
Question: Is there some alternative to exiting the event loop cleanly in the middle? Note I had to manually truncate my large input file, because terminating with process.exit() caused all the file writes to not complete.
This is node v0.10.26 installed a while ago on OS X with Homebrew.
Continuing with my thought process, the behavior that I am seeing here calls into question the fundamental purpose of using writeFile. It's supposed to improve things to be able to flexibly write my file whenever node deems it fit. However, apparently if node's event loop is pumped hard enough, then it will basically "get behind" on its workload.
It is like the event loop has an inbox and an outbox. In this analogy, the outbox represents the temp variables containing the data I am writing to the files. The assumption that a lazy productive programmer like me wants to make is that the inbox and outbox are interfaces that I can use and that they are flexible and that the system will manage for me. However if I feed the inbox at too high a rate, then node actually can't keep up, and it will just start piling the data into the outbox without having any time to flush it because for one reason or another, the scheduling is such that all the incoming events have to get processed first. This in turn defers all garbage collection of the outbox's contents, and quite quickly we deplete the system's RAM. This is quite easily a hard-to-find bug when this pattern is used in a complex system. I am glad I took a modular approach to this project.
I mean, yes, clearly, obviously, beyond all doubt the answer is to use writeFileSync as I do almost every single time that I write files with node.
What, then, is the value in even having writeFile? At this point I am trading a potential small increase in parallel processing for the increased possibility that if (for some reason) the machine's processing capability drops (whether it's thermal throttling or OS level scheduling or I don't pay my IaaS bills on time, or any other reason), that it can potentially lead to a snowballing memory explosion?
Perhaps this is getting at the core of solving the truly rather complex problems inherent in streaming data processing systems, and that I cannot realistically expect this event-based processing model to step up and elegantly solve these problems automatically. Maybe I should be satisfied that it only gets me about half of the way to something robust. Maybe I am just projecting my wishes onto it and that it is unreasonable for me to assume that node needs to less deterministically "improve" the scheduling of its event loop.
I'm not a node expert but it seems like your problem can be simplified using streams. Streams let you pause and resume and also provide other neat functionality. I suggest you take at look at Chapter 9 of Professional NodeJS by Pedro Teixeira. You can find an online copy easily for reading purposes. It provides very detailed and well explained examples on how to use streams to read and write data and also prevent potential memory leaks and loss of data.

Since JavaScript is single-threaded, how are web workers in HTML5 doing multi-threading?

I've been reading about web workers in HTML5, but I know JavaScript is single-threaded.
How are web workers doing multi-threaded work then? or how are they simulating it if it's not truly multi-threaded?
As several comments have already pointed out, Workers really are multi-threaded.
Some points which may help clarify your thinking:
JavaScript is a language, it doesn't define a threading model, it's not necessarily single threaded
Most browsers have historically been single threaded (though that is changing rapidly: IE, Chrome, Firefox), and most JavaScript implementations occur in browsers
Web Workers are not part of JavaScript, they are a browser feature which can be accessed through JavaScript
A bit late, but I just asked myself the same question and I came up with the following answer:
Javascript in browsers is always single-threaded, and a fundamental consequence is that "concurrent" access to variables (the principal headache of multithreaded programming) is actually not concurrent; this is true with the exception of webworkers, which are actually run in separate threads and concurrent access to variables must be dealt with in a somewhat explicit way.
I am not a JavaScript ninja, but I too was convinced that JavaScript in browser is provided as a single threaded process, without paying much attention to whether it was true or to the rationale behind this belief.
A simple fact that supports this assumption is that when programming in JavaScript you don't have to care about concurrent access to shared variables. Every developer, without even thinking of the problem, writes code as if every access to a variable is consistent.
In other words, you don't need to worry about the so called Memory model.
Actually there is no need of looking at WebWorkers to involve parallel processing in JavaScript. Think of an (asynchronous) AJAX request. And think how carelessly you would handle concurrent access to variables:
var counter = 0;
function asyncAddCounter() {
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
if (this.readyState == 4) {
counter++;
}
};
xhttp.open("GET", "/a/remote/resource", true);
xhttp.send();
}
asyncAddCounter();
counter++;
What is the value of counter at the end of the process? It is 2.
It doesn't matter that it is read and written "concurrently", it will never result in a 1. This means that access to counter is always consistent.
If two threads where really accessing the value concurrently, they both could start off by reading 0 and both write 1 in the end.
In browsers, the actual data-fetching of a remote resource is hidden to the developer, and its inner workings are outside the scope of the JavaScript API (what the browser let's you control in terms of JavaScript instructions). As far as the developer is concerned, the result of the network request is processed by the main thread.
In short, the actual carrying out of the request is not visible, but the invocation of the callback (handling the result by custom JavaScript code) is executed by the main thread.
Possibly, if it wasn't for the webworkers, the term "multithreading" wouldn't ever enter the Javascript world.
The execution of the request and the asynchronous invocation of the callback is actually achieved by using event loops, not multithreading. This is true for several browsers and obviously for Node.js. The following are some references, in some cases a bit obsolete, but I guess that the main idea is still retained nowadays.
Firefox: Concurrency model and Event Loop - JavaScript | MDN
Chrome uses libevent in certain OS.
IE: Understanding the Event Model (Internet Explorer)
This fact is the reason why JavaScript is said to be Event-driven but not multithreaded.
Notice that JavaScript thus allows for asynchronous idioms, but not parallel execution of JavaScript code (outside webworkers). The term asynchronous just denotes the fact that the result of two instructions might be processed in scrambled order.
As for WebWorkers, they are JavaScript APIs that give a developer control over a multithreaded process.
As such, they provide explicit ways to handle concurrent access to shared memory (read and write values in different threads), and this is done, among the others, in the following ways:
you push data to a web worker (which means that the new thread reads data) by structured clone: The structured clone algorithm - Web APIs | MDN. Essentially there is no "shared" variable, instead the new thread is given a fresh copy of the object.
you push data to a web worker by transferring ownership of the value: Transferable - Web APIs | MDN. This means that the just one thread can read its value at any time.
as for the results returned by the web workers (how they "write"), the main thread access the results when prompted to do so (for instance with the instruction thisWorker.onmessage = function(e) {console.log('Message ' + e.data + ' received from worker');}). It must be by means of the usual Event Loop, I must suppose.
the main thread and the web worker access a truly shared memory, the SharedArrayBuffer, which is thread-safely accessed using the Atomic functions. I found this clearly exposed in this article: JavaScript: From Workers to Shared Memory
note: webworkers cannot access the DOM, which is truly shared!
You spawn a .js file as a "worker", and it runs processes in a separate thread. You can pass JSON data back and forth between it and the "main" thread. Workers don't have access to certain things like the DOM, though.
So if, say, you wanted to solve complicated math problems, you could let the user enter things into the browser, pass those variables off to the worker, let it do the computation in the background while in the main thread you let the user do other things, or show a progress bar or something, and then when the worker's done, it passes the answer back, and you print it to the page. You could even do multiple problems asynchronously and pass back the answers out of order as they finish. Pretty neat!
The browser kicks of a thread with the javascript you want to execute. So its a real thread, with this web workers thing, your js is no longer single-threaded.
The answer that claimed that "JavaScript is a language, it doesn't define a threading model, it's not necessarily single-threaded" is directly copy-pasted from a medium article... and it confuses without solving the doubt.
It's not necessarily single-threaded, like all other languages. YES... BUT
Javascript is a LANGUAGE meant for Single-threaded programming, and that is the beauty of it and makes it simple and easy to implement.
It is designed around a single Call stack.
Maybe in the future, with new implementations, it will become a Language for multi-threaded programming... but for now, Mehhhhhh.
The Node V8 is still single-threaded, yet it achieves multi-threaded capabilities by creating worker threads on LIBUV which is written in C++.
Same way, even though Javascript is not meant for Multithreading you can achieve limited multithreading by using Browser APIs.
Every time you open a TAB on a browser, it creates a new thread, and the process is the same with web workers.
It works internally BUT does not have access to any window objects.
Yes, People may call it Multithreaded if it makes em Happy,
But in 2021 the Answer is
"JS is meant for Single-threaded programming,(or a single-threaded language) but limited multi-threading can be achieved by using Browser APIs such as Web Workers"
Actually the main confusion, I think here, comes that people are finding clever ways to do things concurrently. If you think about it JavaScript is clearly not multithreaded and yet we have ways to do things in parallel, so what is going on?
Asking the right question is what will bring the answer here. Who is responsible for the threads? There is one answer above, saying that JS is just a language, not a threading model. Completely true!JavaScript has nothing to do with it. The responsibility falls on V8. Check this link for more information -> https://v8.dev/
So V8 is allowing a single thread per JS Context, which means, no matter how hard you try, spawning a new thread is simply impossible. Yet people spawn so-called workers and we get confused. For this to be answered, I ask you the following. Is it possible to maybe start 2 V8 and both of them to interpret some JS code? Precisely the solution to our problem. Workers communicate with messages because their context is different. They are other things that don't know anything about our context, therefore they need some information that comes in the form of a message.
As we are all aware, JavaScript is single-threaded: all code is queued and executed in a sequence.
Using Web Workers, we can run JavaScript processes concurrently (or at least, as close to concurrently as this language allows). The primary benefit of this approach is to handle the manipulation of data in background threads without interfering with the user-interface.
Using web worker:
Web Workers allow you to run JavaScript in parallel on a web page, without blocking the user interface.
Web workers executes in separate thread
Need to host all the worker code in separate file
They aren’t automatically garbage collected, So you need to control them.
To run worker use worker.postMessage(“”);
To stop worker there are two methods terminate() from caller code
and close() from Worker itself
Instantiating a worker will cost some memory.
Web Workers run in an isolated thread. As a result, the code that they execute needs to be contained in a separate file. But before we do that, the first thing to do is create a new Worker object in your main page. The constructor takes the name of the worker script:
var worker = new Worker('task.js');
If the specified file exists, the browser will spawn a new worker thread, which is downloaded asynchronously. The worker will not begin until the file has completely downloaded and executed. If the path to your worker returns an 404, the worker will fail silently.
After creating the worker, start it by calling the postMessage() method:
worker.postMessage(); // Start the worker.
Communicating with a Worker via Message Passing
Communication between a work and its parent page is done using an event model and the postMessage() method. Depending on your browser/version, postMessage() can accept either a string or JSON object as its single argument. The latest versions of the modern browsers support passing a JSON object.
Below is a example of using a string to pass 'Hello World' to a worker in doWork.js. The worker simply returns the message that is passed to it.
Main script:
var worker = new Worker('doWork.js');
worker.addEventListener('message', function(e) {
console.log('Worker said: ', e.data);
}, false);
worker.postMessage('Hello World'); // Send data to our worker.
doWork.js (the worker):
self.addEventListener('message', function(e) {
self.postMessage(e.data); // Send data back to main script
}, false);
When postMessage() is called from the main page, our worker handles that message by defining an onmessage handler for the message event. The message payload (in this case 'Hello World') is accessible in Event.data. This example demonstrates that postMessage() is also your means for passing data back to the main thread. Convenient!
References:
http://www.tothenew.com/blog/multi-threading-in-javascript-using-web-workers/ 
https://www.html5rocks.com/en/tutorials/workers/basics/
https://dzone.com/articles/easily-parallelize-jobs-using-0

Categories

Resources