Is increment an atomic operation in JavaScript? If one thread is accessing
++i; and at the same time another one starts to access the operation will there be any problems?
In JavaScript, a function always runs to completion. That means if a function is running, then it will run completely; only after that, the other function will be called. So, there is no chance of interleaving between statements (but in case of Java it is different).
If you are confused with asynchronous execution, then always remember async means later not parallel. So, coming to your problem, the answer is, No you will not face any problem, it will be a total atomic operation.
Javascript is single threaded, So you need to worry about deallocks or dirty read problems.
Why doesn't JavaScript support multithreading?
If one thread is accessing ++i; and at the same time another one starts to access the operation will there be any problems?
That won't happen with a simple variable like i, because JavaScript is defined such that there can only be one active thread (the agent's executing thread) in a realm at any given time. ("realm" - roughly speaking, a JavaScript global environment and the things within it, such as your variables.) So the issue simply doesn't arise with normal variables or object properties. Your function can't be interrupted during synchronous operation; JavaScript defines "run to completion" semantics: whenever a "job" (like triggering an event handler) is picked up from the job queue and executed, it runs to completion before any other job can be executed. (For an async function, the logic can only be suspended at await, return, or throw, not in the middle of a synchronous compound arithmetic operation. It can be in the middle of a compound arithmetic operation involving await. More about this below.)
The only place you'd have to worry about this is if you're using shared memory, where the actual memory is shared between realms and thus could indeed be accessed by multiple threads at the same time. But if you were doing that, you'd be dealing with a SharedArrayBuffer or a typed array using a SharedArrayBuffer, not a simple variable or property. But yes, if dealing with shared memory, you're exposed to all the "glorious" fun of CPU operation reordering, stale caches, and so on. That's part of why we have the Atomics object, including Atomics.add, which atomically adds a value to an element in a typed array using shared memory. (But beware naïve usage! After all, another thread could overwrite the value just after your add finishes, before you read it...) Atomics provides the bare building blocks necessary to ensure safe access to shared memory. (More about this in Chapter 16 of my book JavaScript: The New Toys: "Shared Memory and Atomics".)
Note: This all applies to standard JavaScript engines that comply with the specification, such as those found in web browsers and Node.js. Non-standard JavaScript environments, such as the the scripting support for JavaScript built into the Java Virtual Machine, may (of course) define alternative non-standard semantics.
Re async functions: There is no multithreading involved in async functions. But the fact the logic of the function is suspended at an await can cause some surprising behavior. That said, the place it can occur is clearly flagged with the await.
I wouldn't worry about the details below unless you have to. But for those who do...
Consider:
let a = 1;
async function one() {
return 1;
}
async function example() {
console.log(`Adding 1 to a`);
a += await one();
}
console.log(`Start, a = ${a}`);
Promise.all([
example(),
example(),
example(),
])
.then(() => {
console.log(`All done, a = ${a}`);
});
(Technically, we could just use a += await 1;, because await will wrap its operand in an implied Promise.resolve(x), but I thought it would be clearer to show an actual promise.)
That outputs:
Start, a = 1
Adding 1 to a
Adding 1 to a
Adding 1 to a
All done, a = 2
But wait, we added 1 to a three times, it should have ended up being 4, not 2?!?!
The key is in the await in this statement:
a += await one();
That's processed like this:
Get the current value of a and set it aside; call it atemp.
Call one and get its promise.
Wait for the promise to settle and get the fulfillment value; let's call that value addend.
Evaluate atemp + addend.
Write the result to a.
Or in code:
/* 1 */ const atemp = a;
/* 2 */ const promise = one();
/* 3 */ const addend = await promise; // Logic is suspended here and resumed later
/* 4 */ const result = atemp + addend;
/* 5 */ a = result;
(You can find this detail in EvaluateStringOrNumericBinaryExpression in the spec.)
Where we used example, we called it three times without waiting for its promise to settle, so Step 1 was run three times, setting aside the value of a (1) three times. Then later, those saved values are used and a's value is overwritten.
Again, there was no multithreading involved (and run-to-completion is fully intact), it's just that when an async function's logic reaches an await, return (explicit or implicit), or throw, the function exits at that point and returns a promise. If that was because of an await, the function's logic will continue when the promise the function awaited settles, and will (in the normal course of things) eventually settle the promise the async fucntion returned. But each of those things will happen to completion, and on a single active thread.
Javascript does not support multithreading. It may have web workers, but your question would not apply to this case, as workers do not share variables.
Yes there will be a problem. Even if Javascript is single threaded, i++ is (read + modify + write) 3 steps operation, So when anyone is taking the i variable from asyncfunction, the other one can set the modified i to the variable area. In order to solve this issue, you can use atomic variables instead of regular variable. When one process take the number with read another process can change the number with write during the first one modifying. No matter this is single thread or multithread, all the things happen the same. All the things I said here are retated to asyncfunction on Node.js.
Related
In C programming, stack overflow errors are outside of the language spec. They represent a fundamental violation of the "contract" of what a function call means. You can overflow the stack halfway through pushing arguments to a function. Or you can overflow it mid-way through a library routine like malloc() (internally to its own implementation might make several function calls that grow the stack, any one of which could overflow). An overflow could happen halfway through the bookkeeping it is doing for the allocation...leaving the heap in a corrupted state that would crash the next malloc() or free() if you tried to keep running.
In theory, it seems JavaScript could do better. It is not running on bare metal...so it could offer some kind of guarantees about operations that would not overflow the stack in mid-run (e.g. by preallocating memory for usermode recursions, vs. making every JS call trigger some C-level recursion in the interpreter). A JS interpreter could give you enough atomic building blocks for making a kind of transaction...where all of a set of operations would run (with no stack overflow) or none of them would (pre-emptively triggering a stack overflow). These transactions could be a wedge for monitoring what you would need to roll back at the moment of catching a RangeError exception upon running out of stack.
In practice, it seems to me that you're not much better off than C. So I think this answer is correct (2 upvotes at time of writing):
"you would have no indication on where your code broke so anything you tried to run would be severely buggy at best."
...and I think this answer is--at minimum--misleading in a generic sense (6 upvotes at time of writing):
"Maximum call stack size exceeded errors can be caught just like any other errors"
As an example, imagine I have this routine, and I wish to maintain the invariant that you never end up in a situation where the two named collections don't each have a value for a given key:
function addToBothCollections(key, n) {
collection1[key] = n
n = n + Math.random() // what if Math.random() overflows?
collection2[key] = n
}
If I had a "wedge" of guaranteed operations that would not overflow the stack, I could come up with a protocol where those operations were used to build some kind of transaction log. Then if an overflow occurred, I could tailor operations like this addToBothCollections() to take advantage of it.
e.g. imagine that negative numbers aren't allowed, so I could say:
function addToBothCollections(key, n) {
collection1[key] = n
collection2[key] = -1
n = n + Math.random()
collection2[key] = n
}
try {
/* ...code that makes a bunch of addToBothCollections() calls */
}
catch (e) {
for (let key in collection2) {
if (collection2[key] == -1) {
delete collection1[key]
delete collection2[key]
}
}
}
But thinking along these lines can only work if you have guarantees, such as that this sequence will either atomically execute both operations or neither:
collection1[key] = n
collection2[key] = -1
If there's any way that collection2[key] = -1 might do something like trigger a GC that causes a stack overflow, you're back to square one. Some atomicity rules are needed.
So is there anything at a language level--current or future--that articulates this kind of problem? I'm actually curious about how this would apply to mixed JavaScript and WebAssembly programs...so if there's any way that you could cook up the atomicity I speak of in Wasm I'd be fine with that as well.
Due to its overly dynamic nature with features like getters/setters, proxies, property addition/deletion, mutable global, user-defined implicit conversions, and other features that can cause "action from the distance", almost every language construct in JavaScript can lead to the invocation of arbitrary user code. Even the statement x; can. The only construct that is guaranteed not to is ===.
For example, collection2[key] = -1, could invoke a setter on collection2 or one of its prototypes (including Array.prototype or Object.prototype), it could invoke a getter on the global object if collection2 or key isn't bound locally, it performs string conversion on key, which in turn might invoke toString or valueOf on it, which in turn might be defined by the user or invoke getters. And so on.
As a consequence, it is impossible to guarantee freedom from errors for almost any function but utterly trivial ones like
function f() {}
function g(x, y) { x === y }
Since you asked about Wasm, that has no such implicit features, every call is explicit, so it is way easier to reason about.
Help! I'm learning to love Javascript after programming in C# for quite a while but I'm stuck learning to love the iterable protocol!
Why did Javascript adopt a protocol that requires creating a new object for each iteration? Why have next() return a new object with properties done and value instead of adopting a protocol like C# IEnumerable and IEnumerator which allocates no object at the expense of requiring two calls (one to moveNext to see if the iteration is done, and a second to current to get the value)?
Are there under-the-hood optimizations that skip the allocation of the object return by next()? Hard to imagine given the iterable doesn't know how the object could be used once returned...
Generators don't seem to reuse the next object as illustrated below:
function* generator() {
yield 0;
yield 1;
}
var iterator = generator();
var result0 = iterator.next();
var result1 = iterator.next();
console.log(result0.value) // 0
console.log(result1.value) // 1
Hm, here's a clue (thanks to Bergi!):
We will answer one important question later (in Sect. 3.2): Why can iterators (optionally) return a value after the last element? That capability is the reason for elements being wrapped. Otherwise, iterators could simply return a publicly defined sentinel (stop value) after the last element.
And in Sect. 3.2 they discuss using Using generators as lightweight threads. Seems to say the reason for return an object from next is so that a value can be returned even when done is true! Whoa. Furthermore, generators can return values in addition to yield and yield*-ing values and a value generated by return ends up as in value when done is true!
And all this allows for pseudo-threading. And that feature, pseudo-threading, is worth allocating a new object for each time around the loop... Javascript. Always so unexpected!
Although, now that I think about it, allowing yield* to "return" a value to enable a pseudo-threading still doesn't justify returning an object. The IEnumerator protocol could be extended to return an object after moveNext() returns false -- just add a property hasCurrent to test after the iteration is complete that when true indicates current has a valid value...
And the compiler optimizations are non-trivial. This will result in quite wild variance in the performance of an iterator... doesn't that cause problems for library implementors?
All these points are raised in this thread discovered by the friendly SO community. Yet, those arguments didn't seem to hold the day.
However, regardless of returning an object or not, no one is going to be checking for a value after iteration is "complete", right? E.g. most everyone would think the following would log all values returned by an iterator:
function logIteratorValues(iterator) {
var next;
while(next = iterator.next(), !next.done)
console.log(next.value)
}
Except it doesn't because even though done is false the iterator might still have returned another value. Consider:
function* generator() {
yield 0;
return 1;
}
var iterator = generator();
var result0 = iterator.next();
var result1 = iterator.next();
console.log(`${result0.value}, ${result0.done}`) // 0, false
console.log(`${result1.value}, ${result1.done}`) // 1, true
Is an iterator that returns a value after its "done" is really an iterator? What is the sound of one hand clapping? It just seems quite odd...
And here is in depth post on generators I enjoyed. Much time is spent controlling the flow of an application as opposed to iterating members of a collection.
Another possible explanation is that IEnumerable/IEnumerator requires two interfaces and three methods and the JS community preferred the simplicity of a single method. That way they wouldn't have to introduce the notion of groups of symbolic methods aka interfaces...
Are there under-the-hood optimizations that skip the allocation of the object return by next()?
Yes. Those iterator result objects are small and usually short-lived. Particularly in for … of loops, the compiler can do a trivial escape analysis to see that the object doesn't face the user code at all (but only the internal loop evaluation code). They can be dealt with very efficiently by the garbage collector, or even be allocated directly on the stack.
Here are some sources:
JS inherits it functionally-minded iteration protocol from Python, but with results objects instead of the previously favoured StopIteration exceptions
Performance concerns in the spec discussion (cont'd) were shrugged off. If you implement a custom iterator and it is too slow, try using a generator function
(At least for builtin iterators) these optimisations are already implemented:
The key to great performance for iteration is to make sure that the repeated calls to iterator.next() in the loop are optimized well, and ideally completely avoid the allocation of the iterResult using advanced compiler techniques like store-load propagation, escape analysis and scalar replacement of aggregates. To really shine performance-wise, the optimizing compiler should also completely eliminate the allocation of the iterator itself - the iterable[Symbol.iterator]() call - and operate on the backing-store of the iterable directly.
Bergi answered already, and I've upvoted, I just want to add this:
Why should you even be concerned about new object being returned? It looks like:
{done: boolean, value: any}
You know, you are going to use the value anyway, so it's really not an extra memory overhead. What's left? done: boolean and the object itself take up to 8 bytes each, which is the smallest addressable memory possible and must be processed by the cpu and allocated in memory in a few pico- or nanoseconds (I think it's pico- given the likely-existing v8 optimizations). Now if you still care about wasting that amount of time and memory, than you really should consider switching to something like Rust+WebAssembly from JS.
I have this code, meant to be run in a child process (through fork, to be specific), in order to try to measure the size of an object in memory:
const syncComputationThatResultsInALargeObject = require('whatever');
let initMemory;
let finalMemory;
let obj;
process.on('message', () => {
// global.gc();
initMemory = process.memoryUsage().heapUsed;
obj = syncComputationThatResultsInALargeObject();
finalMemory = process.memoryUsage().heapUsed;
process.send({
memoryCost: finalMemory - initMemory,
});
});
The reason that this is being done using a child process is to try to prevent any pollution from variables present in the parent process.
What I'm observing is, surprisingly, that sometimes the returned memoryCost is negative, implying that the heap size is smaller after creating obj.
If I however enable manual GC calls using --expose-gc in node, and call the GC before polling the heap usage, before creating the object, I never get negative values.
Can anyone give an answer as to why this could be happening? I'm using node 6.14.4 on Ubuntu 18.04.1, kernel 4.15.0-30-generic. Thanks.
EDIT: This happens even if I reference obj after the assignment to finalMemory, for instance, by putting a reference to one of its fields in the object passed to process.send.
After a few more tests, I was able to understand why this was happening.
In my particular instance of the computation function, I am able to pass several parameters that influence the resulting object size.
What I ended up observing is that when repeating the computation several times for a different sets of parameters, is that parameter sets that generate a larger object present a much greater dispersion between results, even when I call global.gc() at the indicated line.
Thus, I am attributing the observed behavior to the fact that sometimes the GC is being called inside the computation function, in an uncontrollable manner. Greater object sizes mean greater memory usage, which means more frequent (therefore possibly unpredictable) GC calls, which means greater dispersion in results for equal parameters.
I was at a node.js meetup today, and someone I met there said that node.js has es6 generators. He said that this is a huge improvement over callback style programming, and would change the node landscape. Iirc, he said something about call stack and exceptions.
I looked them up, but haven't really found any resource that explains them in a beginner-friendly way. What's a high-level overview of generators, and how are the different (or better?) than callbacks?
PS: It'd be really helpful if you could give a snippet of code to highlight the difference in common scenarios (making an http request or a db call).
Generators, fibers and coroutines
"Generators" (besides being "generators") are also the basic buildings blocks of "fibers" or "coroutines". With fibers, you can "pause" a function waiting for an async call to return, effectively avoiding to declare a callback function "on the spot" and creating a "closure". Say goodbye to callback hell.
Closures and try-catch
...he said something about call stack and exceptions
The problem with "closures" is that even if they "magically" keep the state of the local variables for the callback, a "closure" can not keep the call stack.
At the moment of callback, normally, the calling function has returned a long time ago, so any "catch" block on the calling function cannot catch exceptions in the async function itself or the callback. This presents a big problem. Because of this, you can not combine callbacks+closures with exception catching.
Wait.for
...and would change the node landscape
If you use generators to build a helper lib like Wait.for-ES6 (I'm the author), you can completely avoid the callback and the closure, and now "catch blocks" work as expected, and the code is straightforward.
It'd be really helpful if you could give a snippet of code to highlight the difference in common scenarios (making an http request or a db call).
Check Wait.for-ES6 examples, to see the same code with callbacks and with fibers based on generators.
UPDATE 2021: All of this has been superseded by javascript/ES2020 async/await. My recommendation is to use Typescript and async/await (which is based on Promises also standardized)
Generators is one of many features in upcoming ES6. So in the future it will be possible to use them in browsers (right now you can play with them in FF).
Generators are constructors for iterators. Sounds like gibberish, so in easier terms they allow to create objects that later will be possible to iterate with something like for loops using .next() method.
Generators are defined in a similar way to functions. Except they have * and yield in them. * is to tell that this is generator, yield is similar to return.
For example this is a generator:
function *seq(){
var n = 0;
while (true) yield n++;
}
Then you can use this generator with var s = seq(). But in contrast to a function it will not execute everything and give you a result, it will just instantiate the generator. Only when you will run s.next() the generator will be executed. Here yield is similar to return, but when the yield will run, it will pause the the generator and continues to work on the next expression after next. But when the next s.next() will be called, the generator will resume its execution. In this case it will continue doing while loop forever.
So you can iterate this with
for (var i = 0; i < 5; i++){
console.log( s.next().value )
}
or with a specific of construct for generators:
for (var n of seq()){
if (n >=5) break;
console.log(n);
}
These are basics about generators (you can look at yield*, next(with_params), throw() and other additional constructs). Note that it is about generators in ES6 (so you can do all this in node and in browser).
But how this infinite number sequence has anything to do with callback?
Important thing here is that yield pauses the generator. So imagine you have a very strange system which work this way:
You have database with users and you need to find the name of a user with some ID, then you need to check in your file system the key for a this user's name and then you need to connect to some ftp with user's id and key and do something after connection. (Sounds ridiculous but I want to show nested callbacks).
Previously you would write something like this:
var ID = 1;
database.find({user : ID}, function(userInfo){
fileSystem.find(userInfo.name, function(key){
ftp.connect(ID, key, function(o){
console.log('Finally '+o);
})
})
});
Which is callback inside callback inside callback inside callback. Now you can write something like:
function *logic(ID){
var userInfo = yield database.find({user : ID});
var key = yield fileSystem.find(userInfo.name);
var o = yield ftp.connect(ID, key);
console.log('Finally '+o);
}
var s = logic(1);
And then use it with s.next(); As you see there is no nested callbacks.
Because node heavily uses nested callbacks, this is the reason why the guy was telling that generators can change the landscape of node.
A generator is a combination of two things - an Iterator and an Observer.
Iterator
An iterator is something when invoked returns an iterable which is something you can iterate upon. From ES6 onwards, all collections (Array, Map, Set, WeakMap, WeakSet) conform to the Iterable contract.
A generator(iterator) is a producer. In iteration the consumer PULLs the value from the producer.
Example:
function *gen() { yield 5; yield 6; }
let a = gen();
Whenever you call a.next(), you're essentially pull-ing value from the Iterator and pause the execution at yield. The next time you call a.next(), the execution resumes from the previously paused state.
Observer
A generator is also an observer using which you can send some values back into the generator. Explained better with examples.
function *gen() {
document.write('<br>observer:', yield 1);
}
var a = gen();
var i = a.next();
while(!i.done) {
document.write('<br>iterator:', i.value);
i = a.next(100);
}
Here you can see that yield 1 is used like an expression which evaluates to some value. The value it evaluates to is the value sent as an argument to the a.next function call.
So, for the first time i.value will be the first value yielded (1), and when continuing the iteration to the next state, we send a value back to the generator using a.next(100).
Where can you use this in Node.JS?
Generators are widely used with spawn (from taskJS or co) function, where the function takes in a generator and allows us to write asynchronous code in a synchronous fashion. This does NOT mean that async code is converted to sync code / executed synchronously. It means that we can write code that looks like sync but internally it is still async.
Sync is BLOCKING; Async is WAITING. Writing code that blocks is easy. When PULLing, value appears in the assignment position. When PUSHing, value appears in the argument position of the callback
When you use iterators, you PULL the value from the producer. When you use callbacks, the producer PUSHes the value to the argument position of the callback.
var i = a.next() // PULL
dosomething(..., v => {...}) // PUSH
Here, you pull the value from a.next() and in the second, v => {...} is the callback and a value is PUSHed into the argument position v of the callback function.
Using this pull-push mechanism, we can write async programming like this,
let delay = t => new Promise(r => setTimeout(r, t));
spawn(function*() {
// wait for 100 ms and send 1
let x = yield delay(100).then(() => 1);
console.log(x); // 1
// wait for 100 ms and send 2
let y = yield delay(100).then(() => 2);
console.log(y); // 2
});
So, looking at the above code, we are writing async code that looks like it's blocking (the yield statements wait for 100ms and then continue execution), but it's actually waiting. The pause and resume property of generator allows us to do this amazing trick.
How does it work ?
The spawn function uses yield promise to PULL the promise state from the generator, waits till the promise is resolved, and PUSHes the resolved value back to the generator so it can consume it.
Use it now
So, with generators and spawn function, you can clean up all your async code in NodeJS to look and feel like it's synchronous. This will make debugging easy. Also the code will look neat.
BTW, this is coming to JavaScript natively for ES2017 - as async...await. But you can use them today in ES2015/ES6 and ES2016 using the spawn function defined in the libraries - taskjs, co, or bluebird
Summary:
function* defines a generator function which returns a generator object. The special thing about a generator function is that it doesn't execute when it is called using the () operator. Instead an iterator object is returned.
This iterator contains a next() method. The next() method of the iterator returns an object which contains a value property which contains the yielded value. The second property of the object returned by yield is the done property which is a boolean (which should return true if the generator function is done).
Example:
function* IDgenerator() {
var index = 0;
yield index++;
yield index++;
yield index++;
yield index++;
}
var gen = IDgenerator(); // generates an iterator object
console.log(gen.next().value); // 0
console.log(gen.next().value); // 1
console.log(gen.next().value); // 2
console.log(gen.next()); // object,
console.log(gen.next()); // object done
In this example we first generate an iterator object. On this iterator object we then can call the next() method which allows us to jump form yield to yield value. We are returned an object which has both a value and a done property.
How is this useful?
Some libraries and frameworks might use this construct to wait for the completion of asynchronous code for example redux-saga
async await the new syntax which lets you wait for async events uses this under the hood. Knowing how generators work will give you a better understanding of how this construct works.
To use the ES6 generators in node, you will need to either install node >= 0.11.2 or iojs.
In node, you will need to reference the harmony flag:
$ node --harmony app.js
or you can explicitly just reference the generators flag
$ node --harmony_generators app.js
If you've installed iojs, you can omit the harmony flag.
$ iojs app.js
For a high level overview on how to use generators, checkout this post.
I have a simple program in node.js, such as:
// CODE1
step1();
step2();
var o = step3();
step4(o);
step5();
step6();
this program is meant to be run in a stand-alone script (not in a web browser),
and it is a sequential program where order of execution is important (eg, step6 needs to be executed after step5).
the problem is that step3() is an async function, and the value 'o' is actually passed to a given callback,
so I would need to modify the program as follows:
// CODE2
step1();
step2();
step3( function (o) {
step4(o);
step5();
step6();
})
it could make sense to call step4 from the callback function, because it depends on the 'o' value computed by step3.
But step5 and step6 functions do not depend on 'o', and I have to include them in that callback function only to preserve the order of execution: step3, then step4, then step5 then step6.
this sounds really bad to me.
this is a sequential program, and so I would like to convert step3 to a sync function.
how to do this?
I am looking for something like this (eg using Q):
// CODE3
step1();
step2();
var deferred = Q.defer();
step3(deferred.resolve);
deferred.blockUntilFulfilled() // this function does not exist; this is what i am looking for
var o = deferred.inspect().value
step4(o);
step5();
step6();
How to do this?
ps: there are advantages and disadvantages of using sync or async, and this is a very interesting discussion. however, it is not the purpose of this question. in this question, i am asking how can i block/wait until a promise (or equivalent) gets fulfilled. Please, please, please, do not start a discussion on whether sync/blocking is good or not.
It's impossible to turn an async operation into a sync operation in vanilla JavaScript. There are things like node-fibers (a C++ add-on) that will allow this, or various compile-to-JS languages that will make the async operations look sync (essentially by rewriting your first code block to the second), but it is not possible to block until an async operation completes.
One way to see this is to note that the JavaScript event loop is always "run to completion," meaning that if you have a series of statements, they will always be guaranteed to execute one after the other, with nothing in between. Thus there is no way for an outside piece of information to come in and tell the code to stop blocking. Say you tried to make it work like so:
stepA();
stepB();
while (global.operationIsStillGoing) {
// do nothing
}
stepC();
This will never work, because due to the run-to-completion nature of JavaScript, it is not possible for anything to update global.operationIsStillGoing during the while loop, since that series of statements has not yet run to completion.
Of course, if someone writes a C++ addon that modifies the language, they can get around this. But that's not really JavaScript any more, at least in the commonly understood sense of ECMAScript + the event loop architecture.