how to avoid glitches in Rx - javascript

Unlike other "FRP" libraries, Rx doesn't prevent glitches: callbacks invoked with time-mismatched data. Is there a good way to work around this?
As an example, imagine that we have a series of expensive computations derived from a single stream (e.g. instead of _.identity, below, we do a sort, or an ajax fetch). We do distinctUntilChanged to avoid recomputing the expensive things.
sub = new Rx.Subject();
a = sub.distinctUntilChanged().share();
b = a.select(_.identity).distinctUntilChanged().share();
c = b.select(_.identity).distinctUntilChanged();
d = Rx.Observable.combineLatest(a, b, c, function () { return _.toArray(arguments); });
d.subscribe(console.log.bind(console));
sub.onNext('a');
sub.onNext('b');
The second event will end up causing a number of glitchy states: we get three events out, instead of one, which wastes a bunch of cpu and requires us to explicitly work around the mismatched data.
This particular example can be worked around by dropping the distinctUntilChanged, and writing some wonky scan() functions to pass through the previous result if the input hasn't changed. Then you can zip the results, instead of using combineLatest. It's clumsy, but doable.
However if there is asynchrony anywhere, e.g. an ajax call, then zip doesn't work: the ajax call will complete either synchronously (if cached) or asynchronously, so you can't use zip.
Edit
Trying to clarify the desired behavior with a simpler example:
You have two streams, a and b. b depends on a. b is asynchronous, but the browser may cache it, so it can either update independently of a, or at the same time as a. So, a particular event in the browser can cause one of three things: a updates; b updates; both a and b update. The desired behavior is to have a callback (e.g. render method) invoked exactly once in all three cases.
zip does not work, because when a or b fires alone, we get no callback from zip. combineLatest does not work because when a and b fire together we get two callbacks.

The concept
both a and b update
where both a and b are observables, doesn't exist as a primitive in Rx.
There is no lossless, general operator that can be defined to decide when it receives a notification from a whether it should pass it downstream or hold off until it receives a notification from b. Notifications in Rx do not natively carry "both" semantics, or any semantics beyond the Rx Grammar for that matter.
Furthermore, Rx's serial contract prevents an operator from taking advantage of overlapping notifications in an attempt to achieve this goal. (Though I suspect that relying on race conditions isn't your desired approach anyway.)
See §§4.2, 6.7 in the Rx Design Guidelines.
Thus, what I meant above by "There is no lossless, general operator that can be defined..." is that given two observables a and b with independent notifications, any operator that attempts to decide when it receives a notification from a or b whether it must push immediately or wait for the "other" value, must rely on arbitrary timings. It's guesswork. So this hypothetical operator must either drop values (e.g., DistinctUntilChanged or Throttle), or drop time (e.g., Zip or Buffer), though probably some combination of both.
Therefore, if the agent has the ability to push a alone, or b alone, or a and b together as a notification unit, then it's the developer's responsibility to reify this concept of notification unit themselves.
A 3-state type is required: a | b | {a,b}
(Please excuse my lousy JS)
var ab = function(a, b) { this.a = a; this.b = b; }
sub.onNext(new ab('a')); // process a alone
sub.onNext(new ab('a', 'b')); // process a and b together
sub.onNext(new ab(null, 'c')); // process c alone
The shape of the observable's query no longer matters. Observers must be defined to accept this data type. It's the generator's responsibility to apply any necessary buffering or timing calculations based on the semantics of its internal state in order to produce correct notifications for its observers.
By the way, thank you for providing a simple explanation in your edit (it seems clear to me anyway). I had first heard about "glitches" in this Rx forum discussion. As you can see, it was never really concluded. Now I wonder whether that OP's problem was really as simple as this, assuming that I've understood your problem correctly, of course. :-)
Update:
Here's another related discussion, including some more of my thoughts on why Rx is not FRP:
https://social.msdn.microsoft.com/Forums/en-US/bc2c4b71-c97b-428e-ad71-324055a3cd03/another-discussion-on-glitches-and-rx?forum=rx

Related

Is the order of events when reusing a source observable in a pipeline defined?

The premise
The following rxjs code is expected to accumulate all events that happens in the same micro-task as a single array (buffer) and then release the buffer for the next micro-task to handle:
ob.pipe(buffer(ob.pipe(debounceTime(0)))).subscribe(console.log);
// is expected to log [1,2,3] then [4] 100% of the time.
ob.next(1);
ob.next(2);
setTimeout(() => ob.next(4));
ob.next(3);
In theory, this works fine, but I have doubts about this being a guarantied behavior for all kind of sources (new Observable, output of an operator, subjects, etc.).
The main question
Could timing in the events of ob causes some of the events to end up in the wrong (next) buffer?
Could implementation details of buffer or debounceTime outside of the documented behavior causes the same issue?
In this case, I refer to "the wrong buffer" as slippage, where we would get [1,2] then [3,4] instead of the expected value due to the order of execution of the two operators.
Additional exploration
What if ob is defined as ob = ob.pipe(share()), making sure that both the buffer and debounceTime operators gets notified at the same time?
What if the buffer time is increased? Is there a risk of slippage if notification arrives too close to the debounce limit?
I have a feeling this will only work in the case where ob is published/shared. In that case:
Is this considered idiomatic rx code since the proper behavior of this chain could change depending on other operators around it?
Is there a better way to do this that would avoid those race-conditions? Maybe create my own operator?
To me, it feels very strange to see the same source observable "gating" its own values like this. I could definitely see this code break just because debounceTime observer triggers before buffer's, or the opposite. I guess the question could also apply to cases like ob.pipe(takeUntil(ob)) or ob.pipe(bufferWhen(() => ob)), even though they make little sense at first glance.

How to prevent script injection attacks

Intro
This topic has been the bane of many questions and answers on StackOverflow -and in many other tech-forums; however, most of them are specific to exact conditions and even worse: "over-all" security in script-injection prevention via dev-tools-console, or dev-tools-elements or even address-bar is said to be "impossible" to protect. This question is to address these issues and serve as current and historical reference as technology improves -or new/better methods are discovered to address browser security issues -specifically related to script-injection attacks.
Concerns
There are many ways to either extract -or manipulate information "on the fly"; specifically, it's very easy to intercept information gathered from input -to be transmitted to the server - regardless of SSL/TLS.
intercept example
Have a look here
Regardless of how "crude" it is, one can easily use the principle to fabricate a template to just copy+paste into an eval() in the browser console to do all kinds of nasty things such as:
console.log() intercepted information in transit via XHR
manipulate POST-data, changing user-references such as UUIDs
feed the target-server alternative GET (& post) request information to either relay (or gain) info by inspecting the JS-code, cookies and headers
This kind of attack "seems" trivial to the untrained eye, but when highly dynamic interfaces are in concern, then this quickly becomes a nightmare -waiting to be exploited.
We all know "you can't trust the front-end" and the server should be responsible for security; however - what about the privacy/security of our beloved visitors? Many people create "some quick app" in JavaScript and either do not know (or care) about the back-end security.
Securing the front-end as well as the back-end would prove formidable against an average attacker, and also lighten the server-load (in many cases).
Efforts
Both Google and Facebook have implemented some ways of mitigating these issues, and they work; so it is NOT "impossible", however, they are very specific to their respective platforms and to implement requires the use of entire frameworks plus a lot of work -only to cover the basics.
Regardless of how "ugly" some of these protection mechanisms may appear; the goal is to help (mitigate/prevent) security issues to some degree, making it difficult for an attacker. As everybody knows by now: "you cannot keep a hacker out, you can only discourage their efforts".
Tools & Requirements
The goal is to have a simple set of tools (functions):
these MUST be in plain (vanilla) javascript
together they should NOT exceed a few lines of code (at most 200)
they have to be immutable, preventing "re-capture" by an attacker
these MUST NOT clash with any (popular) JS frameworks, such as React, Angular, etc
does NOT have to be "pretty", but readable at least, "one-liners" welcome
cross-browser compatible, at least to a good percentile
Runtime Reflection / Introspection
This is a way to address some of these concerns, and I don't claim it's "the best" way (at all), it's an attempt.
If one could intercept some "exploitable" functions and methods and see if "the call" (per call) was made from the server that spawned it, or not, then this could prove useful as then we can see if the call came "from thin air" (dev-tools).
If this approach is to be taken, then first we need a function that grabs the call-stack and discard that which is not FUBU (for us by us). If the result of this function is empty, hazaa! - we did not make the call and we can proceed accordingly.
a word or two
In order to make this as short & simple as possible, the following code examples follow DRYKIS principles, which are:
don't repeat yourself, keep it simple
"less code" welcomes the adept
"too much code & comments" scare away everybody
if you can read code - go ahead and make it pretty
With that said, pardon my "short-hand", explanation will follow
first we need some constants and our stack-getter
const MAIN = window;
const VOID = (function(){}()); // paranoid
const HOST = `https://${location.host}`; // if not `https` then ... ?
const stak = function(x,a, e,s,r,h,o)
{
a=(a||''); e=(new Error('.')); s=e.stack.split('\n'); s.shift(); r=[]; h=HOSTPURL; o=['_fake_']; s.forEach((i)=>
{
if(i.indexOf(h)<0){return}; let p,c,f,l,q; q=1; p=i.trim().split(h); c=p[0].split('#').join('').split('at ').join('').trim();
c=c.split(' ')[0];if(!c){c='anon'}; o.forEach((y)=>{if(((c.indexOf(y)==0)||(c.indexOf('.'+y)>0))&&(a.indexOf(y)<0)){q=0}}); if(!q){return};
p=p[1].split(' '); f=p[0]; if(f.indexOf(':')>0){p=f.split(':'); f=p[0]}else{p=p.pop().split(':')}; if(f=='/'){return};
l=p[1]; r[r.length]=([c,f,l]).join(' ');
});
if(!isNaN(x*1)){return r[x]}; return r;
};
After cringing, bare in mind this was written "on the fly" as "proof of concept", yet tested and it works. Edit as you whish.
stak() - short explanation
the only 2 relevant arguments are the 1st 2, the rest is because .. laziness (short answer)
both arguments are optional
if the 1st arg x is a number then e.g. stack(0) returns the 1st item in the log, or undefined
if the 2nd arg a is either a string -or an array then e.g. stack(undefined, "anonymous") allows "anonymous" even though it was "omitted" in o
the rest of the code just parses the stack quickly, this should work in both webkit & gecko -based browsers (chrome & firefox)
the result is an array of strings, each string is a log-entry separated by a single space as function file line
if the domain-name is not found in a log-entry (part of filename before parsing) then it won't be in the result
by default it ignores filename / (exactly) so if you test this code, putting in a separate .js file will yield better results than in index.html (typically) -or whichever web-root mechanism is used
don't worry about _fake_ for now, it's in the jack function below
now we need some tools
bore() - get/set/rip some value of an object by string reference
const bore = function(o,k,v)
{
if(((typeof k)!='string')||(k.trim().length<1)){return}; // invalid
if(v===VOID){return (new Function("a",`return a.${k}`))(o)}; // get
if(v===null){(new Function("a",`delete a.${k}`))(o); return true}; // rip
(new Function("a","z",`a.${k}=z`))(o,v); return true; // set
};
bake() - shorthand to harden existing object properties (or define new ones)
const bake = function(o,k,v)
{
if(!o||!o.hasOwnProperty){return}; if(v==VOID){v=o[k]};
let c={enumerable:false,configurable:false,writable:false,value:v};
let r=true; try{Object.defineProperty(o,k,c);}catch(e){r=false};
return r;
};
bake & bore - rundown
These are failry self-explanatory, so, some quick examples should suffice
using bore to get a property: console.log(bore(window,"XMLHttpRequest.prototype.open"))
using bore to set a property: bore(window,"XMLHttpRequest.prototype.open",function(){return "foo"})
using bore to rip (destroy carelessly): bore(window,"XMLHttpRequest.prototype.open",null)
using bake to harden an existing property: bake(XMLHttpRequest.prototype,'open')
using bake to define a new (hard) property: bake(XMLHttpRequest.prototype,'bark',function(){return "woof!"})
intercepting functions and constructions
Now we can use all the above to our advantage as we devise a simple yet effective interceptor, by no means "perfect", but it should suffice; explanation follows:
const jack = function(k,v)
{
if(((typeof k)!='string')||!k.trim()){return}; // invalid reference
if(!!v&&((typeof v)!='function')){return}; // invalid callback func
if(!v){return this[k]}; // return existing definition, or undefined
if(k in this){this[k].list[(this[k].list.length)]=v; return}; //add
let h,n; h=k.split('.'); n=h.pop(); h=h.join('.'); // name & holder
this[k]={func:bore(MAIN,k),list:[v]}; // define new callback object
bore(MAIN,k,null); let f={[`_fake_${k}`]:function()
{
let r,j,a,z,q; j='_fake_'; r=stak(0,j); r=(r||'').split(' ')[0];
if(!r.startsWith(j)&&(r.indexOf(`.${j}`)<0)){fail(`:(`);return};
r=jack((r.split(j).pop())); a=([].slice.call(arguments));
for(let p in r.list)
{
if(!r.list.hasOwnProperty(p)||q){continue}; let i,x;
i=r.list[p].toString(); x=(new Function("y",`return {[y]:${i}}[y];`))(j);
q=x.apply(r,a); if(q==VOID){return}; if(!Array.isArray(q)){q=[q]};
z=r.func.apply(this,q);
};
return z;
}}[`_fake_${k}`];
bake(f,'name',`_fake_${k}`); bake((h?bore(MAIN,h):MAIN),n,f);
try{bore(MAIN,k).prototype=Object.create(this[k].func.prototype)}
catch(e){};
}.bind({});
jack() - explanation
it takes 2 arguments, the first as string (used to bore), the second is used as interceptor (function)
the first few comments explain a bit .. the "add" line simply adds another interceptor to the same reference
jack deposes an existing function, stows it away, then use "interceptor-functions" to replay arguments
the interceptors can either return undefined or a value, if no value is returned from any, the original function is not called
the first value returned by an interceptor is used as argument(s) to call the original and return is result to the caller/invoker
that fail(":(") is intentional; an error will be thrown if you don't have that function - only if the jack() failed.
Examples
Let's prevent eval from being used in the console -or address-bar
jack("eval",function(a){if(stak(0)){return a}; alert("having fun?")});
extensibility
If you want a DRY-er way to interface with jack, the following is tested and works well:
const hijack = function(l,f)
{
if(Array.isArray(l)){l.forEach((i)=>{jack(i,f)});return};
};
Now you can intercept in bulk, like this:
hijack(['eval','XMLHttpRequest.prototype.open'],function()
{if(stak(0)){return ([].slice.call(arguments))}; alert("gotcha!")});
A clever attacker may then use the Elements (dev-tool) to modify an attribute of some element, giving it some onclick event, then our interceptor won't catch that; however, we can use a mutation-observer and with that spy on "attribute changes". Upon attribute-change (or new-node) we can check if changes were made FUBU (or not) with our stak() check:
const watchDog=(new MutationObserver(function(l)
{
if(!stak(0)){alert("you again! :D");return};
}));
watchDog.observe(document.documentElement,{childList:true,subtree:true,attributes:true});
Conclusion
These were but a few ways of dealing with a bad problem; though I hope someone finds this useful, and please feel free to edit this answer, or post more (or alternative/better) ways of improving front-end security.

Why is <= slower than < using this code snippet in V8?

I am reading the slides Breaking the Javascript Speed Limit with V8, and there is an example like the code below. I cannot figure out why <= is slower than < in this case, can anybody explain that? Any comments are appreciated.
Slow:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
(Hint: primes is an array of length prime_count)
Faster:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i < this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
[More Info] the speed improvement is significant, in my local environment test, the results are as follows:
V8 version 7.3.0 (candidate)
Slow:
time d8 prime.js
287107
12.71 user
0.05 system
0:12.84 elapsed
Faster:
time d8 prime.js
287107
1.82 user
0.01 system
0:01.84 elapsed
Other answers and comments mention that the difference between the two loops is that the first one executes one more iteration than the second one. This is true, but in an array that grows to 25,000 elements, one iteration more or less would only make a miniscule difference. As a ballpark guess, if we assume the average length as it grows is 12,500, then the difference we might expect should be around 1/12,500, or only 0.008%.
The performance difference here is much larger than would be explained by that one extra iteration, and the problem is explained near the end of the presentation.
this.primes is a contiguous array (every element holds a value) and the elements are all numbers.
A JavaScript engine may optimize such an array to be an simple array of actual numbers, instead of an array of objects which happen to contain numbers but could contain other values or no value. The first format is much faster to access: it takes less code, and the array is much smaller so it will fit better in cache. But there are some conditions that may prevent this optimized format from being used.
One condition would be if some of the array elements are missing. For example:
let array = [];
a[0] = 10;
a[2] = 20;
Now what is the value of a[1]? It has no value. (It isn't even correct to say it has the value undefined - an array element containing the undefined value is different from an array element that is missing entirely.)
There isn't a way to represent this with numbers only, so the JavaScript engine is forced to use the less optimized format. If a[1] contained a numeric value like the other two elements, the array could potentially be optimized into an array of numbers only.
Another reason for an array to be forced into the deoptimized format can be if you attempt to access an element outside the bounds of the array, as discussed in the presentation.
The first loop with <= attempts to read an element past the end of the array. The algorithm still works correctly, because in the last extra iteration:
this.primes[i] evaluates to undefined because i is past the array end.
candidate % undefined (for any value of candidate) evaluates to NaN.
NaN == 0 evaluates to false.
Therefore, the return true is not executed.
So it's as if the extra iteration never happened - it has no effect on the rest of the logic. The code produces the same result as it would without the extra iteration.
But to get there, it tried to read a nonexistent element past the end of the array. This forces the array out of optimization - or at least did at the time of this talk.
The second loop with < reads only elements that exist within the array, so it allows an optimized array and code.
The problem is described in pages 90-91 of the talk, with related discussion in the pages before and after that.
I happened to attend this very Google I/O presentation and talked with the speaker (one of the V8 authors) afterward. I had been using a technique in my own code that involved reading past the end of an array as a misguided (in hindsight) attempt to optimize one particular situation. He confirmed that if you tried to even read past the end of an array, it would prevent the simple optimized format from being used.
If what the V8 author said is still true, then reading past the end of the array would prevent it from being optimized and it would have to fall back to the slower format.
Now it's possible that V8 has been improved in the meantime to efficiently handle this case, or that other JavaScript engines handle it differently. I don't know one way or the other on that, but this deoptimization is what the presentation was talking about.
I work on V8 at Google, and wanted to provide some additional insight on top of the existing answers and comments.
For reference, here's the full code example from the slides:
var iterations = 25000;
function Primes() {
this.prime_count = 0;
this.primes = new Array(iterations);
this.getPrimeCount = function() { return this.prime_count; }
this.getPrime = function(i) { return this.primes[i]; }
this.addPrime = function(i) {
this.primes[this.prime_count++] = i;
}
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if ((candidate % this.primes[i]) == 0) return true;
}
return false;
}
};
function main() {
var p = new Primes();
var c = 1;
while (p.getPrimeCount() < iterations) {
if (!p.isPrimeDivisible(c)) {
p.addPrime(c);
}
c++;
}
console.log(p.getPrime(p.getPrimeCount() - 1));
}
main();
First and foremost, the performance difference has nothing to do with the < and <= operators directly. So please don't jump through hoops just to avoid <= in your code because you read on Stack Overflow that it's slow --- it isn't!
Second, folks pointed out that the array is "holey". This was not clear from the code snippet in OP's post, but it is clear when you look at the code that initializes this.primes:
this.primes = new Array(iterations);
This results in an array with a HOLEY elements kind in V8, even if the array ends up completely filled/packed/contiguous. In general, operations on holey arrays are slower than operations on packed arrays, but in this case the difference is negligible: it amounts to 1 additional Smi (small integer) check (to guard against holes) each time we hit this.primes[i] in the loop within isPrimeDivisible. No big deal!
TL;DR The array being HOLEY is not the problem here.
Others pointed out that the code reads out of bounds. It's generally recommended to avoid reading beyond the length of arrays, and in this case it would indeed have avoided the massive drop in performance. But why though? V8 can handle some of these out-of-bound scenarios with only a minor performance impact. What's so special about this particular case, then?
The out-of-bounds read results in this.primes[i] being undefined on this line:
if ((candidate % this.primes[i]) == 0) return true;
And that brings us to the real issue: the % operator is now being used with non-integer operands!
integer % someOtherInteger can be computed very efficiently; JavaScript engines can produce highly-optimized machine code for this case.
integer % undefined on the other hand amounts to a way less efficient Float64Mod, since undefined is represented as a double.
The code snippet can indeed be improved by changing the <= into < on this line:
for (var i = 1; i <= this.prime_count; ++i) {
...not because <= is somehow a superior operator than <, but just because this avoids the out-of-bounds read in this particular case.
TL;DR The slower loop is due to accessing the Array 'out-of-bounds', which either forces the engine to recompile the function with less or even no optimizations OR to not compile the function with any of these optimizations to begin with (if the (JIT-)Compiler detected/suspected this condition before the first compilation 'version'), read on below why;
Someone just has to say this (utterly amazed nobody already did):
There used to be a time when the OP's snippet would be a de-facto example in a beginners programming book intended to outline/emphasize that 'arrays' in javascript are indexed starting at 0, not 1, and as such be used as an example of a common 'beginners mistake' (don't you love how I avoided the phrase 'programing error' ;)): out-of-bounds Array access.
Example 1:
a Dense Array (being contiguous (means in no gaps between indexes) AND actually an element at each index) of 5 elements using 0-based indexing (always in ES262).
var arr_five_char=['a', 'b', 'c', 'd', 'e']; // arr_five_char.length === 5
// indexes are: 0 , 1 , 2 , 3 , 4 // there is NO index number 5
Thus we are not really talking about performance difference between < vs <= (or 'one extra iteration'), but we are talking:
'why does the correct snippet (b) run faster than erroneous snippet (a)'?
The answer is 2-fold (although from a ES262 language implementer's perspective both are forms of optimization):
Data-Representation: how to represent/store the Array internally in memory (object, hashmap, 'real' numerical array, etc.)
Functional Machine-code: how to compile the code that accesses/handles (read/modify) these 'Arrays'
Item 1 is sufficiently (and correctly IMHO) explained by the accepted answer, but that only spends 2 words ('the code') on Item 2: compilation.
More precisely: JIT-Compilation and even more importantly JIT-RE-Compilation !
The language specification is basically just a description of a set of algorithms ('steps to perform to achieve defined end-result'). Which, as it turns out is a very beautiful way to describe a language.
And it leaves the actual method that an engine uses to achieve specified results open to the implementers, giving ample opportunity to come up with more efficient ways to produce defined results.
A spec conforming engine should give spec conforming results for any defined input.
Now, with javascript code/libraries/usage increasing, and remembering how much resources (time/memory/etc) a 'real' compiler uses, it's clear we can't make users visiting a web-page wait that long (and require them to have that many resources available).
Imagine the following simple function:
function sum(arr){
var r=0, i=0;
for(;i<arr.length;) r+=arr[i++];
return r;
}
Perfectly clear, right? Doesn't require ANY extra clarification, Right? The return-type is Number, right?
Well.. no, no & no... It depends on what argument you pass to named function parameter arr...
sum('abcde'); // String('0abcde')
sum([1,2,3]); // Number(6)
sum([1,,3]); // Number(NaN)
sum(['1',,3]); // String('01undefined3')
sum([1,,'3']); // String('NaN3')
sum([1,2,{valueOf:function(){return this.val}, val:6}]); // Number(9)
var val=5; sum([1,2,{valueOf:function(){return val}}]); // Number(8)
See the problem ? Then consider this is just barely scraping the massive possible permutations...
We don't even know what kind of TYPE the function RETURN until we are done...
Now imagine this same function-code actually being used on different types or even variations of input, both completely literally (in source code) described and dynamically in-program generated 'arrays'..
Thus, if you were to compile function sum JUST ONCE, then the only way that always returns the spec-defined result for any and all types of input then, obviously, only by performing ALL spec-prescribed main AND sub steps can guarantee spec conforming results (like an unnamed pre-y2k browser).
No optimizations (because no assumptions) and dead slow interpreted scripting language remains.
JIT-Compilation (JIT as in Just In Time) is the current popular solution.
So, you start to compile the function using assumptions regarding what it does, returns and accepts.
you come up with checks as simple as possible to detect if the function might start returning non-spec conformant results (like because it receives unexpected input).
Then, toss away the previous compiled result and recompile to something more elaborate, decide what to do with the partial result you already have (is it valid to be trusted or compute again to be sure), tie in the function back into the program and try again. Ultimately falling back to stepwise script-interpretation as in spec.
All of this takes time!
All browsers work on their engines, for each and every sub-version you will see things improve and regress. Strings were at some point in history really immutable strings (hence array.join was faster than string concatenation), now we use ropes (or similar) which alleviate the problem. Both return spec-conforming results and that is what matters!
Long story short: just because javascript's language's semantics often got our back (like with this silent bug in the OP's example) does not mean that 'stupid' mistakes increases our chances of the compiler spitting out fast machine-code. It assumes we wrote the 'usually' correct instructions: the current mantra we 'users' (of the programming language) must have is: help the compiler, describe what we want, favor common idioms (take hints from asm.js for basic understanding what browsers can try to optimize and why).
Because of this, talking about performance is both important BUT ALSO a mine-field (and because of said mine-field I really want to end with pointing to (and quoting) some relevant material:
Access to nonexistent object properties and out of bounds array elements returns the undefined value instead of raising an exception. These dynamic features make programming in JavaScript convenient, but they also make it difficult to compile JavaScript into efficient machine code.
...
An important premise for effective JIT optimization is that programmers use dynamic features of JavaScript in a systematic way. For example, JIT compilers exploit the fact that object properties are often added to an object of a given type in a specific order or that out of bounds array accesses occur rarely. JIT compilers exploit these regularity assumptions to generate efficient machine code at runtime. If a code block satisfies the assumptions, the JavaScript engine executes efficient, generated machine code. Otherwise, the engine must fall back to slower code or to interpreting the program.
Source:
"JITProf: Pinpointing JIT-unfriendly JavaScript Code"
Berkeley publication,2014, by Liang Gong, Michael Pradel, Koushik Sen.
http://software-lab.org/publications/jitprof_tr_aug3_2014.pdf
ASM.JS (also doesn't like out off bound array access):
Ahead-Of-Time Compilation
Because asm.js is a strict subset of JavaScript, this specification only defines the validation logic—the execution semantics is simply that of JavaScript. However, validated asm.js is amenable to ahead-of-time (AOT) compilation. Moreover, the code generated by an AOT compiler can be quite efficient, featuring:
unboxed representations of integers and floating-point numbers;
absence of runtime type checks;
absence of garbage collection; and
efficient heap loads and stores (with implementation strategies varying by platform).
Code that fails to validate must fall back to execution by traditional means, e.g., interpretation and/or just-in-time (JIT) compilation.
http://asmjs.org/spec/latest/
and finally https://blogs.windows.com/msedgedev/2015/05/07/bringing-asm-js-to-chakra-microsoft-edge/
were there is a small subsection about the engine's internal performance improvements when removing bounds-check (whilst just lifting the bounds-check outside the loop already had an improvement of 40%).
EDIT:
note that multiple sources talk about different levels of JIT-Recompilation down to interpretation.
Theoretical example based on above information, regarding the OP's snippet:
Call to isPrimeDivisible
Compile isPrimeDivisible using general assumptions (like no out of bounds access)
Do work
BAM, suddenly array accesses out of bounds (right at the end).
Crap, says engine, let's recompile that isPrimeDivisible using different (less) assumptions, and this example engine doesn't try to figure out if it can reuse current partial result, so
Recompute all work using slower function (hopefully it finishes, otherwise repeat and this time just interpret the code).
Return result
Hence time then was:
First run (failed at end) + doing all work all over again using slower machine-code for each iteration + the recompilation etc.. clearly takes >2 times longer in this theoretical example!
EDIT 2: (disclaimer: conjecture based in facts below)
The more I think of it, the more I think that this answer might actually explain the more dominant reason for this 'penalty' on erroneous snippet a (or performance-bonus on snippet b, depending on how you think of it), precisely why I'm adament in calling it (snippet a) a programming error:
It's pretty tempting to assume that this.primes is a 'dense array' pure numerical which was either
Hard-coded literal in source-code (known excelent candidate to become a 'real' array as everything is already known to the compiler before compile-time) OR
most likely generated using a numerical function filling a pre-sized (new Array(/*size value*/)) in ascending sequential order (another long-time known candidate to become a 'real' array).
We also know that the primes array's length is cached as prime_count ! (indicating it's intent and fixed size).
We also know that most engines initially pass Arrays as copy-on-modify (when needed) which makes handeling them much more fast (if you don't change them).
It is therefore reasonable to assume that Array primes is most likely already an optimized array internally which doesn't get changed after creation (simple to know for the compiler if there is no code modifiying the array after creation) and therefore is already (if applicable to the engine) stored in an optimized way, pretty much as if it was a Typed Array.
As I have tried to make clear with my sum function example, the argument(s) that get passed higly influence what actually needs to happen and as such how that particular code is being compiled to machine-code. Passing a String to the sum function shouldn't change the string but change how the function is JIT-Compiled! Passing an Array to sum should compile a different (perhaps even additional for this type, or 'shape' as they call it, of object that got passed) version of machine-code.
As it seems slightly bonkus to convert the Typed_Array-like primes Array on-the-fly to something_else while the compiler knows this function is not even going to modify it!
Under these assumptions that leaves 2 options:
Compile as number-cruncher assuming no out-of-bounds, run into out-of-bounds problem at the end, recompile and redo work (as outlined in theoretical example in edit 1 above)
Compiler has already detected (or suspected?) out of bound acces up-front and the function was JIT-Compiled as if the argument passed was a sparse object resulting in slower functional machine-code (as it would have more checks/conversions/coercions etc.). In other words: the function was never eligable for certain optimisations, it was compiled as if it received a 'sparse array'(-like) argument.
I now really wonder which of these 2 it is!
To add some scientificness to it, here's a jsperf
https://jsperf.com/ints-values-in-out-of-array-bounds
It tests the control case of an array filled with ints and looping doing modular arithmetic while staying within bounds. It has 5 test cases:
1. Looping out of bounds
2. Holey arrays
3. Modular arithmetic against NaNs
4. Completely undefined values
5. Using a new Array()
It shows that the first 4 cases are really bad for performance. Looping out of bounds is a bit better than the other 3, but all 4 are roughly 98% slower than the best case.
The new Array() case is almost as good as the raw array, just a few percent slower.

Understanding Fantasyland `ap`

I'm trying to come to an understand of ap, but having trouble.
In fantasyland, James Forbes says:
First we teach a function how to interact with our type, by storing that function in a container just like any other value. ( Functions are values too ya know! )
var square = Type.of(
a => a * a
)
//=> Type (number -> number)
Then we can apply that contained function to a contained value.
square.ap( Type.of(3) )
//=> Type(9)
ap calls map on a received type, with itself as the transform function.
function ap(type){
// recall our value
// is a function
// Type ( a -> a )
var transformer = this.__value
return type.map(transformer)
}
So this looks like ap only works if the value in our container is a function. This already feels weird to me because I thought the the whole point of The Perfect API was that these functions work on everything every time.
I also want to take note than because of the line square.ap(Type.of(3) ), it looks to me like ap takes any functor (implementer of map).
Now if I jump over to the javascript fantasy-land spec, which I assume is based on the James Forbes link, the 1.i definition of the ap signature (a.ap(b)) states
If b does not represent a function, the behaviour of ap is unspecified.
So it sounds like this spec is expecting ap to take a function unlike The Perfect API.
In summary, I guess I don't understand specifications for ap or what implementing it would look like. When I try googling this, it seems like most people just want to talk about map, which is easy for me to understand already.
The FantasyLand spec pre-dates James Forbes' article by three years, and was created by Brian McKenna, so it would seem that James Forbes' article is based on the spec, not the other way around.
To answer your question, a and b must both be the same kind of "container". If a is a Maybe, then b must also be a Maybe. If a is a Task, then b must also be a Task.
This is indicated here in the FantasyLand spec:
b must be same Apply as a.
Additionally, one of them must contain a function as its inner value. Which one needs to contain a function depends on the API. In the FantasyLand spec, it's b that would contain the function:
b must be an Apply of a function
In James Forbes' article, it's the opposite. I suspect this is because he's basing his article around Ramda, which takes arguments in the opposite order of what you would typically see elsewhere in JavaScript.
In any case, the result of ap is a value with the same type of container as a and b:
The Apply returned by ap must be the same as a and b
and the result contains the result of applying the contained function to the other contained value.
So if a were some value T[x] and b were some value T[f], then a.ap(b) would be T[f(x)].
Hopefully that makes some sense.

Is there a name for this pattern?

I am basically quite sure this pattern must exist and possess a name... for now I will call it "gate pattern"...
Here it is:
In my webpage's javascript, I have to trigger various asynchronous processes. Let's not discuss how trully async js is, but anyway I have to trigger 2 or 3 AJAX calls, must be sure, the UI build-up has finished, and so on.
Only then, when all these processes have finished, I want to do run a certain function. And precisely once.
Example
1: cropStore loaded()
2: resizeEvent()
3: productStore loaded()
The Pattern:
At the end of every (sucessful) Ajax-load-callback, the end of the GUI construction routine, etc... I set a respective flag from false to true and call gatedAction()
onEvent( 'load',
{
.... // whatever has to happen in response to cropStored, resized, etc...
// lastly:
f1 = true; //resp f2, f3, ...
gatedAction();
}
Gate will check the flags, return if any flag is still unset, only calling the target function, if all flags (or as I call them: gates) are open. If all my async pre-conditions call gatedAction() exactly once, I hope I can be sure, the actual targetFunction is called exactly once().
gatedAction ()
{
// Gate
if ( ! f1) return;
if ( ! f2) return;
if ( ! f3) return;
// actual Action ( <=> f1==f2==f3==true )
targetFunction();
}
In practice it works reliably. On a side-note: I think java-typical (not js-typical) synchronization/volatile concerns can be ignored, because javascript is not truly multithreading. Afaik a function is never stopped in the middle of it, just to grant another javascript function in the same document run-time...
So, anyone, is there a name for this? :-)
I need this pattern actually quite often, especially with complex backend UIs.. (and yes, I think, I will turn the above butt-ugly implementation into a more reusable javascript... With a gates array and a target function.)
It sounds like Balking pattern to me.
It is similar to the Rendezvous pattern, although that pattern is generally used in the context of multithreaded real-time systems.
I have no idea, if your pattern has a special name, but it seems equivalent to just using a counting semaphore, which blocks the thread, which started all those other actions, until they all made a V-invocation. Of course, there are no threads and semaphores in JavaScript, but instead of using many boolean variables you could use just one integer for counting.
In addition to the actual answer to your question, you might be interested in the Rx framework for Javascript. It's a port of the .NET version and allows you to compose events, so you don't have to work with tons of flag variables. It's meant for this sort of thing.
http://msdn.microsoft.com/en-us/devlabs/ee794896.aspx

Categories

Resources