javascript websockets - control initial connection/when does onOpen get bound - javascript

Two related questions that may be more rooted in my lack of knowledge of how/if browsers pre-parse javascript:
var ws = new WebSocket("ws://ws.my.url.com");
ws.onOpen = function() { ... };
There appears to be no way to directly control the initialisation of a WebSocket, beyond wrapping it in a callback, so I assume the connection is created as soon as the javascript code is loaded and get to the constructor?
When does the onOpen property get attached to ws? Is there any possibility of a race condition (if for some reason you had some code in between the definition of the socket and the definition of onOpen?) so that onOpen is undecidably bound before/after the connection is established (I know you could optionally check ws.readyState). Supplementary to this, is the WebSocket handshake blocking?
I realise it's all a draft at the moment, possibly implementation dependent and I may have missed something blindingly obvious, but I couldn't see anything particular pertinent on my internet searches/skim through the draft w3c spec, so any help in my understanding of websockets/javascript's inner workings is very much appreciated!

JavaScript is single threaded which means the network connection can't be established until the current scope of execution completes and the network execution gets a chance to run. The scope of execution could be the current function (the connect function in the example below). So, you could miss the onopen event if you bind to it very late on using a setTimeout e.g. in this example you can miss the event:
View: http://jsbin.com/ulihup/edit#javascript,html,live
Code:
var ws = null;
function connect() {
ws = new WebSocket('ws://ws.pusherapp.com:80/app/a42751cdeb5eb77a6889?client=js&version=1.10');
setTimeout(bindEvents, 1000);
setReadyState();
}
function bindEvents() {
ws.onopen = function() {
log('onopen called');
setReadyState();
};
}
function setReadyState() {
log('ws.readyState: ' + ws.readyState);
}
function log(msg) {
if(document.body) {
var text = document.createTextNode(msg);
document.body.appendChild(text);
}
}
connect();
If you run the example you may well see that the 'onopen called' log line is never output. This is because we missed the event.
However, if you keep the new WebSocket(...) and the binding to the onopen event in the same scope of execution then there's no chance you'll miss the event.
For more information on scope of execution and how these are queued, scheduled and processed take a look at John Resig's post on Timers in JavaScript.

TL;DR - The standard states that the connection can be opened "while the [JS] event loop is running" (e.g. by the browser's C++ code), but that firing the open event must be queued to the JS event loop, meaning any onOpen callback registered in the same execution block as new WebSocket(...) is guaranteed to be executed, even if the connection gets opened while the current execution block is still executing.
According to The WebSocket Interface specification in the HTML Standard (emphasis mine):
The WebSocket(url, protocols) constructor, when invoked, must run these steps:
Let urlRecord be the result of applying the URL parser to url.
If urlRecord is failure, then throw a "SyntaxError" DOMException.
If urlRecord's scheme is not "ws" or "wss", then throw a "SyntaxError" DOMException.
If urlRecord's fragment is non-null, then throw a "SyntaxError" DOMException.
If protocols is a string, set protocols to a sequence consisting of just that string.
If any of the values in protocols occur more than once or otherwise fail to match the requirements for elements that comprise the value of Sec-WebSocket-Protocol fields as defined by The WebSocket protocol, then throw a "SyntaxError" DOMException.
Run this step in parallel:
Establish a WebSocket connection given urlRecord, protocols, and the entry settings object. [FETCH]
NOTE If the establish a WebSocket connection algorithm fails, it triggers the fail the WebSocket connection algorithm, which then invokes the close the WebSocket connection algorithm, which then establishes that the WebSocket connection is closed, which fires the close event as described below.
Return a new WebSocket object whose url is urlRecord.
Note the establishment of the connection is run 'in parallel', and the specification further states that "...in parallel means those steps are to be run, one after another, at the same time as other logic in the standard (e.g., at the same time as the event loop). This standard does not define the precise mechanism by which this is achieved, be it time-sharing cooperative multitasking, fibers, threads, processes, using different hyperthreads, cores, CPUs, machines, etc."
Meaning that the connection can theoretically be opened before onOpen registration, even if onOpen(...) is the next statement after the constructor call.
However... the standard goes on to state under Feedback from the protocol:
When the WebSocket connection is established, the user agent must queue a task to run these steps:
Change the readyState attribute's value to OPEN (1).
Change the extensions attribute's value to the extensions in use, if it is not the null value. [WSP]
Change the protocol attribute's value to the subprotocol in use, if it is not the null value. [WSP]
Fire an event named open at the WebSocket object.
NOTE Since the algorithm above is queued as a task, there is no race condition between the WebSocket connection being established and the script setting up an event listener for the open event.
So in a browser or or library that adheres to the HTML Standard, a callback registered to WebSocket.onOpen(...) is guaranteed to execute, if it is registered before the end of the execution block in which the constructor is called, and before any subsequent statement in the same block that releases the event loop (e.g. await).

#leggetter is right, following code did executes sequentially:
(function(){
ws = new WebSocket("ws://echo.websocket.org");
ws.addEventListener('open', function(e){
console.log('open', e);
ws.send('test');
});
ws.addEventListener('message', function(e){console.log('msg', e)});
})();
But, in W3C spec there is a curious line:
Return a new WebSocket object, and continue these steps in the background (without blocking scripts).
It was confusing for me, when I was learning browser api for it. I assume that user agents ignoring it, or I misinterpreting it.

Pay attention to the fact that I/O may occur within the scope of execution.
For example, in the following code
var ws = new WebSocket("ws://localhost:8080/WebSockets/example");
alert("Hi");
ws.onopen = function(){
writeToScreen("Web Socket is connected!!" + "<br>");
};
function writeToScreen(message) {
var div = document.getElementById('test');
div.insertAdjacentHTML( 'beforeend', message );
}
, the message "Web Socket is connected" will appear or not, depending how much time it took you to close the "Hi" alert

No actual I/O will happen until after your script finishes executing, so there should not be a race condition.

Related

XMLHttpRequest returning with status 200, but 'onreadystatechange' event not fired

We have been receiving an intermittent bug with the XMLHttpRequest object when using IE11. Our codebase is using legacy architecture, so this browser is required.
After clicking a button, the browser launches an out-of-band process by creating a new ActiveX control which integrates with a camera to capture an image. This control appears to be working fine... it allows the operator to capture the image, and the Base64 content of the image is returned out of the control back to the browser interface, so I think we can rule out a problem with this object.
Once the image is returned to the browser, the browser performs an asynchronous 'ping' to the web server to check if the IIS session is still alive or it has expired (because the out-of-band image capture process forbids control of the browser while it is open).
The ping to the server returns successfully (and running Fiddler I can see that the response has status 200), with the expected response data:
<sessionstate>ok</sessionstate>
There is a defined 'onreadystatechange' function which should be fired on this response, and the majority of times this seems to fire correctly. However, on the rare occasion it does appear, it continues to happen every time.
Here is a snippet of the code... we expect the 'callback()' function to be called on a successful response to Timeout.asp:
XMLPoster.prototype.checkSessionAliveAsync = function(callback) {
var checkSessionAlive = new XMLHttpRequest();
checkSessionAlive.open("POST", "Timeout.asp?Action=ping", true);
checkSessionAlive.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
checkSessionAlive.onreadystatechange = function() {
if (checkSessionAlive.readyState == 4) {
if (checkSessionAlive.responseText.indexOf("expired") != -1 || checkSessionAlive.status !== 200) {
eTop.window.main.location = "timeout.asp";
return;
}
callback(checkSessionAlive.responseText);
}
}
checkSessionAlive.send();
}
Has anyone seen anything like this before? I appreciate that using legacy software is not ideal, but we are currently limited to using it.

PouchDB detect documents that aren't synced

I am trying to sync a local PouchDB instance to a remote CouchDB. Things work great, but I am not sure how to deal with the following situation:
I have added a validation rule in CouchDB to prevent updating (it will deny all updates). When I run the sync function on my local PouchDB instance after modifying a document, the "denied" event fires as I would expect. However, if I run sync a second time, the "denied" event doesn't fire again, even though the local document differs from the CouchDB version.
How can I check if the local database matches the remote database? If I miss the "denied" event the first time (lets say the user closes the browser), how can I detect on the next run that the databases are not in sync? How can I force PouchDB to try and sync the modified document again so that I can see the denied event?
Thanks!
syncPouch: function(){
var opts = {};
var sync = PouchDB.sync('orders', db.remoteDB, opts);
sync.on('change', function (info) {});
sync.on('paused', function(){
});
sync.on('active', function () {});
sync.on('denied', function(err){
//This only fire once no matter how many times I call syncPouch
console.log("Denied!!!!!!!!!!!!");
debugger;
});
sync.on('complete', function (info) {
//This fires every time
console.log("complete");console.log(info);
});
sync.on('error', function(err){
debugger;
});
return sync;
},
What I have noticed with validate_doc_update functions is that PouchDb appears to treat any "denied" document as sync-ed. So even if you then remove the validate_doc_update function, the document will not sync into the remote database on future attempts even though it is not the same.
So you can be left with an "out of sync" situation that can only be fixed by editing one of the documents again.
Perhaps you are seeing the same thing? Perhaps the "denied" event does not fire because there is no attempt by PouchDb to sync the document (as it has already attempted to sync it previously)?

How can I load a shared web worker with a user-script?

I want to load a shared worker with a user-script. The problem is the user-script is free, and has no business model for hosting a file - nor would I want to use a server, even a free one, to host one tiny file. Regardless, I tried it and I (of course) get a same origin policy error:
Uncaught SecurityError: Failed to construct 'SharedWorker': Script at
'https://cdn.rawgit.com/viziionary/Nacho-Bot/master/webworker.js'
cannot be accessed from origin 'http://stackoverflow.com'.
There's another way to load a web worker by converting the worker function to a string and then into a Blob and loading that as the worker but I tried that too:
var sharedWorkers = {};
var startSharedWorker = function(workerFunc){
var funcString = workerFunc.toString();
var index = funcString.indexOf('{');
var funcStringClean = funcString.substring(index + 1, funcString.length - 1);
var blob = new Blob([funcStringClean], { type: "text/javascript" });
sharedWorkers.google = new SharedWorker(window.URL.createObjectURL(blob));
sharedWorkers.google.port.start();
};
And that doesn't work either. Why? Because shared workers are shared based on the location their worker file is loaded from. Since createObjectURL generates a unique file name for each use, the workers will never have the same URL and will therefore never be shared.
How can I solve this problem?
Note: I tried asking about specific solutions, but at this point I think
the best I can do is ask in a more broad manner for any
solution to the problem, since all of my attempted solutions seem
fundamentally impossible due to same origin policies or the way
URL.createObjectURL works (from the specs, it seems impossible to
alter the resulting file URL).
That being said, if my question can somehow be improved or clarified, please leave a comment.
You can use fetch(), response.blob() to create an Blob URL of type application/javascript from returned Blob; set SharedWorker() parameter to Blob URL created by URL.createObjectURL(); utilize window.open(), load event of newly opened window to define same SharedWorker previously defined at original window, attach message event to original SharedWorker at newly opened windows.
javascript was tried at console at How to clear the contents of an iFrame from another iFrame, where current Question URL should be loaded at new tab with message from opening window through worker.port.postMessage() event handler logged at console.
Opening window should also log message event when posted from newly opened window using worker.postMessage(/* message */), similarly at opening window
window.worker = void 0, window.so = void 0;
fetch("https://cdn.rawgit.com/viziionary/Nacho-Bot/master/webworker.js")
.then(response => response.blob())
.then(script => {
console.log(script);
var url = URL.createObjectURL(script);
window.worker = new SharedWorker(url);
console.log(worker);
worker.port.addEventListener("message", (e) => console.log(e.data));
worker.port.start();
window.so = window.open("https://stackoverflow.com/questions/"
+ "38810002/"
+ "how-can-i-load-a-shared-web-worker-"
+ "with-a-user-script", "_blank");
so.addEventListener("load", () => {
so.worker = worker;
so.console.log(so.worker);
so.worker.port.addEventListener("message", (e) => so.console.log(e.data));
so.worker.port.start();
so.worker.port.postMessage("hi from " + so.location.href);
});
so.addEventListener("load", () => {
worker.port.postMessage("hello from " + location.href)
})
});
At console at either tab you can then use, e.g.; at How to clear the contents of an iFrame from another iFrame worker.postMessage("hello, again") at new window of current URL How can I load a shared web worker with a user-script?, worker.port.postMessage("hi, again"); where message events attached at each window, communication between the two windows can be achieved using original SharedWorker created at initial URL.
Precondition
As you've researched and as it has been mentioned in comments,
SharedWorker's URL is subject to the Same Origin Policy.
According to this question there's no CORS support for Worker's URL.
According to this issue GM_worker support is now a WONT_FIX, and
seems close enough to impossible to implement due to changes in Firefox.
There's also a note that sandboxed Worker (as opposed to
unsafeWindow.Worker) doesn't work either.
Design
What I suppose you want to achieve is a #include * userscript that will collect some statistics or create some global UI what will appear everywhere. And thus you want to have a worker to maintain some state or statistic aggregates in runtime (which will be easy to access from every instance of user-script), and/or you want to do some computation-heavy routine (because otherwise it will slow target sites down).
In the way of any solution
The solution I want to propose is to replace SharedWorker design with an alternative.
If you want just to maintain a state in the shared worker, just use Greasemonkey storage (GM_setValue and friends). It's shared among all userscript instances (SQLite behide the scenes).
If you want to do something computation-heavy task, to it in unsafeWindow.Worker and put result back in Greasemonkey storage.
If you want to do some background computation and it must be run only by single instance, there are number of "inter-window" synchronisation libraries (mostly they use localStorage but Greasemomkey's has the same API, so it shouldn't be hard to write an adapter to it). Thus you can acquire a lock in one userscript instance and run your routines in it. Like, IWC or ByTheWay (likely used here on Stack Exchange; post about it).
Other way
I'm not sure but there may be some ingenious response spoofing, made from ServiceWorker to make SharedWorker work as you would like to. Starting point is in this answer's edit.
I am pretty sure you want a different answer, but sadly this is what it boils down to.
Browsers implement same-origin-policies to protect internet users, and although your intentions are clean, no legit browser allows you to change the origin of a sharedWorker.
All browsing contexts in a sharedWorker must share the exact same origin
host
protocol
port
You cannot hack around this issue, I've trying using iframes in addition to your methods, but non will work.
Maybe you can put it your javascript file on github and use their raw. service to get the file, this way you can have it running without much efforts.
Update
I was reading chrome updates and I remembered you asking about this.
Cross-origin service workers arrived on chrome!
To do this, add the following to the install event for the SW:
self.addEventListener('install', event => {
event.registerForeignFetch({
scopes: [self.registration.scope], // or some sub-scope
origins: ['*'] // or ['https://example.com']
});
});
Some other considerations are needed aswell, check it out:
Full link: https://developers.google.com/web/updates/2016/09/foreign-fetch?hl=en?utm_campaign=devshow_series_crossoriginserviceworkers_092316&utm_source=gdev&utm_medium=yt-desc
Yes you can! (here's how):
I don't know if it's because something has changed in the four years since this question was asked, but it is entirely possible to do exactly what the question is asking for. It's not even particularly difficult. The trick is to initialize the shared worker from a data-url that contains its code directly, rather than from a createObjectURL(blob).
This is probably most easily demonstrated by example, so here's a little userscript for stackoverflow.com that uses a shared worker to assign each stackoverflow window a unique ID number, displayed in the tab title. Note that the shared-worker code is directly included as a template string (i.e. between backtick quotes):
// ==UserScript==
// #name stackoverflow userscript shared worker example
// #namespace stackoverflow test code
// #version 1.0
// #description Demonstrate the use of shared workers created in userscript
// #icon https://stackoverflow.com/favicon.ico
// #include http*://stackoverflow.com/*
// #run-at document-start
// ==/UserScript==
(function() {
"use strict";
var port = (new SharedWorker('data:text/javascript;base64,' + btoa(
// =======================================================================================================================
// ================================================= shared worker code: =================================================
// =======================================================================================================================
// This very simple shared worker merely provides each window with a unique ID number, to be displayed in the title
`
var lastID = 0;
onconnect = function(e)
{
var port = e.source;
port.onmessage = handleMessage;
port.postMessage(["setID",++lastID]);
}
function handleMessage(e) { console.log("Message Recieved by shared worker: ",e.data); }
`
// =======================================================================================================================
// =======================================================================================================================
))).port;
port.onmessage = function(e)
{
var data = e.data, msg = data[0];
switch (msg)
{
case "setID": document.title = "#"+data[1]+": "+document.title; break;
}
}
})();
I can confirm that this is working on FireFox v79 + Tampermonkey v4.11.6117.
There are a few minor caveats:
Firstly, it might be that the page your userscript is targeting is served with a Content-Security-Policy header that explicitly restricts the sources for scripts or worker scripts (script-src or worker-src policies). In that case, the data-url with your script's content will probably be blocked, and OTOH I can't think of a way around that, unless some future GM_ function gets added to allow a userscript to override a page's CSP or change its HTTP headers, or unless the user runs their browser with an extension or browser settings to disable CSP (see e.g. Disable same origin policy in Chrome).
Secondly, userscripts can be defined to run on multiple domains, e.g. you might run the same userscript on https://amazon.com and https://amazon.co.uk. But even when created by this single userscript, shared workers obey the same-origin policy, so there should be a different instance of the shared worker that gets created for all the .com windows vs for all the .co.uk windows. Be aware of this!
Finally, some browsers may impose a size limit on how long data-urls can be, restricting the maximum length of code for the shared worker. Even if not restricted, the conversion of all the code for long, complicated shared worker to base64 and back on every window load is quite inefficient. As is the indexing of shared workers by extremely long URLs (since you connect to an existing shared worker based on matching its exact URL). So what you can do is (a) start with an initially very minimal shared worker, then use eval() to add the real (potentially much longer) code to it, in response to something like an "InitWorkerRequired" message passed to the first window that opens the worker, and (b) For added efficiency, pre-calculate the base-64 string containing the initial minimal shared-worker bootstrap code.
Here's a modified version of the above example with these two wrinkles added in (also tested and confirmed to work), that runs on both stackoverflow.com and en.wikipedia.org (just so you can verify that the different domains do indeed use separate shared worker instances):
// ==UserScript==
// #name stackoverflow & wikipedia userscript shared worker example
// #namespace stackoverflow test code
// #version 2.0
// #description Demonstrate the use of shared workers created in userscript, with code injection after creation
// #icon https://stackoverflow.com/favicon.ico
// #include http*://stackoverflow.com/*
// #include http*://en.wikipedia.org/*
// #run-at document-end
// ==/UserScript==
(function() {
"use strict";
// Minimal bootstrap code used to first create a shared worker (commented out because we actually use a pre-encoded base64 string created from a minified version of this code):
/*
// ==================================================================================================================================
{
let x = [];
onconnect = function(e)
{
var p = e.source;
x.push(e);
p.postMessage(["InitWorkerRequired"]);
p.onmessage = function(e) // Expects only 1 kind of message: the init code. So we don't actually check for any other sort of message, and page script therefore mustn't send any other sort of message until init has been confirmed.
{
(0,eval)(e.data[1]); // (0,eval) is an indirect call to eval(), which therefore executes in global scope (rather than the scope of this function). See http://perfectionkills.com/global-eval-what-are-the-options/ or https://stackoverflow.com/questions/19357978/indirect-eval-call-in-strict-mode
while(e = x.shift()) onconnect(e); // This calls the NEW onconnect function, that the eval() above just (re-)defined. Note that unless windows are opened in very quick succession, x should only have one entry.
}
}
}
// ==================================================================================================================================
*/
// Actual code that we want the shared worker to execute. Can be as long as we like!
// Note that it must replace the onconnect handler defined by the minimal bootstrap worker code.
var workerCode =
// ==================================================================================================================================
`
"use strict"; // NOTE: because this code is evaluated by eval(), the presence of "use strict"; here will cause it to be evaluated in it's own scope just below the global scope, instead of in the global scope directly. Practically this shouldn't matter, though: it's rather like enclosing the whole code in (function(){...})();
var lastID = 0;
onconnect = function(e) // MUST set onconnect here; bootstrap method relies on this!
{
var port = e.source;
port.onmessage = handleMessage;
port.postMessage(["WorkerConnected",++lastID]); // As well as providing a page with it's ID, the "WorkerConnected" message indicates to a page that the worker has been initialized, so it may be posted messages other than "InitializeWorkerCode"
}
function handleMessage(e)
{
var data = e.data;
if (data[0]==="InitializeWorkerCode") return; // If two (or more) windows are opened very quickly, "InitWorkerRequired" may get posted to BOTH, and the second response will then arrive at an already-initialized worker, so must check for and ignore it here.
// ...
console.log("Message Received by shared worker: ",e.data); // For this simple example worker, there's actually nothing to do here
}
`;
// ==================================================================================================================================
// Use a base64 string encoding minified version of the minimal bootstrap code in the comments above, i.e.
// btoa('{let x=[];onconnect=function(e){var p=e.source;x.push(e);p.postMessage(["InitWorkerRequired"]);p.onmessage=function(e){(0,eval)(e.data[1]);while(e=x.shift()) onconnect(e);}}}');
// NOTE: If there's any chance the page might be using more than one shared worker based on this "bootstrap" method, insert a comment with some identification or name for the worker into the minified, base64 code, so that different shared workers get unique data-URLs (and hence don't incorrectly share worker instances).
var port = (new SharedWorker('data:text/javascript;base64,e2xldCB4PVtdO29uY29ubmVjdD1mdW5jdGlvbihlKXt2YXIgcD1lLnNvdXJjZTt4LnB1c2goZSk7cC5wb3N0TWVzc2FnZShbIkluaXRXb3JrZXJSZXF1aXJlZCJdKTtwLm9ubWVzc2FnZT1mdW5jdGlvbihlKXsoMCxldmFsKShlLmRhdGFbMV0pO3doaWxlKGU9eC5zaGlmdCgpKSBvbmNvbm5lY3QoZSk7fX19')).port;
port.onmessage = function(e)
{
var data = e.data, msg = data[0];
switch (msg)
{
case "WorkerConnected": document.title = "#"+data[1]+": "+document.title; break;
case "InitWorkerRequired": port.postMessage(["InitializeWorkerCode",workerCode]); break;
}
}
})();

HTML5 Webworker Startup Synchronization Guarantees

I have a bit of javascript I want to run in a webworker, and I am having a hard time understanding the correct approach to getting them to work in lock-step. I invoke the WebWorker from the main script as in the following simplified script:
// main.js
werker = new Worker("webWorkerScaffold.js");
// #1
werker.onmessage = function(msgObj){
console.log("Worker Reply")
console.log(msgObj);
doSomethingWithMsg(msgObj);
};
werker.onerror = function(err){
console.log("Worker Error:");
console.log(err);
};
werker.postMessage("begin");
Then the complimentary worker script looks like the following:
// webWorkerScaffold.js
var doWorkerStuffs = function(msg){}; // Omitted
// #2
onmessage = function (msgObj){
// Messages in will always be json
if (msgObj.data.msg === "begin")
doWorkerStuffs();
};
This code (the actual version) works as expected, but I am having a difficult time confirming it will always perform correctly. Consider the following:
The "new Worker()" call is made, spawning a new thread.
The spawned thread is slow to load (lets say hangs at "// #2")
The parent thread does "werker.postMessage..." with no recipient
... ?
The same applies in the reverse direction, where I might change the worker script to make noise outward once it is setup internally, under that scenario the main thread could hang at "// #1" and miss the incoming message as it dosen't have its comm's up.
Is there some way to guarantee that these scripts move forward in a lock-step way?
What I am really looking for is a zmq-like REP/REQ semantic, where one or the other blocks (or calls back) when 1:1 transactions can take place.

How synchronous AJAX call could cause memory leak?

I understand this general advice given against the use of synchronous ajax calls, because the synchronous calls block the UI rendering.
The other reason generally given is memory leak isssues with synchronous AJAX.
From the MDN docs -
Note: You shouldn't use synchronous XMLHttpRequests because, due to
the inherently asynchronous nature of networking, there are various
ways memory and events can leak when using synchronous requests. The
only exception is that synchronous requests work well inside Workers.
How synchronous calls could cause memory leaks?
I am looking for a practical example.
Any pointers to any literature on this topic would be great.
If XHR is implemented correctly per spec, then it will not leak:
An XMLHttpRequest object must not be garbage collected if its state is
OPENED and the send() flag is set, its state is HEADERS_RECEIVED, or
its state is LOADING, and one of the following is true:
It has one or more event listeners registered whose type is
readystatechange, progress, abort, error, load, timeout, or loadend.
The upload complete flag is unset and the associated
XMLHttpRequestUpload object has one or more event listeners registered
whose type is progress, abort, error, load, timeout, or loadend.
If an XMLHttpRequest object is garbage collected while its connection
is still open, the user agent must cancel any instance of the fetch
algorithm opened by this object, discarding any tasks queued for them,
and discarding any further data received from the network for them.
So after you hit .send() the XHR object (and anything it references) becomes immune to GC. However, any error or success will put the XHR into DONE state and it becomes subject to GC again. It wouldn't matter at all if the XHR object is sync or async. In case of a long sync request again it doesn't matter because you would just be stuck on the send statement until the server responds.
However, according to this slide it was not implemented correctly at least in Chrome/Chromium in 2012. Per spec, there would be no need to call .abort() since the DONE state means that the XHR object should already be normally GCd.
I cannot find even slightest evidence to back up the MDN statement and I have contacted the author through twitter.
I think that memory leaks are happening mainly because the garbage collector can't do its job. I.e. you have a reference to something and the GC can not delete it. I wrote a simple example:
var getDataSync = function(url) {
console.log("getDataSync");
var request = new XMLHttpRequest();
request.open('GET', url, false); // `false` makes the request synchronous
try {
request.send(null);
if(request.status === 200) {
return request.responseText;
} else {
return "";
}
} catch(e) {
console.log("!ERROR");
}
}
var getDataAsync = function(url, callback) {
console.log("getDataAsync");
var xhr = new XMLHttpRequest();
xhr.open("GET", url, true);
xhr.onload = function (e) {
if (xhr.readyState === 4) {
if (xhr.status === 200) {
callback(xhr.responseText);
} else {
callback("");
}
}
};
xhr.onerror = function (e) {
callback("");
};
xhr.send(null);
}
var requestsMade = 0
var requests = 1;
var url = "http://missing-url";
for(var i=0; i<requests; i++, requestsMade++) {
getDataSync(url);
// getDataAsync(url);
}
Except the fact that the synchronous function blocks a lot of stuff there is another big difference. The error handling. If you use getDataSync and remove the try-catch block and refresh the page you will see that an error is thrown. That's because the url doesn't exist, but the question now is how garbage collector works when an error is thrown. Is it clears all the objects connected with the error, is it keeps the error object or something like that. I'll be glad if someone knows more about that and write here.
If the synchronous call is interrupted (i.e. by a user event re-using the XMLHttpRequest object) before it completes, then the outstanding network query can be left hanging, unable to be garbage collected.
This is because, if the object that initiated the request does not exist when the request returns, the return cannot complete, but (if the browser is imperfect) remains in memory. You can easily cause this using setTimeout to delete the request object after the request has been made but before it returns.
I remember I had a big problem with this in IE, back around 2009, but I would hope that modern browsers are not susceptible to it. Certainly, modern libraries (i.e. JQuery) prevent the situations in which it might occur, allowing requests to be made without having to think about it.
Sync XHR block thread execution and all objects in function execution stack of this thread from GC.
E.g.:
function (b) {
var a = <big data>;
<work with> a and b
sync XHR
}
Variables a and b are blocked here (and whole stack too).
So, if GC started working then sync XHR has blocked stack, all stack variables will be marked as "survived GC" and be moved from early heap to the more persistent. And a tone of objects that should not survive even the single GC will live many Garbage Collections and even references from these object will survive GC.
About claims stack blocks GC, and that object marked as long-live objects: see section Conservative Garbage Collection in
Clawing Our Way Back To Precision.
Also, "marked" objects GCed after the usual heap is GCed, and usually only if there is still need to free more memory (as collecting marked-and-sweeped objs takes more time).
UPDATE:
Is it really a leak, not just early-heap ineffective solution?
There are several things to consider.
How long these object will be locked after request is finished?
Sync XHR can block stack for a unlimited amount of time, XHR has no timeout property (in all non-IE browsers), network problems are not rare.
How much UI elements are locked? If it block 20M of memory for just 1 sec == 200k lead in a 2min. Consider many background tabs.
Consider case when single sync blocks tone of resources and browser
goes to swap file
When another event tries to alter DOM in may be blocked by sync XHR, another thread is blocked (and whole it's stack too)
If user will repeat the actions that lead to the sync XHR, the whole browser window will be locked. Browsers uses max=2 thread to handle window events.
Even without blocking this consumes lots of OS and browser internal resources: thread, critical section resources, UI resources, DOM ... Imagine that your can open (due to memory problem) 10 tabs with sites that use sync XHR and 100 tabs with sites that use async XHR. Is not this memory leak.
Memory leaks using syncronous AJAX requests are often caused by:
using setInterval/setTimout causing circular calls.
XmlHttpRequest - when the reference is removed, so xhr becomes inaccessible
Memory leak happens when the browser for some reason doesn’t release memory from objects which are not needed any more.
This may happen because of browser bugs, browser extensions problems and, much more rarely, our mistakes in the code architecture.
Here's an example of a memory leak being cause when running setInterval in a new context:
var
Context = process.binding('evals').Context,
Script = process.binding('evals').Script,
total = 5000,
result = null;
process.nextTick(function memory() {
var mem = process.memoryUsage();
console.log('rss:', Math.round(((mem.rss/1024)/1024)) + "MB");
setTimeout(memory, 100);
});
console.log("STARTING");
process.nextTick(function run() {
var context = new Context();
context.setInterval = setInterval;
Script.runInContext('setInterval(function() {}, 0);',
context, 'test.js');
total--;
if (total) {
process.nextTick(run);
} else {
console.log("COMPLETE");
}
});

Categories

Resources