I have a web worker running a time-consuming routine task with ajax-requests. Can I terminate them from a main thread not waiting for them to finish?
That's how I spawn and terminate it:
$("button.parse-categories").click(function() {
if (parseCategoriesActive==false) {
parseCategoriesActive = true;
parseCategoriesWorker = new Worker("parseCategories.js");
$("button.parse-categories-cancel").click(function() {
parseCategoriesWorker.terminate();
parseCategoriesActive = false;
});
}
});
This is the worker code:
function myAjax(url, async, callback) {
xmlhttp = new XMLHttpRequest();
xmlhttp.onreadystatechange=function() {
if (xmlhttp.readyState==4 && xmlhttp.status==200)
callback(xmlhttp.responseText);
if (xmlhttp.readyState==4 && xmlhttp.status!=200) {
self.postMessage("error");
throw "error in ajax: "+xmlhttp.status;
}
}
xmlhttp.open("GET", url, async);
xmlhttp.send();
}
var parseCategoriesActive = true;
var counter = 0;
do {
myAjax('parser.php', false, function(resp) {
if (resp=='success')
parseCategoriesActive = false;
else {
counter += Number(resp);
self.postMessage(counter);
}
});
} while (parseCategoriesActive==true);
You can kill any webworker using terminate().
Citing from MDN:
The Worker.terminate() method immediately terminates the Worker. This does not offer the worker an opportunity to finish its operations; it is simply stopped at once.
I did this simple script and it seems the problem with FF and Chrome is still here:
var myWorker = new Worker("w.js");
myWorker.onmessage = function(e) {
console.log('Message received from worker:'+e.data);
}
myWorker.postMessage(100);
setTimeout(stopWW, 500) ;
function stopWW(){
console.log('TERMINATE');
myWorker.terminate();
}
while the webworker is:
onmessage = function(e) {
var n = e.data;
console.log('Message received from main script:'+n);
for (var i=0;i<n;i++){
console.log('i='+i);
postMessage(i);
}
}
as soon as the main thread terminates the webworker so it does not receive postmessage any more BUT the webworker is still running under the scenes, this is an output:
"Message received from main script:100" w.js:3:2
"i=0" w.js:5:1
"Message received from worker:0" w.html:6:2
"i=1" w.js:5:1
...
"Message received from worker:21" w.html:6:2
"i=22" w.js:5:1
"TERMINATE" w.html:13:1
"i=23" w.js:5:1
"i=24" w.js:5:1
...
"i=99" w.js:5:1
The OP clearly asked:
Can I terminate them from a main thread not waiting for them to
finish?
He is not asking how a worker should gracefully exit internally. He wants to very ungracefully kill any worker from an external thread.
You can kill any web worker by calling its terminate() function, assuming the worker instance is in context scope.
Web workers also have a life-cycle. They will self-terminate under specific conditions.
You can kill a dedicated worker by closing the page/frame that created it by calling window.close() from the main thread.
That is what dedicated means - it is only allowed to serve a single
page. Closing the page invokes the dedicated worker's
self-termination sequence.
You can kill a shared worker by closing all the pages of the domain
that created it. That is what shared means - it is
allowed to serve multiple pages from a domain. Closing all domain pages
invokes the shared worker's self-termination sequence.
Other worker types can also be killed by closing all domain pages.
Here is an example of terminating some unwelcome worker bots that steal your CPU:
// Create worker bots that consume CPU threads
var Blob = window.Blob;
var URL = window.webkitURL || window.URL;
var numbots = 4;
var bots = null;
var bot = "while(true){}"; // Nasty little botses!
bot = new Blob([bot], { type: 'text/javascript' });
bot = URL.createObjectURL(bot);
function terminator(){
if (bots!=null){
// Hasta la vista ...
for (worker in bots)
bots[worker].terminate();
bots = null;
document.getElementById("terminator").play();
document.getElementById("button").src = "https://dcerisano.github.io/assets/img/launch.jpg"
}
else{
// Launch Intruders ...
bots = [];
for (var i=0; i<numbots; i++)
bots[i] = new Worker(bot);
document.getElementById("alert").play();
document.getElementById("button").src = "https://dcerisano.github.io/assets/img/terminate.jpg"
}
}
<img id="button" src="https://dcerisano.github.io/assets/img/launch.jpg" width="100%" onclick="terminator()">
<audio id="terminator" src="https://dcerisano.github.io/assets/audio/hastalavista.mp3">
<audio id="alert" src="https://dcerisano.github.io/assets/audio/alert.mp3">
Use your CPU monitor to confirm snippet is running.
Press button to terminate worker bots.
"Hide results" will close the snippet page/frame, also terminating
worker bots.
Hopefully this answer will be useful to those wishing to build apps that hunt and kill unwelcome bots implemented as web workers (eg. bitmining bots that steal your electricity, bandwidth, and make you an accessory to Ponzi)
For example, this Chrome extension detects and selectively blocks web workers (all sorts of bots, including miners).
Related
I have a Worker that shares a SharedArrayBuffer with the "main thread". To work correctly, I have to make sure that the worker has access to the SAB before the main thread accesses to it. (EDIT: The code creating the worker has to be in a seperate function (EDIT2: which returns an array pointing to the SAB).) (Maybe, already this is not possible, you'll tell me).
The initial code looks like this:
function init() {
var code = `onmessage = function(event) {
console.log('starting');
var buffer=event.data;
var arr = new Uint32Array(buffer);// I need to have this done before accessing the buffer again from the main
//some other code, manipulating the array
}`
var buffer = new SharedArrayBuffer(BUFFER_ELEMENT_SIZE);
var blob = new Blob([code], { "type": 'application/javascript' });
var url = window.URL || window.webkitURL;
var blobUrl = url.createObjectURL(blob);
var counter = new Worker(blobUrl);
counter.postMessage(buffer);
let res = new Uint32Array(buffer);
return res;
}
function test (){
let array = init();
console.log('main');
//accessing the SAB again
};
The worker code is always executed after test(), the console shows always main, then starting.
Using timeouts does not help. Consider the following code for test:
function test (){
let array = [];
console.log('main');
setTimeout(function(){
array = initSAB();
},0);
setTimeout(function(){
console.log('main');
//accessing the SAB again
},0);
console.log('end');
};
The console shows end first, followed by main, followed by starting.
However, assigning the buffer to a global array outside the test() function does the job, even without timeouts.
My questions are the following:
why does the worker does not start directly after the message was send (= received?). AFAIK, workers have their own event queue, so they should not rely on the main stack becoming empty?
Is there a specification detailing when a worker starts working after sending a message?
Is there a way to make sure the worker has started before accessing the SAB again without using global variables? (One could use busy waiting, but I beware...) There is probably no way, but I want to be sure.
Edit
To be more precise:
In a completly parallel running scenario, the Worker would be able to
handle the message immediately after it was posted. This is obviously
not the case.
Most Browser API (and Worker is such an API) use a callback queue to handle calls to the API. But if this applied, the message would be
posted/handled before the timeout calbacks were executed.
To go even further: If I try busy waiting after postMessage by reading from the SAB until it changes one value will block the
program infinitely. For me, it means that the Browser does
not posts the message until the call stack is empty As far as
I know, this behaviour is not documentated and I cannot explain it.
To summerize: I want to know how the browser determines when to post the message and to handle it by the worker, if the call of postMessage is inside a function. I already found a workaround (global variables), so I'm more interested in how it works behind the scenes. But if someone can show me a working example, I'll take it.
EDIT 2:
The code using the global variable (the code that works fine) looks like this
function init() {
//Unchanged
}
var array = init(); //global
function test (){
console.log('main');
//accessing the SAB again
};
It prints starting, then main to the console.
What is also worth noticing : If I debug the code with the Firefox Browser (Chrome not tested) I get the result I want without the global variable (starting before main) Can someone explain?
why does the worker does not start directly after the message was sen[t] (= received?). AFAIK, workers have their own event queue, so they should not rely on the main stack becoming empty?
First, even though your Worker object is available in main thread synchronously, in the actual worker thread there are a lot of things to do before being able to handle your message:
it has to perform a network request to retrieve the script content. Even with a blobURI, it's an async operation.
it has to initialize the whole js context, so even if the network request was lightning fast, this would add up on parallel execution time.
it has to wait the event loop frame following the main script execution to handle your message. Even if the initialization was lightning fast, it will anyway wait some time.
So in normal circumstances, there is very little chances that your Worker could execute your code at the time you require the data.
Now you talked about blocking the main thread.
If I try busy waiting after postMessage by reading from the SAB until it changes one value will block the program infinitely
During the initialization of your Worker, the message are temporarily being kept on the main thread, in what is called the outside port. It's only after the fetching of the script is done that this outside port is entangled with the inside port, and that the messages actually pass to that parallel thread.
So if you do block the main thread before the ports have been entangled it won't be able to pass it to the worker's thread.
Is there a specification detailing when a worker starts working after sending a message?
Sure, and more specifically, the port message queue is enabled at the step 26, and the Event loop is actually started at the step 29.
Is there a way to make sure the worker has started before accessing the SAB again without using global variables? [...]
Sure, make your Worker post a message to the main thread when it did.
// some precautions because all browsers still haven't reenabled SharedArrayBuffers
const has_shared_array_buffer = window.SharedArrayBuffer;
function init() {
// since our worker will do only a single operation
// we can Promisify it
// if we were to use it for more than a single task,
// we could promisify each task by using a MessagePort
return new Promise((resolve, reject) => {
const code = `
onmessage = function(event) {
console.log('hi');
var buffer= event.data;
var arr = new Uint32Array(buffer);
arr.fill(255);
if(self.SharedArrayBuffer) {
postMessage("done");
}
else {
postMessage(buffer, [buffer]);
}
}`
let buffer = has_shared_array_buffer ? new SharedArrayBuffer(16) : new ArrayBuffer(16);
const blob = new Blob([code], { "type": 'application/javascript' });
const blobUrl = URL.createObjectURL(blob);
const counter = new Worker(blobUrl);
counter.onmessage = e => {
if(!has_shared_array_buffer) {
buffer = e.data;
}
const res = new Uint32Array(buffer);
resolve(res);
};
counter.onerror = reject;
if(has_shared_array_buffer) {
counter.postMessage(buffer);
}
else {
counter.postMessage(buffer, [buffer]);
}
});
};
async function test (){
let array = await init();
//accessing the SAB again
console.log(array);
};
test().catch(console.error);
According to MDN:
Data passed between the main page and workers is copied, not shared. Objects are serialized as they're handed to the worker, and subsequently, de-serialized on the other end. The page and worker do not share the same instance, so the end result is that a duplicate is created on each end. Most browsers implement this feature as structured cloning.
Read more about transferring data to and from workers
Here's a basic code that shares a buffer with a worker. It creates an array with even values (i*2) and it sends it to the worker. It uses Atomic operations to change the buffer values.
To make sure the worker has started you can just use different messages.
var code = document.querySelector('[type="javascript/worker"]').textContent;
var blob = new Blob([code], { "type": 'application/javascript' });
var blobUrl = URL.createObjectURL(blob);
var counter = new Worker(blobUrl);
var sab;
var initBuffer = function (msg) {
sab = new SharedArrayBuffer(16);
counter.postMessage({
init: true,
msg: msg,
buffer: sab
});
};
var editArray = function () {
var res = new Int32Array(sab);
for (let i = 0; i < 4; i++) {
Atomics.store(res, i, i*2);
}
console.log('Array edited', res);
};
initBuffer('Init buffer and start worker');
counter.onmessage = function(event) {
console.log(event.data.msg);
if (event.data.edit) {
editArray();
// share new buffer with worker
counter.postMessage({buffer: sab});
// end worker
counter.postMessage({end: true});
}
};
<script type="javascript/worker">
var sab;
self['onmessage'] = function(event) {
if (event.data.init) {
postMessage({msg: event.data.msg, edit: true});
}
if (event.data.buffer) {
sab = event.data.buffer;
var sharedArray = new Int32Array(sab);
postMessage({msg: 'Shared Array: '+sharedArray});
}
if (event.data.end) {
postMessage({msg: 'Time to rest'});
}
};
</script>
In my code (a monitoring application) I need to periodically call the server with an XMLHttpRequest object in the form of chained calls. Each call takes exactly 15 seconds, which is timed by the server as it delivers several partial results within that period (HTTP 100 Continue). Immediately after finishing the current call, the onreadystatechange event handler of the current XMLHttpRequest object needs to create and launch the next request (with a new instance), so the communication with the server remains almost seamless.
The way it works, each call retains the object context of the caller in the stack, so as this is a page that must remain open for days, the stack keeps growing with no chance for the garbage collector to claim the data. See the following stack trace:
I cannot use timers (setInterval or such) to launch the next request. It should be launched from inside the ending of the previous one. The data from server must arrive as quickly as possible, and unfortunately browsers nowadays throtle timers when a page is not in focus. As I said, this is a monitoring application meant to be always on in the users' secondary monitors (rarely in focus). I also need to deal with HTTP timeouts and other kinds of errors that derail from the 15 second sequence. There should always be one and only one channel open with the server.
My question is whether is any way to avoid keeping the whole context in the stack when creating an XMLHttpRequest object. Even calling the click() method on a DOM object will keep the stack/context alive. Even promises seem to keep the context.
I'm also unable to use websockets, as the server does not support them.
UPDATE:
It's more complex, buy in essence it's like:
var xhttpObjUrl;
var xhttpObj;
onLoad() {
loadXMLDoc(pollURL + "first=1", true);
}
function loadXMLDoc(url, longtout) {
xhttpObjUrl = url;
xhttpObj = new XMLHttpRequest();
xhttpObj.open(method, url, true);
xhttpObj.onprogress = progress;
xhttpObj.onloadend = progress;
xhttpObj.ontimeout = progress;
if (commlog) consolelog("loadXMLDoc(): url == " + dname);
xhttpObj.send("");
}
function progress() {
if (!xhttpObj) return;
var state = xhttpObj.readyState;
var status;
var statusText;
if (state == 4 /* complete */ || state == 3 /* partial content */) {
try {
status = xhttpObj.status;
statusText = xhttpObj.statusText;
if (status == 200) parseServerData();
} catch (err) {
status = 500;
statusText = err;
}
if (state == 4 || status != 200) {
/* SERVER TERMINATES THE CONNECTION AFTER 15 SECONDS */
/* ERROR HANDLING REMOVED */
var obj = xhttpObj;
xhttpObj = undefined;
abortRequest(obj);
obj = false;
RequestEnd();
}
}
}
function RequestEnd(error) {
var now = (new Date).getTime();
var msdiff = now - lastreqstart;
var code = function () { loadXMLDoc(pollURL + 'lastpoint=' + evtprev.toString() + '&lastevent=' + evtcurrent.toString()); return false; };
if (msdiff < 1000) addTimedCheck(1, code); /** IGNORE THIS **/
else code();
}
I've solved my problem using a web worker. The worker would end the XMLHttpRequest each time and send the page a message with the collected data. Then, when the page finishes processing the data, it would send the worker a message to start a new request. Thus my page wouldn't have any unwanted delays between requests, and there's no stack constantly building up. On error I'd terminate the worker and create a new one, just in case.
I"m learning about HTML5 workers from here and the author uses self.onmessage and self.postmessage to communicate between the main thread and the worker "because the worker cannot access the DOM." But in the below it seems like self is referring to both the main thread and the worker.
function CalculatePi(loop)
{
var c = parseInt(loop);
var f = parseFloat(loop);
var n=1;
//these errors will need more work…
if (isNaN(c) || f != c ) {
throw("errInvalidNumber");
} else if (c<=0) {
throw("errNegativeNumber");
}
for (var i=0,Pi=0;i<=c;i++) {
Pi=Pi+(4/n)-(4/(n+2));
n=n+4;
}
self.postMessage({'PiValue': Pi});
}
//wait for the start 'CalculatePi' message
//e is the event and e.data contains the JSON object
self.onmessage = function(e) {
CalculatePi(e.data.value);
}
The above code is from a separate js file containing the worker, and I understand that the self in self.onmessage is referring to the worker receiving a message from the main thread to start calculating, but why would it use self.postMessage to post a message back to itself? Is the default receipt(s) of #postMessage and #onmessage include both the main thread and worker?
Later on, the author posts the calucation of pi through this function:
worker.onmessage = function(e) {
document.getElementById("PiValue").innerHTML = e.data.PiValue;
};
How does this work when the worker isn't suppose to have access to the DOM? It clearly is using document.getElementById here.
in your file worker.js think of the self.postMessage as the order/instruction that the worker (self) should post a message. Since it is only able to communicate with the mainJS which created it, this message goes there. :)
Also in your mainJS worker.onmessage should be understood as the event "a message comes from the worker".
So basically you have both options in both your scripts:
in mainJS: worker.postMessage("message"); to send a message to the worker - and worker.onmessage = function(event){...} to listen to messages from the worker
in worker script: (self or) this.onmessage = function(event){...} to listen to messages from the mainJS - and self.postMessage("message"); to send something back to mainJS
I'm working with HTML5 socket functions to establish a socket connection to my server. HTML5 has functions below to handle disconnecting
Socket.onclose = function()
{
...
}
Socket.onerror = function()
{
...
}
My problem is, how try for reconnect after onclose function executes? I tried to put a while loop inside of it like
ws.onclose = function()
{
While(conn==0)
{
ws = new WebSocket("ws://example.com");
}
}
and
ws.onopen = function()
{
conn=1;
...
}
But did't work.
Any idea?
Here's the script that comes with the Plezi websocket framework... It's fairly basic, but it works on the browsers I used it on (Safari, Chrome and FireFox).
The trick is to leverage the onclose method WITHOUT a loop.
The onclose method will be called even if the websocket never opened and the connection couldn't be established (without calling onopen).
Initiating a reconnect within an onclose is enough.
Writing a loop or a conditional review will not only fail, but will halt all the scripts on the page. Allow me to explain:
Javascript is single threaded. Again: it's an even/task based, single threaded, environment.
This means that your code acts like an atomic unit - nothing happens and nothing changes until your code finished running it's course.
Because connections could take a while to establish, the new WebSocket was designed (and rightfully so) as an asynchronous function.
This is how come you can define the onopen event callback AFTER the creation of the event.
The new websocket connection will be attempted only once the current task/event is finished...
...so a loop will get you stuck forever waiting for a task that can't be performed until your code stops running...
Back to the issue at hand, here's the code. If you have any ideas for improvements, please let me know:
// Your websocket URI should be an absolute path. The following sets the base URI.
// remember to update to the specific controller's path to your websocket URI.
var ws_controller_path = window.location.pathname; // change to '/controller/path'
var ws_uri = (window.location.protocol.match(/https/) ? 'wss' : 'ws') + '://' + window.document.location.host + ws_controller_path
// websocket variable.
var websocket = NaN
// count failed attempts
var websocket_fail_count = 0
// to limit failed reconnection attempts, set this to a number.
var websocket_fail_limit = NaN
// to offer more or less space between reconnection attempts, set this interval in miliseconds.
var websocket_reconnect_interval = 250
function init_websocket()
{
if(websocket && websocket.readyState == 1) return true; // console.log('no need to renew socket connection');
websocket = new WebSocket(ws_uri);
websocket.onopen = function(e) {
// reset the count.
websocket_fail_count = 0
// what do you want to do now?
};
websocket.onclose = function(e) {
// If the websocket repeatedly you probably want to reopen the websocket if it closes
if(!isNaN(websocket_fail_limit) && websocket_fail_count >= websocket_fail_limit) {
// What to do if we can't reconnect so many times?
return
};
// you probably want to reopen the websocket if it closes.
if(isNaN(websocket_fail_limit) || (websocket_fail_count <= websocket_fail_limit) ) {
// update the count
websocket_fail_count += 1;
// try to reconect
setTimeout( init_websocket, websocket_reconnect_interval);
};
};
websocket.onerror = function(e) {
// update the count.
websocket_fail_count += 1
// what do you want to do now?
};
websocket.onmessage = function(e) {
// what do you want to do now?
console.log(e.data);
// to use JSON, use:
// var msg = JSON.parse(e.data); // remember to use JSON also in your Plezi controller.
};
}
// setup the websocket connection once the page is done loading
window.addEventListener("load", init_websocket, false);
I'm writing a Firefox extension that creates a socket server which will output the active tab's URL when a client makes a connection to it. I have the following code in my javascript file:
var serverSocket;
function startServer()
{
var listener =
{
onSocketAccepted : function(socket, transport)
{
try {
var outputString = gBrowser.currentURI.spec + "\n";
var stream = transport.openOutputStream(0,0,0);
stream.write(outputString,outputString.length);
stream.close();
} catch(ex2){ dump("::"+ex2); }
},
onStopListening : function(socket, status){}
};
try {
serverSocket = Components.classes["#mozilla.org/network/server-socket;1"]
.createInstance(Components.interfaces.nsIServerSocket);
serverSocket.init(7055,true,-1);
serverSocket.asyncListen(listener);
} catch(ex){ dump(ex); }
document.getElementById("status").value = "Started";
}
function stopServer ()
{
if (serverSocket)
serverSocket.close();
}
window.addEventListener("load", function() { startServer(); }, false);
window.addEventListener("unload", function() { stopServer(); }, false);
As it is, it works for multiple tabs in a single window. If I open multiple windows, it ignores the additional windows. I think it is creating a server socket for each window, but since they are using the same port, the additional sockets fail to initialize. I need it to create a server socket when the browser launches and continue running when I close the windows (Mac OS X). As it is, when I close a window but Firefox remains running, the socket closes and I have to restart firefox to get it up an running. How do I go about that?
Firefox extension overlays bind to window objects. One way around this is to create an XPCOM component or find one that someone else already created to allow you to build functionality without binding it to the window objects.
Of course, section #2 below on Observer Notifications may be helpful as well.
Possible workaround: #1
Instead of calling "startServer()" each time a window is opened, you could have a flag called windowCount that you could increment each time you open a new window. If windowCount is greater than 0, don't call startServer().
As windows close, you could decrement the count. Once it hits 0, stop the server.
Here is information from the Mozilla forums on this problem:
http://forums.mozillazine.org/viewtopic.php?f=19&t=2030279
Possible workaround #2:
With that said, I've also found documentation for Observer Notifications, which may be helpful as there is a section on Application Startup and Shutdown:
https://developer.mozilla.org/en/Observer_Notifications
UPDATE:
Here are some resources on creating XPCOM components in JavaScript and in C++:
https://developer.mozilla.org/en/how_to_build_an_xpcom_component_in_javascript
http://www.codeproject.com/KB/miscctrl/XPCOM_Creation.aspx
https://developer.mozilla.org/en/creating_xpcom_components
You probably want to:
Move your code into a JavaScript component
Register your component as a profile-after-change observer
Whenever someone makes a connection to your socket, find the active window and return its URL.
Use something like
var wm = Components.classes["#mozilla.org/appshell/window-mediator;1"]
.getService(Components.interfaces.nsIWindowMediator);
var win = wm.getMostRecentWindow("navigator:browser");
var spec = win ? win.getBrowser().currentURI.spec : "";
var outputString = spec + "\n";
etc.