I'm working with serviceworker quite a bit lately and often run into a situation where I would for like to process some raw data fetched before storing it into the serviceworker cache - example use case would be to process a large raw text file to remove unnecessary white space. This way the cached response to my http request would already be "optimized".
I was thinking that why not do this in webworker, but alas, after much searching I have not found any idea of how a webworker could be made accessible inside the serviceworker. It's not like I can pass in the webworker context using postMessage.
Question:
How can I access Web Workers in a Service Worker?
It's currently not possible to access a web worker from within a service worker. This might change in the future, and the relevant standards issue is https://github.com/whatwg/html/issues/411
Note that it's possible to use the Cache Storage API from within a web worker that's spawned by a normal web page, so you could theoretically do what you suggest outside the context of a service worker.
This is a matter of personal preference rather than a strict guideline, but I don't like the pattern of modifying the data you get back from the network and then using the Cache Storage API to persist it in a synthetic Response object. I prefer using the Cache Storage API for keeping exact copies of what you get back from the network, so that things look the same to your controlled page regardless of whether the request is fulfilled from the network or from the cache.
A pattern that I've used before, and has the added benefit of using of web workers in the way you suggest, is to use IndexedDB in a similar manner. If the response is already in IndexedDB, then you just use it, and if it's not, then you kick off a web worker to handle the network request and processing, and then store the result in IndexedDB for future use.
Here's an example of some code to do this, making use of a lot of ES2015+ features, along with the promise-worker and idb-keyval libraries for the asynchronous code.
import PromiseWorker from 'promise-worker';
import idbKeyValue from 'idb-keyval';
export default async (url, Worker) => {
let value = await idbKeyValue.get(url);
if (!value) {
const promiseWorker = new PromiseWorker(new Worker());
value = await promiseWorker.postMessage(url);
// Don't await here, so that we can return right away.
idbKeyValue.set(url, value);
}
return value;
};
And then the worker could look something like this (which converts Markdown to HTML):
import 'whatwg-fetch';
import MarkdownIt from 'markdown-it';
import registerPromiseWorker from 'promise-worker/register';
const markdown = new MarkdownIt();
registerPromiseWorker(async url => {
const response = await fetch(url);
const text = await response.text();
return markdown.render(text);
});
This approach would start making less sense if you're dealing with large amounts of data, because there's an overhead in serialization, and lack of streaming support, compared to what would be possible with just using the Cache Storage API directly.
Related
I want to create an Electron app that will use webview to display 3rd party content.
I would like to be able to intercept all requests and responses from this webview. Sometimes I would like to manipulate this content, other times I would like to log it, and other times I’d like to do nothing.
As one example for the responses, maybe a web server will respond with TypeScript code, maybe I want to take that response, and compile it to standard JavaScript.
I have looked into this page but it looks like it is only possible to cancel requests, and manipulate the headers. The WebRequest API doesn't look to fit the needs of my use case since it only allows very minor manipulations of requests and responses.
I have also considered setting up some time of web server that can act as a proxy, but I have concerns about that. I want to maintain user privacy, and I want to ensure that to the web servers that host the 3rd party content it looks like the request is coming from a browser like environment (ex. Electron webview) instead of a server. I know I can manipulate requests with the headers I send and such, but this whole solution is getting to be a lot more complicated, then I would like, but might be the only option.
Any better ways to achieve this, and have more control over the Electron webview?
I think you should look into the Protocol API. It works as a proxy internally.
Say you want the user, when opening http://www.google.com, to see content like you've been conned!:
const { protocol } = require("electron");
const content = new Buffer("you've been conned!");
protocol.interceptBufferProtocol("http", (request, result) => {
if (request.url === "http://www.google.com")
return result(content);
... // fetch other http protocol content and return to the electron
});
There's lots of work to do, compared to the WebRequest API, but it's much simpler than an independent local proxy.
To get the request body of any http network call made by your electron app:
session.defaultSession.webRequest.onBeforeSendHeaders(filter, (details, callback) => {
if (details.uploadData) {
const buffer = Array.from(details.uploadData)[0].bytes;
console.log('Request body: ', buffer.toString());
}
callback(details);
})
I have built a portal which provides access to several features, including trouble ticket functionality.
The client has asked me to make trouble ticket functionality available offline. They want to be able to "check out" specific existing tickets while online, which are then accessible (view/edit) while the user's device is out-of-range of any internet connection. Also, they want the ability to create new tickets while offline. Then, when the connection is available, they will check in the changed/newly created tickets.
I have been tinkering with Service Workers and reviewing some good documentation on them, and I feel I have a basic understanding of how to cache the data.
However, since I only want to make the Ticketing portion of the portal available offline, I don't want the service worker caching or returning cached data when any other page of the portal is being accessed. All pages are in the same directory, so the service worker, once loaded, would by default intercept all requests from all pages in the portal.
How can I set up the service worker to only respond with cached data when the Tickets page is open?
Do I have to manually check the window.location value when fetch events occur? For example,
if (window.location == 'https://www.myurl.com/tickets')
{
// try to get the request from network. If successful, cache the result.
// If not successful, try returning the request from the cache.
}
else
{
// only try the network, and don't cache the result.
}
There are many supporting files that need to be loaded for the page (i.e. css files, js files, etc.) so it's not enough to simply check the request.url for the page name. Will 'window.location' be accessible in the service worker event, and is this a reasonable way to accomplish this?
Use service worker scoping
I know that you mentioned that you currently have all pages served from the same directory... but if you have any flexibility over your web app's URL structure at all, then the cleanest approach would be to serve your ticket functionality from URLs that begin with a unique path prefix (like /tickets/) and then host your service worker from /tickets/service-worker.js. The effort to reorganize your URLs may be worthwhile if it means being able to take advantage of the default service worker scoping and just not have to worry about pages outside of /tickets/ being controlled by a service worker.
Infer the referrer
There's information in this answer about determining what the referring window client URL is from within your service worker's fetch handler. You can combine that with an initial check in the fetch handler to see if it's a navigation request and use that to exit early.
const TICKETS = '/tickets';
self.addEventListener('fetch', event => {
const requestUrl = new URL(event.request.url);
if (event.request.mode === 'navigate' && requestUrl.pathname !== TICKETS) {
return;
}
const referrerUrl = ...; // See https://stackoverflow.com/questions/50045641
if (referrerUrl.pathname !== TICKETS) {
return;
}
// At this point, you know that it's either a navigation for /tickets,
// or a request for a subresource from /tickets.
});
Background
I'm new to service workers but working on a library that is intended to become "offline-first" (really, almost "offline-only") (FWIW, the intent is to allow consumers of the library to provide JSON config representing tabular multilinear texts and get in return an app which allows their users to browse these texts in a highly customizable manner by paragraph/verse ranges.)
Other projects are to install the library as a dependency and then supply information via our JavaScript API such as the path of a JSON config file indicating the files that our app will consume to produce an (offline) app for them.
While I know we could do any of the following:
require users provide a hard-coded path from which our service worker's install script could use waitUntil with its own JSON request to retrieve the user's necessary files
skip the service worker's install step of the service worker for the JSON file, and rely on fetch events to update the cache, providing a fallback display if the user completed the install and went offline before the fetches could occur.
Post some state info from our main script to a server which the service worker, once registered, would query before completing its install event.
...but all choices seems less than ideal because, respectively:
Our library's consumers may prefer to be able to designate their own location for their JSON config.
Given that the JSON config designates files critical to showing their users anything useful, I'd rather not allow an install to complete only to say that the user has to go back online to get the rest of the files if they were not able to remain online after the install event to see all the required fetches occur.
Besides wanting to avoid more trips to the server and extra code, I'd prefer for our code to be so offline-oriented as to be able to work entirely on mere static file servers.
Question:
Is there some way to pass a message or state information into a service worker before the install event occurs, whether as part of the query string of the service worker URL, or through a messaging event? The messaging event could even technically arrive after the install event begins as long as it can occur before a waitUntil within the install is complete.
I know I could test this myself, but I'd like to know what best practices might be anyways when the critical app files must themselves be dynamically obtained as in such libraries as ours.
I'm guessing indexedDB might be the sole alternative here (i.e., saving the config info or path of the JSON config to indexedDB, registering a service worker, and retrieving the indexedDB data from within the install event)? Even this would not be ideal as I'm letting users define a namespace for their storage, but I need a way for it too to be passed into the worker, or otherwise, multiple such apps on the origin could clash.
Using a Query Parameter
If you find it useful, then yes, you can provide state during service worker installation by including a query parameter to your service worker when you register it, like so:
// Inside your main page:
const pathToJson = '/path/to/file.json';
const swUrl = '/sw.js?pathToJson=' + encodeURIComponent(pathToJson);
navigator.serviceWorker.register(swUrl);
// Inside your sw.js:
self.addEventListener('install', event => {
const pathToJson = new URL(location).searchParams.get('pathToJson');
event.waitUntil(
fetch(pathToJson)
.then(response => response.json())
.then(jsonData => /* Do something with jsonData */)
);
});
A few things to note about this approach:
If you fetch() the JSON file in your install handler (as in the code sample), that will effectively happen once per version of your service worker script (sw.js). If the contents of the JSON file change, but everything else stays the same, the service worker won't automatically detect that and repopulate your caches.
Following from the first point, if you work around that by, e.g., including hash-based versioning in your JSON file's URL, each time you change that URL, you'll end up installing a new service worker. This isn't a bad thing, per se, but you need to keep it in mind if you have logic in your web app that listens for service worker lifecycle events.
Alternative Approaches
You also might find it easier to just add files to your caches from within the context of your main page, since browsers that support the Cache Storage API expose it via window.caches. Precaching the files within the install handler of a service worker does have the advantage of ensuring that all the files have been cached successfully before the service worker installs, though.
Another approach is to write the state information to IndexedDB from the window context, and then read from IndexedDB inside of your service worker's install handler.
Update 3:
And since it is not supposed to be safe to rely on globals within the worker, my messaging solution seems even less sound. I think it either has to be Jeff Posnick's solution (in some cases, importScripts may work).
Update 2:
Although not directly related to the topic of this thread relating to "install" event, as per a discussion starting at https://github.com/w3c/ServiceWorker/issues/659#issuecomment-384919053 , there are some issues, particularly with using this message-passing approach for the activate event. Namely, the activate event may never fail, and thus never be tried again, leaving one's application in an unstable state. (A failure of install will at least not apply the new service worker to old pages, whereas activate will keep fetches on hold until the event completes, which it may never do if it is left waiting for a message that was not received, and which anything but a new worker will fail to correct since new pages won't be able to load to send that message again.)
Update:
Although I got the client from within the install script in Chrome, I wasn't able to receive the message back with navigator.serviceWorker.onmessage for some reason.
However, I was able to fully confirm the following approach in its place:
In the service worker:
self.addEventListener('install', e => {
e.waitUntil(
new Promise((resolve, reject) => {
self.addEventListener('message', ({data: {
myData
}}) => {
// Do something with `myData` here
// then when ready, `resolve`
});
})
);
});
In the calling script:
navigator.serviceWorker.register('sw.js').then((r) => {
r.installing.postMessage({myData: 100});
});
#JeffPosnick 's is the best answer for the simple case I described in the OP, but I thought I'd present my discovering that one can get messages from and into a service worker script early (tested on Chrome) by such as the following:
In the service worker:
self.addEventListener('install', e => {
e.waitUntil(self.clients.matchAll({
includeUncontrolled: true,
type: 'window'
}).then((clients) => new Promise((resolve, reject) => {
if (clients && clients.length) {
const client = clients.pop();
client.postMessage('send msg to main script');
// One should presumably be able to poll to check for a
// variable set in the SW message listener below
// and then `resolve` when set
// Despite the unreliability of setting globals in SW's
// I believe this could be safe here as the `install`
// event is to run while the main script is still open.
}
})));
});
self.addEventListener('message', e => {
console.log('SW receiving main script msg', e.data);
e.ports[0].postMessage('sw response');
});
In the calling script:
navigator.serviceWorker.addEventListener('message', (e) => {
console.log('msg recd in main script', e.data);
e.source.postMessage('sending back to sw');
});
return navigator.serviceWorker.register(
'sw.js'
).then((r) => {
// navigator.serviceWorker.ready.then((r) => { // This had been necessary at some point in my testing (with r.active.postMessage), but not working for me atm...
// Sending a subsequent message
const messageChannel = new MessageChannel();
messageChannel.port1.onmessage = (e) => {
if (e.data.error) {
console.log('err', e.data.error);
} else {
console.log('data', e.data);
}
};
navigator.serviceWorker.controller.postMessage('sending to sw', [messageChannel.port2]);
// });
});
I have a React/Redux application that talks alot to an API and deals with a lot of rarely changing data from a DB. In order to reduce traffic and improve UE, I now want to create a caching mechanism that stores data on the client by automatically using the best technology that is available (descending from IndexedDB to LocalStorage etc.).
I created a cache object that does an initial check which determines the storage mechanism (which gets saved to an engine property, so the check just needs to run once). It also has some basic methods save(key, value) and load(key), which then call the appropriate functions for the initially determined mechanism.
The cache object and its methods do work, but I wonder how to create the cache in my main index.js only once when the application loads, and then use this very object in my actions without recreating another cache object every time?
BTW: It feels wrong to make the cache part of my application state as it does not really contain substantial data to run the application (if there is no caching available, it falls back to just calling the API).
Do I need to inject the cache into my actions somehow? Or do I need to create a global/static cache object in the main window object?
Thanks for clarification and thoughts on this issue.
redux-thunk middleware offers a custom argument injection feature you could use.
When creating the store
const cache = createCache()
const store = createStore(
reducer,
applyMiddleware(thunk.withExtraArgument(cache))
)
Then in your action creator
function getValue(id) {
return (dispatch, getState, cache) => {
// use cache
}
}
I want the following
During startup, the master process loads a large table from file and saves it into a shared variable. The table has 9 columns and 12 million rows, 432MB in size.
The worker processes run HTTP server, accepting real-time queries against the large table.
Here is my code, which obviously does not achieve my goal.
var my_shared_var;
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Load a large table from file and save it into my_shared_var,
// hoping the worker processes can access to this shared variable,
// so that the worker processes do not need to reload the table from file.
// The loading typically takes 15 seconds.
my_shared_var = load('path_to_my_large_table');
// Fork worker processes
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
} else {
// The following line of code actually outputs "undefined".
// It seems each process has its own copy of my_shared_var.
console.log(my_shared_var);
// Then perform query against my_shared_var.
// The query should be performed by worker processes,
// otherwise the master process will become bottleneck
var result = query(my_shared_var);
}
I have tried saving the large table into MongoDB so that each process can easily access to the data. But the table size is so huge that it takes MongoDB about 10 seconds to complete my query even with an index. This is too slow and not acceptable for my real-time application. I have also tried Redis, which holds data in memory. But Redis is a key-value store and my data is a table. I also wrote a C++ program to load the data into memory, and the query took less than 1 second, so I want to emulate this in node.js.
If I translate your question in a few words, you need to share data of MASTER entity with WORKER entity. It can be done very easily using events:
From Master to worker:
worker.send({json data}); // In Master part
process.on('message', yourCallbackFunc(jsonData)); // In Worker part
From Worker to Master:
process.send({json data}); // In Worker part
worker.on('message', yourCallbackFunc(jsonData)); // In Master part
I hope this way you can send and receive data bidirectionally. Please mark it as answer if you find it useful so that other users can also find the answer. Thanks
You are looking for shared memory, which node.js just does not support. You should look for alternatives, such as querying a database or using memcached.
In node.js fork works not like in C++. It's not copy current state of process, it's run new process. So, in this case variables isn't shared. Every line of code works for every process but master process have cluster.isMaster flag set to true. You need to load your data for every worker processes. Be careful if your data is really huge because every process will have its own copy. I think you need to query parts of data as soon as you need them or wait if you realy need it all in memory.
If read-only access is fine for your application, try out my own shared memory module. It uses mmap under the covers, so data is loaded as it's accessed and not all at once. The memory is shared among all processes on the machine. Using it is super easy:
const Shared = require('mmap-object')
const shared_object = new Shared.Open('table_file')
console.log(shared_object.property)
It gives you a regular object interface to a key-value store of strings or numbers. It's super fast in my applications.
There is also an experimental read-write version of the module available for testing.
You can use Redis.
Redis is an open source, BSD licensed, advanced key-value cache and store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogs.
redis.io
This way works to "share a variable"; it is a bit more fancy than the way #Shivam did present. However, the module internally uses the same API. Therefore "shared memory" is a bit misleading as in cluster each process is a fork of the parent process. At fork time, process memory is duplicated in OS memory. Therefore there is no real shared memory except low-level shared memory like shm device or virtual shared memory page (Windows). I did implement a native module for Node.js which does make use of native shared memory (which is real shared memory) as using this technique both process read directly from a OS shared memory section. However, this solution doesn't really apply here well because it is limited to scalar values. You could of course JSON.stringify and share the JSON serialized data string, but the time it consumes to parse/stringify is totally non-ideal for most use cases. (Especially for larger objects parsing/stringifying of JSON with standard library implementations becomes non-linear).
Thus, this solutions seems the most promising for now:
const cluster = require('cluster');
require('cluster-shared-memory');
if (cluster.isMaster) {
for (let i = 0; i < 2; i++) {
cluster.fork();
}
} else {
const sharedMemoryController = require('cluster-shared-memory');
// Note: it must be a serializable object
const obj = {
name: 'Tom',
age: 10,
};
// Set an object
await sharedMemoryController.set('myObj', obj);
// Get an object
const myObj = await sharedMemoryController.get('myObj');
// Mutually exclusive access
await sharedMemoryController.mutex('myObj', async () => {
const newObj = await sharedMemoryController.get('myObj');
newObj.age = newObj.age + 1;
await sharedMemoryController.set('myObj', newObj);
});
}
This question was posted in 2012, exactly 10 years ago. Since no other answer has mentioned it, Node.js now supports Worker Threads that support shared memory.
Directly from the docs:
Workers (threads) are useful for performing CPU-intensive JavaScript operations.
Unlike child_process or cluster, worker_threads can share memory. They do so by transferring ArrayBuffer instances or sharing SharedArrayBuffer instances.