I am working on a node.js project. Is it possible to read exactly n bytes asynchronously from a stream?
Usually, if I want to read a stream asynchronously, I use events. The problem is that I need to process the rest of the stream asynchronously, too.
If I listen for the data event, I can use the rest of the stream later, but I cannot control how many bytes I want to read at once. I tried to use unshift to put the unused bytes back into the buffer but this does not seem to fire the data event when another listener is added later.
This question is similar, but the only answer is a synchronous solution.
Is there an option to limit the number of bytes being passed to the data event listeners? Is it possible to somehow push the bytes back into the stream and still make them accessible through events?
As long as you're listening to the readable event and not doing a blocking look calling stream.read(n);, then that solution is asynchronous. Something like the following (untested!) should get you what you want.
function streamChunk(stream, size, dataCallback, doneCallback) {
function getChunk() {
var data = stream.read(size);
if(data != null) {
dataCallback(data);
setImmediate(getChunk);
}
}
stream.on('readable', getChunk);
stream.on('end', doneCallback);
}
Related
I'm trying to overcome Call stack size exceeded error but with no luck,
Goal is to re-run the GET request as long as I get music in type field.
//tech: node.js + mongoose
//import components
const https = require('https');
const options = new URL('https://www.boredapi.com/api/activity');
//obtain data using GET
https.get(options, (response) => {
//console.log('statusCode:', response.statusCode);
//console.log('headers:', response.headers);
response.on('data', (data) => {
//process.stdout.write(data);
apiResult = JSON.parse(data);
apiResultType = apiResult.type;
returnDataOutside(data);
});
})
.on('error', (error) => {
console.error(error);
});
function returnDataOutside(data){
apiResultType;
if (apiResultType == 'music') {
console.log(apiResult);
} else {
returnDataOutside(data);
console.log(apiResult); //Maximum call stack size exceeded
};
};
Your function returnDataOutside() is calling itself recursively. If it doesn't gets an apiResultType of 'music' on the first time, then it just keeps calling itself deeper and deeper until the stack overflows with no chance of ever getting the music type because you're just calling it with the same data over and over.
It appears that you want to rerun the GET request when you don't have music type, but your code is not doing that - it's just calling your response function over and over. So, instead, you need to put the code that makes the GET request into a function and call that new function that actually makes a fresh GET request when the apiResultType isn't what you want.
In addition, you shouldn't code something like this that keeping going forever hammering some server. You should have either a maximum number of times you try or a timer back-off or both.
And, you can't just assume that response.on('data', ...) contains a perfectly formed piece of JSON. If the data is anything but very small, then the data may arrive in any arbitrary sized chunks. It make take multiple data events to get your entire payload. And, this may work on fast networks, but not on slow networks or through some proxies, but not others. Instead, you have to accumulate the data from the entire response (all the data events that occur) concatenated together and then process that final result on the end event.
While, you can code the plain https.get() to collect all the results for you (there's an example of that right in the doc here), it's a lot easier to just use a higher level library that brings support for a bunch of useful things.
My favorite library to use in this regard is got(), but there's a list of alternatives here and you can find the one you like. Not only do these libraries accumulate the entire request for you with you writing any extra code, but they are promise-based which makes the asynchronous coding easier and they also automatically check status code results for you, follow redirects, etc... - many things you would want an http request library to "just handle" for you.
I am putting some code here:
const { createReadStream, ReadStream } = require('fs');
var readStream = createReadStream('./data.txt');
readStream.on('data', chunk => {
console.log('---------------------------------');
console.log(chunk);
console.log('---------------------------------');
});
readStream.on('open', () => {
console.log('Stream opened...');
});
readStream.on('end', () => {
console.log('Stream Closed...');
});
So, stream is the movement of data from one place to another. In this case, from data.txt file to my eyes since i have to read it.
I've read in google something like this:
Typically, the movement of data is usually with the intention to
process it, or read it, and make decisions based on it. But there is a
minimum and a maximum amount of data a process could take over time.
So if the rate the data arrives is faster than the rate the process
consumes the data, the excess data need to wait somewhere for its turn
to be processed.
On the other hand, if the process is consuming the data faster than it
arrives, the few data that arrive earlier need to wait for a certain
amount of data to arrive before being sent out for processing.
My question is: which line of code is "consuming the data, processing the data" ? is it console.log(chunk) ? if I had a huge time consuming line of code instead of console.log(chunk), how would my code not grab more data from buffer and wait until my processing is done ? in the above code, it seems like, it would still come into readStream.on('data')'s callback..
My question is: which line of code is "consuming the data, processing the data"
The readStream.on('data', ...) event handler is the code that "consumes" or "processes" the data.
if I had a huge time consuming line of code instead of console.log(chunk), how would my code not grab more data from buffer and wait until my processing is done ?
If the time consuming code is synchronous (e.g. blocking), then no more data events can happen until after your synchronous code is done because only your event handler is running (in the single-threaded event loop driven architecture of node.js). No more data events will be generated until you return control back from your event handler callback function.
If the time consuming code is asynchronous (e.g. non-blocking and thus has returned control back to the event loop), then more data events certainly can happen even though a prior data event handler has not entirely finished it's asynchronous work yet. It is sometimes appropriate to call readStream.pause() while doing asynchronous work to tell the readStream not to generate any more data events until you are ready for them and you can then readStream.resume().
I have code on a web-worker and because i can't post to it an object with methods(functions) , i dont know how to stop blocking the UI with this code:
if (data != 'null') {
obj['backupData'] = obj.tbl.data().toArray();
obj['backupAllData'] = data[0];
}
obj.tbl.clear();
obj.tbl.rows.add(obj['backupAllData']);
var ext = config.extension.substring(1);
$.fn.dataTable.ext.buttons[ext + 'Html5'].action(e, dt, button, config);
obj.tbl.clear();
obj.tbl.rows.add(obj['backupData'])
This code exports records from an html table. Data is an array and is returned from a web worker and sometimes can have 50k or more objects.
As obj and all the methods that it contains are not transferable to we-worker, when data length 30k ,40k or 50k or even more, the UI blocks.
which is the best way to do this?
Thanks in advance.
you could try wrapping the heavy work in an async function like a timeout to allow the engine to queue the whole logic and elaborate it as soon as it has time
setTimeout(function(){
if (data != 'null') {
obj['backupData'] = obj.tbl.data().toArray();
obj['backupAllData'] = data[0];
}
//heavy stuff
}, 0)
or , if the code is extremely long, you can try figure it out a strategy to split your code into chunk of operation and execute each chunk in a separate async function (timeout)
Best way to iterate over an array without blocking the UI
Update:
Sadly, ImmutableJS doesn't work at the moment across webworkers. You should be able to transfer the ArrayBuffer so you don't need to parse it back into an array. Also read this article. If your workload is that heavy, it would be best to actually send back one item at a time from the worker.
Previously:
The code is converting all the data into an array, which is immediately costly. Try returning an immutable data structure from web worker if possible. This will guarantee that it doesn't change when the references change and you can continue iterating over it slowly in batches.
The next thing you can do is to use requestIdleCallback to schedule small batches of items to be processed.
This way you should be able to make the UI breathe a bit.
I've been trying to create a Duplex Stream that receives a number of objects, reorganize them and then pipe them to whatever stream is reading from my stream. The painpoint is that the reading part should only begin after all objects have been received (in other words, after the finish event).
How can I do that?
My current idea is that I'd have two different streams (instead of a Duplex), and that I'd simply make it clear on my API that the Readable stream shouldn't be used before the Writable stream tells you to do so - but that seems so wrong!
So, please, is there a better way to do that?
Thanks in advance.
Found the answer!
Seems like you can tell the consumer stream to wait by simply not calling this.push() until you're ready to respond.
All I had to do was to set a flag to let the duplex stream know the consumer is waiting for data, and then use the on('end') event on the source to check this flag and call this._read if the flag was true. Code is as follows:
constructor(source, options){
super(options);
this._isReady = false;
source.on('end' () => {
this._isReady = true;
if(this._isOnHold) this._read();
})
}
_read(){
if(!this._isReady) {
this._isOnHold = true;
}
// regular push sub-routine
}
Of course, this would all be inside a Duplex Stream subclass.
I have a request that generates an XML response. Due to certain present constraints the response can be quite slow to complete but does begin to return data quickly.
Is it possible to read the response stream in Ext JS before the response is complete? My understanding is that *Readers are only given the response text once it is finished.
There doesn't seem to be an Ext.Proxy for Comet techniques, and I think one would be pretty tough to write. First of all, you'll need to write a Sax parser to make any sense of an incomplete XML document. Then, you'll need to write a proxy that repeatedly executes Operations when new data comes through. ExtJS isn't really designed for this, but it'd be an interesting thing to try. I'm not sure how practical it would be, though; any implemented Ext.Proxy would behave very differently from the other Proxies; though it might have methods with the same names, it'd be a very different interface. You'd have to extend Ext.Model and Ext.Store to understand how to populate themselves with streamed data, writing new listeners for chunk events and new data contracts for the consumers of Stores and Models. I'm not sure it'd be worth your time!
However, if all you need is an event to be thrown when a stream chunk comes through, then that's possible in Gecko and WebKit browsers. You just need to attach a handler to the onreadystate event of the XHR, which will fire every time data is received.
Experimentally:
Ext.define('Ext.proxy.StreamEventedAjax',{
extend: 'Ext.proxy.Ajax',
doRequest: function(operation, callback, scope) {
if (Ext.isIE) return null;
// do other doRequest setup here, use this.buildRequest, etc
var me = this,
req = new XMLHttpRequest(),
responseLength = 0,
newText = "";
req.onreadystate = function(e) {
newText = req.responseText.substring(responseLength);
responseLength = req.responseText.length;
operation.fireEvent('datareceived',e,newText);
}
req.open(me.getMethod(request),request.url);
}
});