Data not being transformed Node.js Transform streams

Data not being transformed Node.js Transform streams - javascript

I'm trying to make a transform stream flow that is taking data from socket.io, converting it to JSON, and then sending it to stdout. I am totally perplexed as to why data just seems to go right through without any transformation. I'm using the through2 library. Here is my code:
getStreamNames().then(streamNames => {
const socket = io(SOCKETIO_URL);
socket.on('connect', () => {
socket.emit('Subscribe', {subs: streamNames});
});
const stream = through2.obj(function (chunk, enc, callback) {
callback(null, parseString(chunk))
}).pipe(through2.obj(function (chunk, enc, callback) {
callback(null, JSON.stringify(chunk));
})).pipe(process.stdout);
socket.on('m', data => stream.write(data));
},
);
getStreamNames returns a promise which resolves to an array of stream names (i'm calling an external socket.io API) and parseString takes a string returned from the API and converts it to JSON so it's manageable.
What I'm looking for is my console to print out the stringify'd JSON after I parse it using parseString and then make it stdout-able with JSON.stringify. What is actually happening is the data is going right through the stream and doing no transformation.
For reference, the data coming from the API is in a weird format, something like
field1~field2~0x23~fieldn
and so that's why I need the parseString method.
I must be missing something. Any ideas?
EDIT:
parseString:
function(value) {
var valuesArray = value.split("~");
var valuesArrayLenght = valuesArray.length;
var mask = valuesArray[valuesArrayLenght - 1];
var maskInt = parseInt(mask, 16);
var unpackedCurrent = {};
var currentField = 0;
for (var property in this.FIELDS) {
if (this.FIELDS[property] === 0) {
unpackedCurrent[property] = valuesArray[currentField];
currentField++;
}
else if (maskInt & this.FIELDS[property]) {
if (property === 'LASTMARKET') {
unpackedCurrent[property] = valuesArray[currentField];
}
else {
unpackedCurrent[property] = parseFloat(valuesArray[currentField]);
}
currentField++;
}
}
return unpackedCurrent;
};
Thanks

The issue is that the stream you're writing, is actually process.stdout, because .pipe returns the last stream.Writable, so you can keep chaining, in your case, process.stdout.
const x = stream.pipe(stream2).pipe(stream3).pipe(process.stdout);
x === process.stdout // true
So all you were doing was: process.stdout.write(data) without going through the pipeline.
What you need to do, is assign your first through2 stream to the stream variable, and then .pipe on that stream.
const stream = through2.obj((chunk, enc, callback) => {
callback(null, parseString(chunk))
});
stream
.pipe(through2.obj((chunk, enc, callback) => {
callback(null, JSON.stringify(chunk));
}))
.pipe(process.stdout);
socket.on('m', data => stream.write(data));

Related

Separate readSream / writeStream vs DuplexStream differences?

Assuming we're creating an our own custom readable/writeable stream by inheriting from the Stream class. In the first scenario, we use the Readable and the Writable classes:
const Stream = require("stream");
const readableStream = new Stream.Readable();
readableStream._read = (size) => {
for (let i = 1; i < 10; i++) {
readableStream.push(`${i},`);
}
readableStream.push(null);
};
const writeableStream = new Stream.Writable();
writeableStream._write = (chunk, encoding, next) => {
console.log("writeableStream", chunk.toString());
next();
};
readableStream.pipe(writeableStream);
In the second scenario, we use the Duplex class (which inherits from both Readable/Writable):
const { Duplex } = require("stream");
const myDuplexStream = new Duplex();
myDuplexStream._read = (size) => {
for (let i = 1; i < 10; i++) {
myDuplexStream.push(`${i},`);
}
myDuplexStream.push(null);
};
myDuplexStream._write = (chunk, encoding, next) => {
if (Buffer.isBuffer(chunk)) {
chunk = chunk.toString();
}
console.log("write operation", chunk);
};
myDuplexStream.pipe(myDuplexStream); // that's I guess a no-go but we could just use a plain on("readable") event.
Would these 2 implementation be practically equal?

No, they’re not identical.
(A) The first example shows a scenario where integers are pushed to a writable buffer that pipe its data to a writable sink. I.e:
Integers are pushed to a readable buffer
The readable pipe its data to a writable buffer
The write stream will console.log the data
(B) While the second example shows a quirky scenario where data piped into the duplex is logged to the console. And, unrelated to any input, some integers are pushed to a readable buffer each time some other stream calls read() on the duplex. I.e:
(Some unknown source must write/pipe data to the duplex)
The duplex console.logs the streamed input
(no data is forwarded from the duplex writable to the duplex readable, instead a list of integers is pushed to the readable buffer when a subsequent writable calls read())
When a stream writable calls read() on the duplex, it will push the integers 1-9 to the readable buffer, for the writable to consume stream. Then it will close the readable side of the duplex (due to push(null)).
The writable of a duplex handles the incoming data, while the readable provides the the outgoing data. (B) seems to have mixed up the order of the readable/writable in the duplex class.
The OP's Duplex example should probably be:
const { Duplex } = require("stream");
const myDuplexStream = new Duplex();
myDuplexStream.chunkProcessingBuffer = []
myDuplexStream._write = (chunk, encoding, next) => {
console.log("this is my incoming chunk:", chunk.toString());
// Typically, this chunk would be processed in some way before pushing it to the readable interface
chunkProcessingBuffer.push(chunk);
next();
};
myDuplexStream._read = (size) => {
// The readable interface would typically push more than one chunk at the time for each call to read()
console.log("this is my outgoing chunk:", chunk.toString());
myDuplexStream.push(chunkProcessingBuffer.shift());
};
const readableStream = new Stream.Readable();
readableStream._read = (size) => {
for (let i = 1; i < 10; i++) {
readableStream.push(`${i},`);
}
readableStream.push(null);
};
const writeableStream = new Stream.Writable();
writeableStream._write = (chunk, encoding, next) => {
console.log("writeableStream", chunk.toString());
next();
};
readableStream.pipe(myDuplexStream).pipe(writableStream)

How to pipe Writable Buffer to a ReadStream?

How can I take a writable stream and return a readable stream from a buffer?
I have the following to write the data that comes from an ftp server to an array of chunks:
let chunks = []
let writable = new Writable
writable._write = (chunk, encoding, callback) => {
chunks.push(chunk)
callback()
}
I am then creating a new readstream:
let readable = new ReadStream()
I then tried to pipe the writable to the readable but that doesn't seem to work:
Argument of type 'ReadStream' is not assignable to parameter of type 'WritableStream'.
writable.pipe(readable)
Here is the entire method:
export class FTP {
readStream(filePath, options = {}) {
let conn = this.getConnection(this.name)
if (!conn) return Buffer.from('')
filePath = this.forceRoot(filePath)
let chunks = []
let writable = new Writable
writable._write = (chunk, encoding, callback) => {
chunks.push(chunk)
callback()
}
let readable = new ReadStream()
conn.client.download(writable, filePath, options.start || undefined)
writable.pipe(readable)
return readable
}
}
I then read from the stream and pipe the output to the response object created from http.createServer() like this:
let stream = store.readStream(file, { start, end })
.on('open', () => stream.pipe(res))
.on('close', () => res.end())
.on('error', err => res.end(err))

Yep, Node.js streams are hard to grasp. Logically, you don't need two streams here. If you want to read from your FTP class as from a stream, you just need to implement a single readable stream. Check this section of the docs out to have an idea how to implement a readable stream from scratch:
class SourceWrapper extends Readable {
constructor(options) {
super(options);
this._source = getLowLevelSourceObject();
// Every time there's data, push it into the internal buffer.
this._source.ondata = (chunk) => {
// If push() returns false, then stop reading from source.
if (!this.push(chunk))
this._source.readStop();
};
// When the source ends, push the EOF-signaling `null` chunk.
this._source.onend = () => {
this.push(null);
};
}
// _read() will be called when the stream wants to pull more data in.
// The advisory size argument is ignored in this case.
_read(size) {
this._source.readStart();
}
}
However, from your example, I can conclude, that conn.client.download() expects a writable stream as an input parameter. In such case you just can take a standard PassThrough stream which is a duplex (i.e. writable on the left and readable on the right side) stream with no transformation applied:
const { PassThrough } = require('stream');
export class FTP {
readStream(filePath, options = {}) {
let conn = this.getConnection(this.name);
if (!conn) return Buffer.from('');
filePath = this.forceRoot(filePath);
const pt = new PassThrough();
conn.client.download(pt, filePath, options.start);
return pt;
}
}
You can find more information on Node.js streams here and here.
UPD: Usage example:
// assume res is an [express or similar] response object.
const s = store.readStream(file, { start, end });
s.pipe(res);

Pipe works the other way round as you're thinking. According to Node.js's documentation, pipe() is a method of Readable, and it accepts a Writable as its destination. What you were trying to do was pipe a Writable to a Readable, but actually it's a Readable that can be piped to a Writeable, not the other way round.
Try passing a PassThrough to download() and return that same PassThrough?

How can track write progress when piping with Node.js?

I am trying to track the progress of a pipe from a read stream to write stream so I can display the progress to the user.
My original idea was to track progress when the data event is emitted as shown here:
const fs = require('fs');
let final = fs.createWriteStream('output');
fs.createReadStream('file')
.on('close', () => {
console.log('done');
})
.on('error', (err) => {
console.error(err);
})
.on('data', (data) => {
console.log("data");
/* Calculate progress */
})
.pipe(final);
However I realized just cause it was read, doesn't mean it was actually written. This can be seen if the pipe is removed, as the data event still emits.
How can track write progress when piping with Node.js?

You can use a dummy Transform stream like this:
const stream = require('stream');
let totalBytes = 0;
stream.pipeline(
fs.createReadStream(from_file),
new stream.Transform({
transform(chunk, encoding, callback) {
totalBytes += chunk.length;
console.log(totalBytes);
this.push(chunk);
callback();
}
}),
fs.createWriteStream(to_file),
err => {
if (err)
...
}
);

You can do the piping manually, and make use of the callback from writable.write()
callback: < function > Callback for when this chunk of data is flushed
const fs = require('fs');
let from_file = `<from_file>`;
let to_file = '<to_file>';
let from_stream = fs.createReadStream(from_file);
let to_stream = fs.createWriteStream(to_file);
// get total size of the file
let { size } = fs.statSync(from_file);
let written = 0;
from_stream.on('data', data => {
// do the piping manually here.
to_stream.write(data, () => {
written += data.length;
console.log(`written ${written} of ${size} bytes (${(written/size*100).toFixed(2)}%)`);
});
});

Somehow I remember this thread being about memory efficiency, anyway, I've rigged up a small script that's very memory efficient and tracks progress very well. I tested it under a 230MB file and the result speaks for itself. https://gist.github.com/J-Cake/78ce059972595823243526e022e327e4
The sample file I used was a bit weird as the content-length header it reported was in fact off but the program uses no more than 4.5 MiB of memory.

How to do sequencial HTTP calls?

I have a couple of APIs I need to call to collect and merge information.
I make the first API call and, based on the result, I make several calls to the second one (in a loop).
Since http requests are asynchronous I'm loosing the information. By the time the second step is finished the server (nodejs) already sent the response back to the client.
I've already tried to, somehow, use the callback functions. This managed to keep the response to the client waiting but the information of the second call was still lost. I guess somehow the variables are not being synchronized.
I also did a quick test with away/async but my Javascript mojo was not enough to make it run without errors.
/* pseudo code */
function getData(var1, callback){
url= "http://test.server/bla?param="+var1;
request.get(url, function (error, response, body){
var results = [];
for(var item of JSON.parse(body).entity.resultArray) {
var o = {};
o['data1'] = item.data1;
o['data2'] = item.data2;
o['data3'] = item.data3;
getSecondStep(o, function(secondStepData){
//console.log("Callback object");
//console.log(o);
o['secondStepData'] = secondStepData;
});
results.push(o);
}
callback(results);
});
}
function getSecondStep(object, callback){
url = "http://othertest.server/foobar?param=" + object.data1;
request.get(url, function (error, response, body){
var results = [];
if(response.statusCode == 200){
for(var item of JSON.parse(body).object.array) {
var o = {}
o['data4'] = item.data4;
o['data5'] = item.data5;
results.push(o);
}
callback(results);
}
});
}
What I would like is to be able to collect all the information into one JSON object to return it back to the client.
The client will then be responsible for rendering it in a nice way.

I recommend using the async / await pattern with the request-promise-native library.
This makes API calls really easy to make and the code is cleaner when using this pattern.
In the example below I'm just calling a httpbin API to generate a UUID but the principle applies for any API.
const rp = require('request-promise-native');
async function callAPIs() {
let firstAPIResponse = await rp("https://httpbin.org/uuid", { json: true });
console.log("First API response: ", firstAPIResponse);
// Call several times, we can switch on the first API response if we like.
const callCount = 3;
let promiseList = [...Array(callCount).keys()].map(() => rp("https://httpbin.org/uuid", { json: true }));
let secondAPIResponses = await Promise.all(promiseList);
return { firstAPIResponse: firstAPIResponse, secondAPIResponses: secondAPIResponses };
}
async function testAPIs() {
let combinedResponse = await callAPIs();
console.log("Combined response: " , combinedResponse);
}
testAPIs();
In this simple example we get a combined response like so:
{
{
firstAPIResponse: { uuid: '640858f8-2e69-4c2b-8f2e-da8c68795f21' },
secondAPIResponses: [
{ uuid: '202f9618-f646-49a2-8d30-4fe153e3c78a' },
{ uuid: '381b57db-2b7f-424a-9899-7e2f543867a8' },
{ uuid: '50facc6e-1d7c-41c6-aa0e-095915ae3070' }
]
}
}

I suggest you go over to a library that supports promises (eg: https://github.com/request/request-promise) as the code becomes much easier to deal with than the callback method.
Your code would look something like:
function getData(var1){
var url = "http://test.server/bla?param="+var1;
return request.get(url).then(result1 => {
var arr = JSON.parse(body).entity.resultArray;
return Promise.all( arr.map(x => request.get("http://othertest.server/foobar?param=" + result1.data1)))
.then(result2 => {
return {
data1: result1.data1,
data2: result1.data2,
data3: result1.data3,
secondStepData: result2.map(x => ({data4:x.data4, data5:x.data5}))
}
})
});
}
And usage would be
getData("SomeVar1").then(result => ... );

The problem is that you are calling the callback while you still have async calls going on. Several approaches are possible, such us using async/await, or reverting to Promises (which I would probably do in your case).
Or you can, well, call the callback only when you have all the information available. Pseudo code follows:
function getData(var1, callback){
url= "http://test.server/bla?param="+var1;
request.get(url, function (error, response, body){
var results = [];
var items = JSON.parse(body).entity.resultArray;
var done = 0, max = items.length;
for(var item of items) {
var o = {};
o['data1'] = item.data1;
o['data2'] = item.data2;
o['data3'] = item.data3;
getSecondStep(o, function(secondStepData){
//console.log("Callback object");
//console.log(o);
o['secondStepData'] = secondStepData;
results.push(o);
done += 1;
if(done === max) callback(results);
});
}
});
}
(note that since this is pseudo code, I am not checking for errors or handling a possible empty result from request.get(...))

You need to call the callback of first function only when all the second callback functions have been called. Try this changes:
function getData(var1, callback) {
url = "http://test.server/bla?param=" + var1;
request.get(url, function (error, response, body) {
var results = [],count=0;
var arr = JSON.parse(body).entity.resultArray;
for (let [index, value] of arr.entries()) {
var o = {};
o['data1'] = item.data1;
o['data2'] = item.data2;
o['data3'] = item.data3;
getSecondStep(o, function (secondStepData) {
//console.log("Callback object");
//console.log(o);
o['secondStepData'] = secondStepData;
results[index] = o;
count++;
if (count === arr.length) {
callback(results);
}
});
}
});
}

Node.js Piping the same readable stream into multiple (writable) targets

I need to run two commands in series that need to read data from the same stream.
After piping a stream into another the buffer is emptied so i can't read data from that stream again so this doesn't work:
var spawn = require('child_process').spawn;
var fs = require('fs');
var request = require('request');
var inputStream = request('http://placehold.it/640x360');
var identify = spawn('identify',['-']);
inputStream.pipe(identify.stdin);
var chunks = [];
identify.stdout.on('data',function(chunk) {
chunks.push(chunk);
});
identify.stdout.on('end',function() {
var size = getSize(Buffer.concat(chunks)); //width
var convert = spawn('convert',['-','-scale',size * 0.5,'png:-']);
inputStream.pipe(convert.stdin);
convert.stdout.pipe(fs.createWriteStream('half.png'));
});
function getSize(buffer){
return parseInt(buffer.toString().split(' ')[2].split('x')[0]);
}
Request complains about this
Error: You cannot pipe after data has been emitted from the response.
and changing the inputStream to fs.createWriteStream yields the same issue of course.
I don't want to write into a file but reuse in some way the stream that request produces (or any other for that matter).
Is there a way to reuse a readable stream once it finishes piping?
What would be the best way to accomplish something like the above example?

You have to create duplicate of the stream by piping it to two streams. You can create a simple stream with a PassThrough stream, it simply passes the input to the output.
const spawn = require('child_process').spawn;
const PassThrough = require('stream').PassThrough;
const a = spawn('echo', ['hi user']);
const b = new PassThrough();
const c = new PassThrough();
a.stdout.pipe(b);
a.stdout.pipe(c);
let count = 0;
b.on('data', function (chunk) {
count += chunk.length;
});
b.on('end', function () {
console.log(count);
c.pipe(process.stdout);
});
Output:
8
hi user

The first answer only works if streams take roughly the same amount of time to process data. If one takes significantly longer, the faster one will request new data, consequently overwriting the data still being used by the slower one (I had this problem after trying to solve it using a duplicate stream).
The following pattern worked very well for me. It uses a library based on Stream2 streams, Streamz, and Promises to synchronize async streams via a callback. Using the familiar example from the first answer:
spawn = require('child_process').spawn;
pass = require('stream').PassThrough;
streamz = require('streamz').PassThrough;
var Promise = require('bluebird');
a = spawn('echo', ['hi user']);
b = new pass;
c = new pass;
a.stdout.pipe(streamz(combineStreamOperations));
function combineStreamOperations(data, next){
Promise.join(b, c, function(b, c){ //perform n operations on the same data
next(); //request more
}
count = 0;
b.on('data', function(chunk) { count += chunk.length; });
b.on('end', function() { console.log(count); c.pipe(process.stdout); });

You can use this small npm package I created:
readable-stream-clone
With this you can reuse readable streams as many times as you need

For general problem, the following code works fine
var PassThrough = require('stream').PassThrough
a=PassThrough()
b1=PassThrough()
b2=PassThrough()
a.pipe(b1)
a.pipe(b2)
b1.on('data', function(data) {
console.log('b1:', data.toString())
})
b2.on('data', function(data) {
console.log('b2:', data.toString())
})
a.write('text')

I have a different solution to write to two streams simultaneously, naturally, the time to write will be the addition of the two times, but I use it to respond to a download request, where I want to keep a copy of the downloaded file on my server (actually I use a S3 backup, so I cache the most used files locally to avoid multiple file transfers)
/**
* A utility class made to write to a file while answering a file download request
*/
class TwoOutputStreams {
constructor(streamOne, streamTwo) {
this.streamOne = streamOne
this.streamTwo = streamTwo
}
setHeader(header, value) {
if (this.streamOne.setHeader)
this.streamOne.setHeader(header, value)
if (this.streamTwo.setHeader)
this.streamTwo.setHeader(header, value)
}
write(chunk) {
this.streamOne.write(chunk)
this.streamTwo.write(chunk)
}
end() {
this.streamOne.end()
this.streamTwo.end()
}
}
You can then use this as a regular OutputStream
const twoStreamsOut = new TwoOutputStreams(fileOut, responseStream)
and pass it to to your method as if it was a response or a fileOutputStream

If you have async operations on the PassThrough streams, the answers posted here won't work.
A solution that works for async operations includes buffering the stream content and then creating streams from the buffered result.
To buffer the result you can use concat-stream
const Promise = require('bluebird');
const concat = require('concat-stream');
const getBuffer = function(stream){
return new Promise(function(resolve, reject){
var gotBuffer = function(buffer){
resolve(buffer);
}
var concatStream = concat(gotBuffer);
stream.on('error', reject);
stream.pipe(concatStream);
});
}
To create streams from the buffer you can use:
const { Readable } = require('stream');
const getBufferStream = function(buffer){
const stream = new Readable();
stream.push(buffer);
stream.push(null);
return Promise.resolve(stream);
}

What about piping into two or more streams not at the same time ?
For example :
var PassThrough = require('stream').PassThrough;
var mybiraryStream = stream.start(); //never ending audio stream
var file1 = fs.createWriteStream('file1.wav',{encoding:'binary'})
var file2 = fs.createWriteStream('file2.wav',{encoding:'binary'})
var mypass = PassThrough
mybinaryStream.pipe(mypass)
mypass.pipe(file1)
setTimeout(function(){
mypass.pipe(file2);
},2000)
The above code does not produce any errors but the file2 is empty

Develop Reference

JavaScript is the programming language of the Web.

Data not being transformed Node.js Transform streams - javascript

Related

Separate readSream / writeStream vs DuplexStream differences?

How to pipe Writable Buffer to a ReadStream?

How can track write progress when piping with Node.js?

How to do sequencial HTTP calls?

Node.js Piping the same readable stream into multiple (writable) targets

Categories

Resources