How do I access the data from a stream?

How do I access the data from a stream? - javascript

I'm working with this library: mTwitter
My problem is, when I want to use the streaming function:
twit.stream.raw(
'GET',
'https://stream.twitter.com/1.1/statuses/sample.json',
{delimited: 'length'},
process.stdout
);
I don't know how to access the JSON that generates process.stdout.

You could use a writable stream, from stream.Writable.
var stream = require('stream');
var fs = require('fs');
// This is where we will be "writing" the twitter stream to.
var writable = new stream.Writable();
// We listen for when the `pipe` method is called. I'm willing to bet that
// `twit.stream.raw` pipes to stream to a writable stream.
writable.on('pipe', function (src) {
// We listen for when data is being read.
src.on('data', function (data) {
// Everything should be in the `data` parameter.
});
// Wrap things up when the reader is done.
src.on('end', function () {
// Do stuff when the stream ends.
});
});
twit.stream.raw(
'GET',
'https://stream.twitter.com/1.1/statuses/sample.json',
{delimited: 'length'},
// Instead of `process.stdout`, you would pipe to `writable`.
writable
);

I'm not sure if you really understand what the word streaming means here. In node.js a stream is basically a file descriptor. The example uses process.stdout but a tcp socket is also a stream, an open file is also a stream and a pipe is also a stream.
So a streaming function is designed to pass the received data directly to a stream without needing you to manually copy the data from source to destination. Obviously this means you don't get access to the data. Think of streaming like pipes on unix shells. That piece of code is basically doing this:
twit_get | cat
In fact, in node, you can create virtual streams in pure js. So it is possible to get the data - you just have to implement a stream. Look at the node documentation of the stream API: http://nodejs.org/api/stream.html

Related

How to use GCP DLP with a file stream

I'm working with Node.js and GCP Data Loss Prevention to attempt to redact sensitive data from PDFs before I display them. GCP has great documentation on this here
Essentially you pull in the nodejs library and run this
const fileBytes = Buffer.from(fs.readFileSync(filepath)).toString('base64');
// Construct image redaction request
const request = {
parent: `projects/${projectId}/locations/global`,
byteItem: {
type: fileTypeConstant,
data: fileBytes,
},
inspectConfig: {
minLikelihood: minLikelihood,
infoTypes: infoTypes,
},
imageRedactionConfigs: imageRedactionConfigs,
};
// Run image redaction request
const [response] = await dlp.redactImage(request);
const image = response.redactedImage;
So normally, I'd get the file as a buffer, then pass it to the DLP function like the above. But, I'm no longer getting our files as buffers. Since many files are very large, we now get them from FilesStorage as streams, like so
return FilesStorage.getFileStream(metaFileInfo1, metaFileInfo2, metaFileInfo3, fileId)
.then(stream => {
return {fileInfo, stream};
})
The question is, is it possible to perform DLP image redaction on a stream instead of a buffer? If so, how?
I've found some other questions that say you can stream with ByteContentItem and GCPs own documentation mentions "streams". But, I've tried passing the returned stream from .getFileStream into the above byteItem['data'] property, and it doesn't work.

So chunking the stream up into buffers of appropriate size is going to work best here. There seem to be a number of approaches to build buffers from a stream you can use here.
Potentially relevant: Convert stream into buffer?
(A native stream interface is a good feature request, just not yet there.)

How can I get multiple files to upload to the server from a Javascript page without skipping?

I'm working on a research experiment which uses getUserMedia, implemented in recorder.js, to record .wav files from the user's microphone and XMLHttpRequest to upload them to the server. Each file is about 3 seconds long and there are 36 files in total. The files are recorded one after another and sent to the server as soon as they are recorded.
The problem I'm experiencing is that not all of the files end up on the server. Apparently the script or the php script are unable to catch up with all the requests in a row. How can I make sure that I get all the files? These are important research data, so I need every recording.
Here's the code that sends the files to the server. The audio data is a blob:
var filename = subjectID + item__number;
xhr.onload=function(e) {
if(this.readyState === 4) {
console.log("Server returned: ",e.target.responseText);
}
};
var fd=new FormData();
fd.append("audio_data",blob, filename);
xhr.open("POST","upload_wav.php",true);
xhr.send(fd);
And this is the php file on the server side:
print_r($_FILES);
$input = $_FILES['audio_data']['tmp_name'];
$output = "audio/".$_FILES['audio_data']['name'].".wav";
move_uploaded_file($input, $output)
This way of doing things is basically copied from this website:
Using Recorder.js to capture WAV audio in HTML5 and upload it to your server or download locally
I have already tried making the XMLHttpRequest wait by using
while (xhr.readyState != 4)
{
console.log("Waiting for server...")
}
It just caused the page to hang.
Would it be better to use ajax than XMLHttp Request? Is there something I can do to make sure that all the files get uploaded? I'm pretty new to Javascript so code examples are appreciated.

I have no idea what your architecture looks like, but here is a potential solution that will work to solve your problem.
The solution uses the Web Worker API to off load the file uploading to a sub-process. This is done with the Worker Interface of that API. This approach will work because there is no contention of the single thread of the main process - web workers work in their own processes.
Using this approach, we do three basic things:
create a new worker passing a script to execute
pass messages to the worker for the worker to deal with
pass messages back to the main process for status updates/replies/resolved data transformation/etc.
The code is heavily commented below to help you understand what is happening and where.
This is the main JavaScript file (script.js)
// Create a sub process to handle the file uploads
///// STEP 1: create a worker and execute the worker.js file immediately
let worker = new Worker('worker.js');
// Ficticious upload count for demonstration
let uploadCount = 12;
// repeatedly build and send files every 700ms
// This is repeated until uplaodCount == 0
let builder = setInterval(buildDetails, 700);
// Recieve message from the sub-process and pipe them to the view
///// STEP 2: listen for messages from the worker and do something with them
worker.onmessage = e => {
let p = document.createElement('pre');
// e.data represents the message data sent from the sub-process
p.innerText = e.data;
document.body.appendChild(p);
};
/**
* Sort of a mock to build up your BLOB (fake here of-course)
*
* Post the data needed for the FormData() to the worker to handle.
*/
function buildDetails() {
let filename = 'subject1234';
let blob = new Blob(['1234']);
///// STEP 3: Send a message to the worker with file details
worker.postMessage({
name: "audio_data",
blob: blob,
filename: filename
});
// Decrease the count
uploadCount--;
// if count is zero (== false) stop the fake process
if (!uploadCount) clearInterval(builder);
}
This is the sub-process JavaScript file (worker.js)
// IGNORE the 'fetch_mock.js' import that is only here to avoid having to stand up a server
// FormDataPolyFill.js is needed in browsers that don't yet support FormData() in workers
importScripts('FormDataPolyFill.js', 'fetch_mock.js');
// RXJS provides a full suite of asynchronous capabilities based around Reactive Programming (nothing to do with ReactJS);
// The need for your use case is that there are guarantees that the stream of inputs will all be processed
importScripts('https://cdnjs.cloudflare.com/ajax/libs/rxjs/6.3.3/rxjs.umd.js');
// We create a "Subject" that acts as a vessel for our files to upload
let forms = new rxjs.Subject();
// This says "every time the forms Subject is updated, run the postfile function and send the next item from the stream"
forms.subscribe(postFile);
// Listen for messages from the main process and run doIt each time a message is recieved
onmessage = doIt;
/**
* Takes an event object containing the message
*
* The message is presumably the file details
*/
function doIt(e) {
var fd = new FormData();
// e.data represents our details object with three properties
fd.append(e.data.name, e.data.blob, e.data.filename);
// Now, place this FormData object into our stream of them so it can be processed
forms.next(fd);
}
// Instead of using XHR, this uses the newer fetch() API based upon Promises
// https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API
function postFile(fd) {
// Post the file to the server (This is blocked in fetch_mock.js and doesn't go anywhere)
fetch('fake', {
method: 'post',
body: fd,
})
.then((fd) => {
// After the XHR request is complete, 'Then' post a message back to the main thread (If there is a need);
postMessage("sent: " + JSON.stringify(fd));
});
}
Since this will not run in stackoverflow, I've created a plunker so that you can run this example:
http://plnkr.co/edit/kFY6gcYq627PZOATXOnk
If all this seems complicated, you've presented a complicated problem to solve. :-)
Hope this helps.

node.js Transfer and saving files using TCP Server

I have a lot of devices sending messages to a TCP Server written in node. The main task of the TCP server is to route some of that messages to redis in order to be processed by another app.
I've written a simple server that does the job quite well. The structure of the code is basically this (not the actual code, details hidden):
const net = require("net");
net.createServer(socket => {
socket.on("data", buffer => {
const data = buffer.toString();
if (shouldRouteMessage(data)) {
redis.publish(data);
}
});
});
Most of the messages are like: {"text":"message body"}, or {"lng":32.45,"lat":12.32}. But sometimes I need to process a message like {"audio":"...encoded audio..."} that spans several "data" events.
What I need in this case is to save the encoded audio into a file and send to redis {"audio":"path/to/audio-file.mp3"} where the route is the file with the audio data received.
One simple option is to store the buffers until I detect the end of the message and then save all them to a file, but that means, among other things, that I must keep the file on memory before saving to disk.
I hope there are better options using streams and pipes. ¿Any suggestions? (some code examples, would be nice)
Thanks

I finally solved, so I post the solution here for documentation purposes (and with some luck, to help others).
The solution was, indeed, quite simple: just open a write stream to a file and write the data packets as they are received. Something like this:
const net = require("net");
const fs = require("fs");
net.createServer(socket => {
socket.on("data", buffer => {
let file = null;
let filePath = null;
const data = buffer.toString();
if (shouldRouteMessage(data)) {
// just publish the message
redis.publish(data);
} else if (isAudioStart(data)) {
// create a write stream to a file and write the first data packet
filePath = buildFilePath(data);
file = fs.createWriteStream(filePath);
file.write(data);
} else if (isLastFragment(data)) {
// if is the last fragment, write it, close the file and publish the result
file.write(data);
file.close();
redis.publish(filePath);
file = filePath = null;
} else if (isDataFragment(data)) {
// just write (stream) it to file
file.write(data);
}
});
});
Note: shouldRouteMessage, buildFilePath, isDataFragment, and isLastFragment are custom functions that depends on the kind of data.
In this way, the incoming data is streamed to the file directly and no need to save the contents in memory before. node's streams rocks!
As always the devil is in the details. Some checks are necesary to, for example, ensure there's always a file when you want to write it. Remember also to set the proper encoding when converting to string (for example: buffer.toString('binary'); did the trick for me). Depending on your data format, the shouldRouteMessage, isAudioStart... and all this custom functions can be more or less complex.
Hope it helps.

Gulp.JS, one source to a dynamic number of destinations

I have one JavaScript source file I am processing that I want to end up in two or more destination folders. Piping to multiple destinations work if I chain the pipes, but not if I add it to a stream one at a time. This prevents me from making the number of destination folders dynamic.
For example doing the following works
var rebundle = function() {
var stream = bundler.bundle();
stream = stream.pipe(source("bundled.js"));
stream.pipe(gulp.dest(dests[0]))
.pipe(gulp.dest(dests[1]))
.pipe(gulp.dest(dests[2]));
return stream;
};
But the following works inconsistently. Sometimes one folder fails to output, other times it does but is missing some contents.
var rebundle = function() {
var stream = bundler.bundle();
stream = stream.pipe(source("bundled.js"));
dests.map(function(d) {
stream.pipe(gulp.dest(d));
});
return stream;
};
In short, is there a way to modify it to allow for a dynamic amount of destinations when starting from one source folder cleanly in one file?
Versions
gulp 3.9
browserify 9

Each invokation of stream.pipe() returns a new stream. You have to apply each following invokation of .pipe() to the previously returned stream.
You're doing it right when you do stream = stream.pipe(source("bundled.js")), but then in your dest.map() callback you're just adding one pipe after another to the same stream. That means you're creating lots of new streams, but those never get returned from you task, so gulp doesn't wait for them to finish.
You have to store the returned stream each time, so that it's used in the next iteration:
dests.map(function(d) {
stream = stream.pipe(gulp.dest(d));
});

When target stream emit error events, can we somehow reuse the source stream?

I am trying to achieve following error handling:
Say we have a readable stream.
We pipe it into a transform stream.
Somehow the transform stream emits an error.
We would like to recover the readable stream (and all of its data), and re-pipe it into another transform stream.
Step 4 appears to be difficult: I can listen to unpipe event on the target stream (transform stream) and retrieve a reference to the source stream (readable stream), but at least some chunks of its data have been lost.
Can we do this without creating a custom transform stream?
A real-world example is deflate content encoding, where in some cases, you need zlib.createInflateRaw() instead of zlib.createInflate(), but we can't decide which one would be the correct choice before looking at the response body buffers.

You do not need to introduce a stream in the middle just to read the first byte. For example:
(function readChunk() {
var chunk = res.read();
if (!chunk)
return res.once('readable', readChunk);
var inflate;
if ((chunk[0] & 0x0F) === 0x08)
inflate = zlib.createInflate();
else
inflate = zlib.createInflateRaw();
// Put chunk back into stream
res.unshift(chunk);
// Prepare the decompression
res.pipe(inflate);
output = new Response(inflate, response_options);
resolve(output);
})();
Also, var body = res.pipe(new stream.PassThrough()); is unnecessary, you can just use res where appropriate.

Develop Reference

JavaScript is the programming language of the Web.

How do I access the data from a stream? - javascript

I'm working with this library: mTwitter My problem is, when I want to use the streaming function: twit.stream.raw( 'GET', 'https://stream.twitter.com/1.1/statuses/sample.json', {delimited: 'length'}, process.stdout ); I don't know how to access the JSON that generates process.stdout.

Related

How to use GCP DLP with a file stream

How can I get multiple files to upload to the server from a Javascript page without skipping?

node.js Transfer and saving files using TCP Server

Gulp.JS, one source to a dynamic number of destinations

When target stream emit error events, can we somehow reuse the source stream?

Categories

Resources