How to process streaming HTTP GET data?

How to process streaming HTTP GET data? - javascript

Right now, I have a node.js server that's able to stream data on GET request, using the stream API. The GET request is Transfer-encoded set to 'chunked'. The data can be on the order of 10 to 30 MBs. (They are sometimes 3D models)
On the browser side, I wish to be able to process the data as I'm downloading it--I wish to be able to display the data on Canvas as I'm downloading it. So you can see the 3D model appear, face by face, as the data is coming in. I don't need duplex communication, and I don't need a persistent connection. But I do need to process the data as soon as it's downloaded, rather than waiting for the entire file to finish downloading. Then after the browser downloads the data, I can close the connection.
How do I do this?
JQuery ajax only calls back when all the data has been received.
I also looked at portal.js (which was jquery-streaming) and socket.io, but they seem to assume persistent reconnection.
So far, I was able to hack a solution using raw XMLHttpRequest, and making a callback when readyStead >= 2 && status == 200, and keeping track of place last read. However, that keeps all the data downloaded in the raw XMLHttpRequest, which I don't want.
There seems to be a better way to do this, but I'm not sure what it is. Any one have suggestions?

oboe.js is a library for streaming responses in the browser.
However, that keeps all the data downloaded in the raw XMLHttpRequest, which I don't want.
I suspect this may be the case with oboe.js as well and potentially a limitation of XMLHttpRequest itself. Not sure as I haven't directly worked on this type of use case. Curious to see what you find out with your efforts and other answers to this question.

So I found the answer, and it's Server-sent events. It basically enables one-way http-streams that the browser can handle a chunk at a time. It can be a little tricky because some existing stream libs are broken (they don't assume you have \n in your stream, and hence you get partial data), or have little documentation. But it's not hard to roll your own (once you figure it out).
You can define your sse_transform like this:
// file sse_stream.coffee
var Transform = require('stream').Transform;
var util = require('util');
util.inherits(SSEStream, Transform);
function SSEStream(option) {
Transform.call(this, option);
this.id = 0;
this.retry = (option && option.retry) || 0;
}
SSEStream.prototype._transform = function(chunk, encoding, cb) {
var data = chunk.toString();
if (data) {
this.push("id:" + this.id + "\n" +
data.split("\n").map(function (e) {
return "data:" + e
}).join("\n") + "\n\n");
//"retry: " + this.retry);
}
this.id++;
cb();
};
SSEStream.prototype._flush = function(next) {
this.push("event: end\n" + "data: end" + "\n\n");
next();
}
module.exports = SSEStream;
Then on the server side (I was using express), you can do something like this:
sse_stream = require('sse_stream')
app.get '/blob', (req, res, next) ->
sse = new sse_stream()
# It may differ here for you, but this is just a stream source.
blobStream = repo.git.streamcmd("cat-file", { p: true }, [blob.id])
if (req.headers["accept"] is "text/event-stream")
res.type('text/event-stream')
blobStream.on("end", () -> res.removeAllListeners()).stdout
.pipe(
sse.on("end", () -> res.end())
).pipe(res)
else
blobStream.stdout.pipe(res)
Then on the browser side, you can do:
source = new EventSource("/blob")
source.addEventListener('open', (event) ->
console.log "On open..."
, false)
source.addEventListener('message', (event) ->
processData(event.data)
, false)
source.addEventListener('end', (event) ->
console.log "On end"
source.close()
, false)
source.addEventListener('error', (event) ->
console.log "On Error"
if event.currentTarget.readyState == EventSource.CLOSED
console.log "Connection was closed"
source.close()
, false)
Notice that you need to listen for the event 'end', that is sent from the server in the transform stream's _flush() method. Otherwise, EventSource in the browser is just going to request the same file over and over again.
Note that you can use libraries on the server side to generate SSE. On the browser side, you can use portal.js to handle SSE. I just spelt things out, so you can see how things would work.

Related

How to efficiently stream a real-time chart from a local data file

complete noob picking up NodeJS over the last few days here, and I've gotten myself in big trouble, it looks like. I've currently got a working Node JS+Express server instance, running on a Raspberry Pi, acting as a web interface for a local data acquisition script ("the DAQ"). When executed, the script writes out data to a local file on the Pi, in .csv format, writing out in real-time every second.
My Node app is a simple web interface to start (on-click) the data acquisition script, as well as to plot previously acquired data logs, and visualize the actively being collected data in real time. Plotting of old logs was simple, and I wrote a JS function (using Plotly + d3) to read a local csv file via AJAX call, and plot it - using this script as a starting point, but using the logs served by express rather than an external file.
When I went to translate this into a real-time plot, I started out using the setInterval() method to update the graph periodically, based on other examples. After dealing with a few unwanted recursion issues, and adjusting the interval to a more reasonable setting, I eliminated the memory/traffic issues which were crashing the browser after a minute or two, and things are mostly stable.
However, I need help with one thing primarily:
Improving the efficiency of my first attempt approach: This acquisition script absolutely needs to be written to file every second, but considering that a typical run might last 1-2 weeks, the file size being requested on every Interval loop will quickly start to balloon. I'm completely new to Node/Express, so I'm sure there's a much better way of doing the real-time rendering aspect of this - that's the real issue here. Any pointers of a better way to go about doing this would be massively helpful!
Right now, the killDAQ() call issued by the "Stop" button kills the underlying python process writing out the data to disk. Is there a way to hook into using that same button click to also terminate the setInterval() loop updating the graph? There's no need for it to be updated any longer after the data acquisition has been stopped so having the single click do double duty would be ideal. I think that setting up a listener or res/req approach would be an option, but pointers in the right direction would be massively helpful.
(Edit: I solved #2, using global window. variables. It's a hack, but it seems to work:
window.refreshIntervalId = setInterval(foo);
...
clearInterval(window.refreshIntervalId);
)
Thanks for much for the help!
MWE:
html (using Pug as a template engine):
doctype html
html
body.default
.container-fluid
.row
.col-md-5
.row.text-center
.col-md-6
button#start_button(type="button", onclick="makeCallToDAQ()") Start Acquisition
.col-md-6
button#stop_button(type="button", onclick="killDAQ()") Stop Acquisition
.col-md-7
#myDAQDiv(style='width: 980px; height: 500px;')
javascript (start/stop acquisition):
function makeCallToDAQ() {
fetch('/start_daq', {
// call to app to start the acquisition script
})
.then(console.log(dateTime))
.then(function(response) {
console.log(response)
setInterval(function(){ callPlotly(dateTime.concat('.csv')); }, 5000);
});
}
function killDAQ() {
fetch('/stop_daq')
// kills the process
.then(function(response) {
// Use the response sent here
alert('DAQ has stopped!')
})
}
javascript (call to Plotly for plotting):
function callPlotly(filename) {
var csv_filename = filename;
console.log(csv_filename)
function makeplot(csv_filename) {
// Read data via AJAX call and grab header names
var headerNames = [];
d3.csv(csv_filename, function(error, data) {
headerNames = d3.keys(data[0]);
processData(data, headerNames)
});
};
function processData(allRows, headerNames) {
// Plot data from relevant columns
var plotDiv = document.getElementById("plot");
var traces = [{
x: x,
y: y
}];
Plotly.newPlot('myDAQDiv', traces, plotting_options);
};
makeplot(filename);
}
node.js (the actual Node app):
// Start the DAQ
app.use(express.json());
var isDaqRunning = true;
var pythonPID = 0;
const { spawn } = require('child_process')
var process;
app.post('/start_daq', function(req, res) {
isDaqRunning = true;
// Call the python script here.
const process = spawn('python', ['../private/BIC_script.py', arg1, arg2])
pythonPID = process.pid;
process.stdout.on('data', (myData) => {
res.send("Done!")
})
process.stderr.on('data', (myErr) => {
// If anything gets written to stderr, it'll be in the myErr variable
})
res.status(200).send(); //.json(result);
})
// Stop the DAQ
app.get('/stop_daq', function(req, res) {
isDaqRunning = false;
process.on('close', (code, signal) => {
console.log(
`child process terminated due to receipt of signal ${signal}`);
});
// Send SIGTERM to process
process.kill('SIGTERM');
res.status(200).send();
})

Weird (caching) issue with Express/Node

I've built an angular/express/node app that runs in google cloud which currently uses a JSON file that serves as a data source for my application. For some reason, (and this only happens in the cloud) when saving data through an ajax call and writing it to the json file, everything seems to work fine. However, when refreshing the page, the server (sometimes!) sends me the version before the edit. I can't tell whether this is an Express-related, Node-related or even Angular-related problem, but what I know for sure is that I'm checking the JSON that comes in the response from the server, and it really is sometimes the modified version, sometimes not, so it most probably isn't angular cache-related.
The GET:
router.get('/concerts', function (request, response) {
delete require.cache[require.resolve('../database/data.json')];
var db = require('../database/data.json');
response.send(db.concerts);
});
The POST:
router.post('/concerts/save', function (request, response) {
delete require.cache[require.resolve('../database/data.json')];
var db = require('../database/data.json');
var concert = request.body;
console.log('Received concert id ' + concert.id + ' for saving.');
if (concert.id != 0) {
var indexOfItemToSave = db.concerts.map(function (e) {
return e.id;
}).indexOf(concert.id);
if (indexOfItemToSave == -1) {
console.log('Couldn\'t find concert with id ' + concert.id + 'in database!');
response.sendStatus(404);
return;
}
db.concerts[indexOfItemToSave] = concert;
}
else if (concert.id == 0) {
concert.id = db.concerts[db.concerts.length - 1].id + 1;
console.log('Concert id was 0, adding it with id ' + concert.id + '.');
db.concerts.push(concert);
}
console.log("Added stuff to temporary db");
var error = commit(db);
if (error)
response.send(error);
else
response.status(200).send(concert.id + '');
});
This probably doesn't say much, so if someone is interested in helping, you can see the issue live here. If you click on modify for the first concert and change the programme to something like asd and then save, everything looks fine. But if you try to refresh the page a few times (usually even up to 6-7 tries are needed) the old, unchanged programme is shown. Any clue or advice greatly appreciated, thanks.

To solve: Do not use local files to store data in cloud! This is what databases are for!
What was actually the problem?
The problem was caused by the fact that the App Engine had 2 VM instances running for my application. This caused the POST request to be sent to one instance, it did its job, saved the data by modifying its local JSON file, and returned a 200. However, after a few refreshes, the load balancing causes the GET to arrive at the other machine, which has its individual source code, including the initial, unmodified JSON. I am now using a MongoDB instance, and everything seems to be solved. Hopefully this discourages people who attempt to do the same thing I did.

socket.io stop re-emitting event after x seconds/first failed attempt to get a response

I noticed that whenever my server is offline, and i switch it back online, it receives a ton of socket events, that have been fired while server was down. ( events that are ... by now outdated ).
Is there a way to stop socket.io from re-emitting the events after they have not received a response for x seconds ?.

When all else fails with open source libraries, you go study the code and see what you can figure out. After spending some time doing that with the socket.io source code...
The crux of the issue seems to be this code that is here in socket.emit():
if (this.connected) {
this.packet(packet);
} else {
this.sendBuffer.push(packet);
}
If the socket is not connected, all data sent via .emit() is buffered in the sendBuffer. Then, when the socket connects again, we see this:
Socket.prototype.onconnect = function(){
this.connected = true;
this.disconnected = false;
this.emit('connect');
this.emitBuffered();
};
Socket.prototype.emitBuffered = function(){
var i;
for (i = 0; i < this.receiveBuffer.length; i++) {
emit.apply(this, this.receiveBuffer[i]);
}
this.receiveBuffer = [];
for (i = 0; i < this.sendBuffer.length; i++) {
this.packet(this.sendBuffer[i]);
}
this.sendBuffer = [];
};
So, this fully explains why it buffers all data sent while the connection is down and then sends it all upon reconnect.
Now, as to how to prevent it from sending this buffered data, here's a theory that I will try to test later tonight when I have more time.
Two things look like they present an opportunity. The socket notifies of the connect event before it sends the buffered data and the sendBuffer is a public property of the socket. So, it looks like you can just do this in the client code (clear the buffer upon connect):
// clear previously buffered data when reconnecting
socket.on('connect', function() {
socket.sendBuffer = [];
});
I just tested it, and it works just fine. I have a client socket that sends an increasing counter message to the server every second. I take the server down for 5 seconds, then when I bring the server back up before adding this code, all the queued up messages arrive on the server. No counts are missed.
When, I then add the three lines of code above, any messages sent while the server is down are not sent to the server (technically, they are cleared from the send buffer before being sent). It works.
FYI, another possibility would be to just not call .emit() when the socket is not connected. So, you could just create your own function or method that would only try to .emit() when the socket is actually connected, thus nothing would ever get into the sendBuffer.
Socket.prototype.emitWhenConnected = function(msg, data) {
if (this.connected) {
return this.emit(msg, data);
} else {
// do nothing?
return this;
}
}
Or, more dangerously, you could override .emit() to make it work this way (not my recommendation).

Volatile events are events that will not be sent if the underlying connection is not ready (a bit like UDP, in terms of reliability).
https://socket.io/docs/v4/emitting-events/#volatile-events
socket.volatile.emit("hello", "might or might not be received");

Stop fs.createWriteStream creating writeable stream when file is deleted

Folks: I'm creating an Angular/Node app, where users download files via selecting a related thumbnail.
As files download, a small list is shown with the download progress - using status-bar.
When the file is downloaded a success message is shown.
Each item in the list has a delete button which removes the files when clicked. All of this works fine.
Question: Similar to this post - when the delete button is clicked, the idea is to stop the download - this is why I thought I'd just delete file.
However, I'm using fs.createWriteStream and when the file is deleted, the stream appears to continue, regardless of the file not being there. This then causes the file.on('finish', function() { state to kick in and show the success message.
To tackle this, I check to see if the file path exists when the finish state kicks in so to display the success message correctly. This feels pretty hacky, especially when there's large files downloading.
Is there a way to cancel the stream from progressing when the file is deleted?

Following your comment 'yes, just like that', I have one question. You are obviously creating the file in client system, and writing in streams. How are you doing it from browser? Are you using any API that gives you access of node's core module in browser? Like browserify.
Having said that, if my understanding is correct, you can achieve that in the following way
var http = require("http"),
fs = require("fs"),
stream = require("stream"),
util = require("util"),
abortStream=false, // When user click on delete, update this flag to true
ws,
Transform;
ws = fs.createWriteStream('./op.jpg');
// Transform streams read input, process data [n times], output processed data
// readStream ---pipe---> transformStream1 ---pipe---> ...transformStreamn ---pipe---> outputStream
// #api https://nodejs.org/api/stream.html#stream_class_stream_transform
// #exmpl https://strongloop.com/strongblog/practical-examples-of-the-new-node-js-streams-api/
Transform = stream.Transform || require("readable-stream").Transform;
function InterruptedStream(options){
if(!(this instanceof InterruptedStream)){
return new InterruptedStream;
}
Transform.call(this, options);
}
util.inherits(InterruptedStream, Transform);
InterruptedStream.prototype._transform = function (chunkdata, encoding, done) {
// This is just for illustration, giving you the idea
// Do not hard code the condition here.
// Suggested to give the condition during constructor call, may be
if(abortStream===true){
// Take care of this part.
// Your logic might try to write in the stream after it is closed.
// You can catch the exception but before that try not to write in the first place
this.end(); // Stops the stream
}
this.push(chunkdata, encoding);
done();
};
var is=new InterruptedStream();
is.pipe(ws);
// Download large file
http.get("http://www.zastavki.com/pictures/1920x1200/2011/Space_Huge_explosion_031412_.jpg", function(res) {
res.on('data', function(data) {
is.write(data);
// Simulates click on delete button
setTimeout(function(){
abortStream=false;
res.destroy();
// Delete the file, I think you have the logic in place
}, 2000);
}).on('end', function() {
console.log("end");
});
});
The above code snippet gives rough idea how its to be done. You can just copy paste it, run (it will work) and make changes.
If we are not on same page please let me know, Ill try to rectify my answer.

i think you can emit an event when your file is deleted and capture that event in
var wt = fs.createWriteStream();
wt.on('eventName',function(){
wt.emit('close');
})
this will close your writableStream.
and delete event should be fired from client side.

Reduce Ajax requests

I'm making a chat script using jQuery and JSON, but my hosting suspends it due to 'resources usage limit'. I want to know if it is possible (and how) to reduce these requests. I read one question in which they tell something about an Ajax timeout, but I'm not very good at Ajax. The code is:
function getOnJSON() {
var from;
var to;
var msg_id;
var msg_txt;
var new_chat_string;
//Getting the data from the JSON file
$.getJSON("/ajax/end.emu.php", function(data) {
$.each(data.notif, function(i, data) {
from = data.from;
to = data.to;
msg_id = data.id;
msg_txt = data.text;
if ($("#chat_" + from + "").length === 0) {
$("#boxes").append('...some stuf...');
$('#' + from + '_form').submit(function(){
contactForm = $(this);
valor = $(this + 'input:text').val();
destinatary = $(this + 'input[type=hidden]').val();
reponse_id = destinatary + "_input";
if (!$(this + 'input:text').val()) {
return false;
}
else {
$.ajax({
url: "/ajax/end.emu.php?ajax=true",
type: contactForm.attr('method'),
data: contactForm.serialize(),
success: function(data){
responsed = $.trim(data);
if (responsed != "success") {
alert("An error occured while posting your message");
}
else {
$('#' + reponse_id).val("");
}
}
});
return false;
}
});
$('#' + from + '_txt').jScrollPane({
stickToBottom: true,
maintainPosition: true
});
$('body').append('<embed src="http://cdn.live-pin.com/assets/pling.mp3" autostart="true" hidden="true" loop="false">');
}
else {
var pane2api = $('#' + from + '_txt').data('jsp');
var originalContent = pane2api.getContentPane().html();
pane2api.getContentPane().append('<li id="' + msg_id + '_txt_msg" class="chat_txt_msg">' + msg_txt + '</li>');
pane2api.reinitialise();
pane2api.scrollToBottom();
$('embed').remove();
$('body').append('<embed src="http://cdn.live-pin.com/assets/pling.mp3" autostart="true" hidden="true" loop="false">');
}
});
});
}
The limit is of 600 reqs/5 min, and I need to make it almost each second. I had a year already paid and they have no refund, also I can't modify the server, just have access to cPanel.

Well, 600 req/5 min is pretty restrictive if you want to make a request/sec for each user. Essentially, that gives you that each user will make 60 req/min. Or 300/5 min. In other words, even if you optimize your script to combine the two requests to one, at maximum you can have two users at your site ;) Not much I guess...
You have two options:
Stick with making a chat system through Ajax requests and change the hosting provider. This might be actually cheaper if you don't have the skills to do 2.
Forget about making an Ajax request to poll and potentially another to push every second. Implement something around web sockets, long-polling or even XMPP.
If you go that route, I would look at socket.io for a transparent library that uses web sockets where they are supported and has fallbacks to long polling and others for the rest. For the XMPP-way, there is the excellent Strophe.js. Note that both routes are much more complex than your Ajax requests and will require a lot of server logic changes.

I don't think that checking each second is really a good idea, in my opinion for online chat 2/3 seconds check should be far enough.
To get less request, you can also add a check on the user activity in client side, if the windows is inactive you can lengthen the checking time, going back to 2/3 seconds when the user come back active, that will allow you to save resources and requests / minutes

I'm working on a project right now that requires keeping the UI in sync with server events. I've been using long polling which does indeed reduce the number of ajax calls, but then it put's the burden on the server to listen for the event that the client is interested in, which isn't fun either.
I'm about to switch over to socket.io which I will set up as a separate push service.
existing server --> pushes to sockt.io server --> pushes to subscribing client

ggozad's response is good, I also recommend web sockets. They work only with newer browser models, so if you want to make it available on all browsers you will need a small Flash bridge (Flash can communicate with sockets very easily and also it can call JavaScript functions and be called from JavaScript). Also, Flash offers P2P if you are interested. http://labs.adobe.com/technologies/cirrus/
Also, for server side you can look into Node.js if you are a JavaScript fan like me :)
To complete my response: there is no way to make an Ajax based chat in witch you are limited to 600 requests/5 min (2 requests/second), want to make a request/second and want more than two users.
Solution: switch to sockets or P2P.

I recommend you to call that paid service from the server side using a single thread (as an API proxy). You can still poll with 600 requests/5 min in this thread. Then every client do Ajax requests to poll or long-poll to your server API proxy without limitation.

Develop Reference

JavaScript is the programming language of the Web.

How to process streaming HTTP GET data? - javascript

Related

How to efficiently stream a real-time chart from a local data file

Weird (caching) issue with Express/Node

socket.io stop re-emitting event after x seconds/first failed attempt to get a response

Stop fs.createWriteStream creating writeable stream when file is deleted

Reduce Ajax requests

Categories

Resources