Stop fs.createWriteStream creating writeable stream when file is deleted - javascript

Folks: I'm creating an Angular/Node app, where users download files via selecting a related thumbnail.
As files download, a small list is shown with the download progress - using status-bar.
When the file is downloaded a success message is shown.
Each item in the list has a delete button which removes the files when clicked. All of this works fine.
Question: Similar to this post - when the delete button is clicked, the idea is to stop the download - this is why I thought I'd just delete file.
However, I'm using fs.createWriteStream and when the file is deleted, the stream appears to continue, regardless of the file not being there. This then causes the file.on('finish', function() { state to kick in and show the success message.
To tackle this, I check to see if the file path exists when the finish state kicks in so to display the success message correctly. This feels pretty hacky, especially when there's large files downloading.
Is there a way to cancel the stream from progressing when the file is deleted?

Following your comment 'yes, just like that', I have one question. You are obviously creating the file in client system, and writing in streams. How are you doing it from browser? Are you using any API that gives you access of node's core module in browser? Like browserify.
Having said that, if my understanding is correct, you can achieve that in the following way
var http = require("http"),
fs = require("fs"),
stream = require("stream"),
util = require("util"),
abortStream=false, // When user click on delete, update this flag to true
ws,
Transform;
ws = fs.createWriteStream('./op.jpg');
// Transform streams read input, process data [n times], output processed data
// readStream ---pipe---> transformStream1 ---pipe---> ...transformStreamn ---pipe---> outputStream
// #api https://nodejs.org/api/stream.html#stream_class_stream_transform
// #exmpl https://strongloop.com/strongblog/practical-examples-of-the-new-node-js-streams-api/
Transform = stream.Transform || require("readable-stream").Transform;
function InterruptedStream(options){
if(!(this instanceof InterruptedStream)){
return new InterruptedStream;
}
Transform.call(this, options);
}
util.inherits(InterruptedStream, Transform);
InterruptedStream.prototype._transform = function (chunkdata, encoding, done) {
// This is just for illustration, giving you the idea
// Do not hard code the condition here.
// Suggested to give the condition during constructor call, may be
if(abortStream===true){
// Take care of this part.
// Your logic might try to write in the stream after it is closed.
// You can catch the exception but before that try not to write in the first place
this.end(); // Stops the stream
}
this.push(chunkdata, encoding);
done();
};
var is=new InterruptedStream();
is.pipe(ws);
// Download large file
http.get("http://www.zastavki.com/pictures/1920x1200/2011/Space_Huge_explosion_031412_.jpg", function(res) {
res.on('data', function(data) {
is.write(data);
// Simulates click on delete button
setTimeout(function(){
abortStream=false;
res.destroy();
// Delete the file, I think you have the logic in place
}, 2000);
}).on('end', function() {
console.log("end");
});
});
The above code snippet gives rough idea how its to be done. You can just copy paste it, run (it will work) and make changes.
If we are not on same page please let me know, Ill try to rectify my answer.

i think you can emit an event when your file is deleted and capture that event in
var wt = fs.createWriteStream();
wt.on('eventName',function(){
wt.emit('close');
})
this will close your writableStream.
and delete event should be fired from client side.

Related

How to efficiently stream a real-time chart from a local data file

complete noob picking up NodeJS over the last few days here, and I've gotten myself in big trouble, it looks like. I've currently got a working Node JS+Express server instance, running on a Raspberry Pi, acting as a web interface for a local data acquisition script ("the DAQ"). When executed, the script writes out data to a local file on the Pi, in .csv format, writing out in real-time every second.
My Node app is a simple web interface to start (on-click) the data acquisition script, as well as to plot previously acquired data logs, and visualize the actively being collected data in real time. Plotting of old logs was simple, and I wrote a JS function (using Plotly + d3) to read a local csv file via AJAX call, and plot it - using this script as a starting point, but using the logs served by express rather than an external file.
When I went to translate this into a real-time plot, I started out using the setInterval() method to update the graph periodically, based on other examples. After dealing with a few unwanted recursion issues, and adjusting the interval to a more reasonable setting, I eliminated the memory/traffic issues which were crashing the browser after a minute or two, and things are mostly stable.
However, I need help with one thing primarily:
Improving the efficiency of my first attempt approach: This acquisition script absolutely needs to be written to file every second, but considering that a typical run might last 1-2 weeks, the file size being requested on every Interval loop will quickly start to balloon. I'm completely new to Node/Express, so I'm sure there's a much better way of doing the real-time rendering aspect of this - that's the real issue here. Any pointers of a better way to go about doing this would be massively helpful!
Right now, the killDAQ() call issued by the "Stop" button kills the underlying python process writing out the data to disk. Is there a way to hook into using that same button click to also terminate the setInterval() loop updating the graph? There's no need for it to be updated any longer after the data acquisition has been stopped so having the single click do double duty would be ideal. I think that setting up a listener or res/req approach would be an option, but pointers in the right direction would be massively helpful.
(Edit: I solved #2, using global window. variables. It's a hack, but it seems to work:
window.refreshIntervalId = setInterval(foo);
...
clearInterval(window.refreshIntervalId);
)
Thanks for much for the help!
MWE:
html (using Pug as a template engine):
doctype html
html
body.default
.container-fluid
.row
.col-md-5
.row.text-center
.col-md-6
button#start_button(type="button", onclick="makeCallToDAQ()") Start Acquisition
.col-md-6
button#stop_button(type="button", onclick="killDAQ()") Stop Acquisition
.col-md-7
#myDAQDiv(style='width: 980px; height: 500px;')
javascript (start/stop acquisition):
function makeCallToDAQ() {
fetch('/start_daq', {
// call to app to start the acquisition script
})
.then(console.log(dateTime))
.then(function(response) {
console.log(response)
setInterval(function(){ callPlotly(dateTime.concat('.csv')); }, 5000);
});
}
function killDAQ() {
fetch('/stop_daq')
// kills the process
.then(function(response) {
// Use the response sent here
alert('DAQ has stopped!')
})
}
javascript (call to Plotly for plotting):
function callPlotly(filename) {
var csv_filename = filename;
console.log(csv_filename)
function makeplot(csv_filename) {
// Read data via AJAX call and grab header names
var headerNames = [];
d3.csv(csv_filename, function(error, data) {
headerNames = d3.keys(data[0]);
processData(data, headerNames)
});
};
function processData(allRows, headerNames) {
// Plot data from relevant columns
var plotDiv = document.getElementById("plot");
var traces = [{
x: x,
y: y
}];
Plotly.newPlot('myDAQDiv', traces, plotting_options);
};
makeplot(filename);
}
node.js (the actual Node app):
// Start the DAQ
app.use(express.json());
var isDaqRunning = true;
var pythonPID = 0;
const { spawn } = require('child_process')
var process;
app.post('/start_daq', function(req, res) {
isDaqRunning = true;
// Call the python script here.
const process = spawn('python', ['../private/BIC_script.py', arg1, arg2])
pythonPID = process.pid;
process.stdout.on('data', (myData) => {
res.send("Done!")
})
process.stderr.on('data', (myErr) => {
// If anything gets written to stderr, it'll be in the myErr variable
})
res.status(200).send(); //.json(result);
})
// Stop the DAQ
app.get('/stop_daq', function(req, res) {
isDaqRunning = false;
process.on('close', (code, signal) => {
console.log(
`child process terminated due to receipt of signal ${signal}`);
});
// Send SIGTERM to process
process.kill('SIGTERM');
res.status(200).send();
})

What's the correct way to handle removing a potentially busy file in NodeJS?

I have a NodeJS server managing some files. It's going to watch for a known filename from an external process and, once received, read it and then delete it. However, sometimes it's attempted to be read/deleted before the file has "unlocked" from previous use so likely will fail occasionally. What I'd like to do is retry this file asap, either as soon as it's finished or continuously at a fast pace.
I'd rather avoid a long sleep where possible, because this needs to be handled ASAP and every second counts.
fs.watchFile(intput_json_file, {interval: 10}, function(current_stats, previous_stats) {
var json_data = "";
try {
var file_cont = fs.readFileSync(input_json_file); // < TODO: async this
json_data = JSON.parse(file_cont.toString());
fs.unlink(input_json_file);
} catch (error) {
console.log("The JSON in the could not be parsed. File will continue to be watched.");
console.log(error);
return;
}
// Else, this has loaded properly.
fs.unwatchFile(input_json_file);
// ... do other things with the file's data.
}
// set a timeout for the file watching, just in case
setTimeout(fs.unwatchFile, CLEANUP_TIMEOUT, input_json_file);
I expect "EBUSY: resource busy or locked" to turn up occasionally, but fs.watchFile isn't always called when the file is unlocked.
I thought of creating a function and then calling it with a delay of 1-10ms, where it could call itself if that fails too, but that feels like a fast route to a... cough stack overflow.
I'd also like to steer clear of synchronous methods so that this scales nicely, but being relatively new to NodeJS all the callbacks are starting to turn into a maze.
May be it will be over for this story, but you can create own fs with full control. In this case other programs will write data directly to your program. Just search by word fuse and fuse-binding

Re-using same instance again webdriverJS

I am really new to Selenium. I managed to open a website using the below nodejs code
var webdriver = require('selenium-webdriver');
var driver = new webdriver.Builder()
.forBrowser('chrome')
.build();
console.log(driver);
driver.get('https://web.whatsapp.com');
//perform all other operations here.
https://web.whatsapp.com is opened and I manually scan a QR code and log in. Now I have different javascript files to perform actions like delete, clear chat inside web.whatsapp.com etc...
Now If I get some error, I debug and when I run the script again using node test.js, it takes another 2 minutes to load page and do the steps I needed. I just wanted to reopen the already opened tab and continue my script instead new window opens.
Edit day 2 : Still searching for solution. I tried below code to save object and reuse it.. Is this the correct approach ? I get a JSON parse error though.
var o = new chrome.Options();
o.addArguments("user-data-dir=/Users/vishnu/Library/Application Support/Google/Chrome/Profile 2");
o.addArguments("disable-infobars");
o.addArguments("--no-first-run");
var driver = new webdriver.Builder().withCapabilities(webdriver.Capabilities.chrome()).setChromeOptions(o).build();
var savefile = fs.writeFile('data.json', JSON.stringify(util.inspect(driver)) , 'utf-8');
var parsedJSON = require('./data.json');
console.log(parsedJSON);
It took me some time and a couple of different approaches, but I managed to work up something I think solves your problem and allows to develop tests in a rather nice way.
Because it does not directly answer the question of how to re-use a browser session in Selenium (using their JavaScript API), I will first present my proposed solution and then briefly discuss the other approaches I tried. It may give someone else an idea and help them to solve this problem in a nicer/better way. Who knows. At least my attempts will be documented.
Proposed solution (tested and works)
Because I did not manage to actually reuse a browser session (see below), I figured I could try something else. The approach will be the following.
Idea
Have a main loop in one file (say init.js) and tests in a separate file (test.js).
The main loop opens a browser instance and keeps it open. It also exposes some sort of CLI that allows one to run tests (from test.js), inspect errors as they occur and to close the browser instance and stop the main loop.
The test in test.js exports a test function that is being executed by the main loop. It is passed a driver instance to work with. Any errors that occur here are being caught by the main loop.
Because the browser instance is opened only once, we have to do the manual process of authenticating with WhatsApp (scanning a QR code) only once. After that, running a test will reload web.whatsapp.com, but it will have remembered that we authenticated and thus immediately be able to run whatever tests we define in test.js.
In order to keep the main loop alive, it is vital that we catch each and every error that might occur in our tests. I unfortunately had to resort to uncaughtException for that.
Implementation
This is the implementation of the above idea I came up with. It is possible to make this much fancier if you would want to do so. I went for simplicity here (hope I managed).
init.js
This is the main loop from the above idea.
var webdriver = require('selenium-webdriver'),
by = webdriver.By,
until = webdriver.until,
driver = null,
prompt = '> ',
testPath = 'test.js',
lastError = null;
function initDriver() {
return new Promise((resolve, reject) => {
// already opened a browser? done
if (driver !== null) {
resolve();
return;
}
// open a new browser, let user scan QR code
driver = new webdriver.Builder().forBrowser('chrome').build();
driver.get('https://web.whatsapp.com');
process.stdout.write("Please scan the QR code within 30 seconds...\n");
driver.wait(until.elementLocated(by.className('chat')), 30000)
.then(() => resolve())
.catch((timeout) => {
process.stdout.write("\b\bTimed out waiting for code to" +
" be scanned.\n");
driver.quit();
reject();
});
});
}
function recordError(err) {
process.stderr.write(err.name + ': ' + err.message + "\n");
lastError = err;
// let user know that test failed
process.stdout.write("Test failed!\n");
// indicate we are ready to read the next command
process.stdout.write(prompt);
}
process.stdout.write(prompt);
process.stdin.setEncoding('utf8');
process.stdin.on('readable', () => {
var chunk = process.stdin.read();
if (chunk === null) {
// happens on initialization, ignore
return;
}
// do various different things for different commands
var line = chunk.trim(),
cmds = line.split(/\s+/);
switch (cmds[0]) {
case 'error':
// print last error, when applicable
if (lastError !== null) {
console.log(lastError);
}
// indicate we are ready to read the next command
process.stdout.write(prompt);
break;
case 'run':
// open a browser if we didn't yet, execute tests
initDriver().then(() => {
// carefully load test code, report SyntaxError when applicable
var file = (cmds.length === 1 ? testPath : cmds[1] + '.js');
try {
var test = require('./' + file);
} catch (err) {
recordError(err);
return;
} finally {
// force node to read the test code again when we
// require it in the future
delete require.cache[__dirname + '/' + file];
}
// carefully execute tests, report errors when applicable
test.execute(driver, by, until)
.then(() => {
// indicate we are ready to read the next command
process.stdout.write(prompt);
})
.catch(recordError);
}).catch(() => process.stdin.destroy());
break;
case 'quit':
// close browser if it was opened and stop this process
if (driver !== null) {
driver.quit();
}
process.stdin.destroy();
return;
}
});
// some errors somehow still escape all catches we have...
process.on('uncaughtException', recordError);
test.js
This is the test from the above idea. I wrote some things just to test the main loop and some WebDriver functionality. Pretty much anything is possible here. I have used promises to make test execution work nicely with the main loop.
var driver, by, until,
timeout = 5000;
function waitAndClickElement(selector, index = 0) {
driver.wait(until.elementLocated(by.css(selector)), timeout)
.then(() => {
driver.findElements(by.css(selector)).then((els) => {
var element = els[index];
driver.wait(until.elementIsVisible(element), timeout);
element.click();
});
});
}
exports.execute = function(d, b, u) {
// make globally accessible for ease of use
driver = d;
by = b;
until = u;
// actual test as a promise
return new Promise((resolve, reject) => {
// open site
driver.get('https://web.whatsapp.com');
// make sure it loads fine
driver.wait(until.elementLocated(by.className('chat')), timeout);
driver.wait(until.elementIsVisible(
driver.findElement(by.className('chat'))), timeout);
// open menu
waitAndClickElement('.icon.icon-menu');
// click profile link
waitAndClickElement('.menu-shortcut', 1);
// give profile time to animate
// this prevents an error from occurring when we try to click the close
// button while it is still being animated (workaround/hack!)
driver.sleep(500);
// close profile
waitAndClickElement('.btn-close-drawer');
driver.sleep(500); // same for hiding profile
// click some chat
waitAndClickElement('.chat', 3);
// let main script know we are done successfully
// we do so after all other webdriver promise have resolved by creating
// another webdriver promise and hooking into its resolve
driver.wait(until.elementLocated(by.className('chat')), timeout)
.then(() => resolve());
});
};
Example output
Here is some example output. The first invocation of run test will open up an instance of Chrome. Other invocations will use that same instance. When an error occurs, it can be inspected as shown. Executing quit will close the browser instance and quit the main loop.
$ node init.js
> run test
> run test
WebDriverError: unknown error: Element <div class="chat">...</div> is not clickable at point (163, 432). Other element would receive the click: <div dir="auto" contenteditable="false" class="input input-text">...</div>
(Session info: chrome=57.0.2987.133)
(Driver info: chromedriver=2.29.461571 (8a88bbe0775e2a23afda0ceaf2ef7ee74e822cc5),platform=Linux 4.9.0-2-amd64 x86_64)
Test failed!
> error
<prints complete stacktrace>
> run test
> quit
You can run tests in other files by simply calling them. Say you have a file test-foo.js, then execute run test-foo in the above prompt to run it. All tests will share the same Chrome instance.
Failed attempt #1: saving and restoring storage
When inspecting the page using my development tools, I noticed that it appears to use the localStorage. It is possible to export this as JSON and write it to a file. On a next invocation, this file can be read, parsed and written to the new browser instance storage before reloading the page.
Unfortunately, WhatsApp still required me to scan the QR code. I have tried to figure out what I missed (cookies, sessionStorage, ...), but did not manage. It is possible that WhatsApp registers the browser as being disconnected after some time has passed. Or that it uses other browser properties (session ID?) to recognize the browser. This is pure speculating from my side though.
Failed attempt #2: switching session/window
Every browser instance started via WebDriver has a session ID. This ID can be retrieved, so I figured it may be possible to start a session and then connect to it from the test cases, which would then be run from a separate file (you can see this is the predecessor of the final solution). Unfortunately, I have not been able to figure out a way to set the session ID. This may actually be a security concern, I am not sure. People more expert in the usage of WebDriver might be able to clarify here.
I did find out that it is possible to retrieve a list of window handles and switch between them. Unfortunately, windows are only shared within a single session and not across sessions.

Make NodeJS/JSDom wait for full rendering before scraping

I'm trying to scrape data from a website that I need to log into. Unfortunately, I'm getting different results using JSDom/NodeJS than I would if I were to use a web browser, such as FF. In particular, I'm not getting the log in form with the username, password and submit button.
I understand much of Javascript, at least, is asynchronous. However, I thought the "done" function of JSDom waits synchronously for the full rendering of the page. I guess what I'd like to do is simulate an HTTPS get and wait for the full document.ready to be done.
var jsdom = require("jsdom");
var jsdom_global = require("jsdom-global");
var fs = require("fs");
var jquery = fs.readFileSync("./jquery-3.1.1.min.js", "utf-8");
jsdom.env({
url: "https://wemc.smarthub.coop/Login.html#login:",
src: [jquery],
done: function (err, window) {
var $ = window.$;
if($("button#LoginSubmitButton").length) {
console.log('Click button found');
} else {
console.log('Click button not found');
}
// The following text boxes are not coming back:
// $("input#LoginUsernameTextBox")
// $("input#LoginPasswordTextBox")
// If I enable the line below, I see a lot less than I would if I
// do a view source in any reasonable browser.
//console.log($("body").html());
}
});
Usually, this will happen because JSDOM doesn't execute the JS when it hits the page. In that case, the only elements returned will be the server rendered HTML.
You could try a headless browser module such as PhantomJS etc and see how that goes for you. There's a section about the distinction between the two at the bottom of the JSDOM github page.

How to process streaming HTTP GET data?

Right now, I have a node.js server that's able to stream data on GET request, using the stream API. The GET request is Transfer-encoded set to 'chunked'. The data can be on the order of 10 to 30 MBs. (They are sometimes 3D models)
On the browser side, I wish to be able to process the data as I'm downloading it--I wish to be able to display the data on Canvas as I'm downloading it. So you can see the 3D model appear, face by face, as the data is coming in. I don't need duplex communication, and I don't need a persistent connection. But I do need to process the data as soon as it's downloaded, rather than waiting for the entire file to finish downloading. Then after the browser downloads the data, I can close the connection.
How do I do this?
JQuery ajax only calls back when all the data has been received.
I also looked at portal.js (which was jquery-streaming) and socket.io, but they seem to assume persistent reconnection.
So far, I was able to hack a solution using raw XMLHttpRequest, and making a callback when readyStead >= 2 && status == 200, and keeping track of place last read. However, that keeps all the data downloaded in the raw XMLHttpRequest, which I don't want.
There seems to be a better way to do this, but I'm not sure what it is. Any one have suggestions?
oboe.js is a library for streaming responses in the browser.
However, that keeps all the data downloaded in the raw XMLHttpRequest, which I don't want.
I suspect this may be the case with oboe.js as well and potentially a limitation of XMLHttpRequest itself. Not sure as I haven't directly worked on this type of use case. Curious to see what you find out with your efforts and other answers to this question.
So I found the answer, and it's Server-sent events. It basically enables one-way http-streams that the browser can handle a chunk at a time. It can be a little tricky because some existing stream libs are broken (they don't assume you have \n in your stream, and hence you get partial data), or have little documentation. But it's not hard to roll your own (once you figure it out).
You can define your sse_transform like this:
// file sse_stream.coffee
var Transform = require('stream').Transform;
var util = require('util');
util.inherits(SSEStream, Transform);
function SSEStream(option) {
Transform.call(this, option);
this.id = 0;
this.retry = (option && option.retry) || 0;
}
SSEStream.prototype._transform = function(chunk, encoding, cb) {
var data = chunk.toString();
if (data) {
this.push("id:" + this.id + "\n" +
data.split("\n").map(function (e) {
return "data:" + e
}).join("\n") + "\n\n");
//"retry: " + this.retry);
}
this.id++;
cb();
};
SSEStream.prototype._flush = function(next) {
this.push("event: end\n" + "data: end" + "\n\n");
next();
}
module.exports = SSEStream;
Then on the server side (I was using express), you can do something like this:
sse_stream = require('sse_stream')
app.get '/blob', (req, res, next) ->
sse = new sse_stream()
# It may differ here for you, but this is just a stream source.
blobStream = repo.git.streamcmd("cat-file", { p: true }, [blob.id])
if (req.headers["accept"] is "text/event-stream")
res.type('text/event-stream')
blobStream.on("end", () -> res.removeAllListeners()).stdout
.pipe(
sse.on("end", () -> res.end())
).pipe(res)
else
blobStream.stdout.pipe(res)
Then on the browser side, you can do:
source = new EventSource("/blob")
source.addEventListener('open', (event) ->
console.log "On open..."
, false)
source.addEventListener('message', (event) ->
processData(event.data)
, false)
source.addEventListener('end', (event) ->
console.log "On end"
source.close()
, false)
source.addEventListener('error', (event) ->
console.log "On Error"
if event.currentTarget.readyState == EventSource.CLOSED
console.log "Connection was closed"
source.close()
, false)
Notice that you need to listen for the event 'end', that is sent from the server in the transform stream's _flush() method. Otherwise, EventSource in the browser is just going to request the same file over and over again.
Note that you can use libraries on the server side to generate SSE. On the browser side, you can use portal.js to handle SSE. I just spelt things out, so you can see how things would work.

Categories

Resources