I have a task to create a node.js script that listens in a directory for new archive files to arrive and process them.
I see that I can do this with fs.watch.
The files are tar.gz archives, that arrive via scp.
The problem is that the arrival of a new archive file seems to create multiple (the exact number is unpredictable) events in the file-system. The first is a rename, followed by some number of change events.
I need to reliably trigger my processing logic only once, when the archive is finished being transferred. How can this be done?
Additional notes:
I am not able to make changes to the system sending the archive, only
the system receiving it.
I am not considering using elapsed time to
guess that the scp event has concluded. That is not reliable.
Using the watch library you can do:
var watch = require('watch')
watch.createMonitor('/home/path', function (monitor) {
monitor.on("created", function (file, stat) {
// do work with new file
})
}
I recommend taking advantage of the rich library ecosystem available to you because you're using the node platform. Some of these problems have already been solved for you!
The problem is that file transfers are not instantaneous; the series of events you have observed make complete sense:
Someone begins uploading a file. The scp server creates a new file. Your watcher sees the rename event.
Bytes are sent to your server. The scp server writes them to the file from step 1. Your watcher sees many change events.
The upload completes. No further events are generated because all the bytes have been written.
As far as I know (and based on skimming the source of scp), there is no way to configure the scp server to do something when an upload actually completes. This leaves you with two options:
Debounce the change events. This means setting a timer every time you get a change event and clearing the previous timer. Eventually, you'll stop getting events, the timer will fire, and you can assume that the upload is complete.
This does leave you vulnerable to acting on stalled or aborted uploads.
You could implement your own scp server. This gives a good overview of how the protocol works. The remote scp simply opens a ssh connection and runs the host's scp command, which then has a simple protocol for file transfer. You'd have to replace your server's scp with your own implementation.
Since the protocol tells you how many bytes to expect, you would know exactly when you've received the complete file and can begin your processing.
Related
I have a python script that generates a heightmap depending on parameters, that will be given in HTML forms. How do I display the resulting image on a website? I suppose that the form submit button will hit an endpoint with the given parameters and the script that computes the heightmap runs then, but how do I get the resulting image and display it in the website? Also, the computation takes a few seconds, so I suppose I need some type of task queue to not make the server hang in the meanwhile. Tell me if I'm wrong.
It's a bit of a general question because I myself don't know the specifics of what I need to use to accomplish this. I'm using Flask in the backend but it's a framework-agnostic question.
Save the image to a file. Return a webpage that contains an <IMG SRC=...> element. The SRC should be a URL pointing at the file.
For example, suppose you save the image to a file called "temp2.png" in a subdirectory called "scratch" under your document root. Then the IMG element would be <IMG SRC="/scratch/temp2.png"> .
If you create and save the image in the same program that generates the webpage that refers to it, your server won't return the page until the image has been saved. If that only takes a few seconds, the server is unlikely to hang. Many applications would take that long to calculate a result, so the people who coded the server would make sure it can handle such delays. I've done this under Apache, Tomcat, and GoServe (an OS/2 server), and never had a problem.
This method does have the disadvantage that you'll need to arrange for each temporary file to be deleted after an expiry period such as 12 hours or whenever you think the user won't need it any more. On the webpage you return, if the image is something serious that the user might want to keep, you could warn them that this will happen. They can always download it.
To delete the old files, write a script that checks when they were last updated, compares that with the current date and time, and deletes those files that are older than your expiry period.
You'll need a way to automatically run it repeatedly. On Unix systems, if you have shell access, the "cron" command is one way to do this. Googling "cron job to delete files older than 1 hour on web server" finds a lot of discussion of methods.
Be very careful when coding any automatic-deletion script, and test it thoroughly to make sure it deletes the right files! If you make your expiry period a variable, you can set it to e.g. 1 minute or 5 minutes when testing, so that you don't need to wait for ages.
There are ways to stream your image back without saving it to a file, but what I'm recommending is (apart possibly from the file deleter) easy to code and debug. I've used it in many different projects.
On my Raspberry Pi, I have a Python script running that makes sure the room temperature is being measured (via a DHT22) and then log the temperature to a CVS file every half hour.
A new CVS file is created for everyday that the script is running. Therefore, the name of the files are temp_dd-mm-yy.cvs. These files are all saved in my loggings folder.
I now want to automatically pin add the cvs files to the IPFS network, because I don't want to write ipfs add <file.cvs> in the terminal every day.
In other words, is there a way to have a piece of code running that makes sure all files in the logging folder are added to IPFS network every 24 hours?
I have experimented with IPFS API, but I didn't manage to get anything useful out of that.
From python directly, there is two ways to do this. Either you call the ipfs binary using the subprocess call command, or you directly use the REST api using something like urllib.
For using the REST API, you have to add the data as a POST request, with the data being passed as form data.
Here is the equivalent curl request to add two "files":
$ curl -X POST -F 'file1=somedata' -F 'file2=somemoredata' http://localhost:5001/api/v0/add
{"Name":"QmaJLd3cTDQFULC4j61nye2EryYTbFAUPKVAzrkkq9wQ98",
"Hash":"QmaJLd3cTDQFULC4j61nye2EryYTbFAUPKVAzrkkq9wQ98","Size":"16"}
{"Name":"Qman7GbdDxgT3SzkzeMinvUkaiVduzKHJGE5P2WGPqV2uq",
"Hash":"Qman7GbdDxgT3SzkzeMinvUkaiVduzKHJGE5P2WGPqV2uq","Size":"20"}
With shell, you could just do a cron job that does e.g.
ipfs add -R /logging
every day. This will be reasonably efficient until your logging directory becomes really large even though it adds files again and again.
Of course, you will need to put the hashes somewhere or use IPNS so people can actually see this data.
A simple solution could be to use ipfs-sync to keep the directory in-sync on IPFS. It'd keep a directory pinned for you, as well as update an IPNS key for you if you'd like a consistent address for the data.
It can also be tuned to only update IPNS every 24 hours if you desire.
Need to add file generator REST API endpoint to web app. So far I've came up with following idea:
client sends file parameters to endpoint
server receives request and using AMQP sends parameters to dedicated service
dedicated service creates a file, puts it into server folder and sends responce that file created with file name
endpoint sends response to client with file
I'm not sure if's a good idea to keep REST request on a server for so long. But still don't want to use email with generated link or sockets.
Do I need to set timeout time in request so it will not be declined after a long wait time?
As far as I know maximum timeout is 120sec for rest api call. If it takes more time for the service to create a file then I need to use sockets, is that right?
The way I've handled similar is to do something like this:
Client sends request for file.
Server adds this to a queue with a 'requested' state, and responds (to the client) almost immediately with a reponse which includes a URL to retrieve the file.
Some background thread/worker/webJob/etc is running in a separate process from the actual web server and is constantly monitoring the queue - when it sees a new entry appear it updates the queue to a 'being generated' state & begins generating the file. When it finishes it updates the queue to a 'ready' state and moves on...
when the server receives a request to download the file (via the URL it gave the client), it can check the status of the file on the queue. If not ready, it can give a response indicating this. If it IS ready, it can simply respond with the file contents.
The Client can use the response to the initial request to re-query the url it was given after a suitable length of time, or repeatedly query it every couple of seconds or whatever is most suitable.
You need some way to store the queue that is accessible easily by both parts of the system - a database is the obvious one, but there are other things you could use...
This approach avoids either doing too much on a request thread or having the client 'hanging' on a request whilst the server is compiling the file.
That's what I've done (successfully) in these sorts of situations. It also makes it easy to add things like lifetimes to the queue, so a file can automatically 'expire' after a while too...
Users generate files on my node js server by pressing a button on a web page.
The server then creates a .zip file.
I want to expose this zip file so that it can be downloaded automatically on to the users' client.
Once downloaded, I want the server to detect that the download is finished and delete the zip file.
1- How do I expose the file in node js? Should the system put it in public folder? That means it will be security risk and anyone can read it.How can I link to a file & make it downloadable which is not in public folder?
2- How do I detect that the download is finished? Should I run a cron job to delete the files without worrying about the download progress?
A few remarks that should help you:
If you are creating temporary files, a good practice is to create signed URLs. Those are URLS that contain specific token that is valid for a limited amount of time. Implementation is trivial - generate the file .zip and token, set timestamp preferably in the DB and construct signed link with token. If the file is not downloaded by client in a given amount of time, it is invalid.
Zip file should have unique name, preferably some random one (if it's not a problem, you could still use header called Content-Disposition to decide on its name during download). You store it in the TEMP dir inside your project.
After user clicks previously generated signed link with token that relates to that file, you start download (streaming). After streaming is complete (refer to NodeJS streams lib), you just delete the file.
And on the client side:
You create a button that leads to endpoint on server (triggers AJAX call or whatever). After clicking, you run mentioned above logic on server.
In return, user client gets generated link (leading to ANOTHER endpoint that handles those signed links only) that has to be followed to download the file
Using any kind of DOM manipulation, you create hidden <a/> tag with href leading to this link and then you trigger automatic click of this link in the JS code. Preferably, if you support new browsers, it's a good idea to add download attribute to it.
DO NOT:
put the file in the public folder. Create endpoint that will stream its contents to the client. Create just some temp dir for it.
run the CRON job for deleting the files. Or run it only if something fails, to clean old files. File should be deleted after it's downloaded (which you will know, as when your stream is closed, you will get a proper event).
IMPLEMENTATION SUGGESTIONS
Create two endpoints on the server (using Express or whatever framework for routing). One for requesting the file (that starts generation process) and another one for downloading it.
After the generation process is finished, store the .zip inside e.g. temp catalog and create token for it.
Store set of data like this in the DB for EVERY download:
zip file name
token (e.g. generated random hash)
timestamp of the generation
Pass the new link to the client (for the second endpoint that is used for download process). Client should initialise the download automatically without human interaction, as suggested above
When link is "clicked" by the code, your server receives another request on second endpoint and then:
checks if the token is still valid (e.g. for 30 seconds).
if not: 403 or 404
if yes: start streaming the data (create file stream and stream it to the client)
when streaming back, include proper headers with response, e.g. file name that client should see (this will obscure your internal file names from temp catalog), using Content-Disposition
After streaming is complete, delete the file
Create CRON job that will run e.g. once a day, ask the DB for ALL the files that have invalid tokens (expired ones) and will try to delete them, if they exist (but this should not be a common scenario, if you delete them properly when the streaming is finished).
I have an ASP.NET page where a request is made and after a while server returns either new page or just file for download. I want to indicate on screen s that server is "Processing..." while it takes time before returning data.
To call javascript when user hits submit is easy. Also reload of page on Postback causes any "Processing..." indicators (some DIVs popping up at the top of page) to go away.
My problem is mostly cases when data returned by server is not a page but a file to store. How can I catch the moment that server started to return data, and run a javascript/remove "Processing" DIV ? Is it even a way to do so in case of reply of different mime type?
In which cases it is even possible?
There are a couple of ways to approximate what you're trying to do with timers and assumptions about what happened, but to really do what you're describing, you need to be polling the server for an indication that the download occurred.
What I would do is take the file, Response.WriteFile it, and then write a flag to some store, either a db, or the file system, or whatever, that uniquely identifies that the transaction has completed. On the client side, your script is polling the server, and on the server, the poll response is checking the store for the flag indicating that the download has occurred.
The key here is that you have to take finer control of the download process itself...merely redirecting to the file is not going to give you the control you need. If you need more specifics on how to accomplish any of these steps, let me know.