Basically, I want to download a large amount of images from an image service. I have a very large JSON object with all of the URLs (~500 or so) in that JSON object. I tried a few npm image downlader packages as well as some other code that did each image downloading all at the same time; however, about 50% of the downloaded images had data loss while downloaded (a large portion of the image was transparent when viewed). How can I download each image, one after another (waiting until the last one is complete before starting the next) to avoid the data loss?
Edit: here is the relevant code, using request:
var download = function(url, dest, callback){
request.get(url)
.on('error', function(err) {console.log(err)} )
.pipe(fs.createWriteStream(dest))
.on('close', callback);
};
links.forEach( function(str) {
var filename = str[0].split('/').pop() + '.jpeg';
console.log(filename);
console.log('Downloading ' + filename);
download(str[0], filename, function(){console.log('Finished Downloading ' + filename)});
});
My links JSON looks like this:
[["link.one.com/image-jpeg"], ["link.two.com/image-jpeg"]]
Okay, so first thing first :
I really do not believe that downloading those 500+ images will all start at once. The V8 engine (kind of the nodejs code executor) actually manages a reasonable number of threads and reuse them to do the stuff. So, it wont create "lots of" new threads, but will wait for other threads to get done.
Now, even if it was all starting at once, I don't think the files would get damaged. I the files were getting corrupt, you wouldn't have been able to open those files.
So, I am pretty sure the problem with the images is not what you think.
Now, for the original question, and to test if I am wrong, you can try to download those files in a sequence like this :
var recursiveDowload = function (urlArray, nameArray, i) {
if (i < urlArray.length) {
request.get(urlArray[i])
.on('error', function(err) {console.log(err)} )
.pipe(fs.createWriteStream(nameArray[i]))
.on('close', function () { recursiveDownload (urlArray, nameArrya, i+1); });
}
}
recursiveDownload(allUrlArrya, allNameArray, 0);
Since you are doing large number of downloads, try Aria2c. Use Aria2 documentations for further details.
Related
I'm making a website, in which I want to offer the user to download the whole website (CSS and images included) for them to modify. I know I can download individual resources with
Click Me
but like I said, this only downloads one file, whereas I would like to download the entire website.
If it helps you visualise what I mean: in chrome, IE and Firefox you can press ctrl+s to download the entire website (make sure you save it as Web page, Complete.
Edit: I know I can create a .zip file that it will download, however doing so requires me to update it every time I make a change, which is something I'd rather not do, as I could potentially be making a lot of changes.
As I mention, it is better that you will have a cron job or something like this that once in a while will create you a zip file of all the desired static content.
If you insist doing it in javascript at the client side have a look at JSZip .
You still have to find a way to get the list of static files of the server to save.
For instance, you can create a txt file with each line is a link to a webpage static file.
you will have to iterate over this file and use $.get to get it's content.
something like this:
// Get list of files to save (either by GET request or hardcoded)
filesList = ["f1.json /echo/jsonp?name=1", "inner/f2.json /echo/jsonp?name=2"];
function createZip() {
zip = new JSZip();
// make bunch of requests to get files content
var requests = [];
// for scoping the fileName
_then = (fname) => data => ({ fileName: fname, data });
for (var file of filesList) {
[fileName, fileUrl] = file.split(" ");
requests.push($.get(fileUrl).then(_then(fileName)));
}
// When all finished
$.when(...requests).then(function () {
// Add each result to the zip
for (var arg of arguments) {
zip.file(arg.fileName, JSON.stringify(arg.data));
}
// Save
zip.generateAsync({ type: "blob" })
.then(function (blob) {
saveAs(blob, "site.zip");
});
});
}
$("#saver").click(() => {
createZip();
});
JSFiddle
Personally, I don't like this approach. But do as you prefer.
I am making a node.js app that will be primarily used to download images from the web. I have created this function that successfully downloads images. I want to make the function also show a live preview of the image as it is downloading without impacting download speed. Is it possible to "tap the pipe" and draw the image as it downloads to a html canvas or img? I am using electron so I am looking for a chromium/node.js based solution.
Edit:
I've also found out you can chain pipes (r.pipe(file).pipe(canvas);) but I'm not sure if that would download the file first and then show up on the canvas or if it would update them both as the file downloads.
I've also thought of creating two separate pipes from the request (var r = request(url); r.pipe(file); r.pipe(canvas);), but I'm not sure if that would try to download the image twice.
I'm also not particularly familiar with the html canvas and haven't been able to test these ideas because I don't know how to pipe a image to a canvas or an img element for display in the application.
const fs = require('fs-extra');
downloadFile(url, filename) {
return new Promise(resolve => {
var path = filename.substring(0, filename.lastIndexOf('\\'));
if (!fs.existsSync(path)) fs.ensureDirSync(path);
var file = fs.createWriteStream(filename);
var r = request(url).pipe(file);
// How would also pipe this to a canvas or img element?
r.on('error', function(err) { console.log(err); throw new Error('Error downloading file') });
r.on('finish', function() { file.close(); resolve('done'); });
});
}
I ended up asking another similar question that provides the answer to this one. It turns out the solution is to pipe the image into memory, encode it with base64 and display it using a data:image url.
I've the following code that creates a screenshot for the video I've uploaded;
var thumbFileName = 'tmp_file.jpg';
ffmpegCommand = ffmpeg(videoFile)
.on('end', function() {
callback(null, tempUploadDir + thumbFileName)
})
.on('error', function(err) {
callback(err);
})
.screenshots({
timestamps: ['50%'],
filename: thumbFileName,
folder: tempUploadDir
});
the code works pretty well and the screenshot is created. The callback read the file stream and store it into the database and eventually try to delete the thumbFileName from the filesystem.
And here is the issue I'm encountering, basically I'm not able to delete the file, even if I try it manually its say that the file is locked by another process (NodeJS) and I can't download it until I stop the application.
In the callback I've also tried to kill the command with ffmpegCommand.kill() before to delete the screenshot but I'm still having the same issue. The file will be removed using fs.unlink and its working when the thumbnail is generated for an image (even post-processed with effects, achieved with sharp) but not with ffmpeg. Apparently ffmpeg is still running and that's why I can't delete the thumb.
I'm kinda new to programming in general. My problem is that I want to download a file and after that do something.
Danbooru.search(tag, function (err, data) { //search for a random image with the given tag
data.random() //selects a random image with the given tag
.getLarge() //get's a link to the image
.pipe(require('fs').createWriteStream('random.jpg')); //downloads the image
});
now I want to do a console.log after the file has been downloaded. I don't want to work with setTimeout since the files will take a diffrent time to download.
Thanks for the help.
See if this works for you. Just saving the request to a variable and checking for the finish event on it.
Danbooru.search(tag, function (err, data) {
var stream = data.random()
.getLarge()
.pipe(require('fs').createWriteStream('random.jpg'));
stream.on('finish', function() { console.log('file downloaded'); });
});
Background -
I'm trying to use node.js and the fs module to accomplish an end goal of monitoring a file, and detecting the lines that have been appended to it.
Current Implementation -
I'm currently using fs.watch to monitor the changes to the file persistently, and fs.readFile to read the file once the watch has been triggered.
Drawbacks -
The downside of this implementation is that it is computationally expensive and slow to derive the appended lines in this manner, especially since it requires reading in the entire file contents despite my interest in only the appended lines.
Ideal Solution -
I would like to instead use fs.createReadStream to somehow read the file up until the end, leave the file descriptor at the end, and start reading again once the file has been appended to.
I've found two ways to read the contents of a stream buffer, but in both implementations, which are readable.read() and readable.on('data',...), it seems the stream is ended once there is no more data to read, although the stream is not closed. I'm not exactly sure how to continue using a ended stream, as readable.resume() does not seem to do anything.
My Question -
How do I read appended lines from a file in a way that is triggered once the file is modified? Is my ideal solution down the right track?
Thank you for your time and help!
This is a problem I once had, and it was quite a headache. This is the implementation that I came up with.
var fs = require('fs');
var file = __dirname + '/file.log';
fs.stat(file, function(err, stats) {
var start = stats.size;
// read the entire file here if you need it
watchFile(start);
});
function watchFile(start) {
fs.watch(file, function(event, filename) {
fs.stat(file, function(err, stats) {
var stream = fs.createReadStream(file, {
start: start,
end: stats.size
});
var lines = new String();
stream.on('data', function(data) {
lines += data;
});
stream.on('end', function() {
// you have the new lines
});
start = stats.size + 1;
});
});
};
First I find the size of the file, and pass it to a watch function. Every time the file changes, I find out the new size of the file and read from the old position to the new position. On some systems the watch function might fire twice per change, so you might want to add checks to get rid of useless reads such as when the start and end are the same byte.