i have an strange issue. I´m using request for file download in NodeJs and everytime i download larger files(>250mb) they get downloaded into memory and are not directly streamed to the Filesystem. Maybe i´m doing something wrong but i made a testcase and the file is still not getting streamed.
var request = require('request');
var fs = require('fs');
var writable = fs.createWriteStream("1GB.zip");
var stream = request.get({
uri: "http://ipv4.download.thinkbroadband.com/1GB.zip",
encoding: null
}, function(error, response, body) {
console.log("code:", response.statusCode);
if (response.statusCode >= 500) {
log.err(response.statusCode, " Servererror", file.url);
}
}).pipe(writable);
in this testcase i`m downloading a sample 1GB file and if you watch the node proccess with the taskmanager it grows to >1GB as it downloads the file.
I want that my Node application uses not more than 200mb of Ram
The issue is that you're passing a callback, which implicitly enables buffering inside request because one of the parameters for the callback is the entire body of the response.
If you want to know when the response is available, just listen for the response event instead:
var request = require('request');
var fs = require('fs');
var writable = fs.createWriteStream("1GB.zip");
var stream = request.get({
uri: "http://ipv4.download.thinkbroadband.com/1GB.zip",
encoding: null
}).on('response', function(response) {
console.log("code:", response.statusCode);
if (response.statusCode >= 500) {
log.err(response.statusCode, " Servererror", file.url);
}
}).pipe(writable);
Related
I need to extract text data from web url (http://www.africau.edu/images/default/sample.pdf)
I used two node_module.
1) crawler-Request
it('Read Pdf Data using crawler',function(){
const crawler = require('crawler-request');
function response_text_size(response){
response["size"] = response.text.length;
return response;
}
crawler("http://www.africau.edu/images/default/sample.pdf",response_text_size).then(function(response){
// handle response
console.log("Reponse =" + response.size);
});
});
What happen for this it will not print anything on console.
2) pfd2json/pdfparser
it('Read Data from url',function(){
var request = require('request');
var pdf = require('pfd2json/pdfparser');
var fs = require('fs');
var pdfUrl = "http://www.africau.edu/images/default/sample.pdf";
let databuffer = fs.readFileSync(pdfUrl);
pdf(databuffer).then(function(data){
var arr:Array<String> = data.text;
var n = arr.includes('Thursday 02 May');
console.log("Print Array " + n);
});
});
Failed: ENOENT: no such file or directory, open 'http://www.africau.edu/images/default/sample.pdf'
I am able to access data from local path but not able to extract it from url.
The issue here is that you are using the fs module (File System) to read a file on a distant server.
You also mistyped the pdf2json module, which should give you an error ?
You did require the request module. This module will make it possible to access that distant file. Here's one way to do this :
it('Read Data from url', function () {
var request = require('request');
var PDFParser = require('pdf2json');
var pdfUrl = 'http://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf';
var pdfParser = new PDFParser(this, 1);
// executed if the parser fails for any reason
pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError));
// executed when the parser finished
pdfParser.on("pdfParser_dataReady", pdfData => console.log(pdfParser.getRawTextContent()));
// request to get the pdf's file content then call the pdf parser on the retrieved buffer
request({ url: pdfUrl, encoding: null }, (error, response, body) => pdfParser.parseBuffer(body));
});
This will make it possible to load the distant .pdf file in your program.
I'd recommend looking at the pdf2json documentation if you want to do more. This will simply output the textual content of the .pdf file when the parser has completed reading data.
I have the URL to a blob which I'm trying to upload to azure storage, there doesn't seem to be an obvious way of doing this as none of the APIs handle uploading a blob url directly.
I'm trying to do something like this:
blobService.createBlockBlobFromLocalFile('taskcontainer', 'myfile.png', blobUrl, (error, result, response) => {
});
Which doesn't work, I've tried to find ways to read the blob url to a readable stream and upload that but haven't gotten very far either.
I basically have a file selected by the user using react-dropzone which provides me with a blob url (which can look like this: blob:http://localhost:3000/cd8ba70e-5877-4112-8131-91c594be8f1e) pointing to the local file. My goal is to now upload that blob url to an azure container.
Firebase storage has a 'put' function which allows you to upload the blob from a url: https://firebase.google.com/docs/storage/web/upload-files
This is the closest I have gotten:
var blobUrl = acceptedFiles[0].preview;
var xhr = new XMLHttpRequest();
xhr.open("GET", blobUrl);
xhr.responseType = "text";//force the HTTP response, response-type header to be blob
xhr.onload = function () {
const Stream = require('stream')
const readable = new Stream.Readable()
readable.push(xhr.responseText);
readable.push(null);
blobService.createBlockBlobFromStream('taskcontainer', 'myblob.png', readable, xhr.responseText.length, (error, result, response)=>{
var ok = 0;
})
}
xhr.send();
The file (or parts of it?) seem to get uploaded but the end result is the file type is lost and I can't view the png uploaded..
You could try the following
var azure = require('azure-storage');
var blobService = azure.createBlobService('', '');
blobService.createBlockBlobFromLocalFile('nodecontainer', 'AzureDC', 'azure_center.png', function(error, result, response) {
publicAccessLevel: 'blob'
}, function(error, result, response) {
if (!error) {
console.log(response);
} else {
console.log(error);
}
});
EDIT
Check this code snippet to upload blob to azure storage
I am trying to retrieve the accurate filesize information for an image URL using node.js (specifically the http module). Everytime I run the below code (with any image url) I get '4061' bytes as the response. The below example should return about 3000 bytes.
I am open to corrections to my existing method of calculation or an alternative method to handle this in node. Thanks.
var http = require('http');
var options = {host: 'www.subway.com', path: '/menu/Images/Menu/Categories_Main/menu-category-featured-products.jpg'};
var req = http.get(options, function(res) {
var obj = res.headers;
var filesize = obj['content-length'];
console.log(filesize + " bytes");
}
);
req.end();
I want to download a file with the Request library. It's pretty straightforward:
request({
url: url-to-file
}).pipe(fs.createWriteStream(file));
Since the URL is supplied by users (in my case) I would like to limit the maximum file size my application will download - let's say 10MB. I could rely on content-length headers like so:
request({
url: url-to-file
}, function (err, res, body) {
var size = parseInt(res.headers['content-length'], 10);
if (size > 10485760) {
// ooops - file size too large
}
}).pipe(fs.createWriteStream(file));
The question is - how reliable is this? I guess this callback will be called after the file has been downloaded, right? But than it's too late if someone supplies the URL of file which is 1 GB. My application will first download this 1 GB of a file just to check (in the callback) that this is too big.
I was also thinking about good old Node's http.get() method. In this case I would do this:
var opts = {
host: host,
port: port,
path: path
};
var file = fs.createWriteStream(fileName),
fileLength = 0;
http.get(opts, function (res) {
res.on('data', function (chunk) {
fileLength += chunk.length;
if (fileLength > 10485760) { // ooops - file size too large
file.end();
return res.end();
}
file.write(chunk);
}).on('end', function () {
file.end();
});
});
What approach would you recommend to limit download max file size without actually downloading the whole thing and checking it's size after all?
I would actually use both methods you've discussed: check the content-legnth header, and watch the data stream to make sure it doesn't exceed your limit.
To do this I'd first make a HEAD request to the URL to see if the content-length header is available. If it's larger than your limit, you can stop right there. If it doesn't exist or it's smaller than your limit, make the actual GET request. Since a HEAD request will only return the headers and no actual content, this will help weed out large files with valid content-lengths quickly.
Next, make the actual GET request and watch your incoming data size to make sure that it doesn't exceed your limit (this can be done with the request module; see below). You'll want to do this regardless of if the HEAD request found a content-length header, as a sanity check (the server could be lying about the content-length).
Something like this:
var maxSize = 10485760;
request({
url: url,
method: "HEAD"
}, function(err, headRes) {
var size = headRes.headers['content-length'];
if (size > maxSize) {
console.log('Resource size exceeds limit (' + size + ')');
} else {
var file = fs.createWriteStream(filename),
size = 0;
var res = request({ url: url });
res.on('data', function(data) {
size += data.length;
if (size > maxSize) {
console.log('Resource stream exceeded limit (' + size + ')');
res.abort(); // Abort the response (close and cleanup the stream)
fs.unlink(filename); // Delete the file we were downloading the data to
}
}).pipe(file);
}
});
The trick to watching the incoming data size using the request module is to bind to the data event on the response (like you were thinking about doing using the http module) before you start piping it to your file stream. If the data size exceeds your maximum file size, call the response's abort() method.
I had a similar issue. I use now fetch to limit download size.
const response = await fetch(url, {
method: 'GET',t
size: 5000000, // maximum response body size in bytes, 5000000 = 5MB
}).catch(e => { throw e })
Right now I'm using this script in PHP. I pass it the image and size (large/medium/small) and if it's on my server it returns the link, otherwise it copies it from a remote server then returns the local link.
function getImage ($img, $size) {
if (#filesize("./images/".$size."/".$img.".jpg")) {
return './images/'.$size.'/'.$img.'.jpg';
} else {
copy('http://www.othersite.com/images/'.$size.'/'.$img.'.jpg', './images/'.$size.'/'.$img.'.jpg');
return './images/'.$size.'/'.$img.'.jpg';
}
}
It works fine, but I'm trying to do the same thing in Node.js and I can't seem to figure it out. The filesystem seems to be unable to interact with any remote servers so I'm wondering if I'm just messing something up, or if it can't be done natively and a module will be required.
Anyone know of a way in Node.js?
You should check out http.Client and http.ClientResponse. Using those you can make a request to the remote server and write out the response to a local file using fs.WriteStream.
Something like this:
var http = require('http');
var fs = require('fs');
var google = http.createClient(80, 'www.google.com');
var request = google.request('GET', '/',
{'host': 'www.google.com'});
request.end();
out = fs.createWriteStream('out');
request.on('response', function (response) {
response.setEncoding('utf8');
response.on('data', function (chunk) {
out.write(chunk);
});
});
I haven't tested that, and I'm not sure it'll work out of the box. But I hope it'll guide you to what you need.
To give a more updated version (as the most recent answer is 4 years old, and http.createClient is now deprecated), here is a solution using the request method:
var fs = require('fs');
var request = require('request');
function getImage (img, size, filesize) {
var imgPath = size + '/' + img + '.jpg';
if (filesize) {
return './images/' + imgPath;
} else {
request('http://www.othersite.com/images/' + imgPath).pipe(fs.createWriteStream('./images/' + imgPath))
return './images/' + imgPath;
}
}
If you can't use remote user's password for some reasons and need to use the identity key (RSA) for authentication, then programmatically executing the scp with child_process is good to go
const { exec } = require('child_process');
exec(`scp -i /path/to/key username#example.com:/remote/path/to/file /local/path`,
(error, stdout, stderr) => {
if (error) {
console.log(`There was an error ${error}`);
}
console.log(`The stdout is ${stdout}`);
console.log(`The stderr is ${stderr}`);
});