Can't download page in NodeJS

Can't download page in NodeJS - javascript

I want to download page (https://www.csfd.cz/tvurce/65871) in NodeJS, but I get just random data.
�}Ms�F������+i"��)�Jْ;�e���7�KM0��LƩ��]��Yg��b��
Ow7U��J�#�K�9��L
I thought it is just wrong encoding, but even size is wrong (downloaded page have 44K, whereas this file have only 19K. What's more surprising is that simple downloading it by python works good.
Python code:
import requests
url = "https://www.csfd.cz/tvurce/65871"
r = requests.get(url)
with open('pyth.txt','wb') as handle:
handle.write(r.content)
JavaScript code:
const request = require('request-promise')
const fs = require('fs')
request('https://www.csfd.cz/tvurce/65871').then((html) => {
fs.writeFileSync('output.html', html)
})
I tried also additional methods like request.get with parameters and so on, but still the same result. Can you please tell me what I am doing wrong?

Use compressed option in request module, see example with request module (https://github.com/request/request).
You need also followRedirect and followAllRedirect parameters to automatically follow 301 and 302 redirection cuz your request is returning 302 :
curl -X GET https://www.csfd.cz/tvurce/65871 --compressed -v -i
Response : 302
<h1>Redirect</h1>
<p><a href="https://www.csfd.cz/tvurce/65871-kit-harington/">Please
click here to continue</a>.</p>
In addition replace your writeFileSync with standard writeFile function
const request = require('request')
const fs = require('fs')
request.get({
url:'https://www.csfd.cz/tvurce/65871',
gzip: true,
followRedirect: true,
followAllRedirect: true
}, function(err, response, body){
if(err || !response || response.statusCode != 200)
{
// error case, do stg
}
else
{
fs.writeFile('output.html', body, "utf8", function(err){
if(err)
{
// error do stg
}
else
{
// success
}
});
}
})

I tried different things, different options and encodings, some parsers, and I didn't get it to work with request and request-promise. From the docs, I would say you aren't doing anything wrong.
I tried then a different module, unirest (npm install unirest --save), and it worked out of the box.
const unirest = require('unirest');
const fs = require('fs');
var Request = unirest.get('https://www.csfd.cz/tvurce/65871')
.end(function(res) {
console.log(res.body);
fs.writeFileSync('output.html', res.body)
});
Hope this is of help.

Read the Content-Encoding header. It's most likely compressed, which would explain the size difference.

Related

node fs read file from given URL or GCS

When run bellow code it's give error, Reading file from directory working perfect but when pass url it's give file not found error. I've check fs.statSync accept url.
const stat = fs.statSync('http://techslides.com/demos/sample-videos/small.mp4');
Error: ENOENT: no such file or directory, stat 'http://techslides.com/demos/sample-videos/small.mp4'

fs.statSync() can take a URL, but ONLY if that URL is a file:// URL.
It is not clear what you would want to do if the argument was actually an http:// URL. You could check to see if it was not a file URL and then attempt to fetch the contents of the URL to see if it exists using a library such as got().
But, fetching data from another server with http will not be synchronous so you will have to change the design of your function to return a promise instead of a synchronous API.

That's because its hosted on a web-server, you need to send a HTTP GET to fetch it locally.
Install the axios package and issue a HTTP GET request to fetch the remote resource from the web-server.
npm install --save axios
Here's a program of the general idea
const fs = require('fs');
const axios = require('axios');
const { promisify } = require('util');
const writeFilePromise = promisify(fs.writeFile);
(async () => {
const url = 'http://techslides.com/demos/sample-videos/small.mp4';
const response = await axios.get(url);
if (response.data) {
await writeFilePromise('small.mp4', response.data);
}
})();

Node gRPC: sending metadata from server to client without error

From the client side, it is easy to add metadata for the server:
const meta = new grpc.Metadata();
meta.add('xyz', 'okay');
stub.service.Rpc(request, meta, (err, response) => {
});
The above can be accessed on the server like this:
call.metadata.get('xyz');
Now, if we need to send metadata from the server to the client, we do this:
const err = { code, details };
const meta = new grpc.Metadata();
meta.add('...', '...');
callback(err, null, meta);
Note that we are passing error, and the actual response is null.
How do I pass a null error and a non-null response, along with metadata?
If I do the following, it does not seem to work as there is no way to access the metadata on the client without the error.
callback(null, r, meta);
// `r` is some response message
Does gRPC spec explicitly disallow sending metadata from server to client when there is no error?
Also, while we're at it, I'd like someone explain how do we send trailing vs initial metadata from server to client in Node.
Relevant links:
https://github.com/grpc/grpc-node
Can I send a custom Error message from server to client GRPC?
How to add metadata to nodejs grpc call
https://github.com/grpc/grpc/issues/9053
https://medium.com/compli-engineering/grpc-nodejs-using-jwt-authentication-b048fef6ecb2

ServerUnaryCall.sendMetadata(responseMetadata)
server:
const method = (call, cb) => {
// code
call.sendMetadata(metadata)
// code
}
client:
const call = client.method(params, cb)
call.on('metadata', (metadata) => {
// code
})

Looks like you can use such code:
client.someFunction().on('metadata', (meta) => { /* any code */ })
At least on v0.9.x you can see: https://github.com/grpc/grpc-node/blob/v1.9.x/packages/grpc-native-core/src/client.js#L562

Manage API calls with JavaScript/Python/Bash

I have to choose one of these languages:
Python (with Selenium or any suggestion)
Javascript (with node with any module)
Bash (with curl for example)
To do the following:
Make a request to an API (Scrapy cloud) and get some value, in my case I just need the id of the response:
{"count": 1, "id": "195457/7/19", "width": 32, "height": 27, "started_time": "2017-06-22T08:20:26", "total": 1, "status": "ok"}
And then make another request with the id to download that provides a download to a file with a CSV/JSON format.
What I tried:
Python:
With Selenium (Firefox driver) open and get the id, it works fine but when I try to download the file with the next API request it asks me for what I want to do with the file (download or open with...). So, as I have to interact with the dialog it is not viable.
Javascript:
I found a module to download files but it is just to download files as images from image web URLs and not for download a file (like the Linux wget command).
Bash:
With curl it works but I can just get the whole response and then I cannot get the id value so I cant continue with what I want. Also I tried to download de file of the second step and this works fine with a simple curl -o myfile.csv URL
Any help would be appreciated. Thanks for reading!

Here is a node version. Its quite broad but the 2 main functions are the callApi and downloadFile.
I dont know the structure of your API url so for me now I have mocked some simple ones - change to what you need.
You will need to npm install request and update the variables to match your API.
index.js
const request = require('request');
const http = require('http');
//const https = require('https'); maybe required
const fs = require('fs');
const apiEndPoint = 'http://scrapycloud?someparam=';
const fileName = 'data.csv';
const assetEndPoint = 'http://assetUrl?id=';
// This will call your api and get the asset id then calls the downloadFile function.
function callApi(assetId, callback) {
request(apiEndPoint + assetId, function (error, response, body) {
if (error) {
return callback(error);
}
const info = JSON.parse(body);
const assetId = info.id;
downloadFile(assetId, callback);
});
}
// This function creates a writeSteam to save a file to your local machine, performs a http request to the assets and pipes it back into the write stream
function downloadFile(assetId, callback) {
var file = fs.createWriteStream(fileName);
//use the following line if your requests needs to be https
//var request = https.get(assetEndPoint + assetId, function (response) {
var request = http.get(assetEndPoint + assetId, function (response) {
response.pipe(file);
file.on('finish', function () {
file.close(callback);
});
}).on('error', function (err) {
fs.unlink(dest);
if (callback) callback(err.message);
});
}
// Called when everything is finished or an error
function complete(err) {
if (err) {
return console.log(err);
}
console.log('file downloaded');
}
// Starts the process, pass it an id and a callback
callApi('123131', complete);

Piping zip file from SailsJS backend to React Redux Frontend

I have a SailsJS Backend where i generate a zip File, which was requested by my Frontend, a React App with Redux. I'm using Sagas for the Async Calls and fetch for the request. In the backend, it tried stuff like:
//zipFilename is the absolute path
res.attachment(zipFilename).send();
or
res.sendfile(zipFilename).send();
or
res.download(zipFilename)send();
or pipe the stream with:
const filestream = fs.createReadStream(zipFilename);
filestream.pipe(res);
on my Frontend i try to parse it with:
parseJSON(response) => {
return response.clone().json().catch(() => response.text());
}
everything i tried ends up with an empty zip file. Any suggestions?

There are various issues with the options that you tried out:
res.attachment will just set the Content-Type and Content-Disposition headers, but it will not actually send anything.
You can use this to set the headers properly, but you need to pipe the ZIP file into the response as well.
res.sendfile: You should not call .send() after this. From the official docs' examples:
app.get('/file/:name', function (req, res, next) {
var options = { ... };
res.sendFile(req.params.name, options, function (err) {
if (err) {
next(err);
} else {
console.log('Sent:', fileName);
}
});
});
If the ZIP is properly built, this should work fine and set the proper Content-Type header as long as the file has the proper extension.
res.download: Same thing, you should not call .send() after this. From the official docs' examples:
res.download('/report-12345.pdf', 'report.pdf', function(err) { ... });
res.download will use res.sendfile to send the file as an attachment, thus setting both Content-Type and Content-Disposition headers.
However, you mention that the ZIP file is being sent but it is empty, so you should probably check if you are creating the ZIP file properly. As long as they are built properly and the extension is .zip, res.download should work fine.
If you are building them on the fly, check this out:
This middleware will create a ZIP file with multiples files on the fly and send it as an attachment. It uses lazystream and archiver
const lazystream = require('lazystream');
const archiver = require('archiver');
function middleware(req, res) {
// Set the response's headers:
// You can also use res.attachment(...) here.
res.writeHead(200, {
'Content-Type': 'application/zip',
'Content-Disposition': 'attachment; filename=DOWNLOAD_NAME.zip',
});
// Files to add in the ZIP:
const filesToZip = [
'assets/file1',
'assets/file2',
];
// Create a new ZIP file:
const zip = archiver('zip');
// Set up some callbacks:
zip.on('error', errorHandler);
zip.on('finish', function() {
res.end(); // Send the response once ZIP is finished.
});
// Pipe the ZIP output to res:
zip.pipe(res);
// Add files to ZIP:
filesToZip.map((filename) => {
zip.append(new lazystream.Readable(() => fs
.createReadStream(filename), {
name: filename,
});
});
// Finalize the ZIP. Compression will start and output will
// be piped to res. Once ZIP is finished, res.end() will be
// called.
zip.finalize();
}
You can build around this to cache the built ZIPs instead of building them on the fly every time, which is time and resource consuming and totally unadvisable for most uses cases.

Importing a json file from a url using node js (express)

I'm a node.js beginner. I'm trying to request a json file from a url (i.e 'http://www.example.com/sample_data.json').
My goal is to download/request the file only once when the server loads and then save it on the client side so I can manipulate/change it locally.
I tried
var file = request('http//exmaple.com/sample_data.json')
but it returns an import module error.
If anyone could give me a start that would be great!
thanks

To do that i would use the request module.
var request = require('request');
request('http//exmaple.com/sample_data.json', function (error, response, body) {
if (!error && response.statusCode == 200) {
var importedJSON = JSON.parse(body);
console.log(importedJSON);
}
})
For more information about the module check this link: https://github.com/request/request

Just some basics about node, and some first things to try:
1) request is a good choice to use for getting the file, but did you do an npm install? "npm install request --save"
2) in order to use the module, you have to "require" it at the top of your code, like: var request = require('request');
I'd start by checking those things first.

Develop Reference

JavaScript is the programming language of the Web.

Can't download page in NodeJS - javascript

Read the Content-Encoding header. It's most likely compressed, which would explain the size difference.

Related

node fs read file from given URL or GCS

Node gRPC: sending metadata from server to client without error

Manage API calls with JavaScript/Python/Bash

Piping zip file from SailsJS backend to React Redux Frontend

Importing a json file from a url using node js (express)

Categories

Resources