So I'm trying to scrape mediamarkt.es with this code:
PORT = 8000;
const express = require("express");
const axios = require("axios").default;
const cors = require("cors")({ origin: true });
const app = express();
const ax = axios.create({
headers: {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-US,en;q=0.9",
},
});
const tc = () => {
ax.get("https://www.mediamarkt.es")
.then((response) => {
console.log(response.status);
})
.catch((err) => {
console.log(err.response.status);
});
};
app.listen(PORT, () => {
tc();
});
Every time I get 403 error. Tried on chrome with disabled javascript and cleaned cache, so it works and returns 200, but in nodejs code I always get 403. With 403 there is html which says captcha, but I believe I can make same request with nodejs as chrome do. Just can't imagine what I'm missing...
Any help would be greatly appreciated.
Related
I'm trying to get the specific token value from this response in the outside of the loop so then i can show this token value in the HTML page.
import puppeteer from 'puppeteer-extra'
var http = import('http');
puppeteer.launch({ headless: true }).then(async (browser) => {
const page = await browser.newPage();
const token=[];
await page.goto('https://orbitxch.com/customer/inplay/highlights/1');
page.on('request', req => {
const tok = req.headers()
console.log(tok)
});
});
Response:
'x-csrf-token': 'c7474114cc5e6b5e5deda9f46a9978dd69',
'sec-ch-ua-mobile': '?0',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/100.0.4889.0 Safari/537.36
Wants to get the x-csrf-token value in a variable. right this command returns the entire website HTTP header so there are lots of x-csrf-token but all are the same I just want one token in token varaible. any nodejs expert?
i tried this:
const token=[];
await page.goto('https://orbitxch.com/customer/inplay/highlights/1');
page.on('request', req => {
const tok = req.headers()
token.push(tok) // but this not working it return []
console.log(tok)
});
I try to set user agent to a npm request. Here is the documentation, but it gives the following error:
Error: Invalid URI "/"
const request = require('async-request')
const run = async (url) => {
const {statusCode} = await request(url)
console.log(statusCode) // 200, works
const options = {
url,
headers: {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36'
}
}
await request(options) // Error: Error: Invalid URI "/"
}
run('https://github.com/')
sandbox
I also tried request.get, as it mentioned in here, but it gives "request.get is not a function" error.
The problem is you are looking at the documentation of request, but using the async-request, which dont support calling with the object argument like you do.
const request = require('async-request')
const run = async (url) => {
const {statusCode} = await request(url)
console.log(statusCode) // 200, works
const options = {
headers: {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36'
}
}
const res = await request(url, options);
console.log(res.statusCode) // 200, works
}
run('https://github.com/')
Im having a strange issue.
My logs are showing that occasionally, a post endpoint returns http 400, because the header object is empty. It seems like its the same user that has the problem.
I am not able to reproduce the issue locally, and the endpoint works for most of the time for most users.
export async function postJson<T>(
url: string,
jsonPayload: string,
caller: string
) {
const headers = new Headers();
headers.append("Content-Type", "application/json");
const payload: RequestInit = {
credentials: "same-origin",
method: "post",
headers: headers,
body: jsonPayload
};
const response = await fetch(url, payload);
if (response.ok) {
//Testing for empty response
const data = await response.text();
if (data) {
return <T>JSON.parse(data);
} else {
return null;
}
} else {
throw new Error(
`Url=${url} returned non-ok status: ${response.status}. Status text:${
response.statusText
}; caller=${caller}, payload=${JSON.stringify(payload)}`
);
}
}
The log is showing the payload object as
payload={"credentials":"same-origin","method":"post","headers":{},"body":"...some json..."}
The useragent that gets the 400 is
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
Any ideas?
I'm trying to login to Amazon via the node.js request module, and seem to be having difficulties.
My aim is to login to the site via their form, here is my code:
const request = require("request");
const rp = require("request-promise");
var querystring = require("querystring");
var cookieJar = request.jar();
var mainUrl = "https://www.amazon.com/";
var loginUrl = "https://www.amazon.co.uk/ap/signin";
let req = request.defaults({
headers: {
"User-Agent":
"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.61 Safari/537.36"
},
jar: cookieJar,
gzip: true,
followAllRedirects: true
});
var loginData =
"email=email#me.com&create=0&password=password123";
req.post(loginUrl, { data: loginData }, function(err, res, body) {
console.log(body);
});
I ran a debugger in the background, and found this seemed to be the URL called. I'm wondering if anyone knows what I may have done incorrectly.
Thank you.
I would like to make request to https://zomato.com/ but there is no response, I am able to connect anywhere else but not to zomato I get timeout error every time. I was trying to set user-agent but it didn't work. I use node 6.6.0 and request 2.79.0. Any ideas?
var request = require('request');
var cheerio = require('cheerio');
var fs = require('fs');
var http = require('http');
request.get({
url: 'http://zomato.com/',
headers: {
'user-ggent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'
}
}, function(error, response, body) {
if(error) {
console.log("Error: " + error);
return;
}
else {
console.log("Status code: " + response.statusCode);
}
});
Update:
I've noticed that this:
curl -X GET "https://zomato.com/"
returns 301 redirect
I had some problems trying to do something similar with some websites. Try NigthmareJS instead of request
I didn't tested for zomato but here there is the code that I used for another website:
var website = new Nightmare()
.useragent("Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36")
.goto('http://zomatoorwhateverwebsite.com/')
.evaluate(function(){
return document.documentElement.innerHTML;
})
.end()
.then(function(html) {
var $ = cheerio.load(html);
//Do what you need here
})
I hope this helps. Sometimes you need to add some wait() check the documentation for extra functions
if you look at the output of curl zomato.com -v you can see that we are being redirected :
HTTP/1.1 301 Moved Permanently
HTTP/1.1 301 Moved Permanently
So we need to add :
followAllRedirects: true,
Here :
request.get({
url: 'http://zamato.com/',
followAllRedirects: true,
headers: {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'
}