Is it possible to change the User-Agent header in a browser?

Is it possible to change the User-Agent header in a browser? - javascript

So I have this code:
async function gjReq(endpoint, data) {
let headers = new Headers({
"Content-Type": "application/x-www-form-urlencoded",
"User-Agent": " ",
"Accept-Encoding": "*",
"Accept": "*/*",
"Access-Control-Allow-Origin": "*"
})
let r = await fetch(`http://www.boomlings.com/database/${endpoint}.php`, {
method: 'POST',
mode: "no-cors",
headers: headers,
body: new URLSearchParams(data)
})
let res = await r.text();
}
gjReq("downloadGJLevel22", {
secret: "Wmfd2893gb7",
levelID: 81304994
}).then(console.log)
When I try to run this script in VS Code, it works perfectly. However, when I try running it in a browser, the server returns 403 error.
I checked this request through Wireshark and found out that the User-Agent value is Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36, even though I've already specified a User-Agent.
The thing is, when the User-Agent is an empty string or just a space, everything works. When User-Agent is different from empty string, the server returns 403 Forbidden.
Is it possible to change the User-Agent without having to change the browser settings?

Related

how to webscrape the "key statistics" from morningstar.com to python and pandas

so the link of the data i want to webscrape is this one
https://www.morningstar.com/stocks/xidx/smsm/valuation
and the data i want to scrape is this one down bellow
THE IMAGE
Pleaseee help mee :(
i would like to have the table in my jupyter notebook so i can use pandas and python to do my stock and investing analysis

The page loads the data through an API endpoint that requires the API key if accessed directly. But you can simulate the request sent by the browser to get the same data in python.
Use inspector to find out the exact request, copy the CURL request and convert it to python using online tool
Here is the converted request. You can get the results in json with response.json
import requests
headers = {
'authority': 'api-global.morningstar.com',
'accept': '*/*',
'accept-language': 'en-US,en;q=0.9',
'apikey': 'lstzFDEOhfFNMLikKa0am9mgEKLBl49T',
'origin': 'https://www.morningstar.com',
'referer': 'https://www.morningstar.com/stocks/xidx/smsm/valuation',
'sec-ch-ua': '"Not?A_Brand";v="8", "Chromium";v="108", "Google Chrome";v="108"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'x-api-realtime-e': 'eyJlbmMiOiJBMTI4R0NNIiwiYWxnIjoiUlNBLU9BRVAifQ.X-h4zn65XpjG8cZnL3e6hj8LMbzupQBglHZce7tzu-c4utCtXQ2IYoLxdik04usYRhNo74AS_2crdjLnBc_J0lFEdAPzb_OBE7HjwfRaYeNhfXIDw74QCrFGqQ5n7AtllL-vTGnqmI1S9WJhSwnIBe_yRxuXGGbIttizI5FItYY.bB3WkiuoS1xzw78w.iTqTFVbxKo4NQQsNNlbkF4tg4GCfgqdRdQXN8zQU3QYhbHc-XDusH1jFii3-_-AIsqpHaP7ilG9aBxzoK7KPPfK3apcoMS6fDM3QLRSZzjkBoxWK75FtrQMAN5-LecdJk97xaXEciS0QqqBqNugoSPwoiZMazHX3rr7L5jPM-ecXN2uEjbSR0wfg-57iHAku8jvThz4mtGpMRAOil9iZaL6iRQ.o6tR6kuOQBhnpcsdTQeZWw',
'x-api-requestid': 'cdbb5a73-9654-4b31-a845-32844eb44ca8',
'x-sal-contenttype': 'e7FDDltrTy+tA2HnLovvGL0LFMwT+KkEptGju5wXVTU=',
}
params = {
'languageId': 'en',
'locale': 'en',
'clientId': 'MDC',
'component': 'sal-components-valuation',
'version': '3.79.0',
}
response = requests.get(
'https://api-global.morningstar.com/sal-service/v1/stock/valuation/v3/0P0000BPTU',
params=params,
headers=headers,
)

403 Forbidden when using Cheerio

I'm trying to webscrape a Website so I can gather some information for a project, here is my code and it's returning in the Console 403. I'm using request and cheerio to do this, why is this happening? Note I do know the what the majority of status codes mean.
const request = require('request');
const cheerio = require('cheerio');
request('http://www.realmeye.com/forum/', function(err, resp, html) {
if (!err) {
const gatherInformation = cheerio.load(html);
console.log(html);
}
})

You should add a "User-Agent" header to the request, which fits for some browser (e.g. chrome). The server probably checks it to avoid unfamiliar clients.
A thumb rule for web scraping:
Use chrome dev tools / fiddler / other similar tool to inspect the request firing up from your client (chrome, firefox, etc') before trying to reproduce it on your framework (Inspect headers, cookies, etc').
The raw request I saw on Fiddler in your case (when hitting your url on chrome):
GET /forum/ HTTP/1.1
Host: www.realmeye.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
Sec-Fetch-Mode: same-origin
Sec-Fetch-Site: same-origin
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9,he;q=0.8
Most of servers would check "Accept" and "User-Agent" headers before returning 200 OK response.
The fixed code snippet:
const request = require('request');
const cheerio = require('cheerio');
let options = {
url: 'https://www.realmeye.com/forum/',
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
}
};
request(options, function(err, resp, html) {
if (!err) {
const gatherInformation = cheerio.load(html);
console.log(html);
}
})

Outlook login script in python3: Javascript not enabled - cannot login

I am currently trying to log into an outlook account using requests only in Python. I did the same thing with selenium before, but because of better performance and the ability to use proxies with it more easily I would like to use requests now. The problem is that whenever I send a post request to the outlook post URL it returns a page saying that it will not function without javascript.
I read on here that I would have to do the requests that javascript would do with requests, so I used the network analysis tool in Firefox and made requests to all the URLs that the browser made requests to. It still returns the same page, saying that js is not enabled and it will not function without js.
s = requests.Session()
s.headers.update ({
"Accept": "application/json",
"Accept-Language": "en-US",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Host": "www.login.live.com",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
})
url = "https://login.live.com/"
r = s.get(url, allow_redirects=True)
#Simulate JS
sim0 = s.get("https://logincdn.msauth.net/16.000.28299.9
#...requests 1-8 here
sim9 = s.get("https://logincdn.msauth.net/16.000.28299.9/images/Backgrounds/0.jpg?x=a5dbd4393ff6a725c7e62b61df7e72f0", allow_redirects=True)
post_url = "https://login.live.com/ppsecure/post.srf?contextid=EFE1326315AF30F4&bk=1567009199&uaid=880f460b700f4da9b692953f54786e1c&pid=0"
payload = {"username": username}
sleep(4)
s.cookies.update({"logonLatency": "LGN01=637025992217600244"})
s.cookies.update({"MSCC": "1567001943"})
s.cookies.update({"CkTst": "G1567004031095"})
print(s.cookies.get_dict())
print ("---------------------------")
print(json.dumps(payload))
print ("---------------------------")
rp = s.post(post_url, data=json.dumps(payload), allow_redirects=True)
rg = s.get("https://logincdn.msauth.net/16.000.28299.9/images/arrow_left.svg?x=a9cc2824ef3517b6c4160dcf8ff7d410")
print (rp.text)
print (rp.status_code)
If anyone could hint me in the right direction on how to fix this I would highly appreciate it! Thanks in advance

Request header not set as expected when using 'no-cors' mode with fetch API

I have a fetch where the request types seems to be changing which is messing up my post. I submit my basic form (one field only). Here is the fetch.
handleSubmit(event, data) {
//alert('A name was submitted: ' + this.state.value);
event.preventDefault();
console.log("SUBMIT STATE::", this.state.value);
return (
fetch("//localhost:5000/api/values/dui/", {
method: "post",
mode: 'no-cors',
headers: {
'Access-Control-Allow-Origin': '*',
'Content-Type': 'application/json',
'Accept': 'application/json',
},
body: JSON.stringify({
name: this.state.value,
})
}).then(response => {
if (response.status >= 400) {
this.setState({
value: 'no greeting - status > 400'
});
throw new Error('no greeting - throw');
}
return response.text()
}).then(data => {
var myData = JSON.parse(data);
this.setState({
greeting: myData.name,
path: myData.link
});
}).catch(() => {
this.setState({
value: 'no greeting - cb catch'
})
})
);
}
But when I look at this in fiddler content-type is now 'content-type: text/plain;charset=UTF-8'. Here is the raw Fiddler:
POST http://localhost:5000/api/values/dui/ HTTP/1.1
Host: localhost:5000
Connection: keep-alive
Content-Length: 16
accept: application/json
Origin: http://evil.com/
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
content-type: text/plain;charset=UTF-8
Referer: http://localhost:3000/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8
{"name":"molly"}
In DOM Inspector I just see:
POST http://localhost:5000/api/values/dui/ 415 (Unsupported Media Type)
I also find it strange that 'accept' is lower case as well as 'content-type'. Any reason why this is happening. I haven't found anything specific in my searches yet.

When no-cors mode is set for a request, browsers won’t allow you to set any request headers other than CORS-safelisted request-headers. See the spec requirements about adding headers:
To append a name/value (name/value) pair to a Headers object (headers), run these steps:
Otherwise, if guard is "request-no-cors" and name/value is not a CORS-safelisted request-header, return.
In that algorithm, return equates to “return without adding that header to the Headers object”.
And the reason it’s instead getting set to text/plain;charset=UTF-8 is because the algorithm for the request constructor calls into an extract a body algorithm which includes this step:
Switch on object’s type:
↪ USVString
Set Content-Type to text/plain;charset=UTF-8.

So this is what resolved this issue, I switched 'no-cors' to 'cors'. Frankly I thought I had flipped flopped these before because of cross origin issues I was having between my local development workstation and the server I was deploying to but needless to say, when I set this back to mode: 'cors', it all worked again. Both local workstation and server. Why that changes the actual request header, Im not sure. If anyone has answers for that I'll gladly upvote.
Thanks.

Why can I not see my HTTP headers in Chrome?

I am performing a fetch:
fetch(url, fetchOptions);
fetchOptions is configured like so:
var fetchOptions = {
method: options.method,
headers: getHeaders(),
mode: 'no-cors',
cache: 'no-cache',
};
function getHeaders() {
var headers = new Headers(); // Headers is part of the fetch API.
headers.append('User-ID', 'foo');
return headers;
}
Checking fetchOptions at runtime it looks as follows:
fetchOptions.headers.keys().next() // Object {done: false, value: "user-id"}
fetchOptions.headers.values().next() // Object {done: false, value: "foo"}
But user-id is nowhere to be found in the request headers per Chrome dev tools:
GET /whatever?a=long_name&searchTerm=g HTTP/1.1
Host: host:8787
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
accept: application/json
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36
Referer: http://localhost:23900/
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,fr;q=0.6
Why can I not see my "User-ID" header in Chrome dev tools, and why does the header key appear to have been lowercased?

Incase someone else has a similar problem there were two possible culprits for this:
I might not have been starting Chrome with the correct flags.
The following didn't work when run from a Windows shortcut:
"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" -enable-precise-memory-info -disable-web-security
The following did work when run from a Windows command prompt:
C:\whatever\48.0.2564.82\application\chrome.exe --disable-web-security --user-data-dir=C:\whatever\tmp\chrome
The addition of no-cors to the mode of the request might have caused an OPTIONS request to precede the GET request and the server did not support OPTIONS.

Develop Reference

JavaScript is the programming language of the Web.

Is it possible to change the User-Agent header in a browser? - javascript

Related

how to webscrape the "key statistics" from morningstar.com to python and pandas

403 Forbidden when using Cheerio

Outlook login script in python3: Javascript not enabled - cannot login

Request header not set as expected when using 'no-cors' mode with fetch API

Why can I not see my HTTP headers in Chrome?

Categories

Resources