Code example:
fetch('https://httpbin.org/get', {
'headers': {
'Date': (new Date()).toUTCString(),
}
})
Response:
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
"Connection": "close",
"Host": "httpbin.org",
"Origin": "http://localhost:8000",
"Referer": "http://localhost:8000/",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36 OPR/58.0.3135.53"
},
"origin": "146.120.13.65",
"url": "https://httpbin.org/get"
}
Date is listed in the forbidden header names in the fetch spec.
These are forbidden so the user agent remains in full control over them.
Accept-Charset
Accept-Encoding
Access-Control-Request-Headers
Access-Control-Request-Method
Connection
Content-Length
Cookie
Cookie2
Date
DNT
Expect
Host
Keep-Alive
Origin
Referer
TE
Trailer
Transfer-Encoding
Upgrade
Via
Related
I'm trying to use UrlfetchApp to send a request to a page. Request is good. But returning
<p>Your browser is currently set to block cookies. Please enable cookies in your browser preferences and try again.</p>
in the html body.
Here is the code:
const res1 = UrlFetchApp.fetch('https://url.com', {
method: 'POST',
headers: {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie': 'JSESSIONID=Dfaefaefaesfsaefgr',
'Origin': 'https://url.com',
'Referer': 'https://url.com',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="100", "Google Chrome";v="100"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"'
},
payload: 'payloaddata'});
I don't know how to turn on the cookie when using UrlFetch function. Does anyone has the same issue?
Thanks for any suggestions.
What I have tried:
Seperate Cookie property from headers into a json object.
cookies = {
'JSESSIONID': '778C494754356A23F080849C10F2A851',
'TS01ee6e39': '018c1954d58470c0adb8d6d0df850dba9363aa54c16fc4653b2178278f62a5f0dde96d5780b3338857c8ea3ff92c6e99b9bd3a5867cef327bd847ac05ab9242a7414fa0832',
'X-HR-ClientSessionId': '10_107.162.4.39_1651626597548',
'locale': 'en',
'TS0189a565': '018c1954d5ae31eb4b4d18a85d43414cdcd9158bcc6fc4653b2178278f62a5f0dde96d578016166d716abd84dc80b66117923674fd68b4dd222e33d15361b818320e9dbd03333b4682ebc552fb370cd462eb3e5d2b5cc5273e789e87f2bd772fa800fa9e77744d459169cf3d8594422a3d7ae7968ba33103e373fbcf2c83f38da92d9643e5f9ef6a925938338bef881e38e827bf0a5635126e7297731cc06d71fd1a883702',}
But this does not work, request immediately return session expired. Please try again.
Edit on 05/07/2022:
I'm trying to get the JSESSIONID from fetch request, I can find this object on the browser, But could not find it on the urlfetch return response.
Endpoint: https://quizlet.com/webapi/3.2/images/search?query=hello&perPage=2
You guys can try to access this page as Incognito, from my side It works. So I think I can fetch data from that site.
I try to copy the request and run in Javascirpt, Python. However, It doesn't work. I got 403 error.
I also try to use Burp Suite. I can't access this site through Burp's browser.
Moreover, As I tried using incognito so I don't think it is relevant to cookies.
Code sample (JS):
import fetch from "node-fetch";
const response = await fetch(
"https://quizlet.com/webapi/3.2/images/search?query=hello&perPage=2",
{
headers: {
accept:
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-language": "en",
"cache-control": "no-cache",
pragma: "no-cache",
"sec-ch-ua":
'"Google Chrome";v="93", " Not;A Brand";v="99", "Chromium";v="93"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Linux"',
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
},
referrerPolicy: "strict-origin-when-cross-origin",
body: null,
method: "GET",
mode: "cors",
credentials: "include",
}
);
const data = await response.status;
console.log(data);
Code Python
import requests
headers = {
'authority': 'quizlet.com',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'sec-ch-ua': '"Google Chrome";v="93", " Not;A Brand";v="99", "Chromium";v="93"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Linux"',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'en',
'cookie': 'qi5=i2x3g7y1z9a6%3At3vMoQQig2yLcpN.HKWn; qtkn=7gT4DE7pN9URJ2AFDYeaVe; fs=qzkse0; app_session_id=9781a407-4f37-4c09-8e97-8156f182bb45; search_session=%7B%22search_session_id%22%3A%22-2379864199063990974614477b859794%22%2C%22query%22%3A%22overrated%22%2C%22version%22%3A%221.1.1%22%2C%22platform%22%3A%22WEB%22%2C%22depth%22%3Anull%2C%22target_object_type%22%3A%22QImage%22%7D; __cf_bm=cB7hRf6JbcOFZ2kvQ3W12V4bxXiIgn_kF3n87RcI0h0-1631877048-0-Ac+Hi0pATLgW5N3JjqYa7uc5W4ZfDLOumvmCQixWJIKdcVj7stciFh8cYFVTOpr+q5pM2Q7LrXC/LsffOB6Mh2E=; __cfruid=81f16a673e6117331dd4270b3f4f29111590d7d8-1631877048',
}
params = (
('query', 'hello'),
('perPage', '2'),
)
response = requests.get(
'https://quizlet.com/webapi/3.2/images/search', headers=headers, params=params)
# NB. Original query string below. It seems impossible to parse and
# reproduce query strings 100% accurately so the one below is given
# in case the reproduced version is not "correct".
# response = requests.get('https://quizlet.com/webapi/3.2/images/search?query=hello&perPage=2', headers=headers)
print(response.status_code)
Please help me out. I don't even know how can be that? (browser works, while code doesn't). Thank anyway.
From the python side. I had a look out of interest, as I'm currently developing a REST API and was curious how they where securing it.
Using Wireshark it appears that the "requests" module in python does not handle http requests in the same manor as Chrome/Firefox, which I suspect they are using as a tell to give a captcha.
Anyway switching requests for the httpx module;
pip install httpx
https://www.python-httpx.org/
And changing the headers to replicate Firefox in full;
import httpx
headers = [
('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'),
('Accept-Encoding','gzip, deflate, br'),
('Accept-Language','en-GB,en;q=0.5'),
('Cache-Control','max-age=0'),
('Connection','keep-alive'),
('Host','quizlet.com'),
('Sec-Fetch-Dest','document'),
('Sec-Fetch-Mode','navigate'),
('Sec-Fetch-Site','none'),
('Sec-Fetch-User','?1'),
('TE','trailers'),
('Upgrade-Insecure-Requests','1'),
('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0'),
]
params = (
('query', 'hello'),
('perPage', '2'),
)
response = httpx.get('https://quizlet.com/webapi/3.2/images/search', headers=headers, params=params,)
print(response.content)
Gives the following as appose to the captcha page for me;
{
"responses": [{
"models": {
"image": [{
"id": 18957872,
"personId": 16641862,
"timestamp": 1416579222,
"lastModified": 1416579222,
"code": "Gfg5XS88MRmYq8RS",
"license": 1,
"width": 480,
"height": 360,
"flickrId": null,
"flickrOwner": null,
"_legacyUrl": "http://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA.gif",
"_legacyUrlSquare": "http://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA_s.gif",
"_legacyUrlSmall": "http://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA_m.gif",
"_secureLegacyUrl": "https://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA.gif",
"_secureLegacyUrlLarge": "https://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA_b.gif",
"_secureLegacyUrlSquare": "https://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA_s.gif",
"_secureLegacyUrlSmall": "https://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA_m.gif"
}, {
"id": 9228314,
"personId": 513525,
"timestamp": 1406222781,
"lastModified": 1406222781,
"code": "bPHbzaV7KsGWfuXJ",
"license": 1,
"width": 298,
"height": 232,
"flickrId": null,
"flickrOwner": null,
"_legacyUrl": "http://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA.jpg",
"_legacyUrlSquare": "http://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA_s.jpg",
"_legacyUrlSmall": "http://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA_m.jpg",
"_secureLegacyUrl": "https://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA.jpg",
"_secureLegacyUrlLarge": "https://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA_b.jpg",
"_secureLegacyUrlSquare": "https://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA_s.jpg",
"_secureLegacyUrlSmall": "https://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA_m.jpg"
}]
},
"paging": {
"total": 50,
"page": 1,
"perPage": 2,
"token": "UuKKKAkmxv.r4YtwFDuRevZVGAHr"
}
}]
}
When making an axios call in my google content script that's being executed on http://example.com, the response status is a 200 and I'd expect a json response but instead the data is an empty string like this:
response = {data: "", status: 200, statusText: "", headers: {…}, config: {…}, …}
content.bundle.js
const url = "https://jsonplaceholder.typicode.com/todos/1";
var config = {
headers: {
'Content-Type': 'application/json',
'X-Requested-With': 'XMLHttpRequest',
},
}
axios.get(url, config).then((response) => {
debugger;
console.log(response);
});
manifest.json
"content_scripts": [
{
"matches": ["<all_urls>"],
"js": ["content.bundle.js"],
"css": ["content-styles.css"]
}
],
If you're interested in the headers
Request URL: https://jsonplaceholder.typicode.com/todos/1
Request Method: GET
Status Code: 200
Remote Address: 104.24.99.239:443
Referrer Policy: no-referrer-when-downgrade
access-control-allow-credentials: true
Provisional headers are shown
Accept: application/json, text/plain, */*
Referer: http://example.com/
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36
X-Requested-With: XMLHttpRequest
I'm able to get around the issue by sending a message to the background script, ideally could avoid this extra step.
Thoughts? Happy to provide more info if needed.
Thanks
I tried to scrape this page using beautifulsoup find method but I could not find the table value in the HTML page. I found out that the website is generating the data instantly when I load the page through an internal API. Any help??
Thanks in advance.
This works for me. I had to dig around in the dev tools but found it
import requests
geturl=r'https://www.barchart.com/futures/quotes/CLJ19/all-futures'
apiurl=r'https://www.barchart.com/proxies/core-api/v1/quotes/get'
getheaders={
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'max-age=0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
}
getpay={
'page': 'all'
}
s=requests.Session()
r=s.get(geturl,params=getpay, headers=getheaders)
headers={
'accept': 'application/json',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'referer': 'https://www.barchart.com/futures/quotes/CLJ19/all-futures?page=all',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
'x-xsrf-token': s.cookies.get_dict()['XSRF-TOKEN']
}
payload={
'fields': 'symbol,contractSymbol,lastPrice,priceChange,openPrice,highPrice,lowPrice,previousPrice,volume,openInterest,tradeTime,symbolCode,symbolType,hasOptions',
'list': 'futures.contractInRoot',
'root': 'CL',
'meta': 'field.shortName,field.type,field.description',
'hasOptions': 'true',
'raw': '1'
}
r=s.get(apiurl,params=payload,headers=headers)
j=r.json()
print(j)
>{'count': 108, 'total': 108, 'data': [{'symbol': 'CLY00', 'contractSymbol': 'CLY00 (Cash)', ........
I wrote something in Python and am trying to figure out why the hell the seemingly equivalent code in JS isn't working.
Working Python -
Headers used:
self.session = requests.Session()
#Set headers
self.headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Connection': 'keep-alive',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.8',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
}
Code:
link = 'https://www.kith.com/cart'
data = [
('updates'+'['+'888074764295'+']', '1'),
('updates'+'['+'888463982599'+']', '0'),
]
click = self.session.post(link, headers= self.headers, data=data, verify = False)
Not working JS -
const secondaryVar = `updates[888463982599]`;
const desiredVariant = `updates[888074764295]`;
const checkoutForm = {};
checkoutForm[desiredVariant] = '1';
checkoutForm[secondaryVar] = '0';
//Post request to cart to update it with desired product
request({
url: 'https://www.kith.com/cart',
followAllRedirects: true,
method: 'post',
formData: checkoutForm,
headers : {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'en-US,en;q=0.9',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Upgrade-Insecure-Requests':'1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
},
},
function(err, res, body) {
I've narrowed it down to this bit of code, but as far as I can tell there is no significant difference between the code in Python and the JS code. My guess is it has something to do with the session or headers...but again I don't know.
Thanks for any responses
I think the Python might not respect cors, which would explain the difference. I don't know what JavaScript framework you are using, but using jQuery, the following works when executing this code from the kith.com website.
To avoid any issues with CORS, I removed the headers that are automatically set by the browser, and I change the url from www.kith.com to kith.com.
jQuery.ajax("https://kith.com/cart", settings={method:"post", headers : {
'Accept':'application/json',
'Accept-Language':'en-US,en;q=0.9',
'Cache-Control':'max-age=0',
'Upgrade-Insecure-Requests':'1',
}, data:{"desiredVariant":1,"secondaryVar":0}}).error(function(err){console.log("error"+ err)}).success(function(res){console.log(res)})