I have a node function that gets a link to YouTube video and sends a request to https://www.convertmp3.io/ . It's a website that allows to download MP3 from YT.
Then it parses the response HTML to get a direct download link from the document (using a library called "cheerio", but if You're not familiar with it, it's just for scraping the link from HTML), and afterwards opens the link to download the MP3.
My code:
const request = require("request")
const cheerio = require("cheerio")
let link = "https://www.convertmp3.io/download/?video=https://www.youtube.com/watch?v=RRSDTE5nWnc"
const options = {
url: link,
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
}
};
request(options, (error, response, html) => {
if(!error && response.statusCode == 200)
{
const $ = cheerio.load(html)
const document = $('.infoBox')
let href = "https://www.convertmp3.io" + document.find("#download").attr('href')
console.log(href)
}
})
While working on my machine, this code works correctly. The console.log gives a link which I can open on my browser and see that the mp3 is actually downloading.
But when I run it on Lambda, the link I get has the correct format, but it's simply not working. It redirects to some non-existing domain.
I'm not entirely sure how is this even possible. The only thing I might think about is that the website may think that the program is a bot (logically) and give a wrong link (which sounds pretty bizzare). But I decided to also send some user-agent headers. It didn't work either.
I'm really confused about how can this be possible and don't even know what else to try. Any thoughts?
I am running a local Springboot server, that when I access it locally in the browser, gives me a valid JSON object properly formatted (I verified this via JSON formatter).
I am also locally running a React application using node. I am attempting to use fetch() to get back that JSON object and running into issues. Finally got around CORs header issues, but not cannot figure out why the JSON object isn't coming back. Here's my code
var headers = new Headers();
headers.append("Content-type", "application/json;charset=UTF-8");
var myInit = { method: 'GET',
headers: headers,
mode: 'no-cors',
cache: 'default',
};
fetch(`http://localhost:3010/getJSON`, myInit)
.then(function(response){
console.log(response.data);
console.log(response);
console.log(JSON.parse(JSON.stringify(response)));
},function(error){
console.log(error);
});
So when I run this in Chrome with the debugger, the responses to the 3 log statements are:
1st logger
undefined
2nd logger
Response {type: "opaque", url: "", redirected: false, status: 0, ok: false,
…}
body
:
(...)
bodyUsed
:
false
headers
:
Headers {}
ok
:
false
redirected
:
false
status
:
0
statusText
:
""
type
:
"opaque"
url
:
""
__proto__
:
Response
3rd logger
{}
I have tried many different JSON parsing, stringify, etc, to no avail.
The next confusing part, is if within the Chrome debugger I go to the "Network" tab, click on the /getJSON, it shows me the entire JSON object just fine in both the "Preview" and "Response" tabs. So clearly Chrome is connecting to it correctly. Here's Chrome's "Headers" tab within "Network":
Request URL:http://localhost:3010/getJSON
Request Method:GET
Status Code:200
Remote Address:[::1]:3010
Referrer Policy:no-referrer-when-downgrade
Response Headers
view source
Content-Type:application/json;charset=UTF-8
Date:Thu, 12 Oct 2017 16:05:05 GMT
Transfer-Encoding:chunked
Request Headers
view source
Accept:*/*
Accept-Encoding:gzip, deflate, br
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Host:localhost:3010
Referer:http://localhost:3000/
User-Agent:Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
I have tried to mimic this header in my request, but not sure how it differs? Any help would be greatly appreciated as I am currently banging my head against the way with this!
You're getting an opaque response, which tells me that maybe you haven't completely resolved the cors headers situation. If you're fetching from the client, I would suggest proxying that through your nodejs so that instead of calling your springboot service, you call node, thus getting rid of the cors issues.
EDIT
You could create something like this:
import express from 'express';
import request from 'request';
const router = express.Router();
router.get('/proxyname', (req, res) => {
// Removing IPv4-mapped IPv6 address format, if present
const requestUrl = [your service's endpoint];
request(requestUrl, (err, apiResponse, body) => {
res.status(apiResponse.statusCode);
try {
res.json(JSON.parse(body));
} catch (e) {
res.send(body);
}
});
});
export default router;
and then on your nodejs server file, add it, like this:
import proxy from '[path to proxy file above]';
app.use('/path-to-endpoint', proxy);
and then call that from the client instead of your SpringBoot service.
I'm encountering a 403 status code with an UnrecognizedClientException in the x-amzn-errortype header of the response to my API Gateway GET Request using the generated Javascript SDK. The Resource being called utilizes IAM Auth which differentiates the users role based on their user group.
Here is my API Client Initialize Function
function initializeAPIClient(accessKey, secretKey, sessionToken){
var config = {
region : region,
accessKey : accessKey,
secretKey : secretKey,
sessionToken : sessionToken
}
apigClient = apigClientFactory.newClient(config);
}
Here is my GET request Function
function testCall(){
var params = '';
var body = '';
var additionalParams = '';
apigClient.testCallGet(params, body, additionalParams)
.then(function(result){
alert("Permissions are available to this user.");
})
.catch(function(result){
alert("Permissions are NOT available to this user.");
});
}
Here are my request headers:
:authority:[API_ENDPOINT]
:method:GET
:path:/[STAGE]/[RESOURCE]
:scheme:https
accept:application/json
accept-encoding:gzip, deflate, sdch, br
accept-language:en-US,en;q=0.8
authorization:AWS4-HMAC-SHA256 Credential=[ACCESS_KEY_ID]/20170406/[REGION]/execute-api/aws4_request, SignedHeaders=accept;host;x-amz-date, Signature=[SIGNATURE]
origin:http://localhost:8000
referer:http://localhost:8000/php/[PAGE].php/?username=[USERNAME]&sessionToken=[SESSION_TOKEN]
user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36
x-amz-date:20170406T180808Z
x-amz-security-token:[SESSION_TOKEN]
I'm not sure what could be causing this. The solutions recommended when I search UnrecognizedClientException seem to suggest doing what I'm already doing.
I've solved my own issue, so here's the answer for anybody who runs into a similar logic error. Do NOT use the Id token as your session token, which is what I was doing. The id token is used to generate the session token, along with the access key and secret key. Do not confuse the two.
I'm looking at my balance on Venmo.com but they only show you 3 months at a time and I'd like to get my entire transaction history.
Looking at the Chrome Developer Tools, under the network tab, I can see the request to https://api.venmo.com/v1/transaction-history?start_date=2017-01-01&end_date=2017-01-31 which returns JSON.
I'd like to programmatically iterate through time and make several request and aggregate all of the transactions. However, I keep getting 401 Unauthorized.
My initial approach was just using Node.js. I looked at the cookie in the request and copied it into a secret.txt file and then sent the request:
import fetch from 'node-fetch'
import fs from 'fs-promise'
async function main() {
try {
const cookie = await fs.readFile('secret.txt')
const options = {
headers: {
'Cookie': cookie,
},
}
try {
const response = await fetch('https://api.venmo.com/v1/transaction-history?start_date=2016-11-08&end_date=2017-02-08', options)
console.log(response)
} catch(e) {
console.error(e)
}
} catch(e) {
console.error('please put your cookie in a file called `secret.txt`')
return
}
}
That didn't work do I tried copying all of the headers over:
const cookie = await fs.readFile('secret.txt')
const options = {
headers: {
'Accept-Encoding': 'gzip, deflate, sdch, br',
'Accept-Language': 'en-US,en;q=0.8',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Cookie': cookie,
'Host': 'api.venmo.com',
'Origin': 'https://venmo.com',
'Pragma': 'no-cache',
'Referer': 'https://venmo.com/account/settings/balance/statement?end=02-08-2017&start=11-08-2016',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36',
},
}
try {
const response = await fetch('https://api.venmo.com/v1/transaction-history?start_date=2016-11-08&end_date=2017-02-08', options)
console.log(response)
} catch(e) {
console.error(e)
}
This also did not work.
I even tried making the request from the console of the website and got a 401:
fetch('https://api.venmo.com/v1/transaction-history?start_date=2016-11-08&end_date=2017-02-08', {credentials: 'same-origin'}).then(console.log)
So my question here is this: I see a network request in Chrome Developer Tools. How can I make that same request programmatically? Preferably in Node.js or Python so I can write an automated script.
In the Network tab of the Chrome Developer Tools, right click the request and click "Copy" > "Copy as cURL (bash)". You can then either write a script using the curl command directly, or use https://curlconverter.com/ to convert the cURL command to Python, JavaScript, PHP, R, Go, Rust, Elixir, Java, MATLAB, Dart or JSON.
I want to programatically find a list of URLs for similar images given an image URL. I can't find any free image search APIs so I'm trying to do this by scraping Google's Search by Image.
If I have an image URL, say http://i.imgur.com/oLmwq.png, then navigating to https://www.google.com/searchbyimage?&image_url=http://i.imgur.com/oLmwq.png gives related images and info.
How do I get jsdom.env to produce the HTML your browser gets from the above URL?
Here's what I've tried (CoffeeScript):
jsdom = require 'jsdom'
url = 'https://www.google.com/searchbyimage?&image_url=http://i.imgur.com/oLmwq.png'
jsdom.env
html: url
scripts: [ "http://code.jquery.com/jquery.js" ]
features:
FetchExternalResources: ['script']
ProcessExternalResources: ['script']
done: (errors, window) ->
console.log window.$('body').html()
You can see the HTML doesn't match what we want. Is this an issue with Jsdom's HTTP headers?
I find request + cheerio to be easier than jsdom for tasks like this. I see that you've found an answer already, but thought I'd mention it as an alternative solution.
Example:
var request = require('request'),
cheerio = require('cheerio');
var google = 'https://www.google.com/searchbyimage';
var image = 'http://i.imgur.com/oLmwq.png';
var options = {
url: google,
qs: { image_url: image },
headers: { 'user-agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11' }
};
request(options, function (err, res, body) {
var $ = cheerio.load(body);
…
});
The issue is Jsdom's User-Agent HTTP header. Once that is set everything (almost) works:
jsdom = require 'jsdom'
url = 'https://www.google.com/searchbyimage?&image_url=http://i.imgur.com/oLmwq.png'
jsdom.env
html: url
headers:
'User-Agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'
scripts: [ "http://code.jquery.com/jquery.js" ]
features:
FetchExternalResources: ['script']
ProcessExternalResources: ['script']
done: (errors, window) ->
$ = window.$
$('#iur img').parent().each (index, elem) ->
href = $(elem).attr 'href'
url = href.split('?')[1].split('&')[0].split('=')[1]
console.log url
Which gives us a nice list of visually similar images. The only problem now is Jsdom throws an error after returning the result:
timers.js:103
if (!process.listeners('uncaughtException').length) throw e;
^
TypeError: Cannot call method 'call' of undefined
at new <anonymous> (/project-root/node_modules/jsdom/lib/jsdom/browser/index.js:54:13)
at _.Zl (https://www.google.com/xjs/_/js/s/c,sb,cr,cdos,jsa,ssb,sf,tbpr,tbui,rsn,qi,ob,mb,lc,hv,cfm,klc,kat,aut,esp,bihu,amcl,kp,lu,m,rtis,shb,sfa,hsm,pcc,csi/rt=j/ver=3w99aWPP0po.en_US./d=1/sv=1/rs=AItRSTPrAylXrfkOPyRRY-YioThBMqxW2A:1238:93)
at _.jm (https://www.google.com/xjs/_/js/s/c,sb,cr,cdos,jsa,ssb,sf,tbpr,tbui,rsn,qi,ob,mb,lc,hv,cfm,klc,kat,aut,esp,bihu,amcl,kp,lu,m,rtis,shb,sfa,hsm,pcc,csi/rt=j/ver=3w99aWPP0po.en_US./d=1/sv=1/rs=AItRSTPrAylXrfkOPyRRY-YioThBMqxW2A:1239:399)
at _.km (https://www.google.com/xjs/_/js/s/c,sb,cr,cdos,jsa,ssb,sf,tbpr,tbui,rsn,qi,ob,mb,lc,hv,cfm,klc,kat,aut,esp,bihu,amcl,kp,lu,m,rtis,shb,sfa,hsm,pcc,csi/rt=j/ver=3w99aWPP0po.en_US./d=1/sv=1/rs=AItRSTPrAylXrfkOPyRRY-YioThBMqxW2A:1241:146)
at Object._onTimeout (https://www.google.com/xjs/_/js/s/c,sb,cr,cdos,jsa,ssb,sf,tbpr,tbui,rsn,qi,ob,mb,lc,hv,cfm,klc,kat,aut,esp,bihu,amcl,kp,lu,m,rtis,shb,sfa,hsm,pcc,csi/rt=j/ver=3w99aWPP0po.en_US./d=1/sv=1/rs=AItRSTPrAylXrfkOPyRRY-YioThBMqxW2A:1248:727)
at Timer.list.ontimeout (timers.js:101:19)