I set up a very basic web scraper to check stock of a specific item on Costco.com for my grandfather. It's working great locally, but when I run it through Heroku it fails (seemingly 50% of the time). Here's the code for the scraper
const task = () => {
// toggle so doesn't send message multiple times if continuously available
let alreadyAvailable = false;
let url = 'http://www.costco.com/Kirkland-Signature-Four-Piece-Urethane-Cover-Golf-Ball,-2-dozen.product.100310467.html';
request(url, function(error, response, html){
let $ = cheerio.load(html);
if(error){
throw new Error(error);
}
if ( $('#product-page #product-details #ctas #add-to-cart input[type="button"]')['0'].attribs.value === 'Out of Stock') {
alreadyAvailable = false;
console.log("still out of stock");
} else {
if (alreadyAvailable === false) {
sendMessage();
alreadyAvailable = true;
}
}
});
};
and here are the logs
2016-12-25T03:48:39.675549+00:00 heroku[scheduler.5440]: Starting process with command `node scraper.js`
2016-12-25T03:48:40.262503+00:00 heroku[scheduler.5440]: State changed from starting to up
2016-12-25T03:48:41.509416+00:00 app[scheduler.5440]: /app/scraper.js:34
2016-12-25T03:48:41.509432+00:00 app[scheduler.5440]: if ( $('#product-page #product-details #ctas #add-to-cart input[type="button"]')['0'].attribs.value === 'Out of Stock') {
2016-12-25T03:48:41.509433+00:00 app[scheduler.5440]: ^
2016-12-25T03:48:41.509433+00:00 app[scheduler.5440]:
2016-12-25T03:48:41.509434+00:00 app[scheduler.5440]: TypeError: Cannot read property 'attribs' of undefined
2016-12-25T03:48:41.509434+00:00 app[scheduler.5440]: at Request._callback (/app/scraper.js:34:90)
2016-12-25T03:48:41.509435+00:00 app[scheduler.5440]: at Request.self.callback (/app/node_modules/request/request.js:186:22)
2016-12-25T03:48:41.509436+00:00 app[scheduler.5440]: at emitTwo (events.js:106:13)
2016-12-25T03:48:41.509436+00:00 app[scheduler.5440]: at Request.emit (events.js:191:7)
2016-12-25T03:48:41.509436+00:00 app[scheduler.5440]: at Request.<anonymous> (/app/node_modules/request/request.js:1081:10)
2016-12-25T03:48:41.509437+00:00 app[scheduler.5440]: at emitOne (events.js:96:13)
2016-12-25T03:48:41.509437+00:00 app[scheduler.5440]: at Request.emit (events.js:188:7)
2016-12-25T03:48:41.509438+00:00 app[scheduler.5440]: at IncomingMessage.<anonymous> (/app/node_modules/request/request.js:1001:12)
2016-12-25T03:48:41.509438+00:00 app[scheduler.5440]: at IncomingMessage.g (events.js:291:16)
2016-12-25T03:48:41.509439+00:00 app[scheduler.5440]: at emitNone (events.js:91:20)
2016-12-25T03:48:41.509439+00:00 app[scheduler.5440]: at IncomingMessage.emit (events.js:185:7)
2016-12-25T03:48:41.509439+00:00 app[scheduler.5440]: at endReadableNT (_stream_readable.js:974:12)
2016-12-25T03:48:41.509440+00:00 app[scheduler.5440]: at _combinedTickCallback (internal/process/next_tick.js:74:11)
2016-12-25T03:48:41.509440+00:00 app[scheduler.5440]: at process._tickCallback (internal/process/next_tick.js:98:9)
2016-12-25T03:48:41.560539+00:00 heroku[scheduler.5440]: State changed from up to complete
2016-12-25T03:48:41.550655+00:00 heroku[scheduler.5440]: Process exited with status 1
2016-12-25T03:58:42.438807+00:00 app[api]: Starting process with command `node scraper.js` by user scheduler#addons.heroku.com
2016-12-25T03:58:43.701468+00:00 heroku[scheduler.5038]: Starting process with command `node scraper.js`
2016-12-25T03:58:44.312279+00:00 heroku[scheduler.5038]: State changed from starting to up
2016-12-25T03:58:45.769564+00:00 app[scheduler.5038]: still out of stock
2016-12-25T03:58:45.827867+00:00 heroku[scheduler.5038]: State changed from up to complete
2016-12-25T03:58:45.814921+00:00 heroku[scheduler.5038]: Process exited with status 0
You can see that sometimes I get the console log inside of the if-block and others I get a type error because it's trying to read attributes from an html element that don't exist. I was thinking that this might be a async issue, but I'm not sure how to go about fixing it. I assumed Request wasn't running the callback until it got all the html.
The problem here is what Costco's website is returning.
Your code is failing when parsing the DOM via Cheerio. What this means in your case is that the particular HTML you're attempting to scrape doesn't actually exist (that's what the error is saying).
This could be caused by a few possible things:
Costco is rendering a DIFFERENT page than what you expect (maybe it thinks you're a bot, or is doing some throttling).
You are receiving a redirect or some other sort of non-error HTTP status code, and the HTML you're looking for doesn't exist there.
Costco's website changes the HTML dynamically to prevent people from scraping.
What I would do if I were you is this:
Have your process log all the page's HTML when the task runs.
The next time your process fails, copy the HTML from the Heroku logs into a local editor, and see what it returned.
I'm willing to bet you will be surprised =)
Related
I tried process.kill(pid, 0) to get the status, but got
node:internal/process/per_thread:221
throw errnoException(err, 'kill');
^
Error: kill ESRCH
at process.kill (node:internal/process/per_thread:221:13)
at Timeout._onTimeout (C:\Users\Administrator\Desktop\demo\web.js:7:43)
at listOnTimeout (node:internal/timers:559:17)
at processTimers (node:internal/timers:502:7) {
errno: -4040,
code: 'ESRCH',
syscall: 'kill'
}
A total of two executions. It returned true for the first time, then set the timer to get it again, and returned error.
This method will throw an error if the target pid does not exist. As a special case, a signal of 0 can be used to test for the existence of a process. Windows platforms will throw an error if the pid is used to kill a process group.
From the above, if the process does not exist, process.kill(pid, 0) should return false instead of an error. Could it be that 0 is not only a check, but also kill?
I'm having trouble returning errorMessage from this nested JSON response I get from an aws SNS message. Part of the problem, I think, is the weird spaces and formatting this is returned in. I parsed the whole thing once to get message which I'm able to return values like executionIdfrom, but JSON.parse(message.errorData) already throws me an error before I can continue downward to -> Cause -> errorMessage.
{"executionId":"<<ARN REDACTED>>","stateName":"9b44bafc-d542-4533-8091-5c9a2bfd9983","startTime":"2022-04-07T16:20:59.350Z","sourceStepFunction":"<<REDACTED>>","errorData":{"Error":"Error","Cause":"{\"errorType\":\"Error\",\"errorMessage\":\"Request failed with status code 400\",\"trace\":[\"Error: Request failed with status code 400\",\" at e.exports (/var/task/index.js:2:69443)\",\" at e.exports (/var/task/index.js:2:71859)\",\" at IncomingMessage.<anonymous> (/var/task/index.js:2:62409)\",\" at IncomingMessage.emit (events.js:412:35)\",\" at IncomingMessage.emit (domain.js:475:12)\",\" at endReadableNT (internal/streams/readable.js:1334:12)\",\" at processTicksAndRejections (internal/process/task_queues.js:82:21)\"]}"}}
Please anyone can tell what should I do? I get this error:
Error: 500 {"error":"[ethjs-query] while formatting outputs from RPC '{\"value\":{\"code\":-32603,\"message\":\"Too Many Requests\",\"data\":{\"originalError\":{}},\"stack\":\"Error: Too Many Requests\\n at eval (/www/node_modules/web3-provider-engine/subproviders/rpc.js:52:23)\\n at Request.eval [as _callback] (/www/node_modules/web3-provider-engine/subproviders/rpc.js:54:11)\\n at Request.self.callback (/www/node_modules/request/request.js:186:22)\\n at Request.emit (events.js:315:20)\\n at Request.eval (/www/node_modules/request/request.js:1155:10)\\n at Request.emit (events.js:315:20)\\n at IncomingMessage.eval (/www/node_modules/request/request.js:1077:12)\\n at Object.onceWrapper (events.js:421:28)\\n at IncomingMessage.emit (events.js:327:22)\\n at endReadableNT (internal/streams/readable.js:1327:12)\"}}'"}
I think you have sent too many requests in a short period of time as it hints in the response property called message "message": "Too Many Requests". Now rate limits cooldown time differs for every service out there, but you can try to wait for a couple of minutes and try again, if you still receive you might have to wait for an hour, a day and so on.
Before posting this question I went through a couple of SO posts with the same heading. All of them suggested me to restart my app..I tried that but I am still getting the error.
The code that is throwing the error is
app.post('/insert', function(request, response) {
var connection = mySequel.createConnection(ConnectionParams);
connection.connect();
var UserName = request.body.UserName;
// insert into userrecords values ("123456",DEFAULT,DEFAULT,DEFAULT,10);
var InsertQuery = "insert into userrecords values (" + UserName + ",DEFAULT,DEFAULT,DEFAULT,-10);";
connection.query(InsertQuery, function(error, result, fields) {
if (error) {
response.send("error");
console.log(error);
throw error;
} else {
// response.send("success");
console.log(result.insertId);
response.send(result.insertId);
}
});
connection.end();
});
The error caused by the code is this
events.js:160
2017-04-26T11:42:52.410094+00:00 app[web.1]:
2017-04-26T11:42:52.410092+00:00 app[web.1]: throw er; // Unhandled 'error' event
2017-04-26T11:42:52.410096+00:00 app[web.1]: Error: Quit inactivity timeout
2017-04-26T11:42:52.410097+00:00 app[web.1]: at Quit.<anonymous> (/app/node_modules/mysql/lib/protocol/Protocol.js:160:17)
2017-04-26T11:42:52.410093+00:00 app[web.1]: ^
2017-04-26T11:42:52.410098+00:00 app[web.1]: at emitNone (events.js:86:13)
2017-04-26T11:42:52.410099+00:00 app[web.1]: at Quit._onTimeout (/app/node_modules/mysql/lib/protocol/sequences/Sequence.js:127:8)
2017-04-26T11:42:52.410099+00:00 app[web.1]: at Quit.emit (events.js:185:7)
2017-04-26T11:42:52.410100+00:00 app[web.1]: at ontimeout (timers.js:380:14)
2017-04-26T11:42:52.410101+00:00 app[web.1]: at tryOnTimeout (timers.js:244:5)
2017-04-26T11:42:52.410102+00:00 app[web.1]: at Timer.listOnTimeout (timers.js:214:5)
I am hosting my app on Heroku and whenever I try to access this URL the app is crashing forcing me to restart the dyno. The data I am sending is being successfully inserted into the DB but the auto incremented primary key is not being returned to my app. Instead I get a server error.
JSON.stringify() returns the correct value once and after that the app crashed which maked the subsequent requests to fail.
How can I solve this problem?
Try converting your response into JSON before sending it.
Like
response.send(JSON.stringify(result.insertId));
I too once had a similar problem and I think that is how I solved it.
EDIT:
I went through the myql npm library and found something called Pool. Pool is used when you app needs to handle more than one request at a time. I suggest that you try that. There is a good example availble here.
I have same problem, i just downgrade to react-scripts#2.1.8
this is the steps:
cd my-app
npm install react-scripts#2.1.8
npm start
After running node for a long time I get this error.
My program itself works perfectly fine.
What does it really mean? I can't find anything about it.
And most importantly how to prevent it?
Error: read ETIMEDOUT
at errnoException (net.js:901:11)
at TCP.onread (net.js:556:19)
--------------------
at Query.Sequence (/nodejs/node_modules/mysql/lib/protocol/sequences/Sequence.js:15:20)
at new Query (/nodejs/node_modules/mysql/lib/protocol/sequences/Query.js:12:12)
at Function.Connection.createQuery (/nodejs/node_modules/mysql/lib/Connection.js:48:10)
at Connection.query (/nodejs/node_modules/mysql/lib/Connection.js:100:26)
at Socket.<anonymous> (/nodejs/server.js:23:12)
at Socket.EventEmitter.emit [as $emit] (events.js:95:17)
at SocketNamespace.handlePacket (/nodejs/node_modules/socket.io/lib/namespace.js:335:22)
at Manager.onClientMessage (/nodejs/node_modules/socket.io/lib/manager.js:488:38)
at WebSocket.Transport.onMessage (/nodejs/node_modules/socket.io/lib/transport.js:387:20)
at Parser.<anonymous> (/nodejs/node_modules/socket.io/lib/transports/websocket/hybi-16.js:39:10)
I guess it has to do something with the mysql I'm using in my program because it refers to Query.sequence or something.
https://github.com/felixge/node-mysql
The only line I can refer to that causes this error is where I create a query:
database.query("SELECT * FROM `...