I have a macro (Macro Recorder), which runs 24/7 without pause.
I need to view an item in one tab, order a specific quantity in another tab, write a message and repeat.
Can I achieve this result with Puppeteer?
What happens if I do not the close browser (because I don't need to)?
Yes, it is possible to run puppeteer 24/7 without any problems. Simply don't close the browser and keep using it. I have run puppeteer instances myself for weeks without any issues. You can see some statistics of one of the runs here.
The only thing you need to watch out for is error and memory handling. You need to make sure your script does not crash due to some minor JavaScript error. Additionally, you need to clean up any memory (by closing the page or even the browser) that might be leaked due to other kind of errors (network errors, SSL errors, timeouts, ...). See the linked run above to see a full list of errors that might happen during crawling process.
puppeteer-cluster: Pool of puppeteer instances
Depending on the complexity of your use case you might want to use the library puppeteer-cluster (disclaimer: I'm the author), which takes care of these kind of problems and also gives you a simple monitoring overview, which is very handy when running a task for a long time.
Related
I wish to create a script that can interact with any webpage with as little delay as possible, excepting network delay. This is to be used for arbitrage trading so any tips in this direction are welcome.
My current approach has been to use Selenium with Python as I am a beginner in web scraping. However, I have investigated some approaches and from my findings it seems that Selenium has a considerable delay (See benchmark results below) I mention other things besides it in this question but the focus is on Selenium.
In general, I need to send a BUY or SELL request to whatever broker I am currently working with. Using the web browser client, I have found a few approaches to do this:
1. Actually clicking the button
a. Use Selenium to Click the button with the corresponding request
b. Use Puppeteer to Click
c. Use Pynput or other direct mouse input manipulation
2. Reverse-engineering the request and sending it directly without clicking any buttons
Now, the delay in sending this request needs to be minimized as much as possible. Network delay is out of our control for the purpose of this optimization.
I have benchmarked the approaches for 1. by opening a page in the standard way you would, either with puppeteer and selenium then waiting for a few seconds. While the script waits, I injected into the browser the following code:
$x('//*[#id="id_demo_webfront"]/iframe')[0].contentDocument.body.addEventListener('click', (data => console.log(new Date().getTime())), true);
The purpose of this code is to log in console the current time when the click is registered by the browser.
Then, I log in my python(Selenium, pynput)/javascript(Puppeteer) script the current time right before issuing the click. I am running on Ubunutu 18.04 as opposed to Windows so my system clock should have good resolution. Additional reading about this here and here.
Actual results, all were run on the same webpage multiple times, for roughly 10 clicks each run:
1. a. ~80ms
1. b. ~10-30ms
1. c. ~5-10ms
For 2., I haven't reliably tested the delay. The idea behind this approach is to inject a function that when fired will send a request that exactly resembles a request that would be sent when the button is clicked. I have not tested the delay between issuing the command to run such an injected function and it actually being run, however I expect this approach to be even faster, basically creating my own client to interact with whatever API the broker has on the backend.
From the results, it is pretty clear that issuing mouse commands seems to be the quickest, but it also is the hardest for me to implement reliably. Also, I seem to find puppeteer running faster across the board, but I prefer selenium in python for ease of development and was wondering if there are any tips and ideas to speed up those delays I am experiencing.
Summary:
Why does Selenium with Python have such a delay to issue commands and can it be improved?
Why does Puppeteer seem to have a lower delay when interacting with the same browser?
Some code snippets of my approach:
class DMMPageInterface:
# These are part of the interface script, self.page is the webpage after initialization
def __init__(self, config):
self.bid_price = self.page.get_element('xpath goes here')
...
# Mostly only the speed of this operation matters, sending the trade request
def bid_click(self):
logger.debug("Clicking bid")
if USE_MOUSE:
mouse.position = (720,390)
mouse.click(Button.left, 1)
else:
self.bid_price.click()
Quite a few questions there!
"Why does Selenium with Python have such a delay to issue commands"
I don't have any direct referencable evidence for this but the selenium delay is probably down to the amount of work it has to do. It was created for testing not for performance. The difference in that is that the most valuable part of a test is that MUST run reliably, if it's slower and does more checks in order to make it more reliable than so be it. You still get a massive performance gain over manual testing, and that's what we're replacing.
This is where you'll need the reliability - and i see you mention this need in your question too. It's great if you have a trading script that can complete an action in <1s - but what if it fails 5% of the time? How much will that cause problems?
Selenium also needs to render the GUI then, depending on how you identify your objects it needs to scan the entire DOM for what you want. Generic locators will logically take longer.
Selenium's greater power is the simplicity and the provided ability synchronisation. There's lots of talk in lots of SO articles about webdriverwait (which is great for in-page script sync), but there's also good recognition of page load wait-times. i.e. no point pushing the button before all the source is loaded.
"Why does Puppeteer seem to have a lower delay when interacting with the same browser?"
Puppeteer works with chrome devtools - selenium works with the DOM. So the two are interacting in a different way. Obviously, puppeteer is only for chrome, but selenium can go to almost any browser. It might not be a problem for you if you don't care about cross-browser capability.
However - how much faster is it really?
In your execution timings, you might also want to factor in the whole life cycle of what you aim to do - namely, include: browser load time, and page render time. From your story, you want to buy or sell something. So, you kick off the script - how long does it take from pressing GO to the end result. You most likely will need synchronisation and that's where puppeteer might not be as strong (or take you much longer to get a handle on).
You say you want to "interact with any webpage with as little delay as possible", but you have to get to the page first and get it in the right state. I wouldn't be concerned about milliseconds per action. Kicking open chrome isn't the quickest action in itself and your overall millisecond test becomes >seconds to minutes.
When you have full selenium script taking 65 seconds and a puppeteer script taking 64 seconds, the overall consideration changes :-)
Some other options to think about if you really need speed:
Try running selenium headless - no gui, less resources, runs faster. (google it) :-)
Try other browsers
If you're all about speed - Drop the GUI entirely. Create an API script or start to investigate the use of performance scripts (for example jmeter or loadrunner).
Another option that isn't selenium or puppeteer is javascript directly in the browser
Try your speed tests with other free web tools like cypress or Microsoft's playwright - they interact with JS in the browser directly and brag about some excellent inbuilt stability features.
If it was me, and it was about a single performant action - my first port of call would be doing it at the API level. There's lots of resources and practice sites out there to help get you started and there's no browser to render, there's barely anything more than the network delay. That's the real way to keep it as a sub-second script.
Final thought, when you ask "can it be improved": if you have some code, there may be other ways to make it faster. Share your script so far and i'm sure folks will love to chip in their 2 cents on how you can streamline it :-)
I'm developping a NodeJS application, and sometimes I get my CPU usage at 100%, with no memory variation (I check this with pm2 monit) and with my Event Loop blocked by this operation.
So I remember that browsers do something interesting when a script seems to have a problem, and they show a popup like: "Do you want to continue running script on this page or stop it".
Is there a way to have the same behavior with NodeJS? I want to detect an operation which takes too many time, and trigger something that I can catch by pm2 to force restart, or (better) by node-inspector where it could break on the bad script and so I could understand what is the problem.
Thanks a lot.
your can stop the node server or check the task list for windows and stop the services whatever services you want.
I have a script that has a few timeouts, draws a menu using Raphael and uses all sorts of dynamic properties to accomplish this. You can take a look at it over at jsfiddle: JSFiddle version of my code
My problem is that Firefox, just occasionally, throws the "a script is running slowly etc." error when this page is open, might be after half a minute or more. Typically I'll have hovered over one element or so, so one sub menu is open at this time. The error usually doesn't point to any of my js files, sometimes even Firefox's own files.
How do I debug this, is it possible at all? Any tips are appreciated. (I'm using chronos for the timers now, didn't seem to help)
Profile your code
You probably want to do Profiling, i.e. performance analysis of your code. As others have pointed out, Firebug is a good tool in Firefox. More specifically: In Firebug, click Profile in the Console tab. Then play with your navigation a bit and click Profile again to finish the analysis and get the results. They will tell you which functions were called how often and how long their execution took. This will help you identify performance bottlenecks in your code.
Optimize
On a quick glance I would say you could probably still optimize the DOM queries. Save references to the results of your expensive queries. And use contexts if possible, to make queries more efficient.
Also, requestAnimationFrame() seems to be the way to go for javascript animations now, instead of setTimeout().
Most browsers have a mechanism of some sort to allow users to stop "long running scripts".
How a script is deemed to be "long running" varies slightly between browsers, and is either the number of instructions executed within an 'execution context' or its elapsed duration.
JavaScript, when running in a browser, is predominantly event-driven (except for the immediate execution of JavaScript as the page is being parsed). The browser will wait until an execution context has completed before doing anything, and any display refreshing etc can be blocked whilst waiting for JavaScript to execute.
You basically need to break up your instructions into responses to events. If you have a massive loop anywhere (e.g. to do an animation), you really need to use an interval to break up the execution cycles. E.g. var interval = window.setInterval(refreshDisplay, 50); - this will call a refreshDisplay function (with no arguments) every 50 milliseconds (a crude 20 calls-per-second).
Try Firebug and use something like this as shown in screenshot --
Link to download firebug --
http://getfirebug.com/downloads/
firebug for debugging javascript (Firefox addon)
Use F12 to show firebug
This will help you !
With Firebug you can debug javascript with breakpoints and see all ajax calls !
There's probably some operation that's running slowly, or one of your loops has a bad complexity. You can use binary splitting: remove half of your code, see if there's a dramatic increase in timing, and do the same in the half of the code that's running the slowest.
I have a web portal cricandcric.com which I have done using the php, java script, and mysql.
I dont see the java script error in Firefox, but i see the error in IE,
I observe the site loads faster in Firefox than in IE.
So my question, does java script errors can slow down the website loading time, even though the java script placed at the end of the page, (Yslow Strategies)
It depends. If the error happens early on and a lot of your script code is bypassed, it could actually make it faster. But every time an error occurs, there is some overhead (the exception object has to be built and sent up the call stack to look for any catches), so if it happens at the end, the script would run slower.
But I doubt your change in load time is noticeably impacted by script errors. How long the script takes to execute on the browser's JS engine or a host of other factors will have more impact.
IE's javascript engine has always been notably behind the other common browsers in performance, so it really might come down to that more than anything else. One of the many improvements in IE9 is JS execution speed that's actually competitive.
That said, the JS error probably is worth looking into, since it's occurring every time the image slideshow advances which happens once every couple of seconds.
If you're concerned about performance in general, there are a couple of tools like YSlow and the recently-opensourced DOM Monster bookmarklet, for giving suggestions on general ways to speed up websites. You might also want to check out some of the writings of Steve Souders.
I have a page that has a byzantine amount of JavaScript code running. In Internet Explorer only, and only version 8, I get a long-script warning that I can reliably reproduce. I suspect it is event handlers triggering themselves in an infinite loop.
The developer tools are limping horribly under the weight of the script running, but I do seem to be able to get the log to tell me what line of script it was executing when I aborted, but it is inevitably some of the deep plumbing of the ExtJS code we use, and I can't tell where it is in my stack of code.
A way of seeing the call stack would work, but preferably I'd like to be able to just break into the debugger when I get the long script warning so I can just step through the stack.
There is a similar question posted, but the answers given were for a not-the-right-tool, or the not terribly helpful advice to eliminate half my code at a time on a binary hunt for the infinite loop. If my code were simple enough that I could do that, it probably wouldn't have gotten the infinite loop in the first place. If I could reproduce the problem in Firebug, I'd probably be a lot happier too.
Here is what I would do:
Go to http://www.microsoft.com/whdc/devtools/debugging/default.mspx and install the Debugging Tools for Windows. You want to run WinDBG when this is installed.
Follow the steps outlined at http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx#a to setup the symbol server connection and have the symbols automatically downloaded to your local drive (c:\websymbols -- or whatever).
Run IEXPLORE.EXE under WinDBG. The help file should give you assistance in doing this if necessary. You need a couple of commands once you get Internet Explorer running and such. First, go ahead and get that large script going.
Break into the debugger (CTRL-SCROLLLOCK to break).
a. Do a LN to "list nearest" to get the DLL files that are loaded. Hopefully, you'll have JSCRIPT.DLL loaded in memory.
b. Type .reload /f to force the reloading of all of the symbols. This will take a while. Now, after this is done, type LN again and you should see that the proper JSCRIPT.PDB has been downloaded to your system in the symbols directory you setup earlier.
Depending on what you want to do, you may need to restart the debugger, but you can do this: After the initial break on WINDBG load, you can type "sxe ld jscript.dll" and it will break when jscript.dll loads.
This is the tricky part, because once this loads, you don't have the code for jscript.dll, but you have the proper symbols (if they are not loaded, then reload them with .reload /f). You can view the functions available by typing "x!jscript" and you'll get a full list of all of the functions and variables.
Pick one, set a break point, and then you should be able to track what is happening to your script.
If nothing else is accomplished, by using the .reload /f process, you can get the appropriate jscript.pdb files loaded on your system. It's possible you could use these in conjunction with Visual Studio to do additional debugging in that manner, but I'm not so sure how well that will work.
I've run in to this before and have had some luck with enabling the developer tools along with Visual Studio. When an error is encountered the page loading is halted, and I could then load up Visual Studio to see the specific line causing the trouble.
This site has some information on using Visual Studio along with the Internet Explorer debugger: Using Visual Studio to Debug JavaScript in IE