I wish to create a script that can interact with any webpage with as little delay as possible, excepting network delay. This is to be used for arbitrage trading so any tips in this direction are welcome.
My current approach has been to use Selenium with Python as I am a beginner in web scraping. However, I have investigated some approaches and from my findings it seems that Selenium has a considerable delay (See benchmark results below) I mention other things besides it in this question but the focus is on Selenium.
In general, I need to send a BUY or SELL request to whatever broker I am currently working with. Using the web browser client, I have found a few approaches to do this:
1. Actually clicking the button
a. Use Selenium to Click the button with the corresponding request
b. Use Puppeteer to Click
c. Use Pynput or other direct mouse input manipulation
2. Reverse-engineering the request and sending it directly without clicking any buttons
Now, the delay in sending this request needs to be minimized as much as possible. Network delay is out of our control for the purpose of this optimization.
I have benchmarked the approaches for 1. by opening a page in the standard way you would, either with puppeteer and selenium then waiting for a few seconds. While the script waits, I injected into the browser the following code:
$x('//*[#id="id_demo_webfront"]/iframe')[0].contentDocument.body.addEventListener('click', (data => console.log(new Date().getTime())), true);
The purpose of this code is to log in console the current time when the click is registered by the browser.
Then, I log in my python(Selenium, pynput)/javascript(Puppeteer) script the current time right before issuing the click. I am running on Ubunutu 18.04 as opposed to Windows so my system clock should have good resolution. Additional reading about this here and here.
Actual results, all were run on the same webpage multiple times, for roughly 10 clicks each run:
1. a. ~80ms
1. b. ~10-30ms
1. c. ~5-10ms
For 2., I haven't reliably tested the delay. The idea behind this approach is to inject a function that when fired will send a request that exactly resembles a request that would be sent when the button is clicked. I have not tested the delay between issuing the command to run such an injected function and it actually being run, however I expect this approach to be even faster, basically creating my own client to interact with whatever API the broker has on the backend.
From the results, it is pretty clear that issuing mouse commands seems to be the quickest, but it also is the hardest for me to implement reliably. Also, I seem to find puppeteer running faster across the board, but I prefer selenium in python for ease of development and was wondering if there are any tips and ideas to speed up those delays I am experiencing.
Summary:
Why does Selenium with Python have such a delay to issue commands and can it be improved?
Why does Puppeteer seem to have a lower delay when interacting with the same browser?
Some code snippets of my approach:
class DMMPageInterface:
# These are part of the interface script, self.page is the webpage after initialization
def __init__(self, config):
self.bid_price = self.page.get_element('xpath goes here')
...
# Mostly only the speed of this operation matters, sending the trade request
def bid_click(self):
logger.debug("Clicking bid")
if USE_MOUSE:
mouse.position = (720,390)
mouse.click(Button.left, 1)
else:
self.bid_price.click()
Quite a few questions there!
"Why does Selenium with Python have such a delay to issue commands"
I don't have any direct referencable evidence for this but the selenium delay is probably down to the amount of work it has to do. It was created for testing not for performance. The difference in that is that the most valuable part of a test is that MUST run reliably, if it's slower and does more checks in order to make it more reliable than so be it. You still get a massive performance gain over manual testing, and that's what we're replacing.
This is where you'll need the reliability - and i see you mention this need in your question too. It's great if you have a trading script that can complete an action in <1s - but what if it fails 5% of the time? How much will that cause problems?
Selenium also needs to render the GUI then, depending on how you identify your objects it needs to scan the entire DOM for what you want. Generic locators will logically take longer.
Selenium's greater power is the simplicity and the provided ability synchronisation. There's lots of talk in lots of SO articles about webdriverwait (which is great for in-page script sync), but there's also good recognition of page load wait-times. i.e. no point pushing the button before all the source is loaded.
"Why does Puppeteer seem to have a lower delay when interacting with the same browser?"
Puppeteer works with chrome devtools - selenium works with the DOM. So the two are interacting in a different way. Obviously, puppeteer is only for chrome, but selenium can go to almost any browser. It might not be a problem for you if you don't care about cross-browser capability.
However - how much faster is it really?
In your execution timings, you might also want to factor in the whole life cycle of what you aim to do - namely, include: browser load time, and page render time. From your story, you want to buy or sell something. So, you kick off the script - how long does it take from pressing GO to the end result. You most likely will need synchronisation and that's where puppeteer might not be as strong (or take you much longer to get a handle on).
You say you want to "interact with any webpage with as little delay as possible", but you have to get to the page first and get it in the right state. I wouldn't be concerned about milliseconds per action. Kicking open chrome isn't the quickest action in itself and your overall millisecond test becomes >seconds to minutes.
When you have full selenium script taking 65 seconds and a puppeteer script taking 64 seconds, the overall consideration changes :-)
Some other options to think about if you really need speed:
Try running selenium headless - no gui, less resources, runs faster. (google it) :-)
Try other browsers
If you're all about speed - Drop the GUI entirely. Create an API script or start to investigate the use of performance scripts (for example jmeter or loadrunner).
Another option that isn't selenium or puppeteer is javascript directly in the browser
Try your speed tests with other free web tools like cypress or Microsoft's playwright - they interact with JS in the browser directly and brag about some excellent inbuilt stability features.
If it was me, and it was about a single performant action - my first port of call would be doing it at the API level. There's lots of resources and practice sites out there to help get you started and there's no browser to render, there's barely anything more than the network delay. That's the real way to keep it as a sub-second script.
Final thought, when you ask "can it be improved": if you have some code, there may be other ways to make it faster. Share your script so far and i'm sure folks will love to chip in their 2 cents on how you can streamline it :-)
Related
I have a web app that basically can be looked at as a messaging system - people can submit a message, and someone else can receive it. This all works via AJAX and the Javascript front end interacts with a PHP backend. All of this works completely fine and there's no issue.
I have also implemented the Notification system that sends the desktop or android app a push notification when a new message is received. This also works completely fine.
The notification system works using setTimeout to periodically check the PHP AJAX system. But this is where the deal breaking issues arise.
When out of focus on Android, settimeout becomes completely unreliable - sometimes it will work, sometimes it will not work at all, sometimes it is very late.
To fix this, I then moved everything into a support worker as I thought that would work independent of the browser being focused, but this is even worse - seems it is even less consistent than just running settimeout on the browser.
So is there some way to rectify this? Is there some special directive within the supportworker that I can put so that it does not sleep?
thank you.
This API does not guarantee that timers will run exactly on schedule. Delays due to CPU load, other tasks, etc, are to be expected which may result in the actual delay may be longer than intended.
You can read more about the setTimeout delay and reasons it may be delayed longer here on MDN.
If you need immediate messaging like capabilities you should look into an architecture and protocol meant for such. Something like WebSockets with events would better suffice for this use-case.
I have a macro (Macro Recorder), which runs 24/7 without pause.
I need to view an item in one tab, order a specific quantity in another tab, write a message and repeat.
Can I achieve this result with Puppeteer?
What happens if I do not the close browser (because I don't need to)?
Yes, it is possible to run puppeteer 24/7 without any problems. Simply don't close the browser and keep using it. I have run puppeteer instances myself for weeks without any issues. You can see some statistics of one of the runs here.
The only thing you need to watch out for is error and memory handling. You need to make sure your script does not crash due to some minor JavaScript error. Additionally, you need to clean up any memory (by closing the page or even the browser) that might be leaked due to other kind of errors (network errors, SSL errors, timeouts, ...). See the linked run above to see a full list of errors that might happen during crawling process.
puppeteer-cluster: Pool of puppeteer instances
Depending on the complexity of your use case you might want to use the library puppeteer-cluster (disclaimer: I'm the author), which takes care of these kind of problems and also gives you a simple monitoring overview, which is very handy when running a task for a long time.
If I use Chrome Dev Tools I can do the following:
Open chrome dev tools (right click on the page in chrome => inspect)
Navigate to the "performance" tab
Click the record button
Click on a button in my web app
Stop performance recording
Then i get a nice little pie in the "Summary" tab of chrome:
My question is:
How can i start recording, stop recording and get those summary values (Loading, Scripting etc.) in javascript?
It would be really nice if someone could give me a little code example.
My question is not on how I can handle page navigation, cause for this I am using C# selenium. What I want to do is start performance recording, execute some steps with the webdriver, stop recording and measure the performance.
There are two ways you could do it:
First one:
I would recommend looking into puppeteer.
It's a project done by the guys from google chrome and it has support for tracing. As you can see here https://pptr.dev/#?product=Puppeteer&version=v1.13.0&show=api-class-tracing they have a way to retrieve the generated trace, and you should just write it to your computer to be able to use it later.
The call of tracing.start({}) uses a path which specifies the file to write the trace to.
The call of tracing.stop() can be very easily integrated with the fs library to convert the Buffer output to a file that later you can read with the chrome dev tools in case you wouldn't want to use the start function with the path parameter.
The only downside, is that you can't really reuse your Selenium script and you would have to start more or less from the scratch, even thought Puppeteer claims to be easier.
Second one (a little more difficult):
Use something similar to this library. https://github.com/paulirish/automated-chrome-profiling
It's written in JS, and it works perfectly as it's expected with the example, if you follow the installation steps of the package and then run the command node get-timeline-trace.js and load the file generated (profile-XXXXXXXX.devtools.trace) to the chrome profiler you will have a very nice report.
The only problem I see is that you will have to find a way to execute your selenium scripts passing it the chrome instance to it, and I don't know how easy that could be (maybe the PID might do?)
We use PhantomJs 2.0 to take screenshots of web pages. We've found that one particular page takes several minutes to process. This page does not appear to have this issue (or at least not of any comparable magnitude) when loaded in Chrome.
I believe that this is because the javascript is hanging/running very slowly. During the hang, Phantom is using a lot of CPU (although only one core). It does not appear to be taking up an abnormal amount of memory. I am fairly confident that javascript is the culprit because I can see from logging that all requests complete quickly, but then after the page loads Phantom hangs for awhile and won't run anything (I think this is because Phantom is all single-threaded so if the page is still running javascript my Phantom script won't run anything).
I'd like to debug and try to understand what part of the JS is taking so long, but I can't figure out how to get at this in Phantom. For example, I can't seem to collect any output from console.profile/console.profileEnd. How can I profile the javascript running in Phantom to find the bottleneck?
I use Phantomas, via grunt-phantomas. It's a tool that integrates with PhantomJS to profile a wide variety of performance-related metrics. Definitely worth checking out. If it doesn't give you exactly what you need, you can look at the source and see how they integrate with PhantomJS and get data out.
I have a script that has a few timeouts, draws a menu using Raphael and uses all sorts of dynamic properties to accomplish this. You can take a look at it over at jsfiddle: JSFiddle version of my code
My problem is that Firefox, just occasionally, throws the "a script is running slowly etc." error when this page is open, might be after half a minute or more. Typically I'll have hovered over one element or so, so one sub menu is open at this time. The error usually doesn't point to any of my js files, sometimes even Firefox's own files.
How do I debug this, is it possible at all? Any tips are appreciated. (I'm using chronos for the timers now, didn't seem to help)
Profile your code
You probably want to do Profiling, i.e. performance analysis of your code. As others have pointed out, Firebug is a good tool in Firefox. More specifically: In Firebug, click Profile in the Console tab. Then play with your navigation a bit and click Profile again to finish the analysis and get the results. They will tell you which functions were called how often and how long their execution took. This will help you identify performance bottlenecks in your code.
Optimize
On a quick glance I would say you could probably still optimize the DOM queries. Save references to the results of your expensive queries. And use contexts if possible, to make queries more efficient.
Also, requestAnimationFrame() seems to be the way to go for javascript animations now, instead of setTimeout().
Most browsers have a mechanism of some sort to allow users to stop "long running scripts".
How a script is deemed to be "long running" varies slightly between browsers, and is either the number of instructions executed within an 'execution context' or its elapsed duration.
JavaScript, when running in a browser, is predominantly event-driven (except for the immediate execution of JavaScript as the page is being parsed). The browser will wait until an execution context has completed before doing anything, and any display refreshing etc can be blocked whilst waiting for JavaScript to execute.
You basically need to break up your instructions into responses to events. If you have a massive loop anywhere (e.g. to do an animation), you really need to use an interval to break up the execution cycles. E.g. var interval = window.setInterval(refreshDisplay, 50); - this will call a refreshDisplay function (with no arguments) every 50 milliseconds (a crude 20 calls-per-second).
Try Firebug and use something like this as shown in screenshot --
Link to download firebug --
http://getfirebug.com/downloads/
firebug for debugging javascript (Firefox addon)
Use F12 to show firebug
This will help you !
With Firebug you can debug javascript with breakpoints and see all ajax calls !
There's probably some operation that's running slowly, or one of your loops has a bad complexity. You can use binary splitting: remove half of your code, see if there's a dramatic increase in timing, and do the same in the half of the code that's running the slowest.