Javascript API to screen scrape page flow, including button click

Javascript API to screen scrape page flow, including button click - javascript

We are looking for Javascript API to screen scrape page flow, including button click. If it were on server side, WebDriver of Selenium would have been a great choice, but we want the screen scraping to run on the client browser. The screens to be scrapped is a transaction in itself (login to third party website, transaction step 1, step 2 and then final confirmation). Any javascript API available?
AFAIK, both nodeJS and phantom JS don't have capability to click a button from the scrapped page.
thanks in advance,
abbas

Webdriver is an HTTP based protocol, something that every browser speaks, so it is possible to control one browser from another. I've written a tutorial some weeks ago on that topic here

I recently ran into DalekJS (http://dalekjs.com/docs/actions.html) which allows for taking screenshots of pages and clicking on the elements as well. I think it even supports multiple browsers -- although they have to be installed (http://dalekjs.com/pages/documentation.html#browsers).
here is the sample code directly from the link:
test.open('http://doctorwhotv.co.uk/')
.assert.text('#nav').is('Navigation')
.assert.visible('#nav')
.assert.attr('#nav', 'data-nav', 'true')
.click('#nav')
.done();

Related

How can I capture webpage screenshots with JavaScript?

I want to take a screenshot of the webpage so that it can be used to share on various social media platforms. The webpage has visual elements displayed in iframe which are from different domain (not same origin).
So far i have checked
html2canvas, but looks like there is a limitation where the screenshot wolud not be generated properly if there are visual elements on the page loaded from different origin
getUserMedia: but looks like this triggers a user image capture and not webpage capture. here is link to codepen i found https://codepen.io/jgalazm/pen/bGEgEGW
Here are couple of my questions:
Is my understanding correct that getUserMedia cannot be used to capture webpage screenshot?
What other alternatives i can use to capture the webpage screenshot?
Thanks.

what I've found best is to handle this server side.
So you use selenium with chrome / firefox / phantom js and you render and then take the screen shot through the selenium api.
selenium is primarly Java or Python
https://pythonspot.com/selenium-take-screenshot/
if javascript is more of your thing than opt for puppeteer
https://buddy.works/guides/how-take-screenshots-with-puppeteer-headless-chrome
Deployment
If I knew more about your backend then I could answer more thoroughly.
If you are using a cloud provider, I'd recommend a serverless function.
If you have a node server than Puppeteer
If you have a python server than selenium.

Live Update Software for Web Pages on Multiple Screens

I have six devices on/above my desk:
A 4k TV
An Ultrawide monitor
Two laptop computers
An iPad
An android phone
All are connected to the internet and browsing mydomain.org/page, an html page that I am editing using one of the laptops, which has a full page of code. When I press a button on my Wacom tablet that I rerouted through Wacom smart actions to run a script, my code is uploaded over FTP to my server. Right now, I have to reload every page on every device to see the updated results. I want to make a javascript to run on my site that has a LiveLoad() global function. When I execute this function from the DevTools command line, the script stores a cookie that the device is a debug device.
All pages with the script will now enable a small icon over the page that when set to 'live,' the script will open a connection through secure web sockets with an update server that stores up to 10 connections (perhaps only from approved IPs, so that only I can use this on my live site), each registered to the page they are browsing. When I update mydomain.org/page or other pages, my script securely opens a connection and POSTs a secret code to the update server, telling it to send a message over web sockets to all connected debug devices with that page open. On fast internet, this truly will be the ultimate website building setup if I can just overcome this design hurdle.
What I don't want:
Anything more than a single php script for the server-side implementation. Shared hosting. No root access. No Node. No fancy crap. Single PHP files are the only option.
Bloated javascript addons. Bare minimum code for receiving a single type of message (an update is needed) is all I need. Perhaps later I can make it more robust with a second type of update (a hard update) where a php script processes all the fonts, scripts, images etc on a page and adds a random query string after them to force a hard-reload of all page resources.
How can I achieve this? Where to start with php web sockets? The internet has proven to be a cynical wasteland of bloated php libraries that require installation in a node enviornment to freelancers struggling to make scripts off of the 2013 websockets API documentation, with no good, simple solutions around.

Scrape website after form submit and data is loaded

I have to scrape a website which i've reviewed and i realised that i don't need to submit any form. I have the needed urls to get the data.
I'm using NodeJs and Phantom.
My problems source is something related with the session or cookies (i think).
In my web browser i can enter in this link https://www.infosubvenciones.es/bdnstrans/GE/es/convocatorias, hit on the form blue button with text "Procesar consulta". The table below will be filled. In dev tools on network tab you can see a XHR request with a link similar to https://www.infosubvenciones.es/bdnstrans/busqueda?type=convs&_search=false&nd=1594848133517&rows=50&page=1&sidx=4&sord=desc, if you open it in a new tab, the data is displayed. But if you open that link in other web browser you get 0 results.
That's exactly what is happening to me with NodeJs and Phantom and i don't know how to fix it.

If you want to give Scrapy a try, https://docs.scrapy.org/en/latest/topics/dynamic-content.html explains how to deal with this type of scenarios, and I would suggest reading it after completing the tutorial.
The page can also be handy if you use other scraping framework, as there’s not much that is Scrapy-specific, and for Python-specific stuff I’m sure there will be JavaScript counterparts.
As for Cheerio and Phantom, I’m not familiar with them, but it is most likely doable with them as well.
It’s doable with any web client, it’s just a matter of knowing how to use the tool for this purpose. Most of the work involves using your web browser tools to understand how the website works underneath.

Use JS or some applet to create a screenshot in Firefox?

I'm creating a personal home page, due to the fact that iGoogle will be discontinued. One of the things I'm trying to create, is a speed dial-type interface, with website thumbnails as links, and I'd like to automate this process.
I've attempted screenshot automation a few years back with linux and the webkit engine. And it's fine. But my problem is, that I want the screenshots to be from my browser, i.e. my Gmail inbox, not the login page I'd get if attempting a remote screenshot.
I thought of using html2canvas but again, I'd have to load the source of the webpages remotely using a proxy, and that's not what I want. Another attempt of mine, was to load the website in an iframe, extract the source, and pass it on to html2canvas. Unfortunately most websites like google, facebook etc don't allow embeding their websites into iframes, so I'm still stuck.
How do plugins like FoxTab, and SpeedDial make the screenshots from within the browser without popups etc? They do it "browser side" silently, is it possible to duplicate this using just JavaScript? Or is there a way I could accomplish the same in another way, perhaps with a custom addon or something?..

Have you considered using a service like http://webthumbnail.org/ ?
http://phantomjs.org/ is also a great service for that if you want to do it yourself.

Take a look at phatomjs. We use it to take screenshots of all our hosted sites periodically. Phantomjs is a headless Webkit implementation.

how can I launch/pass a value to a Windows C# application from a web page? (it seems possible with song links to iTunes)?

I'm putting together a C# winforms application, and it would be good to be able to have the ability for someone to click on a webpage link that automatically maximizes my c# application and passes some data to it. Pretty much like I see some webpages have a song link that automatically opens iTunes, and then in iTunes it searches for the song details you passed.
Q1 - How does one do this in HTML/Javascript?
Q2 - Does this approach only work on certain browsers?
Q3 - Would this only work on Windows? (I just need it for windows myself)
Thanks

You can register a URL protocol handler (see) which allows you specify a unique URL protocol and you can make clickable links in web pages which will spawn a new application passing the complete URL. Be careful though because this mechanism has been mis-implemented a number of times which can expose you to exploitation.
Also browsers will normally warn you if you are trying to use one of these odd URLs. And this will only work on Windows (but there are alternatives on other OSes).

You would have to associate a new file type with your C# application. The web page could "launch" such a file by downloading it.
I believe you'd have to pass parameters by writing them to the file to be downloaded.
It's true that there would be a "run or save" prompt, but aside from that, I think this would be the simplest method, and the one that would be easiest to maintain.

My first reaction would be that you'd have to create some kind of browser plugin first, that would act as the middle man between your javascript and your C# application. This is because website javascript and other code is run in a limited security context and cannot access priviliged resources, such as other applications, named pipes, tcp ports, etc.

Develop Reference

JavaScript is the programming language of the Web.