PyQt4: Trigger click on a javascript link - javascript

I am trying to scrape some web pages in Python were the information is generated by Javascript.
I managed to retrieve the information generated on page load by using a headless browser with PyQt4 (example here: http://blog.motane.lu/2009/07/07/downloading-a-pages-content-with-python-and-webkit/)
But now, I'm trying to retrieve some information that is generated by having the user click on a Javascript link.
How can I do that?
Thanks

I guess you need Form Extractor Example. The trick is that you can expose any python object to javascript and call its methods. Pytonic version of this example could be found in PyQt distribution.

Related

Wordpress Custom JS Location

We decided to go to another provider and rebuild the website. Accidently, we moved the old domain as well and we can't access the wordpress backend nor the frontend anymor. Now for the new page, I need a Custom javascript code I entered on a specific page with Elementor. Where can I find my old code now without accessing the homepage? I have FTP and Database access.
Where do you put the code? Depending on the answer you need to look at different locations.
Afaik elementor doesn't support custom js nativly. You can bypass that with a html widget or custom plugin. If you used elementors gui for that it's probably stored like all other elementor related content inside wp_post_meta of your database.
If you don't know the page where it was used I would suggest to export the database as sql and use an editor to search for a text string related to the code (for example the url of the ajax request).
If you know the page you can do the same but faster. Look for the id of the post with the code, search for it in wp_post_meta and go on like mentioned before.
If you havn't used elementors gui the script could lie inside a directory on the server probably inside your theme.

Webpage with Javascript forwarding rendering/simulating in Java

There is a web service that provides a link to a given DOI (https://en.wikipedia.org/wiki/Digital_object_identifier), which can be used to access the PDF of the associated document. The link has the following structure, see: https://libkey.io/libraries/1420/articles/362897792/full-text-file?utm_source=api_50
If you access the link, you will be redirected to the PDF document. This works fine if I want to access the document with the browser. But if I want to download the PDF document programmatically or via Java, I need the direct link to the PDF.
My question: How can I get direct access to the PDF. Is there a library that can simulate the browser in Java? Do you know other ways to get to the PDF.
If my problem is not understandable enough, ask me specific questions!
Thanks a lot

Iron Python script in spotfire web player

I have a html page which i display a spotfire report in using the web player. I would like to have the option of refreshing the data by pressing a button i create, and i couldnt find a way to do that using the web player. I know there is a way to do this using script in iron python but i dont understand how exactly it works, i mean, when i click the button i would like the script to run. Where do i write it? How do i call it? My html page controller (im using angular) is in java script.
Thanks :)
Can't create python scripts in webplayer as far as I know. You need the client. There you can create a Text Area and add an Action Control. Select 'Script' (requires scripting privileges to be able to write code in python) and code to refresh the data table:
myDataTable.Refresh()
myDataTable is a script parameter that points to your visualization data table.

I need help creating a Windows 7 Gadget

I need to create a Windows 7 Gadget (or Widget) as a mini project. I know how to create a basic HelloWorld gadget (including the xml manifest and the html page), but I do not know how to make a complex one.
My company uses a bug tracking software (say, XYZ). My widget needs to be able to access and display data from XYZ regarding bugs, given a bug ID, or other search criteria.
I currently have the APPGUID and server name for XYZ.
Please help. I do not know where to start.
If your bug tracking software (XYZ) is a web application then you need to use its web service or you need to scrape the site to access the data regarding the bugs. You can simply scrape the site using the Simple HTML DOM.
Example can be seen in PHP Simple HTML DOM Scrape External URL
To download the library the link is http://sourceforge.net/projects/simplehtmldom/files/
Then you can scrape and display the data as the normal HTML code.
OR you have to use the web service provided by the XYZ application.

How to extract the dynamically generated HTML from a website

Is it possible to extract the HTML of a page as it shows in the HTML panel of Firebug or the Chrome DevTools?
I have to crawl a lot of websites but sometimes the information is not in the static source code, a JavaScript runs after the page is loaded and creates some new HTML content dynamically. If I then extract the source code, these contents are not there.
I have a web crawler built in Java to do this, but it's using a lot of old libraries. Therefore, I want to move to a Rails/Ruby solution for learning purposes. I already played a bit with Nokogiri and Mechanize.
If the crawler is able to execute JavaScript, you can simply get the dynamically created HTML structure using document.firstElementChild.outerHTML.
Nokogiri and Mechanize are currently not able to parse JavaScript. See
"Ruby Nokogiri Javascript Parsing" and "How do I use Mechanize to process JavaScript?" for this.
You will need another tool like WATIR or Selenium. Those drive a real web browser, and can thus handle any JavaScript.
You can't fetch the records coming from the database side. You can only fetch the HTML code which is static.
JavaScript must be requesting the records from the database using a query request which can't be fetch by the crawler.

Categories

Resources