How to execute a page's javascript function in perl? - javascript

I am trying to extract data from a website using Perl. Below is the description of the site:
site displays data dependent on a date
a calendar is displayed that is used to change the date
upon clicking the dates in the calendar, it calls a javascript function that passes in the date and refreshes the part of the page that displays the data
My question is, how do I execute that JS function so that I could loop through the dates that I need data from?
Thanks in Advance

It's much easy make same HTTP request from your script and get all data you need directly.
You can record all HTTP requests/responses of your browser by using HTTP Fox extension (for Firefox).

Edit
There is a CPAN module:
JavaScript - Perl extension for executing embedded JavaScript
But I've not already tested them.
Original post:
Take a look at smjs it's Spidermonkey's JS shell.
You could pass javascript by:
open my $jsout,"echo '$javascript' | smjs |" or die;
print <$jsout>;
...
But be care! This take security consideration away!

Related

Find the way to generate the sha256Hash argument in ajax request on this website - https://dutchie.com/dispensaries/taste-buds/menu

Open Chrome browser, open developer tool, switch to network tab, select XHR sub tab
input this URL https://dutchie.com/dispensaries/taste-buds/menu, and press refresh.
You will see ajax call (graphQL) to load products - https://dutchie.com/graphql?operationName=DispensaryQuery&variables=%7B%22dispensaryFilter%22%3A%7B%22type%22%3A%22Dispensary%22%2C%22activeOnly%22%3Afalse%2C%22cNameOrID%22%3A%22taste-buds%22%7D%7D&extensions=%7B%22persistedQuery%22%3A%7B%22version%22%3A1%2C%22sha256Hash%22%3A%22bf3bcd4d9eed1d4a889037a5f2ff50332dd61b3979b900008e9af8ddbbf62f85%22%7D%7D
scrolling down to end to see the Query String Parameters, view parsed. You will see there is an sha256Hash argument. This value will change every 2 or 3 days.
I find out the hash value is generated in this JS code.
https://assets2.dutchie.com/vendors~main-0214ed32ab1236096fdb.bundle.js
I don't understand JS totally, anyone who can help to find the way how the hash value is generated? So i can write Python code and C# code to generate the same value.
Great thanks!

PowerBi: Query HTML table

What I need
I need to retrieve data from this source . Let's assume I must use only PowerBi for this.
What I did so far
If I use the basic web source option, then the query is just basically an htlm parsing with which I can easily get the data found in the html scope of the page, example:
Source:
The steps I'm following through Web source option:
Query:
(to simplify the example, assume we don't need the dates)
You can download that example .pbix file here.
The problem
The problem is that I need more data, which can't be accessed through the html preview. For example, let's imagine I need to retrieve the data from January 2010 to April 2020. Those king of queries can only be done via this button located in the webpage (which exports the requested data to an Excel workbook):
The idea is to get this process automated, so going to the source and export the excel file all the time is not an option.
Inspecting the element I realized that what it does is execute a javascript function:
The question
As a PowerBi/PowerQuery noob I wonder: Is there any way I can get that data directly with PowerBi (maybe calling the js function somehow)? If there is so, then how?
Thank you in advance.
The solution to my case was to use URL parameters to retrieve de data without parsing the html table.
❌Original URL I was using:
https://gee.bccr.fi.cr/indicadoreseconomicos/Cuadros/frmVerCatCuadro.aspx?idioma=1&CodCuadro=%20400
✔️New URL for the query, adding some parameters:
https://gee.bccr.fi.cr/indicadoreseconomicos/Cuadros/frmVerCatCuadro.aspx?idioma=1&CodCuadro=%20400&Idioma=1&FecInicial=2010/01/01&FecFinal=2040/01/01&Filtro=0&Exportar=True
This procedure only works in this case, because obviously the parameters will not be the same on other web pages.
However, I post this answer to keep the main idea for those who are in a similar situation: first try with the appropriate url parameters to get the data in a different format. Of course you first must know which are the available parameters, which is a limitation.

Extract html sourcecode from a javascript generated output

I am currently working on a project of finding empty classrooms in our school in real time. For that purpose, I need to extract substitution published on our school page (https://ssnovohradska.edupage.org/substitution/?), since there might be any additional changes.
But when I try to extract the html source code and parse it with bs4, it cannot find the divs(class: "section print-nobreak") that contain the substitution text. When I took a look at the page source code(Ctrl+U) I found that there is only a javascript that prints it all directly.
Is there any way to extract the html after the javascript output has been already rendered?
Thanks for help!
Parsing HTML is unfortunately necessary to solve your problem. But I will explain how to find ways to avoid that in your future projects (not based on this website).
You've correctly noticed that the text is created by JavaScript code running on the page. This could also indicate that the data is either loaded from another resource (XHR/fetch call getting a response from an API) or is stored as a JSON/JS inside of the website's code. (Or is generated from an algorithm, but this is unlikely to be the case in such websites.)
The website actually uses both methods (initial render gets data stored inside of the website's code, but when you switch dates on the calendar it makes AJAX requests). You can see this by searching for ReactDOM.render(React.createElement( in the code. They're providing a HTML string to the createElement call, so I would suggest looking into the AJAX way of doing things.
Now, to check where the resource is located, all you need to do is opening Developer Tools in your favorite browser (usually Control+Shift+I) and navigating to the Network tab. Now that your network tab is open, you need to cause the website to load external data, for example, by pressing a date on the "calendar bar".
Here you will notice many external requests, but we're actually looking only for XHR calls. Click on the XHR button next to the "Filter" text field. That should result in only one request being shown:
Unfortunately for us, the response only contains HTML. Also, API calls are protected - they require a PHP session ID and some sort of a token (__gsh) to not fail. So, going back to step 1 - seems like our only solution is to use regular expressions to find the text between "report_html":"<div class and </div></div></div> from the source code, if you're interested in today's date only. If you want to get contents for tomorrow or any other date - you will need to either fetch the page, save the cookies and find the token to supply to the request and then make that request, or use something like puppeteer or pyppeteer (since you've mentioned BS4) and load the webpage in that. If you aren't doing the data fetching that often, you should be fine overall.

Classic ASP and Javascript Integration

I'm currently using Classic ASP and youtube javascript API in order to pull information of videos and store them into a database, however I need to know if some of the next steps are possible, or if I would have to convert to another language.
The information I am seeking to download into my SQL 2012 Database currently exceeds the maximum space allowed, meaning I can only send about 50 of my 1700 results (and growing) each time. Prior to the space cap, I would simply keep running the next page function until there is no more pagetokens and simply upload all the data, however, now I must do it in small steps.
My application currently works like this: Javascript creates hidden forms->Forms are submitted->classic ASP queries form and moves information to database
By directly editing the code I can modify which 50 results I send to the classic ASP, but I'd like to be able to do this without modifying code.
So my question is this: Is it possible to send a url query of sorts to javascript so that I know what results I have sent? Or is there a better way to circumvent the space issue aside from rerunning the javascript each time?
The error I get when attempting to spend too much information is:
Request object error 'ASP 0104 : 80004005'
Operation not Allowed
I apologize if this question seems a little vague as I'm not entirely sure how to word this without writing a 5 paragraph essay.
You could add a redirect on the ASP doing the downloading. The redirect can go back to the javascript page and include the number of results processed in the url like so:
Response.Redirect "javascript.asp?numResults=" & numberOfResultsSentSoFar
Then on the javascript page include some ASP to extract the number of results processed
dim resultsProcessed = Request.QueryString("numResults")
Then you can feed it into javascript like so:
var currentResultIndex = <%=resultsProcessed%>;
However, a better way might be to use AJAX to send the first 50 results and wait for a response from the ASP and then send the next 50.

Pull an external page 10 seconds after the request using PHP

I have two web pages that I'll call domain.com/Alvin and domain.com/Bert for this example.
Alvin displays search results based on a query string variable, but it loads the results using JavaScript approximately two seconds after the page loads.
Bert needs to use these results for occasional ad-hoc reporting, but due to the way the company is set up, I can't link directly into the database that Alvin is pulling from. A different team manages the Alvin page, so I won't have access to change their existing code.
While I think I could do this with .NET, I'm unsure of how to do the request with PHP which is highly preferred for the page.
Is anybody aware of how I could use file_get_contents, file_get_html or any other PHP functions to get the HTML of another page but only pull the HTML five seconds after the initial request to allow the JavaScript to update the results?
Credit to mplungjan - not sure why I didn't think of this earlier, but I was able to replicate the AJAX to the same request. Thanks!
Since they are on the same domain, one page can ajax the other page in – mplungjan

Categories

Resources