First, let me say that I am not familiar with the terminology so, if you see something, by all means, help me improve the wording.
What I want to do is retrieve a CSV file that is generated by a website, apparently based on a table.
The site in question has two drop boxes from which one can select the queries and then, based on `onchange=', a search is made and a table is filled.
With the table filled, a button appears, which can then be pressed and the CSV file, containing the fields, is offered to download.
After poking around with the page, I was able to find and construct the URL responsible to retrieve the CSV file. It is something like:
http://www.example.com/exportCSV.action?field1=3&field2=5
The problem is, if I try to `curl' it, a empty CSV file is retrieved, with just the headers. So, I think that the actual content must be inside the table which is filled using the normal web interface.
The last call from the javascript function that generates the CSV is:
window.open("exportCSV.action?"+fields)
Is there a way to satisfy the initial search so, when I try to curl the `CSV url' I can get a filled CSV, and not a empty one?
This rather sounds like that web site is not accepting your cURL request. Some try to limit their services to “real” browsers only.
Try using a debugging tool like FireBug to have a look at the actual data that the JavaScript sends and receives over the network.
I assume you are doing your cURL call right? Passing parameters, especially on the command line, can be a bit tricky. Make sure you escape the URL correctly, for example with single quotes:
curl 'http://www.example.com/exportCSV.action?field1=3&field2=5'
Else, the & character and possibly the question mark as well might get interpreted by your shell.
Related
What I need
I need to retrieve data from this source . Let's assume I must use only PowerBi for this.
What I did so far
If I use the basic web source option, then the query is just basically an htlm parsing with which I can easily get the data found in the html scope of the page, example:
Source:
The steps I'm following through Web source option:
Query:
(to simplify the example, assume we don't need the dates)
You can download that example .pbix file here.
The problem
The problem is that I need more data, which can't be accessed through the html preview. For example, let's imagine I need to retrieve the data from January 2010 to April 2020. Those king of queries can only be done via this button located in the webpage (which exports the requested data to an Excel workbook):
The idea is to get this process automated, so going to the source and export the excel file all the time is not an option.
Inspecting the element I realized that what it does is execute a javascript function:
The question
As a PowerBi/PowerQuery noob I wonder: Is there any way I can get that data directly with PowerBi (maybe calling the js function somehow)? If there is so, then how?
Thank you in advance.
The solution to my case was to use URL parameters to retrieve de data without parsing the html table.
❌Original URL I was using:
https://gee.bccr.fi.cr/indicadoreseconomicos/Cuadros/frmVerCatCuadro.aspx?idioma=1&CodCuadro=%20400
✔️New URL for the query, adding some parameters:
https://gee.bccr.fi.cr/indicadoreseconomicos/Cuadros/frmVerCatCuadro.aspx?idioma=1&CodCuadro=%20400&Idioma=1&FecInicial=2010/01/01&FecFinal=2040/01/01&Filtro=0&Exportar=True
This procedure only works in this case, because obviously the parameters will not be the same on other web pages.
However, I post this answer to keep the main idea for those who are in a similar situation: first try with the appropriate url parameters to get the data in a different format. Of course you first must know which are the available parameters, which is a limitation.
I am currently working on a project of finding empty classrooms in our school in real time. For that purpose, I need to extract substitution published on our school page (https://ssnovohradska.edupage.org/substitution/?), since there might be any additional changes.
But when I try to extract the html source code and parse it with bs4, it cannot find the divs(class: "section print-nobreak") that contain the substitution text. When I took a look at the page source code(Ctrl+U) I found that there is only a javascript that prints it all directly.
Is there any way to extract the html after the javascript output has been already rendered?
Thanks for help!
Parsing HTML is unfortunately necessary to solve your problem. But I will explain how to find ways to avoid that in your future projects (not based on this website).
You've correctly noticed that the text is created by JavaScript code running on the page. This could also indicate that the data is either loaded from another resource (XHR/fetch call getting a response from an API) or is stored as a JSON/JS inside of the website's code. (Or is generated from an algorithm, but this is unlikely to be the case in such websites.)
The website actually uses both methods (initial render gets data stored inside of the website's code, but when you switch dates on the calendar it makes AJAX requests). You can see this by searching for ReactDOM.render(React.createElement( in the code. They're providing a HTML string to the createElement call, so I would suggest looking into the AJAX way of doing things.
Now, to check where the resource is located, all you need to do is opening Developer Tools in your favorite browser (usually Control+Shift+I) and navigating to the Network tab. Now that your network tab is open, you need to cause the website to load external data, for example, by pressing a date on the "calendar bar".
Here you will notice many external requests, but we're actually looking only for XHR calls. Click on the XHR button next to the "Filter" text field. That should result in only one request being shown:
Unfortunately for us, the response only contains HTML. Also, API calls are protected - they require a PHP session ID and some sort of a token (__gsh) to not fail. So, going back to step 1 - seems like our only solution is to use regular expressions to find the text between "report_html":"<div class and </div></div></div> from the source code, if you're interested in today's date only. If you want to get contents for tomorrow or any other date - you will need to either fetch the page, save the cookies and find the token to supply to the request and then make that request, or use something like puppeteer or pyppeteer (since you've mentioned BS4) and load the webpage in that. If you aren't doing the data fetching that often, you should be fine overall.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm kind of new to javascript but I'm currently working on my website:
When I press a button, javascript generates a random number (for example: Your Coins: 25) and then I need to connect to my 'members' table and add 25 to the 'coins' field. (I'm already connected with mysql in the php code if this matters.)
Could anyone help me?
If it's "coins" then you probably won't want it to generate client side, otherwise someone would be able to call your java script function with any number they like and add in millions of coins!
The other way is to have PHP generate the number for you.
You can use something like jQuery's $.get function to call your php script with the action of "adding a random number of coins" and the php script can return the random number to java script via JSON for it to be displayed.
First the browser sends a request to the server, which, on its turn parses it. This is the time when your HTML is generated and your PHP runs. When the HTML is generated and ready, it is being sent to the web-browser. The HTML might contain script tags which are pointing to js files via the src attribute, or script tags which contain Javascript code, but before the server sends the response, the Javascript files are not loaded, Javascript code inside the scripts will not be executed. When the response arrives to the web-browser, it parses the HTML, loads the external js, css files and pictures and executes the Javascript code.
So, when your Javascript generates the value, it is running on the user's web-browser, remote from the server. Therefore, from this point the Javascript code should send an AJAX request to the server. This will post a request to the server, which, on its turn will receive and parse it. You can post parameters when you send an AJAX request. jQuery has an easy-to-use variation. Your server will receive the request with your parameters and you will be able to read the parameters via $_GET or $_POST, which are associative arrays containing parameters with their names used as indexes. You can use those to write your query.
All this is well-documented, if you watch a few tutorials, you should be able to solve the problem. On the other hand the commenters and the other answerer are right when they tell you that you should never trust the browser to generate sensitive data, as hackers could easily see what requests are being sent from the web-browser and would send similar posts where they would be "lucky".
I'm currently using Classic ASP and youtube javascript API in order to pull information of videos and store them into a database, however I need to know if some of the next steps are possible, or if I would have to convert to another language.
The information I am seeking to download into my SQL 2012 Database currently exceeds the maximum space allowed, meaning I can only send about 50 of my 1700 results (and growing) each time. Prior to the space cap, I would simply keep running the next page function until there is no more pagetokens and simply upload all the data, however, now I must do it in small steps.
My application currently works like this: Javascript creates hidden forms->Forms are submitted->classic ASP queries form and moves information to database
By directly editing the code I can modify which 50 results I send to the classic ASP, but I'd like to be able to do this without modifying code.
So my question is this: Is it possible to send a url query of sorts to javascript so that I know what results I have sent? Or is there a better way to circumvent the space issue aside from rerunning the javascript each time?
The error I get when attempting to spend too much information is:
Request object error 'ASP 0104 : 80004005'
Operation not Allowed
I apologize if this question seems a little vague as I'm not entirely sure how to word this without writing a 5 paragraph essay.
You could add a redirect on the ASP doing the downloading. The redirect can go back to the javascript page and include the number of results processed in the url like so:
Response.Redirect "javascript.asp?numResults=" & numberOfResultsSentSoFar
Then on the javascript page include some ASP to extract the number of results processed
dim resultsProcessed = Request.QueryString("numResults")
Then you can feed it into javascript like so:
var currentResultIndex = <%=resultsProcessed%>;
However, a better way might be to use AJAX to send the first 50 results and wait for a response from the ASP and then send the next 50.
I am currently designing a Node.js web server that will store requests as JSON objects and store them in a text file.
An example of said text file is this:
{"elements":[
{"email":"test#test.com","timestamp":"22:10:54"},
{"email":"foo#foobar.com","timestamp:"09:56:49"}
]}
What I want to do is be able to append a given JSON into this text file. This would be more complicated than a simple fs.append() function, because I have to first get rid of the
\n]}
that closes the JSON, and then stick on
,{"email":"INPUT EMAIL HERE","timestamp":"INPUT TIMESTAMP HERE"}\n]}
to form the new completed JSON,
{"elements":[
{"email":"test#test.com","timestamp":"22:10:54"},
{"email":"foo#foobar.com","timestamp:"09:56:49"},
{"email":"INPUT EMAIL HERE","timestamp":"INPUT TIMESTAMP HERE"}
]}
I want to do all of this without having to load the text file with fs.readFileSync(path), because that would become more difficult with each new entry. So, my ultimate issue is that I need a way to strip off (effectively reverse-append) a few characters from the end of the file so that I can append on the newly inputted elements. I've already looked over the doc (http://nodejs.org/docs/latest/api/fs.html#fs_fs_createwritestream_path_options) and I saw no function for it, but I figured there has to be a way to do it.