I want to create a php app that will check ticket status from FIFA page that is created in angular js.
I tried to get data by using PHP, PHP curl Method, PHP file_get_content(), Jquery, and Javascript but all the time i got empty array.
hopefully there will some restrictions from angular js and server. link is given below please help me to check data from website.
https://tickets.fifa.com/Services/ADService.html?lang=en
You are talking about screen-scraping. Screen-scraping is a fragile solution because if they change the HTML for their page then your application will break.
That said, in your case the reason you got an empty array is because that site's webserver prevents screen-scraping. If you'd checked your php error log you would have seen a 403 forbidden error.
Simply put.. FIFA does not want their data to be stolen and used for purposes other than what they intended it for.
Related
I am currently working on a project of finding empty classrooms in our school in real time. For that purpose, I need to extract substitution published on our school page (https://ssnovohradska.edupage.org/substitution/?), since there might be any additional changes.
But when I try to extract the html source code and parse it with bs4, it cannot find the divs(class: "section print-nobreak") that contain the substitution text. When I took a look at the page source code(Ctrl+U) I found that there is only a javascript that prints it all directly.
Is there any way to extract the html after the javascript output has been already rendered?
Thanks for help!
Parsing HTML is unfortunately necessary to solve your problem. But I will explain how to find ways to avoid that in your future projects (not based on this website).
You've correctly noticed that the text is created by JavaScript code running on the page. This could also indicate that the data is either loaded from another resource (XHR/fetch call getting a response from an API) or is stored as a JSON/JS inside of the website's code. (Or is generated from an algorithm, but this is unlikely to be the case in such websites.)
The website actually uses both methods (initial render gets data stored inside of the website's code, but when you switch dates on the calendar it makes AJAX requests). You can see this by searching for ReactDOM.render(React.createElement( in the code. They're providing a HTML string to the createElement call, so I would suggest looking into the AJAX way of doing things.
Now, to check where the resource is located, all you need to do is opening Developer Tools in your favorite browser (usually Control+Shift+I) and navigating to the Network tab. Now that your network tab is open, you need to cause the website to load external data, for example, by pressing a date on the "calendar bar".
Here you will notice many external requests, but we're actually looking only for XHR calls. Click on the XHR button next to the "Filter" text field. That should result in only one request being shown:
Unfortunately for us, the response only contains HTML. Also, API calls are protected - they require a PHP session ID and some sort of a token (__gsh) to not fail. So, going back to step 1 - seems like our only solution is to use regular expressions to find the text between "report_html":"<div class and </div></div></div> from the source code, if you're interested in today's date only. If you want to get contents for tomorrow or any other date - you will need to either fetch the page, save the cookies and find the token to supply to the request and then make that request, or use something like puppeteer or pyppeteer (since you've mentioned BS4) and load the webpage in that. If you aren't doing the data fetching that often, you should be fine overall.
I have seen this request
https://somesite.com/path/script.js?shop=xyz.com
https://somesite.com/path/script.js?shop=abc.com
the output for both requests are different
earlier I was thinking that .js file cant be treated like php and dynamic data cant be passed but I was wrong and some expert advised that its router which is passing these data and according to shop name they are serving.
I have php apache+ nginx server
can anyone tell how we can make this work. don't want to go with Nodejs as it won't allow to pass php code.
just for test i want to call above links and print this.
hi i am shop xyz
or hi i am shop abc
on respective calls.
can anyone share some details guidelines how to make router. any link any guide any advise will solv the purpose.
I have never dealth with such javascript ant it is damm easy in php extension but want to learn in javascript.
If i have somesite.com/thisiswhatiwant, how can I transform that into a variable and process it, without using get vars? What should I google in the first place?
The idea is to create a dynamic page structure where that part of the url will populate variables in the page and be used to return dynamic page specific queries.
Is there a framework I can use that has a way to handle this easily?
If I use javascript for this, how should I handle it to not return any 404 errors but rather just pull a templating page and then use that part of the url for developing of the page?
Thank you!
Here is how you parse the path of a url in PHP
$url = "http://somesite.com/thisiswhatiwant";
var_dump(parse_url($url, PHP_URL_PATH));
If i have somesite.com/thisiswhatiwant, how can I transform that into a variable and process it, without using get vars? What should I google in the first place?
That's simply getting the current URL and parsing it. (Which are pretty well covered in the linked questions).
You do need to get the server to execute the PHP first. This question about the front controller pattern explains that.
If I use javascript for this, how should I handle it to not return any 404 errors but rather just pull a templating page and then use that part of the url for developing of the page?
Assuming you mean client-side JavaScript: You can't.
JavaScript runs in the context of a webpage.
Get page from server
Parse HTML document
Run JavaScript that page says to run
If you 404 at step 1 then everything stops and no JS runs.
The correct terminology is vanity URLs. They are static urls that behave like dynamic urls. Dynamic urls are urls with queries which is not what we want here.
This tutorial will help.
The solution is trough the .htaccess rules as i expected.
The rest is basic php/db queries.
I still do not know of a web app framework that makes this trivial to implement, but there must be.
Here is the tutorial
http://culttt.com/2011/11/16/how-to-make-vanity-urls-using-php-htaccess-and-mysql/
I'm a relative newbie when it comes to coding, especially javascript. I currently am trying to populate a table from a google spreadsheet, which will update when the spreadsheet is.
I followed this tutorial word for word (basically all you need to do is replace the key with your own to specify your spreadsheet, and make sure its both published and public, which I've done)
http://dataforradicals.com/the-absurdly-illustrated-guide-to-sortable-searchable-online-data-tables/
I just get a bad request 400 error referring to my spreadsheet. If I visit the spreadsheet generated directly I just get the words...
"Invalid query parameter value for sq."
https://spreadsheets.google.com/feeds/list/1UcfO9GHePQrcixZB_R9uVXr1vHVqVTDg7DdsOjpm-K0/od6/public/values?alt=json-in-script&sq=&callback=Tabletop.callbacks.tt140241226993949106
I can visit my spreadsheet with the link I was given when I published it here..
[maximum links reached but the structure is different]
As you can see the domain structure is different. I fear that "Tabletop to Datatables" is adding an outdated url to the start of that link but can't find where it actually applies it.
The only reason I would think thats not happening is because the example in the tutorial still works! And the link it refers to is the old style URL too
I'm baffled, please help if you can. All suggestions appreciated
The query string includes a parameter without a value, &sq=.
https://spreadsheets.google.com/feeds/list/1UcfO9GHePQrcixZB_R9uVXr1vHVqVTDg7DdsOjpm-K0/od6/public/values?alt=json-in-script&sq=&callback=Tabletop.callbacks.tt140241226993949106
^^^^
Try this, with that parameter completely removed...
https://spreadsheets.google.com/feeds/list/1UcfO9GHePQrcixZB_R9uVXr1vHVqVTDg7DdsOjpm-K0/od6/public/values?alt=json-in-script&callback=Tabletop.callbacks.tt140241226993949106
There is an updated version of this project. Any necessary updates are included here:
https://github.com/scottpham/tabletop-to-datatables
Try with the updated versions of all js libraries.
Check the link and remove the extra string after pub. That part of link is not necessary and may cause issues.
According to google:
The 400 Bad Request error is an HTTP status code that means that the request you sent to the website server, often something simple like a request to load a web page, was somehow incorrect or corrupted and the server couldn't understand it.
Good luck
I want to login to hotmail email account with httpclient. For that I need to pass 21 parameters to http post method including username and password. I found this out through temper data addon in firefox. I also found out that few of them are generated dynamically.(i.e. their values change each time we reload the page). My problem is to how do I find these dynamically generated values of the parameters which have to be passed in http post. I tried to find them with firebug addon but it did not help ! I think values are generated by javascript. If it is, then how do I parse them ? I have used Html parser before but it seems that it does not support javascript parsing. I would appreciate any idea regarding this.
Thank you.
This might be some form of CSRF anti forgery token, in which case I don't think you going to come right. They might be stored in some cookie, or in a hidden element in the page.
More info: http://blog.stevensanderson.com/2008/09/01/prevent-cross-site-request-forgery-csrf-using-aspnet-mvcs-antiforgerytoken-helper/