Python, Scraping Data from javascript website - javascript

So, i am kinda of a new python programmer, if i can call myself like that, and i am trying to learn through the "picking new projects" procedure.
What i want to do now is :
enter a website that gives live score results as http://www.livescore.com for example.
somehow scrape all the teams that playing against each other and manipulate those data.
Then i want to build an app that takes those data, arrange them nicely in a table format (let's say) and then update it every time a team scores a goal (possibly through scrapping again ? i don't know..). So i want to project them as my own data.
As i am new to python, i don't even know if that's possible to be done.
If so, can you help me ? Point me to some directions maybe, point me specific chapters of python to read, specific modules etc etc ?
I really need any help i can get because i am really lost on the matter.
I don't know where to begin with.
Thanks in advance

For web scraping i would recommand using regular requests library of python + the BeautifulSoup library for parsing HTML. This way you can have a look at the content of the website.
The problem starts with dynamically added data, which is probably the case for you. the actual live data is probably coming from XHR requests the site is making to the server, so there lies the data you are really interested in.
In order to get the data you can try looking at those XHR requests and maybe try to mimic them.
Another platform from extracting data from sites is the Selenium project. Its more of an automated web browser, which gives you the access to all the data , even the dynamically loaded ones.

Related

Basic WebApp Questions

I am currently thinking about coding my first webapp.
I would say i have beginner to intermediate coding skills in html,css,js,jquery node,sql and monogodb.
The problem is that i do not really know how to achieve my goal.
My goal is to built a responsive single page website, which gets the stock price from an api and display it on one card. Furthermore the user should be able to click on the plus button and add another card with an stock index he chose from a choosing form option.
Now what want to know is:
Is this an example for choosing a js framework like react, vue etc. and how can I accomplish my goal ?
I coded the api get request etc. in node and was able to print everything I want into the console log. How do I do the same thing but displaying it on my html page ?
How can I create these cards which are automatically getting added to the homepage ?
How can I save the data for each individual card? (especially without a login procedure...)?
I know these are quite easy questions but I really want to learn how to do it.
Please check the images below.
(check:
I would say try to use your existing skills to achieve your goals. When you run into problems you cannot solve or problems that could be solved by a specific framework such as react, try using them. It's better to understand what these frameworks are doing before you rely on them for every development.
You need to use an http framework like express to return your api values from node js to a web client.
Needs more info but as you're already talking about component based libraries, a good approach would be to keep the cards in a json array and render your page from that. If you are persisting the card data in a database you will also may want to render them server side as the page is loaded using a server side technology like node.
To keep things simple, you can store the info client side using something like localStorage or indexeddb. If you want to store something on the server side, you will need to introduce a way of identifying the user.

Is it possible to retrieve data from parse.com using objective-c and show it in website?

I have an iOS app in which I use parse.com as backend service. Now, I hired someone to do a website interface using HTML and CSS. I want to share the same data between iOS app and website, I know parse.com offers me a few ways to do this, including creating a javaScriptapplication. The problem is, my programmer doesn't have any experience in JavaScript, nor do I.
My question is: Is it possible to use what I have (objective-c, xcode) as far as retrieving data from parse.com and showing on website? Even if I need to code something new, is it possible to use objective-c together with HTML and CSS?
Thanks.
Parse has several APIs, one of which is REST. Your web developer should use the REST API to get data from Parse
https://www.parse.com/docs/rest
If there is will there is way, but you'll be making something really specific to your use and will be non standard and will be immediately hard to maintain, I recommend that you hire another developer and do things properly using the technologies given to you by parse !. if the cost will be high now I can promise you it'll be much higher if you went the path you're going to now.
So my answer is:
Yes, everything is possible and no, don't do it ! :)
Edit: Added an example to a possible way to do it to actually answer OP's question.
Example case:
1-Create a simple Mac Application in Xcode that fetches data exactly like you do it on iOS, and store the needed data into a database of your choice on your server
2-You now have access to the data you needed from parse, but on a local mirror. you will need some tool to fetch that data though, I recommend a simple PHP script.
Note that this will require an OSX server to always be running to fetch that data, you'll also need of find a way to fetch data on demand when a user needs it Vs. polling at specified intervals, this will hardly scale and will be costly as I said.

Unable to see complete scraped web page in Google Apps Script logs

A few weeks ago I started learning Javascript and the Google Apps Script API, specifically in regard to spreadsheets. I have been trying to make a spreadsheet that fetches web pages and pulls stats about my friends for the game League of Legends. However, I have been running into a problem with the site I want to use, which is basically the only free LoL stats site that updates frequently. I'm not familiar at all with web development, but it seems when I try to access a page on lolking.net, for example http://www.lolking.net/summoner/na/60783 with Google's UrlFetchApp.fetch() it does not load the dynamic page. So instead of the final source, I get this which doesn't help me. Is there an easy way around this or would I simply have to use another website?
Thanks for thie info! Although it turns out I was mistaken. The UrlFetchApp was indeed returning the full source code, but I was using GAS's Logger to view the text. It seems the Logger has a length limit, so when I searched for the stats I wanted they weren't there simply because the source code got truncated. So, due to an oversight on my part, I never had a problem in the first place. For other people reading this question, in the end I have no idea how UrlFetchApp works with dynamic pages using clientside js (you'd probably want to talk to the poster below or post a new question).
You are getting fhe raw html page with clientside js included. That wont work from any system not just gas. You need to debug that page js and find where it does an ajax call to get the data you want.
Then do the same from your gas. Might not work if the call is authenticated etc.

print html page to PDF on a schedule

I have a HTML page that uses javascript to generate dynamic images using a graph handler on a different server. The images will contain the same data for 1 week but will change when the 1 week window expires.
I am trying to come up with a way to automatically save the contents of the page to either a local file on the server or write to a PDF file.
I tried to use a 'web downloader' like HTTTrack, but it does not get the dynamic images...
I am running the html page off IIS.
I have no experience with IIS or ASP.
Thanks!
I'm not sure that I see any way to do this directly off the front end in an automatic manner. The challenge is that any "screen scraper" you have go out and grab the site with would need to be running javascript to get the tables, which isn't how I see many such systems operating. It's partially why you see strangeness on Archive.org when you have a site that's heavily augmented with javascript or flash.
An untested concept you might attempt was posted in this Stack Question
I could see some sort of a system that you rig together with another computer that schedules an browser load then prints to .pdf in some fashion. I've been unable to find any specific software that would automate that process, so you'd be left cobbling such a system together on your own.
Clearly you have the data available to make your dynamic images. The most feature-rich way I could think of would be to use a system like Jasper Reports or Crystal Reports, which you could feed your data, replicate the report, and easily output via pdf, a built-in export in both systems.
Perhaps its worth questioning your end purpose. To me, creating a "snapshot" of the relevant data in another table and using another system to render your graphs from that snapshot data seems far more valuable than just a print of the screen. You can then go back and adjust data as needed, or use it for other reporting purposes, exporting in any number of tools that are even as simple as Access. Heck, 10 years down the road you may want the data to look better than the graph system you're currently using, and you'd have the data to render it any way you want. When the VP of marketing comes looking for his numbers, a simple click would output those numbers that could be manipulated as needed from there.
I was able to accomplish what I wanted to do using wkhtmltopdf to convert my HTML page with Javascript to PDF. I ran the job via a task scheduler to supply my website url and output file name as parameters.
I then used a windows batch file to check if the file was created and then rename/email it to interested parties.
This of course requires that you have the ability to install wkhtmltopdf on your server.

MVC: Pass HTML template in JSON response rather than separate resource?

I'm planning to write a web app using either Backbone or Angular. We want to push "widgets" from the server to the client (i.e. semi-complicated, dynamic but largely autonomous UI elements... something maybe like the popular TodoMVC app). So we'll need to send over a template, some javascript (controllers etc.), probably CSS, and JSON data (models).
We're debating how to send over all the information. How much can be, and what should be encapsulated in the JSON?
Is it possible to create files out of data passed over? i.e. could we pull out CSS and apply the rules to our document? I believe it would be easier to run javascript passed over this way.
I'm under the impression that being able to cache the template is important... does that require loading it as a (separate) resource rather than as part of some huge JSON object?
As for CSS, it needs to be loaded before inserting into the DOM (so we don't want just a promise). Would it make sense - or even be possible - to pass our CSS rules in the JSON and extract them somehow?
EDIT: to more fully describe what I'm working with, I'm focusing only on the front-end. The back-end can be customized to send over resources however we want - they'll optimize that depending on what the front-end needs. Our backend stack includes MongoDB, Tapestry, ActiveMQ.
The payload that needs to get sent over will be all the resources necessary to push something like a Mac Dashboard Widget or Windows Gadget into the browser. So HTML, CSS, Javascript, data will all get sent over. We want things to be snappy and minimize server requests as much as possible, as some of these payloads may get to be somewhat large.
Your questions are rather vague and the answers can change by coding on a different build platform. Your chosen application design can also impact the best practices to follow for implementing X type of application to X device(s). There's more than one way to skin a cat ya know.
What information are you sending over? Is it a lot of raw data arrays? Is it more document based where XML could be more beneficial? You'll need to work with your DBA if the data is stored in a repository and you'll be querying it. Will you need to write code to format your data or will the DBMS output it in the format you need?
What do you mean the CSS needs to be loaded before being inserted? Everything loads in the order you specify so making sure your CSS loads at the proper time shouldn't be an issue. If you're using ASP.NET, you can also use a SSM (style-sheet manager) to serve your CSS and only load whats required. SSM's are great if you have a lot of style-sheets to serve.
A lot of the questions you asked will be answered as you get further into the design phase. In order to get some solid answers and not just speculation on what is the best method, you will need to publish a lot more details than this. Any answers at this point shouldn't carry much weight in your decisions. They certainly wouldn't mine.

Categories

Resources