How to scrape a website for time-series data

How to scrape a website for time-series data - javascript

I'm looking to scrape a currency exchange for time-series data so I can do some analysis. I know I can use javascript libraries to scrape a website one time, but I want time series data. So would I essentially create a JSON Object that has the format :
{
ExchangeRateTimeseries:
[Timestamp: xxxxxx, Symbols: { symbol1: symbol1rate, symbol2: symbol2rate, symbol3: symbol3rate.....}]
}
?? So I can basically execute a scrape request every say 1 minute and then make a JSON object with the respective timestamps and the exchange rates? Does this make sense? I'm open for any input. Thank you.

Related

In Google Data Studio, is there a way to filter my data without it being fetched after each filter request?

Picture for context
My Community Connector fetches these 2 fields (Subscription Date and Clicks).
I want to be able to filter by date so my table only shows, for example, data from the last 7 days. This works using the Date Filter that Data Studio provides, however, I notice that this date filter does another fetch request with the correct date I selected.
I don't want this to happen. I want to filter by date USING MY EXISTING DATA. Is there any way to do this? To filter only using my cached data, and not send a new GET request?

While this is not doable from Data Studio side, you can implement your own cache in Apps Script. You can evaluate each getData request and return data from the cache if needed. This will avoid sending new GET requests to your API endpoint.

While this may not be the best option to everyone, I found a quick temp solution by loading my own community connector USING google's community connector, Extract Data.
This way my data loads only 1 time, and I can filter it instantly the way I want.
If you want to refresh the data, you 'edit' the data source and save.

Using Two Columns for Date and Time with Dygraph

I have a device that hosts a simple web server and records data on a job site in .csv format. We would like to display this data in a simple line graph to make it easier to interpret when logged in remotely. The device currently records the data as follows:
YYYY/MM/DD,HH:MM:SS,Data1,Data2,Data3
When using Dygraph I know that the date and time need to be in the first column in order for it to parse the data correctly. There is no way to change the format that the device uses to save the data, but is there a way to make Dygraphs use both columns as date and time?

How about replacing the first comma on every line with a space?
data = data.split('\n').map(line => line.replace(',', ' ')).join('\n');
You'll have to either change the format coming from the .csv file, or issue the XHR yourself (jQuery and other libraries can help) and transform the data before handing it off to dygraphs.

How to retrieve data from multiple json files by making multiple ajax calls?

A year ago, I learned json and ajax in a training program. Now since I will be going on a similar job from next month, I want to practice whatever I learned. So I am making a Football page for various FIFA teams. On my home page there are various links like "Teams", "Players", "Clubs", "History" ...All of them are inter related to each other. I have 4 JSON files: teams.json, players.json, clubs.json, history.json. I am not able to understand how to perform multiple ajax calls to retrieve the information. Like when I click on "Players", the information of club of the player should come from Clubs.json. I hope I am able to make my problem clear.
Please ask if any doubts. I can also post the so-far written code.

Well I don't have much experience, but what I think is that you will not have to get the data from json files most of the time like that if you are building a website or an app.
Instead there are probably going to be a backend service like API.
API can be made of PHP or Ruby on Rails and etc. It's purpose is to get all the data you need let's say for the link Players from all the different tables you have in the database, let's say Teams, Clubs and so on, and group them in a nice JSON string (you will need to parse it) sending it back to you.
And because of that you won't need to make a lot of ajax calls which is slow and heavy and you will leave that work for the backend logic making only one single ajax call.
So what you are trying to do is wrong in your case because for an app like that the data will be stored in database, not in json files.
You can search and read more about the API in google if you are interested.
This is my unprofessional opinion. Hope it helped you or motivated you to learn something new :)

Assuming you want to achieve following task,
You have 4 pages."Teams", "Players", "Clubs", "History
When you click on Teams, you want to get data from Teams.json
When you click on Players, you want to get data from Players.json
You can write 4 different function
function getTeams(){
//ajax call to get Teams
}
function getPlayers(){
//ajax call to get Players
}
etc

Best way to handle graphing and display of large data sets

Our website provides various data services to our clients; one of which is gauge data. Some gauges log information every 15 minutes, some every minute. This data is sent to our SQL database.
All of this data is displayed via a graph (generated server side via PHP and JPGraphs) with each individual log entry being displayed as a row in a collapsible table (jquery 1.10.2).
When a client wants to view the data, they select a date range and which gauges they would like to view. If they want to view the last 3 days of a gauge that logs every minute then it loads pretty quickly. If they want to view 2 of those then it takes around 15-30 seconds to load. The real problem comes when they want to view a months worth of data; especially more than 1 gauge. This can take upwards of 15-20 minutes to load and the browser repeatedly asks if we want to stop the script from populating the collapsible table rows(jquery).
Obviously this is a problem since clients want a relatively fast response (1-5 min max). Ideally, we would also like to be able to pull gauge data from several months at a time. The only way we can do that now is to pull data 2 weeks at a time and compile the total manually.
For reference: If I wanted to pull a months data for 2 of our once-a-minute-logging gauges, then there would be 86,400 rows added via jQuery to a collapsible table. The page takes approx. 5 minutes to load and the browser is terribly slow during this time period.
My question is: What is the best way to pull/graph/populate data using a PHP based server (Symfony 1.4 framework) and javascript?
Should we look into upgrading our allotted processing power/RAM(we are hosted by GoDaddy)? Is there a faster way to populate collapsibles than with jquery? All of our calculatoins are done server side. Should we just pull the raw data and let the client side do the data processing? Should we split the data processing between client and server?
Here's a screen shot of the web page. Its cropped so that more client-sensitive information is not displayed:

In response to my comment.
Since you need the entire data-set only on the server side (you create your graph on the server), this means that you don't actually need to send the entire data-set to the client.
Instead send a small portion to the client. Let's say the first 200 results. Then you can go ahead and cache the rest of the result-set into a JSON file (lite database, whatever you want really). Then create an interface where the user can request for more data. Infinity scroll is nice but has its own problems. Maybe just a button that says load more data. As people have said anything more than a few hundred data points in a table at one time is crazy to have because people won't look at it anyways. Then when they hit the button to get more data, you send an AJAX request to the server with the correct parameters for what data you want.
For example the first time they click getMoreData() you want to get the next 200 data points. So you send getMoreData(start=200, length=200). Your server picks up the AJAX request and finds the correct data in the JSON file or the lite database, wherever you have cached the results. And the user can keep requesting more data (making sure you update your start parameter), and you only ever return a small subset. The user doesn't even realize that they don't have the whole data-set there in front of them because it looks like they do.
One that is complicated about this is sorting and searching. If you want to implement those then you need to make sure you go to the server side and sort/search through the cached results.
So basically you have a system where you can create the entire graph on the server side which shouldn't take long. What does take long is the loading of the entire data-set to the client side. So you break up that up into small chunks. You can even easily create pagination and the such with this method.

What is the most efficient way of sending data for a very large playlist over http?

I'm currently working on developing a web-based music player. The issue that I'm having is pulling a list of all the songs from the database and sending it to the client. The client has the ability to dynamically create playlists, and therefore they must have access to a list of the entire library. This library can range upwards of 20,000 unique songs. I'm preparing the data on the server-side using django and this tentative scheme:
{
id: "1",
cover: "http://example.com/AlbumArt.jpg",
name: "Track Name",
time: "3:15",
album: "Album Name",
disc: (1, 2),
year: "1969",
mp3: "http://example.com/Mp3Stream.mp3"
},
{
id: "2",
...
}
What is the best method of DYNAMICALLY sending this information to the client? Should I be using jSON? Could jSON effectively send this text file consisting of 20,000 entries? Is it possible to cache this playlist on the client side so this huge request doesn't have to happen every time the user logs-in, instead only when there was a change in database?
Basically, what I need at this point is a dependable method of transmitting a text-based playlist consisting of around 20,000 objects, each with their own attributes (name, size, etc...), in a timely manor. Sort of like Google Music. When you log-in, you are presented with all the songs in your library. How are they sending this list?
Another minor question that comes to mind is, can the browser (mainly Chrome) handle this amount of data without sacrificing usability?
Thank you so much for all your help!

I just took a look at the network traffic for Google Play, and they will transmit the initial library screen (around 50 tracks) via JSON, with the bare minimum for metadata (name, track ID, and album art ID). When you load the main library page, it makes a request to an extremely basic HTML page, that appears to insert items from an inline JS object Gist Sample. The total file was around 6MB, but it was cached and nothing needed to be transferred.
I would suggest doing a paginated JSON request to pull down the data, and using ETags and caching to ensure it isn't retransmitted unless it absolutely needs to be. And instead of a normal pagination of ?page=5&count=1000, try ?from=1&to=1000, so that deleting 995 will purge ?from=1&to=1000 from the cache, but not ?from=1001&to=2000 (whereas ?page=2&count=1000 would).
Google Play Music does not appear to use Local Storage, IndexedDB, or Web SQL, and loads everything from the cached file and parses it into a JS object.

Have you seen this http://code.flickr.net/2009/03/18/building-fast-client-side-searches/ ?
I've been using this array system myself lately (for 35K objects) and it is fast (assuming you dont want to render them all on screen).
Basically the server builds a long string in the form
1|2|3$cat|dog|horse$red|blue|green
Which is sent as a single string to an http request. Take the responseText field and conver it to an array using
Var arr = request.responseText.split('$');
Var ids = arr[0].split('|');
Var names = arr[1].split('|');
Clearly, you end up with arrays of strings at the end, not objects, but arrays are fast for many operations. I've used $ and | as delimiters in this example, but for live use I use something more obscure. My 35k objects are completly handled in less than 0.5sec (iPad client).
You can save the strings to localstorage, but watch the 5Mb limit, or use a shim such as lawnchair. (nb I also like SpenserJ answer, which may be easier to implement depending on your environment)
This method doesn't easily work for all JSON datatypes, they need to be quite flat. I've also found these big arrays to behave well for performance, even on smartphones, ipod touch etc ( see jsperf.com for several tests around string.split and array searching)

You could implement a file-like object that wraps the json file and spits out proper chunks.
For instance, you know that your json file is a single array of music objects, you could create a generator that wraps the json file and returns chunks of the array.
You would have to do some string content parsing to get the chunking of the json file right.
I don't know what generates your json content. If possible, I would consider generating a number of managable files, instead of one huge file.

I would test performance of sending the complete JSON in a single request. Chances are that the slowest part will be rendering the UI and not the response time of the JSON request. I recommend storing the JSON in a JavaScript object on the page, and only render UI dynamically as needed based on scrolling. The JavaScript object can serve as a data source for the client side scrolling. Should the JSON be too large, you may want to consider server backed scrolling.
This solution will also be browser agnostic (HTML < 5 )

Develop Reference

JavaScript is the programming language of the Web.