Use BigQuery in Node.js to import JSON as string - javascript

I'm using #google-cloud/bigquery and trying to import data to a table which I have JSON.
I see the table.createWriteStream() method - but I believe since that streams data it costs money, whereas methods bq load in the console is free.
So my two questions are:
1: Is using table.import() the equivilent free way to load data to a table?
2: How can I import data that i have in a variable without having to save it to a .json file first?

If you want to avoid streaming insert you should know that for load jobs there is a Daily limit: 1,000 load jobs per table per day. Streaming insert doesn't have this limit.
Streaming insert is extremely cheap, $0.05 per GB that's $50 for 1TB. Not sure how much volume you have, but usually people are not building around streaming insert because it's better suited.
Streaming insert is the recommended way to import data, as it's scalable, it has a nice per row error message, so you can retry rows and not the full file.

Related

Delete a very large number of entries in Azure Table Storage

Here's my setup:
I'm running a Node.js Web App in Azure, which is using Azure Table Storage (Non-SQL). To work with table storage I'm using the azure-storage npm module.
What I'm trying to do:
So I have a system that's tracking events for devices. In storage I'm setting my PartitionKey to be the deviceId and I'm setting the RowKey to be the eventId.
Adding events is straight forward; add them one at a time as they occur.
Retrieving them is easy using the query structure.
However, deleting large quantities of entries seems to be a pain. It appears you can only delete one entity at a time. There doesn't seem to be a query based implementation.
There is the option to use batches to create a large batch of delete operations; but I've just found that there is a cap of 100 operations per batch.
So I'm trying to delete all events for a single device; in my current case I have about 5000 events. So to achieve this I first have to query all my events with a GET request (and concatenate them using continuation tokens), then separate them into batches of 100, and then send 50 large requests in order to delete all the entries...
The same thing in SQL would be DELETE * WHERE deviceId='xxxxxxxx'
Surely there must be a better way than this!
The same thing in SQL would be DELETE * WHERE deviceId='xxxxxxxx'
Surely there must be a better way than this!
Sadly, there isn't :). You must fetch the entities based on your requirement and then delete them (either in batches or individually).
You can however optimize the fetching process by only fetching PartitionKey and RowKey from your table instead of all attributes as you only need these two attributes for deleting an entity.

Where to put "a lot" of data, array / file / somewhere else, in JS on node.js

This may be a "stupid" question to ask, but I am working with a "a lot" of data for the first time.
What I want to do: Querying the World Bank API
Problem: The API is very unflexible when it comes to searching/filtering... I could query every country/indicator for it self, but I would generate a lot of calls. So I wanted to download all informations abourt a country or indicator at once and then sort them on the machine.
My Question: Where/How to store the data? Can I simply but it into an array, do I have to worry about size? Should I write to a temporary json file ? Or do you have another idea ?
Thanks for your time!
Example:
20 Countries, 15 Indicators
If I would query every country for itself I would generate 20*15 API calls, if I would call ALL countries for 1 indicator it would result in 15 API calls. I would get a lot of "junk" data :/
You can keep the data in RAM in an appropriate data structure (array or object) if the following are true:
The data is only needed temporarily (during one particular operation) or can easily be retrieved again if your server restarts.
If you have enough available RAM for your node.js process to store the data in RAM. In a typical server environment, there might be more than a GB of RAM available. I wouldn't recommend using all of that, but you could easily use 100MB of that for data storage.
Keeping it in RAM will likely make it faster and easier to interact with than storing it on disk. The data will, obviously, not be persistent across server restarts if it is in RAM.
If the data is needed long term and you only want to fetch it once and then have access to the data over and over again even if your server restarts of if the data is more than hundreds of MBs or if your server environment does not have a lot of RAM, then you will want to write the data to an appropriate database where it will persist and you can query it as needed.
If you don't know how large your data will be, you can write code to temporarily put it in an array/object and observe the memory usage of your node.js process after the data has been loaded.
I would suggest storing it in a nosql database, since you'll be working with JSON, and querying from there.
mongodb is very 'node friendly' - there's the native driver - https://github.com/mongodb/node-mongodb-native
or mongoose
Storing data from an external source you don't control brings with it the complexity of keeping the data in sync if the data happens to change. Without knowing your use case or the API it's hard to make recommendations. For example, are you sure you need the entire data set? Is there a way to filter down the data based on information you already have (user input, etc)?

How to read a large file(>1GB) in javascript?

I use ajax $.get to read a file at local server. However, the web crashed since my file was too large(> 1GB). How can I solve the problem? If there's other solutions or alternatives?
$.get("./data/TRACKING_LOG/GENERAL_REPORT/" + file, function(data){
console.log(data);
});
A solution, assuming that you don't have control over the report generator, would be to download the file in multiple smaller pieces, using range headers, process the piece, extract what's needed from it (I assume you'll be building some html components based on the report), and move to the next piece.
You can tweak the piece size until you find a reasonable value for it, a value that doesn't make the browser crash, but also doesn't result in a large number of http requests.
If you can control the report generator, you can configure it to generate multiple smaller reports instead of a huge one.
Split the file into a lot of files or give a set user ftp access. I doubt you'd want too many people downloading a gig each off your web server.

What is the most efficient way of sending data for a very large playlist over http?

I'm currently working on developing a web-based music player. The issue that I'm having is pulling a list of all the songs from the database and sending it to the client. The client has the ability to dynamically create playlists, and therefore they must have access to a list of the entire library. This library can range upwards of 20,000 unique songs. I'm preparing the data on the server-side using django and this tentative scheme:
{
id: "1",
cover: "http://example.com/AlbumArt.jpg",
name: "Track Name",
time: "3:15",
album: "Album Name",
disc: (1, 2),
year: "1969",
mp3: "http://example.com/Mp3Stream.mp3"
},
{
id: "2",
...
}
What is the best method of DYNAMICALLY sending this information to the client? Should I be using jSON? Could jSON effectively send this text file consisting of 20,000 entries? Is it possible to cache this playlist on the client side so this huge request doesn't have to happen every time the user logs-in, instead only when there was a change in database?
Basically, what I need at this point is a dependable method of transmitting a text-based playlist consisting of around 20,000 objects, each with their own attributes (name, size, etc...), in a timely manor. Sort of like Google Music. When you log-in, you are presented with all the songs in your library. How are they sending this list?
Another minor question that comes to mind is, can the browser (mainly Chrome) handle this amount of data without sacrificing usability?
Thank you so much for all your help!
I just took a look at the network traffic for Google Play, and they will transmit the initial library screen (around 50 tracks) via JSON, with the bare minimum for metadata (name, track ID, and album art ID). When you load the main library page, it makes a request to an extremely basic HTML page, that appears to insert items from an inline JS object Gist Sample. The total file was around 6MB, but it was cached and nothing needed to be transferred.
I would suggest doing a paginated JSON request to pull down the data, and using ETags and caching to ensure it isn't retransmitted unless it absolutely needs to be. And instead of a normal pagination of ?page=5&count=1000, try ?from=1&to=1000, so that deleting 995 will purge ?from=1&to=1000 from the cache, but not ?from=1001&to=2000 (whereas ?page=2&count=1000 would).
Google Play Music does not appear to use Local Storage, IndexedDB, or Web SQL, and loads everything from the cached file and parses it into a JS object.
Have you seen this http://code.flickr.net/2009/03/18/building-fast-client-side-searches/ ?
I've been using this array system myself lately (for 35K objects) and it is fast (assuming you dont want to render them all on screen).
Basically the server builds a long string in the form
1|2|3$cat|dog|horse$red|blue|green
Which is sent as a single string to an http request. Take the responseText field and conver it to an array using
Var arr = request.responseText.split('$');
Var ids = arr[0].split('|');
Var names = arr[1].split('|');
Clearly, you end up with arrays of strings at the end, not objects, but arrays are fast for many operations. I've used $ and | as delimiters in this example, but for live use I use something more obscure. My 35k objects are completly handled in less than 0.5sec (iPad client).
You can save the strings to localstorage, but watch the 5Mb limit, or use a shim such as lawnchair. (nb I also like SpenserJ answer, which may be easier to implement depending on your environment)
This method doesn't easily work for all JSON datatypes, they need to be quite flat. I've also found these big arrays to behave well for performance, even on smartphones, ipod touch etc ( see jsperf.com for several tests around string.split and array searching)
You could implement a file-like object that wraps the json file and spits out proper chunks.
For instance, you know that your json file is a single array of music objects, you could create a generator that wraps the json file and returns chunks of the array.
You would have to do some string content parsing to get the chunking of the json file right.
I don't know what generates your json content. If possible, I would consider generating a number of managable files, instead of one huge file.
I would test performance of sending the complete JSON in a single request. Chances are that the slowest part will be rendering the UI and not the response time of the JSON request. I recommend storing the JSON in a JavaScript object on the page, and only render UI dynamically as needed based on scrolling. The JavaScript object can serve as a data source for the client side scrolling. Should the JSON be too large, you may want to consider server backed scrolling.
This solution will also be browser agnostic (HTML < 5 )

Ajax issue: delay in getting data from web service using innerHTML, please guide

I am working on an ajax application which will display about a million records in an html table. Web service returns records from server, I build a logn string by concatinating data and tags and than put this string using innerHTML (not using DOM for getting better performance).
For testing I have put 6000 recods in database (stored procedure takes about 4 seconds in completion of its execution).
While testing on local system (database and application on same machine) it took about 5 minutes to display the records in page. After deplying on web server it did not responde even for more time. It looks very low performance. I put records in a CSV file and its weight was less than 2 MB. I couldn't understand why string concatinations to build html table and putting string in innerHTML is taking such a huge time (if it is the issue). Requiment is to show about million records in web page but performance on just 6000 records is disappointing. I am not gettign what to do to increase performance.
Kindly guide me and help me.
You're trying to display a million records on a single page? No matter how you optimize your server code, that's a LOT of html to parse/render, especially if it's in a table.
Even using .innerHTML isn't going to "save" you any time. The rendering engine is still going to have to parse/style/render/position many millions of table rows/cells and you WILL have to wait while it's working.
If you absolutely HAVE to show all those records on a single page, try to break things up into manageable chunks. Have the AJAX call return (say) 100 records at a time, put those into the table, then fetch another 100 records, etc... At least that way you'll see the content of the page growing, rather than having to sit there and wait for 1,000,000 table rows to get displayed in a single shot.
A better option would be to do pageination, where only 100 records are shown at a time and you present a standard navigation with << first / prev / next / last >> buttons to swap through "pages" of data.
As Marc stated, you need pagination. See if this helps - How do I do pagination in ASP.NET MVC?
In addition to this you could optimize the result by employing master-detail pattern - fetch only the summary of the record (master) and on some action in master, fetch details and display on the screen. This will reduce the size of data being transfered from the server.

Categories

Resources