Delete a very large number of entries in Azure Table Storage

Delete a very large number of entries in Azure Table Storage - javascript

Here's my setup:
I'm running a Node.js Web App in Azure, which is using Azure Table Storage (Non-SQL). To work with table storage I'm using the azure-storage npm module.
What I'm trying to do:
So I have a system that's tracking events for devices. In storage I'm setting my PartitionKey to be the deviceId and I'm setting the RowKey to be the eventId.
Adding events is straight forward; add them one at a time as they occur.
Retrieving them is easy using the query structure.
However, deleting large quantities of entries seems to be a pain. It appears you can only delete one entity at a time. There doesn't seem to be a query based implementation.
There is the option to use batches to create a large batch of delete operations; but I've just found that there is a cap of 100 operations per batch.
So I'm trying to delete all events for a single device; in my current case I have about 5000 events. So to achieve this I first have to query all my events with a GET request (and concatenate them using continuation tokens), then separate them into batches of 100, and then send 50 large requests in order to delete all the entries...
The same thing in SQL would be DELETE * WHERE deviceId='xxxxxxxx'
Surely there must be a better way than this!

The same thing in SQL would be DELETE * WHERE deviceId='xxxxxxxx'
Surely there must be a better way than this!
Sadly, there isn't :). You must fetch the entities based on your requirement and then delete them (either in batches or individually).
You can however optimize the fetching process by only fetching PartitionKey and RowKey from your table instead of all attributes as you only need these two attributes for deleting an entity.

Related

How to save report filters and making it available for executing and scheduling

I have a use case where I have to Add query params to APIs call, and save it accordingly.
Meaning- I have one report to Add, which gives multiple filters that to be saved. (Some of them are pre-defined and others can be added) which could be saved and it could run to generate report or to schedule it.
How here I am using postgres sql as DB. How can I solve such scenarios, If say saving into db then what's the best way to do such operations? I have a db table where I am storing Report name and descriptions. But how can I add such Filters and Save it? And next time the user just Schedule or generate report with saved filters.
Below are the images for better understanding of the scenario.
The Image of where List displayed of saved reports
The Image where user can Add custom report using available filters.
Not sure the best way to handle such scenarios, where I have to add multiple of filters and save it accordingly.
Can anyone help me here with the approach to handle such things? Like saving in DB or how it could work the best.
Thanks
Edit- Something like JIRA, can save the filters and directly apply it. Something like this.

Assuming that you have control over your web application and that it will translate what users input into SQL queries, just create a data structure that both your UI and your data tier can understand and store it in a table using the JSON or JSONB datatype.

Creating temp URLs in single page applications

In my react based single page application, my page is divided in two panes.
Left Pane: Filter Panel.
Right Pane: Grid (table containing data that passes through applied filters)
In summary, I have an application that looks very similar to amazon.com. By default, when user hits an application's root endpoint (/) in the browser, I fetch last 7 days of data from the server and show it inside the grid.
Filter panel has couple of filters (e.g. time filter to fetch data that falls inside specified time interval, Ids to search data with specific id etc.) and a search button attached in the header of filter panel. Hitting search button makes a post call to a server by giving selected filters inside post form body, server returns back data that matches filters passed and my frontend application displays this data returned back from the server inside grid.
Now, when someone hits search button in the filter panel I want to reflect selected filters in the query parameter of the URL, because it will help me to share these URLs with other users of my website, so that they can see filters I applied and see data inside the grid matching these filters only.
Problem here is, if on search button click, I use http get with query parameters, I will endup breaking application because of limit imposed on URL length by different browsers.
Please suggest me correct solution to create such URLs that will help me to set the selected filters in the filter panel without causing any side effect in my application.
Possible solution: Considering the fact that we cannot directly add plain strings in query parameter because of URL length limitation from different browsers (Note: Specification does not limit the length of an HTTP Get request but different browsers implement their own limitations), we can use something like message digest or hash (convert input of arbitrary length into an output of fixed length) and save it in DB for server to understand the request and serve content back. This is just a thought, I am not sure whether this is an ideal solution to this problem.
Behavior of other heavily used websites:
amazon.com, newegg.com -> uses hashed urls.
kayak.com -> since they have very well defined keywords, they use
short forms like IN for INDIA, BLR for Bangalore etc. and combine
this with negation logic to further optimize maximum url length. Not
checked but this will ideally break after large selection of filters.
flipkart.com -> appends strings directly to query parameters and breaks
after limit is breached. verified this.

In response to #cauchy's answer, we need to make a distinction between hashing and encryption.
Hashing
Hashes are by necessity irreversible. In order to map the hash to the specific filter combination, you would either need to
hash each permutation of filters on the server for every request to try matching the requested hash (computationally intensive) or
store a map of hash to filter combination on the server (memory intensive).
For the vast majority of cases, option 1 is going to be too slow. Depending on the number of filters and options, option B may require a sizable map, but it's still your best option.
Encryption
In this scheme, the server would send its public key to the client, then the client could use that to encrypt its filter options. The server would then decrypt the encrypted data with its private key. This is good, but your encrypted data will not be fixed length. So, as more options are selected, you run into the same problem of indeterminate parameter length.
Thus, in order to ensure your URL is short for any number of filters and options, you will need to maintain a mapping of hash->selection on the server.
How should we handle permanent vs temporary links?
You mentioned in your comment above
If we use some persistent store to save the mapping between this hash to actual filters, we would ideally want to segregate long-lived "permalinks" from short-lived ephemeral URLs, and use that understanding to efficiently expire the short-lived hashes.
You likely have a service on the server that handles all of the filters that you support in your application. The trick here is letting that service also manage the hashmap. As more filters and options are added/removed, the service will need to re-hash each permutation of filter selections.
If you need strong support for permalinks, then whenever you remove filters or options, you'll want to maintain the "expired" hashes and change their mapping to point to a reasonable alternative hash.
When do we update hashes in our DB?
There are lots of options, but I would generally prefer build time. If you're using a CI solution like Jenkins, Travis, AWS CodePipeline, etc., then you can add a build step to update your DB. Basically, you're going to...
Keep a persistent record of all the existing supported filters.
On build, check to see if there are any new filters. If so...
Add those filters to the record from step 1.
Hash all new filter permutations (just those that include your new filters) and store those in the hash DB
Check to see if any filters have been removed. If so...
Remove those filters from the record from step 1.
Find all the hashes for permutations that include those filters and either...
remove those hashes from the DB (weak permalinks), or
Point that hash to a reasonable alternative hash in the DB (strong permalinks)

Lets analyse your problem and the solution possible.
Problem : You want a URL which has information about the filter applied so that when you share that URL user doesn't land on arbitrary page.
Solutions:
1) Append filter applied with URL. To achieve this you will need to shorten the key of type of filter and the value of filter so that Length of URL don't exceed much for each filter.
Drawback: This is not most reliable solution as the number of filter increase URL length has to increase no other option.
2) Append a unique key of filter applied(hash) with URL. To achieve this you will need to do some changes on server and client both. On client side you will need a encoding algorithm which convert filter applied to unique hash. On server side you will need decoding algorithm which convert unique hash to filter applied. SO now client whenever a URL like this is hit you can make a POST api call which take this hash give you the array of filter applied or on client side only put the logic to convert this hash.
Do all this in componentWillMount to avoid any side effect.
I think 2nd solution is scalable and efficient in almost all cases.

Full Text Indexing After Modifying a Record

I have an application uses viewpanels to display data. One viewpanel displays unprocessed records and the other displays processed records. The user chooses an unprocessed record (using the show values in this column as links option), and is directed to a page where they input information. Then then click on button that updates the documents using doc.replaceItemValue statements in javascript. The user is then directed back to the viewpanel that displays the unprocessed records. In order to have the just processed record not show up in the unprocessed records I have to reindex the database. I am using database.updateFTIndex(false) to accomplish this.
Is there a better way to accomplish this? If two are more users are submitting records, will their individual indexes step on each other?
I never had to worry about this when using mysql.
Thanks for any advice.

I've used that technique for a while in production and not been notified of any issues. Updating an index via the Database Properties or a View gives the message that it has been queued for update on the server, but I'm not sure if the same happens with the programmatic call. It may well do.
In my scenario, I'm consolidating a lot of data into individual documents, so although intensive use periodically, it's not a huge number of documents being updated at any one time.
I'm also running the update to the index via sessionAsSigner, I had assumed that would be needed for authority purposes.

Where to put "a lot" of data, array / file / somewhere else, in JS on node.js

This may be a "stupid" question to ask, but I am working with a "a lot" of data for the first time.
What I want to do: Querying the World Bank API
Problem: The API is very unflexible when it comes to searching/filtering... I could query every country/indicator for it self, but I would generate a lot of calls. So I wanted to download all informations abourt a country or indicator at once and then sort them on the machine.
My Question: Where/How to store the data? Can I simply but it into an array, do I have to worry about size? Should I write to a temporary json file ? Or do you have another idea ?
Thanks for your time!
Example:
20 Countries, 15 Indicators
If I would query every country for itself I would generate 20*15 API calls, if I would call ALL countries for 1 indicator it would result in 15 API calls. I would get a lot of "junk" data :/

You can keep the data in RAM in an appropriate data structure (array or object) if the following are true:
The data is only needed temporarily (during one particular operation) or can easily be retrieved again if your server restarts.
If you have enough available RAM for your node.js process to store the data in RAM. In a typical server environment, there might be more than a GB of RAM available. I wouldn't recommend using all of that, but you could easily use 100MB of that for data storage.
Keeping it in RAM will likely make it faster and easier to interact with than storing it on disk. The data will, obviously, not be persistent across server restarts if it is in RAM.
If the data is needed long term and you only want to fetch it once and then have access to the data over and over again even if your server restarts of if the data is more than hundreds of MBs or if your server environment does not have a lot of RAM, then you will want to write the data to an appropriate database where it will persist and you can query it as needed.
If you don't know how large your data will be, you can write code to temporarily put it in an array/object and observe the memory usage of your node.js process after the data has been loaded.

I would suggest storing it in a nosql database, since you'll be working with JSON, and querying from there.
mongodb is very 'node friendly' - there's the native driver - https://github.com/mongodb/node-mongodb-native
or mongoose

Storing data from an external source you don't control brings with it the complexity of keeping the data in sync if the data happens to change. Without knowing your use case or the API it's hard to make recommendations. For example, are you sure you need the entire data set? Is there a way to filter down the data based on information you already have (user input, etc)?

Way to access huge content from DB

I need to fetch huge data(may be some 10K records) from DB and show it as report(i use DataTable), and it has data filter/search and pagination.
Question - which one is best/recommended way from the below option,
I will fetch all the records at once and store it in front end(as a object) and if filter applies i will filter from the object and display it.
Likewise i wont interact with DB if i work with pagination(Since i have all the records with myself already)
Every time i need to contact the DB when i applies filter/search.
Likewise for pagination,
For example, if i select page 5 then i will send a query to DB to get me only those data and display it. Note: Number of record per page is also the option to select.
If we have any other best way, please guide me.
Thanks,

I am not familiar with DataTable, but it appears to be similar to jqGrid, which I'm familiar with.
I prefer your proposed solution #2. You are better off fetching only what you need. If you're only displaying, say, 100 rows, it's wasteful (both in terms of bandwidth and local memory usage) to fetch 10k rows at once if you're only displaying 100.
Use LIMIT on the MySQL side to fetch only the records you need. If you want, say, records 200 through 300 for page 3, you'd add LIMIT 200, 100 to the end of your query (the first parameter to LIMIT says "start at 200" and the second says "fetch 100 rows.") If DataTable works like jqGrid, you should be able to re-query the database and repopulate your table when the user changes pages, and this fetch will be done in the background with AJAX, which conserves bandwidth. Your query will be identical except for the range specified by the LIMIT at the end of your query.
Think of it this way: say you use GMail and you never archive your messages, so your inbox contains 20,000 emails, but only shows 100 per page. Do you think Google has designed the GMail front-end so that all 20k subject and from lines are fetched at once and stored locally, or is the server queried again when the user changes pages? (It's the latter.)

Develop Reference

JavaScript is the programming language of the Web.