Working with large JSON data sets in memory using Node - javascript

I am pulling JSON data from Salesforce. I can have roughly about 10 000 records, but never more. In order to prevent Api limits and having to hit Salesforce for every request, I thought I could query the data every hour and then store it in memory. Obviously this will be much much quicker, and much less error prone.
A JSON object would have about 10 properties and maybe one other nested JSON object with two or three properties.
I am using methods similar to below to query the records.
getUniqueProperty: function (data, property) {
return _.chain(data)
.sortBy(function(item) { return item[property]; })
.pluck(property)
.uniq()
.value();
}
My questions are
What would the ramifications be by storing the data into memory and working with the data in memory? I obviously don't want to block the sever by running heavy filtering on the data.
I have never used redis before, but would something like a caching db help?
Would it be best to maybe query the data every hour, and store the JSON response in something like Mongo. I would then do all my querying against Mongo as opposed to in-memory? Every hour I query Salesforce, I just flush the database and reinsert the data.

Storing your data in memory has a couple of disadvantages:
non-scalable — when you decide to use more processes, each process will need to make same api request;
fragile — if your process crashes you will lose the data.
Also working with large amount of data can block process for longer time than you would like.
Solution:
- use external storage! It can be redis, or MongoDB or RDBMS;
- update data in separate process, triggered with cron;
- don't drop the whole database: there is a chance that someone will make a request right after that (if your storage doesn't support transactions, of course), update records.

Related

Stuck on approach for real-time data with custom filters

I've been scratching my head and trying this for about a week now. So I hope I can find my help here..
I'm making an application that provides real-time data to the client, I've thought about Server-Sent-Events but that doesn't allow per-user responses AFAIK.
WebSocket is also an option but I'm not convinced about it, let me sketch my scenario which I did with WS:
Server fetches 20 records every second, and pushes these to an array
This array gets sent to all websocket connections every second, see this pseudo below:
let items = [ { ... some-data ... } ];
io.on("connection", socket => {
setInterval(() => {
io.emit("all_items", items);
}, 1000);
});
The user can select some items in the front end, the websocket receives this per connection
However, I'm conviced the way I'm taking this on is not a good way and enormously innefficient. Let me sketch the scenario of the program of what I want to achieve:
There is a database with let's say 1.000 records
User connects to the back-end from a (React) Front-end, gets connected to the main "stream" with about 20 fetched records (without filters), which the server fetches every second. SELECT * FROM Items LIMIT 20
Here comes the complex part:
The user clicks some checkboxes with custom filters (in the front-end) e.g. location = Shelf 2. Now, what's supposed to happen is that the websocket ALWAYS shows 20 records for that user, no matter what the filters are
I've imagined to have a custom query for each user with custom options, but I think that's bad and will absolutely destroy the server if you have like 10.000 users
How would I be able to take this on? Please, everything helps a little, thank you in advance.
I have to do some guessing about your app. Let me try to spell it out while talking just about the server's functionality, without mentioning MySQL or any other database.
I guess your server maintains about 1k datapoints with volatile values. (It may use a DBMS to maintain those values, but let's ignore that mechanism for the moment.) I guess some process within your application changes those values based on some kind of external stimulus.
Your clients, upon first connecting to your server, start receiving a subset of twenty of those values once a second. You did not specify how to choose that initial subset. All newly-connected clients get the same twenty values.
Clients may, while connected, apply a filter. When they do that, they start getting a different, filtered, subset from among all the values you have. They still get twenty values. Some or all the values may still be in the initial set, and some may not be.
I guess the clients get updated values each second for the same twenty datapoints.
You envision running the application at scale, with many connected clients.
Here are some thoughts on system design.
Keep your datapoints in RAM in a suitable data structure.
Write js code to apply the client-specified filters to that data structure. If that code is efficient you can handle millions of data points this way.
Back up that RAM data structure to a DBMS of your choice; MySQL is fine.
When your server first launches load the data structure from the database.
To get to the scale you mention you'll need to load-balance all this across at least five servers. You didn't mention the process for updating your datapoints, but it will have to fan out to multiple servers, somehow. You need to keep that in mind. It's impossible to advise you about that with the information you gave us.
But, YAGNI. Get things working, then figure out how to scale them up. (It's REALLY hard work to get to 10K users; spend your time making your app excellent for your first 10, then 100 users, then scale it up.)
Your server's interaction with clients goes like this (ignoring authentication, etc).
A client connects, implicitly requesting the "no-filtering" filter.
The client gets twenty values pushed once each second.
A client may implicitly request a different filter at any time.
Then the client continues to get twenty values, chosen by the selected filter.
So, most client communication is pushed out, with an occasional incoming filter request.
This lots-of-downbound-traffic little-bit-of-upbound-traffic is an ideal scenario for Server Sent Events. Websockets or socket.io are also fine. You could structure it like this.
New clients connect to the SSE endpoint at https://example.com/stream
When applying a filter they reconnect to another SSE endpoint at https://example.com/stream?filter1=a&filter2=b&filter3=b
The server sends data each second to each open SSE connection applying the filter. (Streams work very well for this in nodejs; take a look at the server side code for the signalhub package for an example.

What is the best way to manipulate an API AJAX JSON response and pass it to another page?

I think I have a tough one for you guys. Or at least it's been tough for me. I've been searching for the best way to do this in Stack Overflow and everyone that has asked has been given a different response.
I have this code that is accessing an API and calling a maintenance list of all the vehicles in a fleet.
function getMaintenanceList() {
var settings = {
"url": "API URL HERE",
"method": "GET",
"timeout": 0,
"headers": {
"Authorization": "Bearer token here"
},
};
$.ajax(settings).done(function (response) {
// The response the API sends is a JSON object.
// It is an array.
var jsonMaintenance = response;
var parsedJson = JSON.stringify(jsonMaintenance);
//Left over code from when I was trying to
//pass the data directly into the other page
// I was unable to do so
//return jsonMaintenance;
//Left over code from when this was in a PHP file
//and I was posting the stringified response to the page
// for testing purpose
//I had to disable CORS in Google Chrome to test the response out
//console.log(jsonMaintenance);
//document.getElementById("main").innerHTML = parsedJson;
});
};
The code above works well. What I was attempting to do here was write the stringified response to a file. Save that file in the server. Call it from another page using JavaScript and save it as an object in JavaScript, parse it using JSON.parse(), and then pull the required information.
Here's an explanation as to why I'm trying to do it this way. When I call the maintenance list from the API, I'm getting the entire maintenance list from the API, but I need to be able to display only parts of the information from the list.
On one page, we'll call it vehicle-list.php, on it I have a list of all the vehicles in our fleet. They all have unit numbers assigned to them. When I click on a unit number on this page it'll take me to another page which has more information on the vehicle such as the VIN number, license plate, etc. we'll call this page vehicle-info.php. We're using this page for all the vehicles' information, in other words, when we click on different unit numbers on vehicle-list.php it'll always take us to vehicle-info.php. We're only updating the DOM when we go to the page.
I only want to include the information specific to each vehicle unit in the page along with the other info in the DOM. And I only want to call the info from the API once as I am limited to a certain amount of calls for that API. This is why I am attempting to do it this way.
I will say that what I originally wanted to do was get this JSON response once every 24 hours by using a function in vehicle-list.php save the reponse as a variable as seen above var jsonMaintenance = response; and then just access certain parts of the array every time a unit number is clicked. However, I have been unable to access the variable in any other page. I've written many test files attempting to call jsonMaintenance without success so I've been trying to to just save it as a text file to the server and I haven't been able to figure that out either.
After explaining all of the above. My questions are these:
How do I best manipulate this data to accomplish what I want to accomplish? What would be the best standard? Is the above code even the right way to call the data for what I'm trying to do?
There doesn't seem to be a set standard on accomplishing any of this when I search on Stack Overflow. I'd like to be as efficient as possible.
Thank you for your time.
there is a lot of ways how you pass your data through your website after getting it in from an api call, the best approach is to store these information in a database and call it back in which ever way you want, you can do that as far as you are using php, you can store it to sql or to access, if you don't want to store these information in a database like in sql or access, then best way is to store it to localStorage and call it back whenever you want.
I will show you briefly how you can do that, if you want better explanation post an example of your returned data.
to store an item in localstorage use,
localStorage.setItem('key', 'value');
to call an item back from localstorage use,
var somevar = localStorage.getItem('key')
to remove specific item from localstorage use,
localStorage.removeItem('key')
to clear all items saved to localstorage use,
localStorage.clear()
be aware storing the data to localStorage is only at the station you are using
I would do it somehow like this.
Call the maintenance list from the API with the server side language of your choice which seems to be PHP in your case. Lets say the script is called: get-list.php. This can be triggered by a cron job running get-list.php in intervals limited to the certain amount of calls that you are allowed to do for that API. Or if you are not able to create cron jobs then trigger the same get-list.php with an AJAX-call (eg jQuery.get('sld.tld/get-list.php') - in this case get-list.php have to figure out if its the right time to call the API or not).
Now that you have the data you can prepare it as you want and store it as a JSON-string in a text file or database of your choice. If I get you right you have a specific dataset for each vehicle, which have to be identified by an id (you named it "unit number") so your JSON would look kind of: {"unit1": { property1: "val1", property2: "val2" }, "unit2": { property1: "valXYZ", property2: "valABC" }} or alike.
Now when you link to vehicle-info.php from vehicle-list.php, you do it like so: ancor or similar as well. Of course you can also grab the data with AJAX, its just important to deliver vehicle-info.php the corresponding unit number (or id - better to say) and you are good to go.
vehicle-info.php now have all there is to render the page, which is the complete data set stored in text file or data base and the id (unit number) to know which part of the whole dataset to extract.
I wanted to give you this different approach because in my experience this should work out just so much better. If you are working server side (eg PHP) you have write permissions which is not the case for JavaScript-client side. And also performance is not so much of an issue. For instance its not an issue if you have heavy manipulating on the data set at the get-list.php-level. It can run for minutes and once its done it stores the ready-to-use-data making it staticly available without any further impact on performance.
Hope it helps!
If i ran into a similiar problem i would just store the data in a database of my own and call it from there, considering you are only (willing/abe/allowed) to request the data from the API very rarely but need to operate on the data quite frequently (whenever someone clicks on a specific vehice on your applicaiton) this seems like the best course of action.
So rather than querying the data on client side, I'd call it from server, store it on server and and have the client operate on that data.

Best way to pull static data

Consider I have a zoo app that shows all the zoos for each city. Each city is a page with a list of zoos.
In my current solution, on each page, I have ajax call to the server that pulls the list of the zoos for that particular city.
The performance is extremely important for me and my thought was to remove the ajax call and replace it with a JSON object that will live in the app. That way I will save a call to the server and I believe the data will arrive faster.
Is this solution makes sense? There are around 40 cities with ~50 zoos for each.
Consider the data is static and will never change.
Since 900 records is not much **, you can get all the records at once during the initial load and filter the all records array by city, that way your user experience would be much smoother, since client side js processing is far better than n/w latency.
** - note: strictly considering the data set size of ~900
Other solution can be - cache the data in the session scope and when ever there is a specific request for a city check for the availability in session scope, if it's not there make a n/w call.
I think correct question is what is my performance requirements?
Because you can write all your data in json object and do everything on client side without any ajax call but in this case when any client visit your page that means it will download all data. and that is another question mark

PHP: Request 50MB-100MB json - browser crash / do not display any result

Huge json server requests: around 50MB - 100MB for example.
From what I know, it might crash when loading huge requests of data to a table (I usually use datatables), the result: memory reaches to almost 8G, and the browser crash. Chrome might not return a result, Firefox will usually ask if I want to wait or kill the process.
I'm going to start working on a project which will send requests for huge jsons, all compressed (done by the server side PHP). The purpose of my report is to fetch data, and display all in a table - made easy to filter and order. So I cant find the use of "lazy load"ing for this specific case.
I might use a vue-js datatable library this time (not sure which specifically).
What's exactly using so much of my memory? I know for sure that the json result is received. Is that rendering/parsing of the json to the DOM? (I'm referring to the datatable example for now: https://datatables.net/examples/data_sources/ajax)
What is the best practices in these kind of situations?
I started researching this issue and noticed that there are posts from 2010 that seem like they're not relevant at all.
There is no limit on the size of an HTTP response. There is a limit on other things, such as:
local storage
session storage
cache
cookies
query string length
memory (per your CPU limitations or browser allocation)
Instead, the problem is with your implementation of your datatable most likely. You can't just insert 100,000 nodes into the DOM and not expect some type of performance impact. Furthermore, if the datatable is performing logic against each of those datum as they're coming in and processing them before the node insertion, that's also going to be a big no no.
What you've done here is essentially pass the leg work of performing pagination from the server to the client, and with dire impacts.
If you must return a response that big, consider using one of the storage options that browsers provide (a few mentioned above). Then paginate off of the stored JSON response.

Where to put "a lot" of data, array / file / somewhere else, in JS on node.js

This may be a "stupid" question to ask, but I am working with a "a lot" of data for the first time.
What I want to do: Querying the World Bank API
Problem: The API is very unflexible when it comes to searching/filtering... I could query every country/indicator for it self, but I would generate a lot of calls. So I wanted to download all informations abourt a country or indicator at once and then sort them on the machine.
My Question: Where/How to store the data? Can I simply but it into an array, do I have to worry about size? Should I write to a temporary json file ? Or do you have another idea ?
Thanks for your time!
Example:
20 Countries, 15 Indicators
If I would query every country for itself I would generate 20*15 API calls, if I would call ALL countries for 1 indicator it would result in 15 API calls. I would get a lot of "junk" data :/
You can keep the data in RAM in an appropriate data structure (array or object) if the following are true:
The data is only needed temporarily (during one particular operation) or can easily be retrieved again if your server restarts.
If you have enough available RAM for your node.js process to store the data in RAM. In a typical server environment, there might be more than a GB of RAM available. I wouldn't recommend using all of that, but you could easily use 100MB of that for data storage.
Keeping it in RAM will likely make it faster and easier to interact with than storing it on disk. The data will, obviously, not be persistent across server restarts if it is in RAM.
If the data is needed long term and you only want to fetch it once and then have access to the data over and over again even if your server restarts of if the data is more than hundreds of MBs or if your server environment does not have a lot of RAM, then you will want to write the data to an appropriate database where it will persist and you can query it as needed.
If you don't know how large your data will be, you can write code to temporarily put it in an array/object and observe the memory usage of your node.js process after the data has been loaded.
I would suggest storing it in a nosql database, since you'll be working with JSON, and querying from there.
mongodb is very 'node friendly' - there's the native driver - https://github.com/mongodb/node-mongodb-native
or mongoose
Storing data from an external source you don't control brings with it the complexity of keeping the data in sync if the data happens to change. Without knowing your use case or the API it's hard to make recommendations. For example, are you sure you need the entire data set? Is there a way to filter down the data based on information you already have (user input, etc)?

Categories

Resources