How to perform fast search on JSON file? - javascript

I have a json file that contains many objects and options.
Each of these kinds:
{"item": "name", "itemId": 78, "data": "Some data", ..., "option": number or string}
There are about 10,000 objects in the file.
And when part of item value("ame", "nam", "na", etc) entered , it should display all the objects and their options that match this part.
RegExp is the only thing that comes to my mind, but at 200mb+ file it starts searching for a long time(2 seconds+)
That's how I'm getting the object right now:
let reg = new RegExp(enteredName, 'gi'), //enteredName for example "nam"
data = await fetch("myFile.json"),
jsonData = await data.json();
let results = jsonData.filter(jsonObj => {
let item = jsonObj.item,
itemId = String(jsonObj.itemId);
return reg.test(item) || reg.test(itemId);
});
But that option is too slow for me.
What method is faster to perform such search using js?

Looking up items by item number should be easy enough by creating a hash table, which others have already suggested. The big problem here is searching for items by name. You could burn a ton of RAM by creating a tree, but I'm going to go out on a limb and guess that you're not necessarily looking for raw lookup speed. Instead, I'm assuming that you just want something that'll update a list on-the-fly as you type, without actually interrupting your typing, is that correct?
To that end, what you need is a search function that won't lock-up the main thread, allowing the DOM to be updated between returned results. Interval timers are one way to tackle this, as they can be set up to iterate through large, time-consuming volumes of data while allowing for other functions (such as DOM updates) to be executed between each iteration.
I've created a Fiddle that does just that:
// Create a big array containing items with names generated randomly for testing purposes
let jsonData = [];
for (i = 0; i < 10000; i++) {
var itemName = '';
jsonData.push({ item: Math.random().toString(36).substring(2, 15) + Math.random().toString(36).substring(2, 15) });
}
// Now on to the actual search part
let returnLimit = 1000; // Maximum number of results to return
let intervalItr = null; // A handle used for iterating through the array with an interval timer
function nameInput (e) {
document.getElementById('output').innerHTML = '';
if (intervalItr) clearInterval(intervalItr); // If we were iterating through a previous search, stop it.
if (e.value.length > 0) search(e.value);
}
let reg, idx
function search (enteredName) {
reg = new RegExp(enteredName, 'i');
idx = 0;
// Kick off the search by creating an interval that'll call searchNext() with a 0ms delay.
// This will prevent the search function from locking the main thread while it's working,
// allowing the DOM to be updated as you type
intervalItr = setInterval(searchNext, 0);
}
function searchNext() {
if (idx >= jsonData.length || idx > returnLimit) {
clearInterval(intervalItr);
return;
}
let item = jsonData[idx].item;
if (reg.test(item)) document.getElementById('output').innerHTML += '<br>' + item;
idx++;
}
https://jsfiddle.net/FlimFlamboyant/we4r36tp/26/
Note that this could also be handled with a WebWorker, but I'm not sure it's strictly necessary.
Additionally, this could be further optimized by utilizing a secondary array that is filled as the search takes place. When you enter an additional character and a new search is started, the new search could begin with this secondary array, switching to the original if it runs out of data.

Related

Is there a way to stop a COUNTIF function after the value changes from TRUE to FALSE in google spreadsheets?

I have a row of tick boxes and I need to count the most recent successive TRUE values. So it should start counting when the first TRUE appears and stop when it changes to FALSE, ignoring anything that comes after that.
Right now I have a script doing that, but since I have a lot of entries, it takes a long time to run and stops after 6min without finishing everything.
for(var j=3;j<lastRow;j++){
count = 0;
for (var k=stupac-1;k>2;k--){
if (range.getCell(j,k).getValue() == true){
count++;
}
else if ((range.getCell(j,k).isChecked() == false)&&(count>0)){
break;
}
}
range.getCell(j,stupac).setValue(count);
}
I thought the best way would be to stop the COUNTIF when the value changes, but have had no luck trying to get that working.
I came up with this solution which seems pretty fast. Your approach iterates over each cell which is not efficient especially when the search space is large. The following approach iterates over the rows in the specified data range.
function myFunction() {
const ss = SpreadsheetApp.getActive();
const sh = ss.getSheetByName('Sheet1');
const data = sh.getRange('A3:G'+sh.getLastRow()).getValues();
const counts = [];
data.forEach(row=>{
let t_index = row.indexOf(true);
let cutRow = t_index > -1 ? row.slice(t_index, row.length+1) : [];
let f_index = cutRow.indexOf(false);
let ct = f_index > -1 ? f_index : cutRow.length;
counts.push([ct]);
});
sh.getRange(3,8,counts.length,counts[0].length).setValues(counts);
}
Please adjust the data ranges according to your needs.
The solution here matches the following file:
Please make sure that you have enabled V8 Runtime.

Looping through a series of GET requests

I have a get request that looks for a specific ID within a tree, and then pulls back the value from that ID. I need to loop through a series of these get requests, each with a similar ID (each ID increases in value by one).
I have created a standard loop using hard coded values but I'm struggling to set the variable based on dynamic values coming out of the tree.
For example, I can set a variable like this:
var cars = [entry.get('sub_menu.sub_menu_link.0.title'), entry.get('sub_menu.sub_menu_link.1.title'), entry.get('sub_menu.sub_menu_link.2.title')];
This grabs all the values from these areas of the tree.
But I don't know how many of these there will be so I can't hard code it in this way. I need to be able to replace 0, 1 and 2 in those values with a loop that adds a new get request and increases the integer between "link." and ".title" each time.
Expected result would be to add as many get requests in to the variable as it finds, with the integer increased for each request, until it finds no more.
Full example code with hard coded get requests is below (which won't actually work because the tree isn't being pulled in. For example purposes only):
Query.fetch()
.then(function success(entry) {
var subMenu = [entry.get('sub_menu.sub_menu_link.0.title'), entry.get('sub_menu.sub_menu_link.1.title'), entry.get('sub_menu.sub_menu_link.2.title')];
var text = "";
var i;
for (i = 0; i < subMenu.length; i++) {
text += subMenu[i] + "<br>";
}
document.getElementById("subMenu-container").innerHTML = text;
},
function error(err) {
// err object
});
I'm answering based on three assumptions:
entry.get is synchronous
entry.get returns null or undefined if it couldn't get the string matching the argument we pass it
Although your code is using promises, you want to keep it to ES5-level language features (not ES2015+)
See inline comments:
Query.fetch()
.then(function success(entry) {
// Loop getting titles until we run out of them, put them in `titles`
var titles = [];
var title;
while (true) { // Loop until `break;`
// Use string concat to include the value of `titles.length` (0, 1, ...) in the string
title = entry.get('sub_menu.sub_menu_link.' + titles.length + '.title');
if (title === null || title === undefined) {
break;
}
titles.push(title);
}
// Use `Array.prototype.join` to join the titles together with <br> in-between
document.getElementById("subMenu-container").innerHTML = titles.join("<br>");
}, function error(err) {
// Handle/report error
});

node.js: expensive request

Hi, everyone! I need some help in my first app:
I’m creating an application with express+node.js as the background. There is no database. I’m using 3rd-party solution with some functions, that doing calculations, instead.
Front
50 objects. Every object has one unique value: random number. At start I have all these objects, I need to calculate some values for every object and position it on the form based on the calculated results.
Each object sends: axios.get('/calculations?value=uniqueValue') and I accumulate the results in an array. When array.length will be equal 50 I will compare array elements to each other and define (x, y) coordinates of each object. After that, objects will appear on the form.
Back
let value = uniqueValue; // an unique value received from an object
let requests = [];
for (let i = 0; i < 1500; i++) { // this loop is necessary due to application idea
requests.push(calculateData(value)); // 3rd-party function
value += 1250;
}
let result = await Promise.all(requests);
let newData = transform(result); // here I transform the calculated result and then return it.
return newData
Calculations for one object cost 700 ms. All calculations for all 50 objects cost ≈10 seconds. 3rd-party function receives only one value at the same time, but works very quickly. But the loop for (let i = 1; i < 1500; i++) {…} is very expensive.
Issues
10 seconds is not a good result, user can’t wait so long. May be I should change in approach for calculations?
Server is very busy while calculating, and other requests (e.g. axios.get('/getSomething?params=something') are pending.
Any advice will be much appreciated!
You can make the call in chunks of data using async.eachLimit
var values = [];
for (let i = 0; i < 1500; i++) { // this loop is necessary due to application idea
values.push(value);
value += 1250;
}
var arrayOfItemArrays = _.chunk(values, 50);
async.eachLimit(arrayOfItemArrays, 5, eachUpdate, function(err, result){let
newData = transform(result);
return newData ;
});
function eachUpdate(req_arr, cb){
var result = []
req_arr.forEach(fucntion(item){
calculateData(item).then((x){
result.push(x);
});
cb(result);
}

how to work with a large array in javascript [duplicate]

This question already has answers here:
Best way to iterate over an array without blocking the UI
(4 answers)
Closed 6 years ago.
In my application I have a very big array (arround 60k records). Using a for loop I am doing some operations on it as shown below.
var allPoints = [];
for (var i = 0, cLength = this._clusterData.length; i < cLength; i+=1) {
if (allPoints.indexOf(this._clusterData[i].attributes.PropertyAddress) == -1) {
allPoints.push(this._clusterData[i].attributes.PropertyAddress);
this._DistClusterData.push(this._clusterData[i])
}
}
When I run this loop the browser hangs as it is very big & in Firefox is shows popup saying "A script on this page may be busy, or it may have stopped responding. You can stop the script now, or you can continue to see if the script will complete". What can I do so that browser do not hang?
You need to return control back to the browser in order to keep it responsive. That means you need to use setTimeout to end your current processing and schedule it for resumption sometime later. E.g.:
function processData(i) {
var data = clusterData[i];
...
if (i < clusterData.length) {
setTimeout(processData, 0, i + 1);
}
}
processData(0);
This would be the simplest thing to do from where you currently are.
Alternatively, if it fits what you want to do, Web Workers would be a great solution, since they actually shunt the work into a separate thread.
Having said this, what you're currently doing is extremely inefficient. You push values into an array, and consequently keep checking the ever longer array over and over for the values it contains. You should be using object keys for the purpose of de-duplication instead:
var allPoints = {};
// for (...) ...
if (!allPoints[address]) { // you can even omit this entirely
allPoints[address] = true;
}
// later:
allPoints = allPoints.keys();
First of all, avoid the multiple this._clusterData[i] calls. Extract it to a variable like so:
var allPoints = [];
var current;
for (var i = 0, cLength = this._clusterData.length; i < cLength; i+=1) {
current = this._clusterData[i];
if (allPoints.indexOf(current.attributes.PropertyAddress) == -1) {
allPoints.push(current.attributes.PropertyAddress);
this._DistClusterData.push(current)
}
}
This should boost your performance quite a bit :-)
As others already pointed out, you can do this asynchronously, so the browser remains responsive.
It should be noted however that the indexOf operation you do can become very costly. It would be better if you would create a Map keyed by the PropertyAddress value. That will take care of the duplicates.
(function (clusterData, batchSize, done) {
var myMap = new Map();
var i = 0;
(function nextBatch() {
for (data of clusterData.slice(i, i+batchSize)) {
myMap.set(data.attributes.PropertyAddress, data);
}
i += batchSize;
if (i < clusterData.length) {
setTimeout(nextBatch, 0);
} else {
done(myMap);
}
})();
})(this._clusterData, 1000, function (result) {
// All done
this._DistClusterData = result;
// continue here with other stuff you want to do with it.
}.bind(this));
Try considering adding to the array asynchronously with a list, for a set of 1000 records at a time, or for what provides the best performance. This should free up your application while a set of items is added to a list.
Here is some additional information: async and await while adding elements to List<T>

Is this as inefficient as it feels?

I have a webpage with two lists. A source list (represented by availableThings) populated by a search, and items that the user has selected (selectedThings). I want to maintain a unique list of selectedThings, so I want to remove already selected things from the list of available things. In my code snippet below, data.AvailableThings is populated from the server and has no knowledge of user-selected things. The user can select up to 3 items, ergo selectedThings.items will contain no more than 3 items. availableThings.items can potentially be a few thousand.
After availableThings.items gets populated, I feed it into ICanHaz for the HTML generation. FWIW, I'm using jQuery for drag behavior between the lists, but the question is jQuery-agnostic.
[... jQuery AJAX call snipped ...]
success: function (data) {
availableThings.items = [];
for (var thing in data.AvailableThings) {
var addToList = true;
for (var existing in selectedThings.items) {
if (existing.Id === thing.Id) {
addToList = false;
break;
}
}
if (addToList) {
availableThings.items.push(thing);
}
}
}
If n is the count of available things and m is the count of selected things, then this is O(n * m) whereas if you hashed by ID, you could turn this into O(n + m).
var existingIds = {};
for (var existing in selectedThings.items) {
existingIds[existing.Id] = existingIds;
}
availableThings.items = [];
for (var thing in data.AvailableThings) {
if (existingIds[thing.Id] !== existingIds) {
availableThings.items.push(thing);
}
}
If there is some sort of order (ordered by ID, name, or any field) to the data coming from the server, you could just do a binary search for each of the items in the selected set, and remove them if they are found. This would reduce it to O(m log n) for a dataset of n items where selection of m items is allowed. Since you've got it fixed at 3, it would essentially be O(log n).

Categories

Resources