Related
I'm using .splice() to remove objects from an array of objects containing a timestamp, based on whether the userDates contains either a matching timestamp or a timestamp within a range of 45 minutes before or after the object's timestamp. Essentially removing all objects with overlapping date values with the userDates array date values.
When you run this code, you'll notice that some of the objects get removed, but others don't.
JSFiddle: https://jsfiddle.net/jb2t3Lr9/1/ and code to reproduce the problem:
let userDates = ["2020-11-20T22:00:00.000Z","2020-11-20T23:00:00.000Z","2020-11-21T00:00:00.000Z","2020-11-21T01:00:00.000Z","2020-11-22T02:15:00.000Z","2020-11-22T03:15:00.000Z","2020-11-22T01:00:00.000Z","2020-11-22T00:00:00.000Z","2020-11-21T23:00:00.000Z","2020-12-13T22:00:00.000Z","2020-12-14T22:00:00.000Z","2020-12-15T22:00:00.000Z","2020-12-16T22:00:00.000Z","2020-12-13T23:00:00.000Z","2020-12-14T23:00:00.000Z","2020-11-21T20:00:00.000Z","2020-11-22T20:00:00.000Z","2020-11-22T19:00:00.000Z","2020-11-21T19:00:00.000Z"];
let datesToUpdate = [
{ sessionInterval: 50, dateTime: '2020-11-22T20:00:00.000Z' },
{ sessionInterval: 50, dateTime: '2020-11-21T20:00:00.000Z' },
{ sessionInterval: 50, dateTime: '2020-11-21T19:00:00.000Z' },
{ sessionInterval: 50, dateTime: '2020-11-22T19:00:00.000Z' },
{ sessionInterval: 50, dateTime: '2020-11-22T17:30:00.000Z' },
{ sessionInterval: 50, dateTime: '2020-11-21T17:00:00.000Z' }
];
function removeOverlappingDates(userDates, datesToUpdate) {
const FIFTEEN_MINUTES = 15 * 60 * 1000; // milliseconds
datesToUpdate.forEach((toUpdate, index) => {
userDates.forEach((date) => {
let dateInMS = new Date("" + date).valueOf();
const fifteenBefore = dateInMS - FIFTEEN_MINUTES;
const thirtyBefore = dateInMS - FIFTEEN_MINUTES * 2;
const fortyFiveBefore = dateInMS - FIFTEEN_MINUTES * 3;
const fifteenAfter = dateInMS + FIFTEEN_MINUTES;
const thirtyAfter = dateInMS + FIFTEEN_MINUTES * 2;
const fortyFiveAfter = dateInMS + FIFTEEN_MINUTES * 3;
let toUpdateInMS = new Date("" + toUpdate.dateTime).valueOf();
if (
toUpdateInMS == fifteenBefore ||
toUpdateInMS == thirtyBefore ||
toUpdateInMS == fortyFiveBefore ||
toUpdateInMS == fifteenAfter ||
toUpdateInMS == thirtyAfter ||
toUpdateInMS == fortyFiveAfter ||
toUpdateInMS == dateInMS
) {
datesToUpdate.splice(index, 1);
}
});
});
return datesToUpdate;
}
console.log("datesToUpdate 1", datesToUpdate);
datesToUpdate = removeOverlappingDates(userDates, datesToUpdate);
console.log("datesToUpdate 2", datesToUpdate);
What's even stranger to me is that if I just compare to arrays of datetime values against each other (with the same datetime values as the array of objects contains), then everything gets removed properly. Fiddle: https://jsfiddle.net/rc1mvzLq/
I'll try to write this with minimal new methods, but as mentioned in the comments, filter is ideal for this use case.
First and foremost, never change the array you're iterating over. That causes behaviours that are hard to debug. However, a copy of the array in the beginning would not help you in this case because you're using splice to remove an element of the array in the middle of it which causes a bunch of elements to reindex.
For example if you have [A,B,C,D,E] and you remove the element at index 1, you now have [A,C,D,E,F] and (since the iteration for that index is done) your index got incremented and is now 2 and you will never test value C in your logic. A few more situations like this one and you can see how multiple elements get left unchecked. The only way this can work is if two elements that need to get deleted are never next to one another.
To keep this simple and as close to your original logic as possible, what I advise is that you count how many elements you've deleted and offset the index by that amount. However also copy the array so that you don't mutate the one you're iterating over (NOTE: a shallow copy is enough, and you can simply use the spread syntax to create a new array with the same objects).
function removeOverlappingDates(userDates, datesToUpdate) {
const FIFTEEN_MINUTES = 15 * 60 * 1000; // milliseconds
const newDatesToUpdate = [...datesToUpdate];
let deletedElementsCount = 0;
datesToUpdate.forEach((toUpdate, index) => {
// same code as before
if (
// check if this condition should maybe be toUpdateInMS >= fortyFiveBefore && toUpdateInMS <= fortyFiveAfter
) {
const newIndex = index - deletedElementsCount;
newDatesToUpdate.splice(newIndex, 1);
deletedElementsCount = deletedElementsCount + 1;
}
});
});
return newDatesToUpdate;
}
As a side note, there is a reason why the copy is the one I'm changing and that's because datesToUpdate is passed as a function argument. If the datesToUpdate array gets changed inside the function, it will remain changed after it. You shouldn't change argument objects (this includes arrays) inside functions unless you want it to remain like that. A function that does this it is considered to have side effects and this can also be hard to debug if not used cautiously. In your case there is no need to do side effects because you're returning the result.
I am attempting to iterate over a very large 2D array in JavaScript within an ionic application, but it is majorly bogging down my app.
A little background, I created custom search component with StencilJS that provides suggestions upon keyup. You feed the component with an array of strings (search suggestions). Each individual string is tokenized word by word and split into an array and lowercase
For example, "Red-Winged Blackbird" becomes
['red','winged','blackbird']
So, tokenizing an array of strings looks like this:
[['red','winged','blackbird'],['bald','eagle'], ...]
Now, I have 10,000+ of these smaller arrays within one large array.
Then, I tokenize the search terms the user inputs upon each keyup.
Afterwards, I am comparing each tokenized search term array to each tokenized suggestion array within the larger array.
Therefore, I have 2 nested for-of loops.
In addition, I am using Levenshtein distance to compare each search term to each element of each suggestion array.
I had a couple drinks so please be patient while i stumble through this.
To start id do something like a reverse index (not very informative). Its pretty close to what you are already doing but with a couple extra steps.
First go through all your results and tokenize, stem, remove stops words, decap, coalesce, ects. It looks like you've already done this but i'm adding an example for completion.
const tokenize = (string) => {
const tokens = string
.split(' ') // just split on words, but maybe rep
.filter((v) => v.trim() !== '');
return new Set(tokens);
};
Next what we want to do is generate a map that takes a word as an key and returns us a list of document indexes the word appears in.
const documents = ['12312 taco', 'taco mmm'];
const index = {
'12312': [0],
'taco': [0, 1],
'mmm': [2]
};
I think you can see where this is taking us... We can tokenize our search term and find all documents each token belongs, to work some ranking magic, take top 5, blah blah blah, and have our results. This is typically the way google and other search giants do their searches. They spend a ton of time in precomputation so that their search engines can slice down candidates by orders of magnitude and work their magic.
Below is an example snippet. This needs a ton of work(please remember, ive been drinking) but its running through a million records in >.3ms. Im cheating a bit by generate 2 letter words and phrases, only so that i can demonstrate queries that sometimes achieve collision. This really doesn't matter since the query time is on average propionate to the number of records. Please be aware that this solution gives you back records that contain all search terms. It doesn't care about context or whatever. You will have to figure out the ranking (if your care at this point) to achieve the results you want.
const tokenize = (string) => {
const tokens = string.split(' ')
.filter((v) => v.trim() !== '');
return new Set(tokens);
};
const ri = (documents) => {
const index = new Map();
for (let i = 0; i < documents.length; i++) {
const document = documents[i];
const tokens = tokenize(document);
for (let token of tokens) {
if (!index.has(token)) {
index.set(token, new Set());
}
index.get(token).add(i);
}
}
return index;
};
const intersect = (sets) => {
const [head, ...rest] = sets;
return rest.reduce((r, set) => {
return new Set([...r].filter((n) => set.has(n)))
}, new Set(head));
};
const search = (index, query) => {
const tokens = tokenize(query);
const canidates = [];
for (let token of tokens) {
const keys = index.get(token);
if (keys != null) {
canidates.push(keys);
}
}
return intersect(canidates);
}
const word = () => Math.random().toString(36).substring(2, 4);
const terms = Array.from({ length: 255 }, () => word());
const documents = Array.from({ length: 1000000 }, () => {
const sb = [];
for (let i = 0; i < 2; i++) {
sb.push(word());
}
return sb.join(' ');
});
const index = ri(documents);
const st = performance.now();
const query = 'bb iz';
const results = search(index, query);
const et = performance.now();
console.log(query, Array.from(results).slice(0, 10).map((i) => documents[i]));
console.log(et - st);
There are some improvements you can make if you want. Like... ranking! The whole purpose of this example is to show how we can cut down 1M results to maybe a hundred or so canidates. The search function has some post filtering via intersection which probably isn't what you want you want but at this point it doesn't really matter what you do since the results are so small.
Question
I am using tf.Tensor and tf.concat() to handle large training data,
and I found continuous using of tf.concat() gets slow.
What is the best way to load large data from file to tf.Tensor?
Background
I think it's common way to handle data by array in Javascript.
to achieve that, here is the rough steps to go.
steps to load data from file to Array
read line from file
parse line to Javascript's Object
add that object to array by Array.push()
after finish reading line to end, we can use that array with for loop.
so I think I can use tf.concat() in similar way to above.
steps to load data from file to tf.Tensor
read line from file
parse line to Javascript's Object
parse object to tf.Tensor
add tensor to original tensor by tf.concat()
after finish reading line to end, we can use that tf.Tensor
Some code
Here is some code to measure both speed of Array.push() and tf.concat()
import * as tf from "#tensorflow/tfjs"
let t = tf.tensor1d([1])
let addT = tf.tensor1d([2])
console.time()
for (let idx = 0; idx < 50000; idx++) {
if (idx % 1000 == 0) {
console.timeEnd()
console.time()
console.log(idx)
}
t = tf.tidy(() => t.concat(addT))
}
let arr = []
let addA = 1
console.time()
for (let idx = 0; idx < 50000; idx++) {
if (idx % 1000 == 0) {
console.timeEnd()
console.time()
console.log(idx)
}
arr.push(addA)
}
Measurement
We can see stable process on Array.push(),
but it gets slow on tf.concat()
For tf.concat()
default: 0.150ms
0
default: 68.725ms
1000
default: 62.922ms
2000
default: 23.199ms
3000
default: 21.093ms
4000
default: 27.808ms
5000
default: 39.689ms
6000
default: 34.798ms
7000
default: 45.502ms
8000
default: 94.526ms
9000
default: 51.996ms
10000
default: 76.529ms
11000
default: 83.662ms
12000
default: 45.730ms
13000
default: 89.119ms
14000
default: 49.171ms
15000
default: 48.555ms
16000
default: 55.686ms
17000
default: 54.857ms
18000
default: 54.801ms
19000
default: 55.312ms
20000
default: 65.760ms
For Array.push()
default: 0.009ms
0
default: 0.388ms
1000
default: 0.340ms
2000
default: 0.333ms
3000
default: 0.317ms
4000
default: 0.330ms
5000
default: 0.289ms
6000
default: 0.299ms
7000
default: 0.291ms
8000
default: 0.320ms
9000
default: 0.284ms
10000
default: 0.343ms
11000
default: 0.327ms
12000
default: 0.317ms
13000
default: 0.329ms
14000
default: 0.307ms
15000
default: 0.218ms
16000
default: 0.193ms
17000
default: 0.234ms
18000
default: 1.943ms
19000
default: 0.164ms
20000
default: 0.148ms
While the tf.concat and Array.push function look and behave similar there is one big difference:
tf.concat creates a new tensor from the input
Array.push adds the input to the first array
Examples
tf.concat
const a = tf.tensor1d([1, 2]);
const b = tf.tensor1d([3]);
const c = tf.concat([a, b]);
a.print(); // Result: Tensor [1, 2]
b.print(); // Result: Tensor [3]
c.print(); // Result: Tensor [1, 2, 3]
The resulting variable c is a new Tensor while a and b are not changed.
Array.push
const a = [1,2];
a.push(3);
console.log(a); // Result: [1,2,3]
Here, the variable a is directly changed.
Impact on the runtime
For the runtime speed, this means that tf.concat copies all tensor values to a new tensor before adding the input. This obviously takes more time the bigger the array is that needs to be copied. In contrast to that, Array.push does not create a copy of the array and therefore the runtime will be more or less the same no matter how big the array is.
Note, that this is "by design" as tensors are immutable, so every operation on an existing tensor always creates a new tensor. Quote from the docs:
Tensors are immutable, so all operations always return new Tensors and never modify input Tensors.
Therefore, if you need to create a large tensor from input data it is advisable to first read all data from your file and merge it with "vanilla" JavaScript functions before creating a tensor from it.
Handling data too big for memory
In case you have a dataset so big that you need to handle it in chunks because of memory restrictions, you have two options:
Use the trainOnBatch function
Use a dataset generator
Option 1: trainOnBatch
The trainOnBatch function allows to train on a batch of data instead of using the full dataset to it. Therefore, you can split your code into reasonable batches before training them, so you don't have to merge your data together all at once.
Option 2: Dataset generator
The other answer already went over the basics. This will allow you to use a JavaScript generator function to prepare the data. I recommend to use the generator syntax instead of an iterator factory (used in the other answer) as it is the more modern JavaScript syntax.
Exampe (taken from the docs):
function* dataGenerator() {
const numElements = 10;
let index = 0;
while (index < numElements) {
const x = index;
index++;
yield x;
}
}
const ds = tf.data.generator(dataGenerator);
You can then use the fitDataset function to train your model.
Though there is not a single way of creating a tensor, the answer of the questions lies to what is done with the tensors created.
Performance
tensors are immutable, therefore each time, tf.concat is called a new tensor is created.
let x = tf.tensor1d([2]);
console.log(tf.memory()) // "numTensors": 1
const y = tf.tensor1d([3])
x = tf.concat([x, y])
console.log(tf.memory()) // "numTensors": 3,
<html>
<head>
<!-- Load TensorFlow.js -->
<script src="https://cdn.jsdelivr.net/npm/#tensorflow/tfjs#0.14.1"> </script>
</head>
<body>
</body>
</html>
As we can see from the snippet above, the number of tensors that is created when tf.concat is called is 3 and not 2 . It is true that tf.tidy will dispose of unused tensors. But this operation of creating and disposing of tensors will become most and most costly as the created tensor is getting bigger and bigger. This is both an issue of memory consumption and computation since creating a new tensor will always delegate to a backend.
creating tensor from large data
Now that the issue of performance is understood, what is the best way to proceed ?
create the whole array in js and when the whole array is completed, then create the tensor.
for (i= 0; i < data.length; i++) {
// fill array x
x.push(dataValue)
}
// create the tensor
tf.tensor(x)
Though, it is the trivial solution, it is not always possible. Because create an array will keep data in memory and we can easily run out of memory with big data entries. Therefore sometimes, it might be best instead of creating the whole javascript array to create chunk of arrays and create a tensor from those chunk of arrays and start to process those tensors as soon as they are created. The chunk tensors can be merged using tf.concat again if necessary. But it might not always be required.
For instance we can call model.fit() repeatedly using chunk of tensors instead of calling it once with a big tensor that might take long to create. In this case, there is no need to concatenate the chunk tensors.
if possible create a dataset using tf.data. This is the ideal solution, if we are next to fit a model with the data.
function makeIterator() {
const iterator = {
next: () => {
let result;
if (index < data.length) {
result = {value: dataValue, done: false};
index++;
return result;
}
return {value: dataValue, done: true};
}
};
return iterator;
}
const ds = tf.data.generator(makeIterator);
The advantage of using tf.data is that the whole dataset is created by batches when needed during model.fit call.
I'm starting with react-native building an app to track lap times from my RC Cars. I have an arduino with TCP connection (server) and for each lap, this arduino sends the current time/lap for all connected clients like this:
{"tx_id":33,"last_time":123456,"lap":612}
In my program (in react-native), I have one state called dados with this struct:
dados[tx_id] = {
tx_id: <tx_id>,
last_time:,
best_lap:0,
best_time:0,
diff:0,
laps:[]
};
This program connects to arduino and when receive some data, just push to this state. More specific in laps array of each transponder. Finally, I get something like this:
dados[33] = {
tx_id:33,
last_time: 456,
best_lap: 3455,
best_time: 32432,
diff: 32,
laps: [{lap:1,time:1234},{lap:2,time:32323},{lap:3,time:3242332}]
}
dados[34] = {
tx_id:34,
last_time: 123,
best_lap: 32234,
best_time: 335343,
diff: 10,
laps: [{lap:1,time:1234},{lap:2,time:32323},{lap:3,time:3242332}]
}
dados[35] = {
tx_id:35,
last_time: 789,
best_lap: 32234,
best_time: 335343,
diff: 8,
laps: [{lap:1,time:1234},{lap:2,time:32323},{lap:3,time:3242332},{lap:4,time:343232}]
}
This data in rendered to View's using map function (not a FlatList).
My problem now is that I need to order this before printing on screen.
Now, with this code, data are printed using tx_id as order, since it's the key for main array. Is there a way to order this array using number of elements in laps property and the second option to sort, use last_time property of element?
In this case, the last tx of my example (35) would be the first in the list because it has one lap more than other elements. The second item would be 34 (because of last_time). And the third would be tx 33.
Is there any way to to this in JavaScript, or I need to create a custom functions and check every item in recursive way?!
Tks #crackhead420
While waiting for reply to this question, I just found what you said.... :)
This is my final teste/solution that worked:
var t_teste = this.state.teste;
t_teste[33] = {tx_id: 33, last_time:998,best_lap:2,best_time:123,diff:0,laps:[{lap:1,time:123},{lap:2,time:456}]};
t_teste[34] = {tx_id: 34, last_time:123,best_lap:2,best_time:123,diff:0,laps:[{lap:1,time:123},{lap:2,time:456}]};
t_teste[35] = {tx_id: 35, last_time:456,best_lap:2,best_time:123,diff:0,laps:[{lap:1,time:123},{lap:2,time:456},{lap:3,time:423}]};
t_teste[36] = {tx_id: 36, last_time:789,best_lap:2,best_time:123,diff:0,laps:[{lap:1,time:123},{lap:2,time:456}]};
console.log('Teste original: ',JSON.stringify(t_teste));
var saida = t_teste.sort(function(a, b) {
if (a.laps.length > b.laps.length) {
return -1;
}
if (a.laps.length < b.laps.length) {
return 1;
}
// In this case, the laps are equal....so let's check last_time
if (a.last_time < b.last_time) {
return -1; // fastest lap (less time) first!
}
if (a.last_time > b.last_time) {
return 1;
}
// Return the same
return 0;
});
console.log('Teste novo: ',JSON.stringify(saida));
Using some simple helper functions, this is definitely possible:
const data = [{tx_id:33,last_time:456,best_lap:3455,best_time:32432,diff:32,laps:[{lap:1,time:1234},{lap:2,time:32323},{lap:3,time:3242332}]},{tx_id:34,last_time:123,best_lap:32234,best_time:335343,diff:10,laps:[{lap:1,time:1234},{lap:2,time:32323},{lap:3,time:3242332}]},{tx_id:35,last_time:789,best_lap:32234,best_time:335343,diff:8,laps:[{lap:1,time:1234},{lap:2,time:32323},{lap:3,time:3242332},{lap:4,time:343232}]}]
const sortBy = fn => (a, b) => -(fn(a) < fn(b)) || +(fn(a) > fn(b))
const sortByLapsLength = sortBy(o => o.laps.length)
const sortByLastTime = sortBy(o => o.last_time)
const sortFn = (a, b) => -sortByLapsLength(a, b) || sortByLastTime(a, b)
data.sort(sortFn)
// show new order of `tx_id`s
console.log(data.map(o => o.tx_id))
sortBy() (more explanation at the link) accepts a function that selects a value as the sorting criteria of a given object. This value must be a string or a number. sortBy() then returns a function that, given two objects, will sort them in ascending order when passed to Array.prototype.sort(). sortFn() uses two of these functions with a logical OR || operator to employ short-circuiting behavior and sort first by laps.length (in descending order, thus the negation -), and then by last_time if two objects' laps.length are equal.
Its possible to sort an object array by theire values:
dados.sort(function(a, b) {
return a.last_time - b.last_time;
});
I'm struggling to get the sum of the subfiles. The code below currently returns the sum of a.txt and all its subfiles, supposing that the contents of a.txt is
1
b.txt
the contents of b.txt is
2
c.txt
and the contents of c.txt is
3
I'd like to also get the sum of b.txt and all of its subfiles, the sum of c.txt and all of its subfiles, and so on and so forth for all the files that exist. So the output would be: the sum of a.txt and its subfiles is sum, the sum of b.txt and its subfiles is sum, the sum of c.txt and its subfiles is sum, and so on...
My code below:
const fs = require('fs')
const file = 'a.txt'
let output = (file) => {
let data = fs.readFileSync(file, 'utf8')
.split('\n')
.reduce((array, i) => {
if (i.match(/.txt$/)) {
let intArr = array.concat(output(i))
return intArr
} else if (i.match(/^\d+$/)) {
array.push(parseInt(i, 10));
}
return array;
}, [])
return data
}
console.log(output(file))
const sum = output(file)
console.log(sum.reduce((a, b) => a + b, 0))
Also, any suggestions for improving this code are welcome.
This can be viewed as a pretty standard graph search. Your code starts to do that but there's a few places it can be changed to make it a little easier.
Below is a depth first search starting with a particular file and keeping track of a counts object. The function parses the file just like yours, adds the numbers the counts object. Then it recurses. When the recursion unwinds it add the resulting child's counts to the parents. In the end it returns the counts object which should have the total + subpages for all pages. It doesn't do any error checking for simplicity and it's not clear what should happen if two children both reference the same grandchild - should it be counted twice? Either was it should be easy to adjust.
I made mocked version of fs.readFileSync to the code would run in the snippet and be easier to see:
// fake fs for readFileSync
let fs = {
files: {
"a.txt": "1\nb.txt",
"b.txt": "2\nc.txt",
"c.txt": "3",
"d.txt": "2\n10\ne.txt\nf.txt",
"e.txt": "1",
"f.txt": "5\n7\ng.txt",
"g.txt": "1\na.txt"
},
readFileSync(file) { return this.files[file]}
}
function dfs(file, counts = {}) {
// parse a sinlge file into object
// {totals: sum_allthenumbers, files:array_of_files}
let data = fs.readFileSync(file, 'utf8').split('\n')
let {total, files} = data.reduce((a, c) => {
if(c.match(/^\d+$/)) a.total += parseInt(c)
else if(c.match(/.txt$/)) a.files.push(c)
return a
},{total: 0, files:[]})
// add the total counts for this file
counts[file] = total
// recurse on children files
for (let f of files) {
if (counts.hasOwnProperty(f)) continue // don't look at files twice if there are cycles
let c = dfs(f, counts)
counts[file] += c[f] // children return the counts object, add childs count to parent
}
// return count object
return counts
}
console.log("original files starting with a.txt")
console.log(dfs('a.txt'))
console.log("more involved graph starts with d.txt")
console.log(dfs('d.txt'))