More efficient way to build/transfer a large data structure?

More efficient way to build/transfer a large data structure? - javascript

I'm building a mobile Boggle-type web app with node.js. I'm trying find a more efficient way to load/build a massive dictionary (180,000+ words). I currently have it working but the load time is slightly long. Users have to wait about 15 seconds for the entire thing to build and some users time-out before the entire thing has loaded. I was wondering if anyone has any tips to improve the speed.
The way I'm currently doing this (which is probably completely inefficient):
I broke down the list into 26 arrays, one for each letter, and stuck each array in it's own javascript file.
When the app loads it runs a recursive function which gets the next js file and loads in the array from it overwriting the previous one. And then, it loops through the entire array and loads each new word into my Trie datastructure.
The files with the arrays in them combined are around 2mb. After being combined the datastructure itself clocks in at round 12mb, which isn't so bad on a desktop computer, but does weigh down a couple of my users' smartphones.
This needs to be built on the client side to allow instant lookups. The way I'm doing it currently works but I know there has to be a better way.

The other other tactic is to convert your recursive code into non-recursive code that uses an explicit stack, saving only the objects you actually need.
Have you tried profiling your code?

To answer the question of the fastest loading time, are you doing it in this fashion? (aka, without more code, we can't possibly know)
function LoadFiles(fileArray){
file = fileArray.slice(); //get the first file.
$.ajax(file).success(function(data){
/* yes, my object is a little funky, I'm focused on writing pseudocode */
wordLibraryAdd(data);
if (fileArray.length) // on a zero length quit processing
setTimeout(function(){ LoadFiles(fileArray) }, 50) //a 50 ms buffer between each loading isn't bad.
})
}

Related

Run Quicksort and Mergesort in parallel to measure time - JavaScript?

I saw these two JS code blocks for Quick sort and Merge sort
and I would like to practise some parallel execution to measure the time each code needs to sort the bars. I know JS works in single thread and single process but I also think there is a good way how to measure the time.
How can I do that? Is it with some special library or some trick in JavaScript?

I know JS works in single thread and single process...
Nope. Not even on browsers. The language has nothing to say about threading. It's a matter of environment. For instance, on browsers, there's one main UI thread and as many web worker threads as you want to create.
But I wouldn't use multi-threading to compare the time two different sorting algorithms take. Instead, just test them separately, ensuring nothing else is going on in the background. Tool recommendations are off-topic for SO, but you don't need any particular tool here. Do repeated tests, with warmups, and average the results. Try to come up with ways that you could make the data suit one kind of sort or the other, and test both ways on that data as well as random data. Ensure that measuring the times isn't interfering with the sorting itself. Ensure that the sorted results are correct (otherwise it doesn't matter how fast it runs). Etc.

Most efficient way to populate/distribute indexedDB [datastore] with 350000 records

So, I have a main indexedDB objectstore with around 30.000 records on which I have to run full text search queries. Doing this with the ydn fts plugin this generates a second objectstore with around 300.000 records. Now, as generating this 'index' datastore takes quite long I figured it had be faster to distribute the content of the index as well. This in turn generated a zip file of around 7MB which after decompressing on the client side gives more than 40MB of data. Currently I loop over this data inserting it one by one (async, so callback time is used to parse next lines) which takes around 20 minutes. As I am doing this in the background of my application through a webworker this is not entirely unacceptable, but still feels hugely inefficient. Once it has been populated the database is actually fast enough to be even used on mid to high end mobile devices, however the population time of 20 minutes to one hour on mobile devices is just crazy. Any suggestions how this could be improved? Or is the only option minimizing the number of records? (which would mean writing my own full text search... not something I would look forward to)

Your data size is quite large for mobile browser. Unless user constantly using your app, it is not worth sending all data to client. You should use server side for full text search, while catching opportunistically as illustrated by this example app. In this way, user don't have to wait for full text search indexing.
Full-text search require to index all tokens (words) except some stemming words. Stemming is activated only when lang is set to en. You should profile your app which parts is taking time. I guess browser is taking most of the time, in that case, you cannot do much optimization other than parallelization. Sending indexed data (as you described above) will not help much (but please confirm by comparing). Also Web worker will not help. I assume your app have no problem with slow respond due to indexing.
Do you have other complaint other than slow indexing time?

HTML canvas performance when drawing lots of lines

I'm currently writing an application that displays a lot, and I mean, a lot of 2D paths (made of hundreds, thousands of tiny segments) on an HTML5 canvas. Typically, a few million points. These points are downloaded from a server into a binary ArrayBuffer.
I probably won't be using that many points in the real world, but I'm kinda interested in how I could improve the performance. You can call it curiosity if you want ;)
Anyway, I've tested the following solutions :
Using gl.LINES or gl.LINE_STRIP with WebGL, and compute everything in shaders on the GPU. Currently the fastest, can display up to 10M segments without flinching on my Macbook Air. But there are very strict constraints for the binary format if you want to avoid processing things in JavaScript, which is slow.
Using Canvas2D, draw a huge path with all the segments in one stroke() call. When I'm getting past 100k points, the page freezes for a few seconds before the canvas is updated. So, not working here.
Using Canvas2D, but draw each path with its own stroke() call. Despite what others have been saying on the internet, this is much faster than drawing everything in one call, but still a lot slower than WebGL. Things start to get bad when I reach about 500k segments.
The two Canvas2D solutions require looping through all the points of all the paths in JavaScript, so this is quite slow. Do you know of any method(s) that could improve JavaScript's iteration speed in an ArrayBuffer, or processing speed in general?
But, what's strange is, the screen isn't updated immediately after all the canvas draw calls have finished. When I start getting to the performance limit, there is a noticeable delay between the end of the draw calls and the update of the canvas. Do you have any idea where that comes from, and is there a way to reduce it?

First, WebGL was a nice and hype idea, but the amount of processing required to decode and display the binary data simply doesn't work in shaders, so I ruled it out.
Here are the main bottlenecks I've encountered. Some of them are quite common in general programming, but it's a good idea to remember them :
It's best to use multiple, small for loops
Create variables and closures at the highest level possible, don't create them inside the for loops
Render your data in chunks, and use setTimeout to schedule the next chunk after a few milliseconds : that way, the user will still be able to use the UI
JavaScript objects and arrays are fast and cheap, use them. It's best to read/write them in sequential order, from the beginning to the end.
If you don't write data sequentially in an array, use objects (because non-sequential read-writes are cheap for objects) and push the indexes into an index array. I used a SortedList implementation to keep the indexes sorted, which I found here. Overhead was minimal (about 10-20% of the rendering time), and in the end it was well worth it.
That's about everything I remember. If I do find something else, I'll update this answer!

Mass astar pathfinding

I'm trying to create a tower defence game in Javascript.
It's all going well apart from the pathfinding..
I'm using the astar code from this website: http://www.briangrinstead.com/blog/astar-search-algorithm-in-javascript which uses a binary heap (which I believe is fairly optimal)
The problem i'm having is I want to allow people to block the path of the "attackers". This means that each "attacker" needs to be able to find its way to the exit on its own (as someone could just cut off a single "attacker" and it would need to find its own way to the exit). Now 5/6 attackers can pathfind at any one time with no issue. But say the path is blocked for 10+ attackers, all 10 of them will need to fire its pathfinding script at the same time which just drops the FPS to about 1/2 per sec.
This must be a common problem for anyone who has a lot of entities pathfinding at anyone time, so I imagine there must be a better way than my approach.
So my question is: What is the best way to implement mass pathfinding algorithm to multiple "bots" in the most efficient way.
Thanks,
James

Use Anti-objects, this is the only way to get cheap pathfinding, afaik :
http://www.cs.colorado.edu/~ralex/papers/PDF/OOPSLA06antiobjects.pdf
Anti-object basically mean that instead of bots having individual ai, you will have one "swarm ai", which is bound to your game map.
p.s.: Here is another link about pathfinding in general (possibly the best online reference available):
http://theory.stanford.edu/~amitp/GameProgramming/index.html

Just cache the result.
Store the path as the value in a hash table (object), give each node a UUID, concatenate the UUIDs to form a unique hash table key and insert the path into it.
When you retrieve the path back out of the hash table, walk the path, and see if it's still valid, if not, recalculate and insert the new one back in.
There are many optimization that you can do :)
Like c69 said swarm AI or hive mind come to mind :P

Creating and parsing huge strings with javascript?

I have a simple piece of data that I'm storing on a server, as a plain string. It is kind of ridiculous, but it looks like this:
name|date|grade|description|name|date|grade|description|repeat for a long time
this string can be up to 1.4mb in size. The idea is that it's a bunch of student records, just strung together with a simple pipe delimeter. It's a very poor serialization method.
Once this massive string is pushed to the client, it is split along the pipes into student records again, using javascript.
I've been timing how long it takes to create, and split, these strings on the client side. The times are actually quite good, the slowest run I've seen on a few different machines is 0.2 seconds for 10,000 'student records', which has a final string size of ~1.4mb.
I realize this is quite bizarre, just wondering if there are any inherent problems with creating and splitting such large strings using javascript? I don't know how different browsers implement their javascript engines. I've tried this on the 'major' browsers, but don't know how this would perform on earlier versions of each.
Yeah looking for any comments on this, this is more for fun than anything else!
Thanks

String splitting for 1.4mb data is not a problem for decent machines, instead you should worry about the internet connection speed of your users. I've tried to do spell check with 800 kb dictionary (which is half of your data), main issue was loading time.
But looks like your students records data could be put in database, and might not need to load everything at loading time, So, how about do a pagination to show user records or use ajax to request to search certain user names?

If it's a really large string it may pay to continuously slice the string with 'string'.slice(from, to) to only process a smaller subset, appending all of the individual items to the end of the output with list.push() or something similar might work.
String split methods are probably the most efficient way of doing this though, even in IE. Processing individual characters using string.charAt(x) is extremely slow and will often show a security error as it stalls the browser. Using string split methods would certainly be much faster than splitting using regular expressions.
It may also be possible to encode the data using a JSON array, some newer browsers such as IE8/Webkit/FF3.5 have fast JSON parsing built in using JSON.parse(data). But using eval(JSON) may overflow the browser if there's enough data, so is probably a bad idea. It may pay to compare for performance though.
A much better approach in a lot of cases is to use AJAX and only load some of the data at once from the server, which would also save download time.

Besides S. Mark's excellent comments about local vs. x-fer speed and the tip to re-encode using AJAX, I suggest a (longterm) move away from JavaScript in the Browser (assuming that's were it runs) to either a non-browser implementation of JS (or possibly another language).
A browser based JS seems a week link in a data-x-fer chain and nothing I would want to run unmonitored, since the browsers are upgraded from time to time and breaking your JS-x-fer might be an unanticipates side effect!

Develop Reference

JavaScript is the programming language of the Web.