I have a decent sized data-grid (basically an interactive table) around 600 rows.
I noticed that binding KO to this grid actually takes substantial amount of time, esp. during the databind. On older browser the situation is even worse, the processor is peaked for almost a minute.
The biggest chunk of performance block seems to be coming from the line that performs the databind, NOTE: This is the initial databind, so many of reply to handling large update does not seem to be applicable.
Also mapping plugin was used to convert json objects into viewmodels on the fly. However the line that performs the mapping itself did not seem to take up too much time compared to the line that databinds.
Unfortunately paging is out of the question due to the requirement. Is there any general tips/pointers on optimizing larger view models and KO?
I had a similar issue and posted in the Knockout Google group. Michael Best recommended trying some custom bindings.
Since you're doing edits, his knockout-table binding won't work for you. But you might try the knockout-repeat binding. It's supposed to be faster than Knockout's native foreach (at the cost of some additional complexity in your HTML). The final option is to create your own binding that builds your grid all in one go. In theory, building the entire grid in memory and stuffing it into the DOM complete will be faster than modifying the DOM in discrete bits.
KoGrid is probably not what you want, but there are probably some hints and tips embedded in the source.
One advice when using the mapping plugin is to only map the properties that you need. Mapping all properties to observables in large data sets can be a real performance killer.
Related
Due to the reasons outlined in this question I am building my own client side search engine rather than using the ydn-full-text library which is based on fullproof. What it boils down to is that fullproof spawns "too freaking many records" in the order of 300.000 records whilst (after stemming) there are only about 7700 unique words. So my 'theory' is that fullproof is based on traditional assumptions which only apply to the server side:
Huge indices are fine
Processor power is expensive
(and the assumption of dealing with longer records which is just applicable to my case as my records are on average 24 words only1)
Whereas on the client side:
Huge indices take ages to populate
Processing power is still limited, but relatively cheaper than on the server side
Based on these assumptions I started of with an elementary inverted index (giving just 7700 records as IndexedDB is a document/nosql database). This inverted index has been stemmed using the Lancaster stemmer (most aggressive one of the two or three popular ones) and during a search I would retrieve the index for each of the words, assign a score based on overlap of the different indices and on similarity of typed word vs original (Jaro-Winkler distance).
Problem of this approach:
Combination of "popular_word + popular_word" is extremely expensive
So, finally getting to my question: How can I alleviate the above problem with a minimal growth of the index? I do understand that my approach will be CPU intensive, but as a traditional full text search index seems unusably big this seems to be the only reasonable road to go down on. (Pointing me to good resources or works is also appreciated)
1 This is a more or less artificial splitting of unstructured texts into small segments, however this artificial splitting is standardized in the relevant field so has been used here as well. I have not studied the effect on the index size of keeping these 'snippets' together and throwing huge chunks of texts at fullproof. I assume that this would not make a huge difference, but if I am mistaken then please do point this out.
This is a great question, thanks for bringing some quality to the IndexedDB tag.
While this answer isn't quite production ready, I wanted to let you know that if you launch Chrome with --enable-experimental-web-platform-features then there should be a couple features available that might help you achieve what you're looking to do.
IDBObjectStore.openKeyCursor() - value-free cursors, in case you can get away with the stem only
IDBCursor.continuePrimaryKey(key, primaryKey) - allows you to skip over items with the same key
I was informed of these via an IDB developer on the Chrome team and while I've yet to experiment with them myself this seems like the perfect use case.
My thought is that if you approach this problem with two different indexes on the same column, you might be able to get that join-like behavior you're looking for without bloating your stores with gratuitous indexes.
While consecutive writes are pretty terrible in IDB, reads are great. Good performance across 7700 entries should be quite tenable.
The project requirements are odd for this one, but I'm looking to get some insight...
I have a CSV file with about 12,000 rows of data, approximately 12-15 columns. I'm converting that to a JSON array and loading it via JSONP (has to run client-side). It takes many seconds to do any kind of querying on the data set to returned a smaller, filtered data set. I'm currently using JLINQ to do the filtering, but I'm essentially just looping through the array and returning a smaller set based on conditions.
Would webdb or indexeddb allow me to do this filtering significantly faster? Any tutorials/articles out there that you know of that tackles this particular type of issue?
http://square.github.com/crossfilter/ (no longer maintained, see https://github.com/crossfilter/crossfilter for a newer fork.)
Crossfilter is a JavaScript library for exploring large multivariate
datasets in the browser. Crossfilter supports extremely fast (<30ms)
interaction with coordinated views, even with datasets containing a
million or more records...
This reminds me of an article John Resig wrote about dictionary lookups (a real dictionary, not a programming construct).
http://ejohn.org/blog/dictionary-lookups-in-javascript/
He starts with server side implementations, and then works on a client side solution. It should give you some ideas for ways to improve what you are doing right now:
Caching
Local Storage
Memory Considerations
If you require loading an entire data object into memory before you apply some transform on it, I would leave IndexedDB and WebSQL out of the mix as they typically both add to complexity and reduce the performance of apps.
For this type of filtering, a library like Crossfilter will go a long way.
Where IndexedDB and WebSQL can come into play in terms of filtering is when you don't need to load, or don't want to load, an entire dataset into memory. These databases are best utilized for their ability to index rows (WebSQL) and attributes (IndexedDB).
With in browser databases, you can stream data into a database one record at a time and then cursor through it, one record at a time. The benefit here for filtering is that this you means can leave your data on "disk" (a .leveldb in Chrome and .sqlite database for FF) and filter out unnecessary records either as a pre-filter step or filter in itself.
I found some interesting post about memory usage and CPU usage here on Stack Overflow, however none of them had a direct approach to the apparently simple question:
As a generic strategy in a JavaScript app, is it better in terms of performances to use memory (storing data) or CPU (recalculating data each time)?
I refer to javascript usage in common browsers environment (FF, Chrome, IE>8)
Does anybody have a more direct and documented answer to this?
--- EDIT ---
Ok, I understand the question is very generic. I try to reduce the "scope".
Reading your answer I realized that the real question is: "how to undestand the memory limit under which my javascript code still has good performances?".
Environment: common browsers environment (FF, Chrome, IE>8)
Functions I use are not very complex math functions, but can produce quite a huge amount of data (300-400kb) and I wanted to understand if it was better to recalculate them every time or just store results in variables.
Vaguely related - JS in browsers is extremely memory hungry when you start using large objects / arrays. If you think about binary data produced by canvas elements, or other rich media APIs, then clearly you do not want to be storing this data in traditional ways - disregarding performance issues, which are also important.
From the MDN article talking about JS Typed Arrays:
As web applications become more and more powerful, adding features such as audio and video manipulation, access to raw data using WebSockets, and so forth, it has become clear that there are times when it would be helpful for JavaScript code to be able to quickly and easily manipulate raw binary data.
Here's a JS Perf comparison of arrays, and another looking at canvas in particular, so you can get some direct examples on how they work. Hope this is useful.
It's just another variation on the size/performance tradeoff. Storing values increases size, recalculating decreases performance.
In some cases, the calculation is complex, and the memory usage is small. This is particularly true for maths functions.
In other cases, the memory needed would be huge, and calculations are simple. This is particularly true when the output would be a large data structure, and you can calculate an element in the structure easily.
Other factors you will need to take into account is what resources are available. If you have very limited memory then you may have no choice, and if it is a background process then perhaps using lots of memory is not desirable. If the calculation needs to be done very often then you are more likely to store the value than if it's done once a month...
There are a lot of factors in the tradeoff, and so there is no "generic" answer, only a set of guidelines you can follow as each case arises.
What are some of the better solutions to handling large datasets (100K) on the client with JavaScript. In particular, if you have multi-column sort and search capabilities, how do you handle fetching (and pre-fetching) the data, client side model binding (for display), and caching the data.
I would imagine a good solution would be doing some thoughtful work in the background. For instance, initially, if the table was displaying N items, it might fetch 2N items, return the data for the user, and then go fetch the next 2N items in the background (even if the user hasn't requested this). As the user made search/sort changes, it would throw out (or maybe even cache the initial base case), and do similar functionality.
Can you share the best solutions you have seen?
Thanks
Use a jQuery table plugin like DataTables: http://datatables.net/
It supports server-side processing for sorting, filtering, and paging. And it includes pipelining support to prefetch the next x pages of records: http://www.datatables.net/examples/server_side/pipeline.html
Actually the DataTables plugin works 4 different ways:
1. With an HTML table, so you could send down a bunch of HTML and then have all the sorting, filtering, and paging work client-side.
2. With a JavaScript array, so you could send down a 2D array and let it create the table from there.
3. Ajax source - which is not really applicable to you.
4. Server-side, where you send data in JSON format to an empty table and let DataTables take it from there.
SlickGrid does exactly what you're looking for. (Demo)
Using the AJAX data store, SlickGrid can handle millions of rows without flinching.
Since you tagged this with Ext JS, I'll point you to Ext.ux.LiveGrid if you haven't already seen it. The source is available, so you might have a look and see how they've addressed this issue. This is a popular and widely-used extension in the Ext JS world.
With that said, I personally think (virtually) loading that much data is useless as a user experience. Manually pulling a scrollbar around (jumping hundreds of records per pixel) is a far inferior experience to simply typing what you want. I'd much prefer some robust filtering/searching instead of presenting that much data to the user.
What if you went to Google and instead of a search box, it just loaded the entire internet into one long virtual list that you had to scroll through to find your site... :)
It depends on how the data will be used.
For a large dataset, where the browser's Find function was adequate, just returning a straight HTML table was effective. It takes a while to load, but the display is responsive on older, slower clients, and you never have to worry about it breaking.
When the client did the sorting and search, and you're not showing the entire table at once, I had the server send tab-delimited tables through XMLHTTPRequest, parsed them in the browser with list = String.split('\n'), and updated the display with repeated calls to $('node').innerHTML = 'blah'. The JS engine can store long strings pretty efficiently. That ran a lot faster on the client than showing, hiding, and rearranging DOM nodes. Creating and destroying new DOM nodes on the fly turned out to be really slow. Splitting each line into fields on-demand seems to work; I haven't experimented with that degree of freedom.
I've never tried the obvious pre-fetch & background trick, because these other methods worked well enough.
Check out this comprehensive list of data grids and
spreadsheets.
For filtering/sorting/pagination purposes you may be interested in great Handsontable, or DataTables as a free alternative.
If you need simply display huge list without any additional features Clusterize.js should be sufficient.
I have a big data should be shown in a Table.
I use javascript to fill the table instead of priting in HTML.
Here is a sample data I use:
var aUsersData = [[1, "John Smith", "...."],[...],.......];
the problem is that Firefox warns me that "There is a heavy script running, should i continue or stop?"
I don't want my visitors see the warning. how can I make performance better? jQuery? pure script? or another library you suggest?
you can use the method here to show a progress bar and not have the browser lock up on you.
http://www.kryogenix.org/days/2009/07/03/not-blocking-the-ui-in-tight-javascript-loops
I am using almost that method on this page:
http://www.bacontea.com/bb/
to get the browser not to hang and show feedback while loading.
jQuery doesn't usually make things faster, just easier. I use jQuery to populate tables, but they're pretty small (at most 2 columns by 40 rows). How much data are you populating into the table? This could be the limiting factor.
If you post some of your table-populating code we can see if it's possible to improve performance in any way.
My suspicion is that it won't make much difference either way, although sometimes adding a layer of abstraction like jQuery can impact performance. Alternately, the jQuery team may have found an obscure, really fast way of doing something that you would have done in a more obvious, but slower, way if you weren't using it. It all depends.
Two suggestions that apply regardless:
First, since you're already relying on your users having JavaScript enabled, I'd use paging and possibly filtering as well. My suspicion is that it's building the table that takes the time. Users don't like to scroll through really long tables anyway, adding some paging and filtering features to the page to only show them the entries from the array they really want to see may help quite a lot.
Second, when building the table, the way you're do it can have a big impact on performance. It almost seems a bit counter-intuitive, but with most browsers, building up a big string and then setting the innerHTML property of a container is usually faster than using the DOM createElement function over and over to create each row and cell. The fastest overall tends to be to push strings onto an array (rather than repeated concatenation) and then join the array:
var markup, rowString;
markup = [];
markup.push("<table><tbody>");
for (index = 0, length = array.length; index < length; ++index) {
rowString = /* ...code here to build a row as a string... */;
markup.push(rowString);
}
markup.push("</tbody></table>");
document.getElementById('containerId').innerHTML = markup.join("");
(That's raw JavaScript/DOM; if you're already using jQuery and prefer, that last line can be rewritten $('#containerId').html(markup.join(""));)
This is faster than using createElement all over the place because it allows the browser to process the HTML by directly manipulating its internal structures, rather than responding to the DOM API methods layered on top of them. And the join thing helps because the strings are not constantly being reallocated, and the JavaScript engine can optimize the join operation at the end.
Naturally, if you can use a pre-built, pre-tested, and pre-optimised grid, you may want to use that instead -- as it can provide the paging, filters, and fast-building required if it's a good one.
You can try a JS templating engine like PURE (I'm the main contributor)
It is very fast on all browsers, and keeps the HTML totally clean and separated from the JS logic.
If you prefer the <%...%> type of syntax, there are plenty of other JS template engines available.