I'm comparing performance between a few frameworks (namely ReactJS and AngularJS) versus a "vanilla HTML + JS". During this I came across absolutely abysmal performance with Internet Explorer (I've tested in IE9 and IE11 and they both exhibit performance issues but differently).
The original code is an HTML file but I've moved it to JSFiddle for the sake of sharing it here. If you'd like, I can post it as a GitHub Gist, instead.
Anyways, goal is to render a table with 5,000 items in it (representing files and folders). On my test machine, IE11 takes around 30 seconds for the initial rendering while Chrome/Safari/Firefox are in the 1.5–3 second range. If I look at just how long it takes to generate the HTML string (so not even DOM manipulation), that alone is about 15 seconds on IE11 plus another 15 for actual rendering.
Any thoughts as to what I'm doing wrong? Make sure you change the sampleSize from 100 to 5,000 once you want to see the actual results:
var sampleSize = 100;
to
var sampleSize = 5000;
Note: here's what I've already done to improve performance:
Changed the string concatenation of each row to using an array with a .join('') at the end which is a known performance issue with IE
Only a single DOM access with $(tblBody).html(nodes.join('')); rather than append one row at a time
The above two enhancements brought the initial rendering from 36s down to 30s.
Note 2: the code is that that f*ed up since it's still faster than either my ReactJS- or AngularJS-based solutions. So the main question is: what in the world is IE doing?
Related
Due to the reasons outlined in this question I am building my own client side search engine rather than using the ydn-full-text library which is based on fullproof. What it boils down to is that fullproof spawns "too freaking many records" in the order of 300.000 records whilst (after stemming) there are only about 7700 unique words. So my 'theory' is that fullproof is based on traditional assumptions which only apply to the server side:
Huge indices are fine
Processor power is expensive
(and the assumption of dealing with longer records which is just applicable to my case as my records are on average 24 words only1)
Whereas on the client side:
Huge indices take ages to populate
Processing power is still limited, but relatively cheaper than on the server side
Based on these assumptions I started of with an elementary inverted index (giving just 7700 records as IndexedDB is a document/nosql database). This inverted index has been stemmed using the Lancaster stemmer (most aggressive one of the two or three popular ones) and during a search I would retrieve the index for each of the words, assign a score based on overlap of the different indices and on similarity of typed word vs original (Jaro-Winkler distance).
Problem of this approach:
Combination of "popular_word + popular_word" is extremely expensive
So, finally getting to my question: How can I alleviate the above problem with a minimal growth of the index? I do understand that my approach will be CPU intensive, but as a traditional full text search index seems unusably big this seems to be the only reasonable road to go down on. (Pointing me to good resources or works is also appreciated)
1 This is a more or less artificial splitting of unstructured texts into small segments, however this artificial splitting is standardized in the relevant field so has been used here as well. I have not studied the effect on the index size of keeping these 'snippets' together and throwing huge chunks of texts at fullproof. I assume that this would not make a huge difference, but if I am mistaken then please do point this out.
This is a great question, thanks for bringing some quality to the IndexedDB tag.
While this answer isn't quite production ready, I wanted to let you know that if you launch Chrome with --enable-experimental-web-platform-features then there should be a couple features available that might help you achieve what you're looking to do.
IDBObjectStore.openKeyCursor() - value-free cursors, in case you can get away with the stem only
IDBCursor.continuePrimaryKey(key, primaryKey) - allows you to skip over items with the same key
I was informed of these via an IDB developer on the Chrome team and while I've yet to experiment with them myself this seems like the perfect use case.
My thought is that if you approach this problem with two different indexes on the same column, you might be able to get that join-like behavior you're looking for without bloating your stores with gratuitous indexes.
While consecutive writes are pretty terrible in IDB, reads are great. Good performance across 7700 entries should be quite tenable.
I'm building a mobile Boggle-type web app with node.js. I'm trying find a more efficient way to load/build a massive dictionary (180,000+ words). I currently have it working but the load time is slightly long. Users have to wait about 15 seconds for the entire thing to build and some users time-out before the entire thing has loaded. I was wondering if anyone has any tips to improve the speed.
The way I'm currently doing this (which is probably completely inefficient):
I broke down the list into 26 arrays, one for each letter, and stuck each array in it's own javascript file.
When the app loads it runs a recursive function which gets the next js file and loads in the array from it overwriting the previous one. And then, it loops through the entire array and loads each new word into my Trie datastructure.
The files with the arrays in them combined are around 2mb. After being combined the datastructure itself clocks in at round 12mb, which isn't so bad on a desktop computer, but does weigh down a couple of my users' smartphones.
This needs to be built on the client side to allow instant lookups. The way I'm doing it currently works but I know there has to be a better way.
The other other tactic is to convert your recursive code into non-recursive code that uses an explicit stack, saving only the objects you actually need.
Have you tried profiling your code?
To answer the question of the fastest loading time, are you doing it in this fashion? (aka, without more code, we can't possibly know)
function LoadFiles(fileArray){
file = fileArray.slice(); //get the first file.
$.ajax(file).success(function(data){
/* yes, my object is a little funky, I'm focused on writing pseudocode */
wordLibraryAdd(data);
if (fileArray.length) // on a zero length quit processing
setTimeout(function(){ LoadFiles(fileArray) }, 50) //a 50 ms buffer between each loading isn't bad.
})
}
I've actually obtained a job to test a website that is somehow struggling with its performance. In Detail I should pick out different parts of the document and check out their waiting->load->finished states. Since I'm familiar with firebug i've tested many sites as a whole. But now i need to know when starts a special DIV rendering, when is it finished and how long did it wait before. The goal is to find out wich part of the website took how long until painted.
I doubt you'll be able to measure individual parts of a page they way you want. I would approach this by removing parts of the page, measuring the subsetted page, and inferring from those measurements which parts are slowest.
Keep in mind that this sort of logic may not be be correct. For example, you may have a page with two parts. You may measure the two parts independently by creating subsetted pages. The times of the two parts added together will not equal the time for the total. And one part seeming slower than the other doesn't mean that when combined, the "slow" part is responsible for the bulk of the time. Browsers are very complicated machines, and they don't always operate the way you imagine.
AFAIK, speed of printing a div is not something you should worry about. If there is some sererside language, then i would suggest assiging a variable to current time before a portion starts and compare it to the time right after the portion ends. You can subtract them to get the time it took do work that portion out.
If there is javascript involved, then i would suggest chrome dev tool's timeline panel. It shows everything, from css recalculation and printing of the style/div to ajax/(if using) db queries..
As you are familiar with Firebug you can use HttpWatch tool for recording the exact request and response time of all specific http requests made by your browser.
so when a special DIV rendering starts this tool will capture the request and response time for the same.
http://www.httpwatch.com/
All the best!
I have a simple piece of data that I'm storing on a server, as a plain string. It is kind of ridiculous, but it looks like this:
name|date|grade|description|name|date|grade|description|repeat for a long time
this string can be up to 1.4mb in size. The idea is that it's a bunch of student records, just strung together with a simple pipe delimeter. It's a very poor serialization method.
Once this massive string is pushed to the client, it is split along the pipes into student records again, using javascript.
I've been timing how long it takes to create, and split, these strings on the client side. The times are actually quite good, the slowest run I've seen on a few different machines is 0.2 seconds for 10,000 'student records', which has a final string size of ~1.4mb.
I realize this is quite bizarre, just wondering if there are any inherent problems with creating and splitting such large strings using javascript? I don't know how different browsers implement their javascript engines. I've tried this on the 'major' browsers, but don't know how this would perform on earlier versions of each.
Yeah looking for any comments on this, this is more for fun than anything else!
Thanks
String splitting for 1.4mb data is not a problem for decent machines, instead you should worry about the internet connection speed of your users. I've tried to do spell check with 800 kb dictionary (which is half of your data), main issue was loading time.
But looks like your students records data could be put in database, and might not need to load everything at loading time, So, how about do a pagination to show user records or use ajax to request to search certain user names?
If it's a really large string it may pay to continuously slice the string with 'string'.slice(from, to) to only process a smaller subset, appending all of the individual items to the end of the output with list.push() or something similar might work.
String split methods are probably the most efficient way of doing this though, even in IE. Processing individual characters using string.charAt(x) is extremely slow and will often show a security error as it stalls the browser. Using string split methods would certainly be much faster than splitting using regular expressions.
It may also be possible to encode the data using a JSON array, some newer browsers such as IE8/Webkit/FF3.5 have fast JSON parsing built in using JSON.parse(data). But using eval(JSON) may overflow the browser if there's enough data, so is probably a bad idea. It may pay to compare for performance though.
A much better approach in a lot of cases is to use AJAX and only load some of the data at once from the server, which would also save download time.
Besides S. Mark's excellent comments about local vs. x-fer speed and the tip to re-encode using AJAX, I suggest a (longterm) move away from JavaScript in the Browser (assuming that's were it runs) to either a non-browser implementation of JS (or possibly another language).
A browser based JS seems a week link in a data-x-fer chain and nothing I would want to run unmonitored, since the browsers are upgraded from time to time and breaking your JS-x-fer might be an unanticipates side effect!
I have a form with a textarea that can contain large amounts of content (say, articles for a blog) edited using one of a number of third party rich text editors. I'm trying to implement something like an autosave feature, which should submit the content through ajax if it's changed. However, I have to work around the fact that some of the editors I have as options don't support an "isdirty" flag, or an "onchange" event which I can use to see if the content has changed since the last save.
So, as a workaround, what I'd like to do is keep a copy of the content in a variable (let's call it lastSaveContent), as of the last save, and compare it with the current text when the "autosave" function fires (on a timer) to see if it's different. However, I'm worried about how much memory that could take up with very large documents.
Would it be more efficient to store some sort of hash in the lastSaveContent variable, instead of the entire string, and then compare the hash values? If so, can you recommend a good javascript library/jquery plugin that implements an appropriate hash for this requirement?
In short, you're better off just storing and comparing the two strings.
Computing a proper hash is not cheap. For example, check out the pseudo code or an actual JavaScript implementation for computing the MD5 hash of a string. Furthermore, all proper hash implementations will require enumerating the characters of the string anyway.
Furthermore, in the context of modern computing, a string has to be really, really long before comparing it against another string is slow. What you're doing here is effectively a micro-optimization. Memory won't be an issue, nor will the CPU cycles to compare the two strings.
As with all cases of optimizing: check that this is actually a problem before you solve it. In a quick test I did, computing and comparing 2 MD5 sums took 382ms. Comparing the two strings directly took 0ms. This was using a string that was 10000 words long. See http://jsfiddle.net/DjM8S.
If you really see this as an issue, I would also strongly consider using a poor-mans comparison; and just comparing the length of the 2 strings, to see if they have changed or not, rather than actual string comparisons.
..
An MD5 hash is often used to verify the integrity of a file or document; it should work for your purposes. Here's a good article on generating an MD5 hash in Javascript.
I made a JSperf rev that might be useful here for performance measuring. Please add different revisions and different types of checks to the ones I made!
http://jsperf.com/long-string-comparison/2
I found two major results
When strings are identical performance is murdered; from ~9000000 ops/s to ~250 ops/sec (chrome)
The 64bit version of IE9 is much slower on my PC, results from the same tests:
+------------+------------+
| IE9 64bit | IE9 32bit |
+------------+------------+
| 4,270,414 | 8,667,472 |
| 2,270,234 | 8,682,461 |
+------------+------------+
Sadly, jsperf logged both results as simply "IE 9".
Even a precursory look at JS MD5 performance tells me that it is very, very slow (at least for large strings, see http://jsperf.com/md5-shootout/18 - peaks at 70 ops/sec). I would want to go as far as to try AJAXing the hash calculation or the comparison to the backend but I don't have time to test, sorry!