WebGL Buffer Size Limit - javascript

I just ran in a small issue with WebGL today, while doing a project on point set visualisation. I understand there is a index limit in drawElements, due to the indexes being 16-bit integers. According to this post, however, there isn't for drawArrays, which I confirmed by being able to send some 400k points to the GPU.
The thing is, once I tried with 400k, I wanted to explore the possibilities of WebGL, and I tried with a 3M vertices model. Bang! Nothing gets displayed, and the WebGL inspector shows no drawArrays call.
Are you aware of some kind of limit for direct drawArray calls?

It looks like the same question is already discussed/answered here: Is there a limit of vertices in WebGL?. In that thread, the post by brainjam says that he discovered that drawArrays was not limited to 65k.

It sounds like you've got an outdated driver. The definition of drawArrays():
void drawArrays(enum mode, int first, long count)
The count elements is a long integer, that would mean at least 2^32 Elements in 32-bit architectures and 2^64 on 64-bit archs.
Remember that, unlike what anyone could presume, both Chrome/Chromium and Firefox use Direct3D as the underlying technology for WebGL on windows.

Related

Why does WebAssembly.Memory take `initial` and `maximum` in units of number of pages, rather than number of bytes

I will almost always be computing my memory needs in the unit of bytes. I wouldn't know how large a "page" is, I had to look it up (MDN says it's 64KB). It's not obvious that the size of a page even should be a fixed (documentable) platform-independent constant. I'm unable to source a reason or even an attestation for the page size being that in MDN's spec link.
I can only think of really bad reasons for it to be this way and I want to figure out whether it really is that bad.
The page size for wasm is documented in the spec: https://webassembly.github.io/spec/core/exec/runtime.html#memory-instances
The value is 64k rather than 640k.
WebAssembly itself is designed to be a compiler target rather than hand written, so normally the toolchain will set this value rather than a human. As it happens, if you use clang, or emscripten then the memory size, as specified on the command line, is actually in bytes rather than pages: https://lld.llvm.org/WebAssembly.html#cmdoption-initial-memory
Wasm memory size can only be chosen in steps of pages (64 KiB), because that generally makes memory bounds checks using hardware virtual memory techniques more feasible.

Compare sound between source and microphone in JavaScript

I'm working about audio but I'm a newbie in this area. I would like to matching sound from microphone to my source audio(just only 1 sound) like Coke Ads from Shazam. Example Video (0.45 minute) However, I want to make it on website by JavaScript. Thank you.
Building something similar to the backend of Shazam is not an easy task. We need to:
Acquire audio from the user's microphone (easy)
Compare it to the source and identify a match (hmm... how do... )
How can we perform each step?
Aquire Audio
This one is a definite no biggy. We can use the Web Audio API for this. You can google around for good tutorials on how to use it. This link provides some good fundametal knowledge that you may want to understand when using it.
Compare Samples to Audio Source File
Clearly this piece is going to be an algorithmic challenge in a project like this. There are probably various ways to approach this part, and not enough time to describe them all here, but one feasible technique (which happens to be what Shazam actually uses), and which is also described in greater detail here, is to create and compare against a sort of fingerprint for smaller pieces of your source material, which you can generate using FFT analysis.
This works as follows:
Look at small sections of a sample no more than a few seconds long (note that this is done using a sliding window, not discrete partitioning) at a time
Calculate the Fourier Transform of the audio selection. This decomposes our selection into many signals of different frequencies. We can analyze the frequency domain of our sample to draw useful conclusions about what we are hearing.
Create a fingerprint for the selection by identifying critical values in the FFT, such as peak frequencies or magnitudes
If you want to be able to match multiple samples like Shazam does, you should maintain a dictionary of fingerprints, but since you only need to match one source material, you can just maintain them in a list. Since your keys are going to be an array of numerical values, I propose that another possible data structure to quickly query your dataset would be a k-d tree. I don't think Shazam uses one, but the more I think about it, the closer their system seems to an n-dimensional nearest neighbor search, if you can keep the amount of critical points consistent. For now though, just keep it simple, use a list.
Now we have a database of fingerprints primed and ready for use. We need to compare them against our microphone input now.
Sample our microphone input in small segments with a sliding window, the same way we did our sources.
For each segment, calculate the fingerprint, and see if it matches close to any from storage. You can look for a partial match here and there are lots of tweaks and optimizations you could try.
This is going to be a noisy and inaccurate signal so don't expect every segment to get a match. If lots of them are getting a match (you will have to figure out what lots means experimentally), then assume you have one. If there are relatively few matches, then figure you don't.
Conclusions
This is not going to be an super easy project to do well. The amount of tuning and optimization required will prove to be a challenge. Some microphones are inaccurate, and most environments have other sounds, and all of that will mess with your results, but it's also probably not as bad as it sounds. I mean, this is a system that from the outside seems unapproachably complex, and we just broke it down into some relatively simple steps.
Also as a final note, you mention Javascript several times in your post, and you may notice that I mentioned it zero times up until now in my answer, and that's because language of implementation is not an important factor. This system is complex enough that the hardest pieces to the puzzle are going to be the ones you solve on paper, so you don't need to think in terms of "how can I do X in Y", just figure out an algorithm for X, and the Y should come naturally.

Is there a way to find the maximum size of localStorage for Chrome Extension

I'm writing a Google Chrome extension. I know that Chrome currently sets the limit of 5MB on the maximum allowed size of localStorage. But I'm curious if there's any way to get this from the Chrome itself, anything like a JS constant/global variable?
PS. I just hate to hard-code this value in case they change it in the future.
We assume that one character equals one byte, but this is not a safe assumption. Strings in JavaScript are UTF-16, so each character requires two bytes of memory. This means that while many browsers have a 5 MB limit, you can only store 2.5 M characters.
It is quite difficult to predict how much is left for the domain, even if it is set to 5 MB.
After reading through about HTML5 Storage. It is quite possible to look for unlimited storage option.
https://developers.google.com/chrome/whitepapers/storage#unlimited
In this documentation they are suggesting manifest as:
"storage": {
"managed_schema": "schema.json"
},
I have not tested myself but it is worth to give it a try. If it works then please let me know as well.
Updated 2021:
https://developer.chrome.com/docs/extensions/reference/storage/#property-local
Use chrome.storage.local.QUOTA_BYTES.

Client side search engine optimization

Due to the reasons outlined in this question I am building my own client side search engine rather than using the ydn-full-text library which is based on fullproof. What it boils down to is that fullproof spawns "too freaking many records" in the order of 300.000 records whilst (after stemming) there are only about 7700 unique words. So my 'theory' is that fullproof is based on traditional assumptions which only apply to the server side:
Huge indices are fine
Processor power is expensive
(and the assumption of dealing with longer records which is just applicable to my case as my records are on average 24 words only1)
Whereas on the client side:
Huge indices take ages to populate
Processing power is still limited, but relatively cheaper than on the server side
Based on these assumptions I started of with an elementary inverted index (giving just 7700 records as IndexedDB is a document/nosql database). This inverted index has been stemmed using the Lancaster stemmer (most aggressive one of the two or three popular ones) and during a search I would retrieve the index for each of the words, assign a score based on overlap of the different indices and on similarity of typed word vs original (Jaro-Winkler distance).
Problem of this approach:
Combination of "popular_word + popular_word" is extremely expensive
So, finally getting to my question: How can I alleviate the above problem with a minimal growth of the index? I do understand that my approach will be CPU intensive, but as a traditional full text search index seems unusably big this seems to be the only reasonable road to go down on. (Pointing me to good resources or works is also appreciated)
1 This is a more or less artificial splitting of unstructured texts into small segments, however this artificial splitting is standardized in the relevant field so has been used here as well. I have not studied the effect on the index size of keeping these 'snippets' together and throwing huge chunks of texts at fullproof. I assume that this would not make a huge difference, but if I am mistaken then please do point this out.
This is a great question, thanks for bringing some quality to the IndexedDB tag.
While this answer isn't quite production ready, I wanted to let you know that if you launch Chrome with --enable-experimental-web-platform-features then there should be a couple features available that might help you achieve what you're looking to do.
IDBObjectStore.openKeyCursor() - value-free cursors, in case you can get away with the stem only
IDBCursor.continuePrimaryKey(key, primaryKey) - allows you to skip over items with the same key
I was informed of these via an IDB developer on the Chrome team and while I've yet to experiment with them myself this seems like the perfect use case.
My thought is that if you approach this problem with two different indexes on the same column, you might be able to get that join-like behavior you're looking for without bloating your stores with gratuitous indexes.
While consecutive writes are pretty terrible in IDB, reads are great. Good performance across 7700 entries should be quite tenable.

Information heap size

What information can I obtain from the performance.memory object in Chrome?
What do these numbers mean? (are they in kb's or characters)
What can I learn from these numbers?
Example values of performance.memory
MemoryInfo {
jsHeapSizeLimit: 793000000,
usedJSHeapSize: 10000000,
totalJSHeapSize: 31200000
}
What information can I obtain from the performance.memory object in Chrome?
The property names should be pretty descriptive.
What do these numbers mean? (are they in kb's or characters)
The docs state:
The values are quantized as to not expose private information to
attackers.
See the WebKit Patch for how the quantized values are exposed. The
tests in particular help explain how it works.
What can I learn from these numbers?
You can identify problems with memory management. See http://www.html5rocks.com/en/tutorials/memory/effectivemanagement/ for how the performance.memory API was used in gmail.
The related API documentation does not say, but my read judging by the numbers you shared and what I see on my machine is that the values are in bytes.
A quick review of the code to which Bergi linked - regarding the values being quantized - seems to support this - e.g. float sizeOfNextBucket = 10000000.0; // First bucket size is roughly 10M..
The quantized MemoryInfo properties are mostly useful for monitoring vs. determining the precise impact of operations on memory. A comment in the aforementioned linked code explains this well I think:
86 // We quantize the sizes to make it more difficult for an attacker to see precise
87 // impact of operations on memory. The values are used for performance tuning,
88 // and hence don't need to be as refined when the value is large, so we threshold
89 // at a list of exponentially separated buckets.
Basically the values get less precise as they get bigger but are still sufficiently precise for monitoring memory usage.

Categories

Resources