Finding local maxima - javascript

I'd like to find the local maxima for a set of data.
I have a log of flight data from a sounding rocket payload, and I'd like to find the approximate times for the staging based on accelerometer data. I should be able to get the times I want based on a visual inspection of the data on a graph, but how would I go about finding the points programmatically in Javascript?

If it's only necessary to know approximate times, probably it's good enough to use some heuristic such as: run the data through a smoothing filter and then look for jumps.
If it's important to find the staging times accurately, my advice is to construct a piecewise continuous model and fit that to the data, and then derive the staging times from that. For example, a one-stage model might be: for 0 < t < t_1, acceleration is f(t) - g; for t > t_1, acceleration is - g, where g is gravitational acceleration. I don't know what f(t) might be here but presumably it's well-known in rocket engineering. The difficulty of fitting such a model is due to the presence of the cut-off point t_1, which makes it nondifferentiable, but it's not really too difficult; in a relatively simple case like this, you can loop over the possible cut-off points and compute the least-squares solution for the rest of the parameters, then take the cut-off point or points which have the least error.
See Seber and Wild, "Nonlinear Regression"; there is a chapter about such models.

Related

Downsampling Time Series: Average vs Largest-Triangle-Three-Buckets

I'm programming lines charts with the use of Flot Charts to display timeseries.
In order to reduce the number of points to display, I do a downsampling by applying an average function on every data points in the same hour.
Recently, I however discovered the Largest-Triangle-Three-Buckets algorithm:
http://flot.base.is/
What are the differences between using such algorithm against using a simple function like average (per minute, per hour, per day, ...)?
To speed up long period queries, does it make sense to pre-calculate an sql table on server-side, by applying LTTB on each month of data, and let the client-side apply an other LTTB on the agregated data?
1: The problem with averages, for my purposes, is that they nuke large differences between samples- my peaks and valleys were more important than what was happening between them. The point of the 3buckets algorithm is to try to preserve those inflection points (peaks/valleys) while not worrying about showing you all the times the data was similar or the same.
So, in my case, where the data was generally all the same (or close enough-- temperature data) until sample X at which point a small % change was important to be shown in the graph, the buckets algorithm was perfect.
Also- since the buckets algorithm is parameterized, you can change the values (how much data to keep) and see what values nuke the most data while looking visually nearly-identical and decide how much data you can dispense with before your graph has had too much data removed.
The naive approach would be decimation (removing X out of N samples) but what happens if it's the outliers you care about and the algorithm nukes an outlier? So then you change your decimation so that if the difference is -too- great, then it doesn't nuke that sample. This is kind of a more sophisticated version of that concept.
2: depends on how quickly you can compute it all, if the data ever changes, various other factors. That's up to you. From my perspective, once my data was in the past and a sample was 'chosen' to represent the bucket's value, it won't be changed and I can save it and never recalculate again.
Since your question is a bit old, what'd you end up doing?

Compare sound between source and microphone in JavaScript

I'm working about audio but I'm a newbie in this area. I would like to matching sound from microphone to my source audio(just only 1 sound) like Coke Ads from Shazam. Example Video (0.45 minute) However, I want to make it on website by JavaScript. Thank you.
Building something similar to the backend of Shazam is not an easy task. We need to:
Acquire audio from the user's microphone (easy)
Compare it to the source and identify a match (hmm... how do... )
How can we perform each step?
Aquire Audio
This one is a definite no biggy. We can use the Web Audio API for this. You can google around for good tutorials on how to use it. This link provides some good fundametal knowledge that you may want to understand when using it.
Compare Samples to Audio Source File
Clearly this piece is going to be an algorithmic challenge in a project like this. There are probably various ways to approach this part, and not enough time to describe them all here, but one feasible technique (which happens to be what Shazam actually uses), and which is also described in greater detail here, is to create and compare against a sort of fingerprint for smaller pieces of your source material, which you can generate using FFT analysis.
This works as follows:
Look at small sections of a sample no more than a few seconds long (note that this is done using a sliding window, not discrete partitioning) at a time
Calculate the Fourier Transform of the audio selection. This decomposes our selection into many signals of different frequencies. We can analyze the frequency domain of our sample to draw useful conclusions about what we are hearing.
Create a fingerprint for the selection by identifying critical values in the FFT, such as peak frequencies or magnitudes
If you want to be able to match multiple samples like Shazam does, you should maintain a dictionary of fingerprints, but since you only need to match one source material, you can just maintain them in a list. Since your keys are going to be an array of numerical values, I propose that another possible data structure to quickly query your dataset would be a k-d tree. I don't think Shazam uses one, but the more I think about it, the closer their system seems to an n-dimensional nearest neighbor search, if you can keep the amount of critical points consistent. For now though, just keep it simple, use a list.
Now we have a database of fingerprints primed and ready for use. We need to compare them against our microphone input now.
Sample our microphone input in small segments with a sliding window, the same way we did our sources.
For each segment, calculate the fingerprint, and see if it matches close to any from storage. You can look for a partial match here and there are lots of tweaks and optimizations you could try.
This is going to be a noisy and inaccurate signal so don't expect every segment to get a match. If lots of them are getting a match (you will have to figure out what lots means experimentally), then assume you have one. If there are relatively few matches, then figure you don't.
Conclusions
This is not going to be an super easy project to do well. The amount of tuning and optimization required will prove to be a challenge. Some microphones are inaccurate, and most environments have other sounds, and all of that will mess with your results, but it's also probably not as bad as it sounds. I mean, this is a system that from the outside seems unapproachably complex, and we just broke it down into some relatively simple steps.
Also as a final note, you mention Javascript several times in your post, and you may notice that I mentioned it zero times up until now in my answer, and that's because language of implementation is not an important factor. This system is complex enough that the hardest pieces to the puzzle are going to be the ones you solve on paper, so you don't need to think in terms of "how can I do X in Y", just figure out an algorithm for X, and the Y should come naturally.

how to appromimately align unprecise time series data / curves?

There are two similar time series data sets that share a common measurement, that however comes from two completely different sources:
one is a classic GPS receiver (position, accurate time, somewhat accurate speed once per second) on a tractor
the other is a logger device on the same tractor that has an internal real time clock and measures speed/distance by measuring wheel rotation. That device also tracks other sensor data and stores these values once per second.
I'd like to combine these two data sources so that I can (as accurately as possible) combine GPS position and the additional logger sensor data.
One aspect that might help here is that there will be some significant speed variations during the measurement as the tractor usually has to do 180-degree turns after 100-200 meters (i.e good detail for a better matching).
By plotting the speed data into two charts respectively (X-axis is time, Y-axis is speed), I think a human could pretty easily align the charts in a way so that they match nicely.
How can this be done in software? I expect there are known algorithms that solve this but it's hard to search for it if you don't know the right terms...
Anyway, the algorithm probably has to deal with these problems:
the length of the data won't be equal (the two devices won't start/stop exactly at the same time - I expect at least 80% to overlap)
the GPS clock is of course perfectly accurate, but the realtime clock of the logger may be way off (it has no synchronized time source), so that I can't simply match the time
there might be slight variations in the measured speed due to the different measurement methods
A simple solution probably would find two matching extremes (allowing to interpolate the data in between), a better solution might be more flexible and even correct some "drifts"...

What are good strategies for graphing large datasets (1M +)?

I'm just starting to approach this problem, I want to to allow users to arbitrarily select ranges and filters that allow them to graph large data sets (realistically it should never be more than 10 million data points) on a web page. I use elasticsearch as the method of storing and aggregating the data, along with redis for keeping track of summary data, and d3.js is my graphing library.
My thoughts on the best solution is to have precalculated summaries in different groupings that can be used to graph from. So if the data points exist over several years, I can have groupings by month and day (which I would be doing anyway), but then by groupings of say half day, quarter day, hour, half hour, etc. And then before I query for graph data I do a quick calculation to see which of these groupings will give me some ideal number of data points (say 1000).
Is this a reasonable way to approach the problem? Is there a better way?
You should reconsider the amount of data...
Even in desktop plotting apps it is uncommon to show that many points per plot - e.g. origin prints a warning that it will show only a subset for performance reasons. you could for example throw away every 3rd point to make them less.
You should give the user the ability to zoom in or navigate around to explore the data, like pagination alike style ...
Grouping or faceting how it is called in Lucene community is of course possible with that many documents but be sure you have enough RAM+CPU
You can't graph (typically) more points than you have dots on your screen. So to graph 1M points you'd need a really good monitor.

Using Dijkstra's algorithm to find a path that can carry the most weight

I have a graph, with X nodes and Y edges. Weighted edges. The point is to start at one node, and stop at another node which is the last location. Now here comes the problem:
Visualize the problem. The edges are roads, and the edge weights are the max weight limits for vehicles driving on the roads. We would like to drive the biggest truck possible from A to F. I want the largest maximum allowed weight for all paths from A to F.
Can I use some sort of Dijkstra's algorithm for this problem? I'm not sure how to express this problem in the form of an algorithm that I can implement. Any help is much appreciated. I'm confused because Dijkstra's algorithm just only view on shortest path.
If I understand correctly, you want to find the path between some nodes that has the maximum bottleneck edge. That is, you want the path whose smallest edge is as large as possible. If this is what you want to solve, then there is a very straightforward modification of Dijkstra's algorithm that can be used to solve the problem.
The idea behind the algorithm is to run Dijkstra's algorithm with a twist. Normally, when running Dijkstra's algorithm, you keep track of the length of the shortest path to each node. In the modified Dijkstra's algorithm, you instead store, for each node, the maximum possible value of a minimum-weight edge on any path that reaches the node. In other words, normally in Dijkstra's algorithm you determine which edge to expand by finding the edge that maximizes the quantity
d(s, u) + l(u, v)
Where s is the start node, u is some node you've explored so far, and (u, v) is an edge. In the modified Dijkstra's, you instead find the edge minimizing
min(bottleneck(s, u), l(u, v))
That is, you consider the bottleneck edge on the path from the source node to any node you've seen so far and consider what bottleneck path would be formed if you left that node and went some place else. This is the best bottleneck path to the target node, and you can repeat this process.
This variant of Dijkstra's algorithm also runs in O(m + n log n) time using a good priority queue. For more information, consider looking into these lecture slides that have a brief discussion of the algorithm.
Interestingly, this is a well-known problem that's used as a subroutine in many algorithms. For example, one of the early polynomial-time algorithms for solving the maximum flow problem uses this algorithm as a subroutine. For details about how, check out these lecture notes.
Hope this helps! And if I've misinterpreted your question, please let me know so I can delete/update this answer.
No Dijkstra, no flow problem. It's a lot easier: Just use your favorite graph search (BFS or DFS).
Instead of computing & tracking the cost associated with reaching a certain node in the graph, just compute the 'size' of the biggest truck that is allowed to use this path (the minimum of weights of all edges in the path). When multiple search paths meet in a node throw away the path that has the lower 'truck weight limit'.
Here is an easy and efficient way:
Let MAX be the maximum edge weight in the graph. Binary search 0 <= k <= MAX such that you can get from A to F using only edges with weights >= k. You can use a breadth first search to see if this is possible (don't take an edge if its weight is too small).
This gives an O((X + Y) log V) algorithm, where V is the range of your weights.
What a Dijkstra-like algorithm requires is optimal substructure and a way quickly to compute the objective value for a one-edge extension of a path with a known objective value. Here, optimal substructure means that if you have an optimal path from a vertex x to a different vertex y, then the subpath from x to the second-to-last vertex is optimal.
(IVlad, I only can get O(X + Y) with randomization.)

Categories

Resources