Simple Linear Regression Prediction Algorithm in JavaScript - javascript

I am trying to do a simple profit prediction of an organization in the future based on the past profit in JavaScript. My dataset will be the date as x-axis and the profit as y-axis. I am new to data analytics and basically I have zero knowledge in it and I not sure which prediction algorithm will be the most suitable.
I have done some researches here and here and found out that I can actually use the Linear Regression Prediction algorithm. However, from those examples, I only saw that the prediction algorithm is simply plotting a straight line based on the data to find out the regression value and it does not predict any value for the future at all.
I wonder if the algorithm mentioned above is applicable for my case?
Thanks!

It depends a lot on the business and how much data you have. Does the past history follow a regular linear progression? If so then a linear model would make sense. Are there ups and downs? What explains those? Is it seasonal or some other cyclical? If so you need to take those into account. Are there specific periods with huge outliers that are very uncommon? Perhaps correcting (removing) those would yield better results.
There is no one-size-fits-all solution.

The prediction has nothing to do with JavaScript or HTML. It's just using a regression function. How to fit your function to data granted is a realm of regression analysis. You can check the least squares method to clarify your understanding.
Choosing of regression function is another issue. It's related to realm your data come from. You have to be aware of restrictions on your output so you can take a function that will fit the business logic (for example, if you have your data cyclic by a year or a day or whatever, you may wrap sin() or cos() function over another one).
There is one more method to predict. It's related to machine learning and based on artificial neural networks. If you're interested doing this with JS, I can suggest you use brain.js - the simplest library to deal with neural networks in JS.

Related

Compare sound between source and microphone in JavaScript

I'm working about audio but I'm a newbie in this area. I would like to matching sound from microphone to my source audio(just only 1 sound) like Coke Ads from Shazam. Example Video (0.45 minute) However, I want to make it on website by JavaScript. Thank you.
Building something similar to the backend of Shazam is not an easy task. We need to:
Acquire audio from the user's microphone (easy)
Compare it to the source and identify a match (hmm... how do... )
How can we perform each step?
Aquire Audio
This one is a definite no biggy. We can use the Web Audio API for this. You can google around for good tutorials on how to use it. This link provides some good fundametal knowledge that you may want to understand when using it.
Compare Samples to Audio Source File
Clearly this piece is going to be an algorithmic challenge in a project like this. There are probably various ways to approach this part, and not enough time to describe them all here, but one feasible technique (which happens to be what Shazam actually uses), and which is also described in greater detail here, is to create and compare against a sort of fingerprint for smaller pieces of your source material, which you can generate using FFT analysis.
This works as follows:
Look at small sections of a sample no more than a few seconds long (note that this is done using a sliding window, not discrete partitioning) at a time
Calculate the Fourier Transform of the audio selection. This decomposes our selection into many signals of different frequencies. We can analyze the frequency domain of our sample to draw useful conclusions about what we are hearing.
Create a fingerprint for the selection by identifying critical values in the FFT, such as peak frequencies or magnitudes
If you want to be able to match multiple samples like Shazam does, you should maintain a dictionary of fingerprints, but since you only need to match one source material, you can just maintain them in a list. Since your keys are going to be an array of numerical values, I propose that another possible data structure to quickly query your dataset would be a k-d tree. I don't think Shazam uses one, but the more I think about it, the closer their system seems to an n-dimensional nearest neighbor search, if you can keep the amount of critical points consistent. For now though, just keep it simple, use a list.
Now we have a database of fingerprints primed and ready for use. We need to compare them against our microphone input now.
Sample our microphone input in small segments with a sliding window, the same way we did our sources.
For each segment, calculate the fingerprint, and see if it matches close to any from storage. You can look for a partial match here and there are lots of tweaks and optimizations you could try.
This is going to be a noisy and inaccurate signal so don't expect every segment to get a match. If lots of them are getting a match (you will have to figure out what lots means experimentally), then assume you have one. If there are relatively few matches, then figure you don't.
Conclusions
This is not going to be an super easy project to do well. The amount of tuning and optimization required will prove to be a challenge. Some microphones are inaccurate, and most environments have other sounds, and all of that will mess with your results, but it's also probably not as bad as it sounds. I mean, this is a system that from the outside seems unapproachably complex, and we just broke it down into some relatively simple steps.
Also as a final note, you mention Javascript several times in your post, and you may notice that I mentioned it zero times up until now in my answer, and that's because language of implementation is not an important factor. This system is complex enough that the hardest pieces to the puzzle are going to be the ones you solve on paper, so you don't need to think in terms of "how can I do X in Y", just figure out an algorithm for X, and the Y should come naturally.

How to get Render Performance for Javascript based Charting Libraries?

To preface I am pretty new to programming Javascript, but I have been working with various libraries for a while now. I've been tasked to get performance metrics for various charting libraries to find the fastest and most flexible based on some of the libraries available (e.g. AmCharts, HighCharts, SyncFusion, etc.). I've tried JSPerf and it seems like I am getting performance metrics for the code execution and not the actual rendered chart which is the metrics we want (aka what the user experience will be). I've tried using the performance.now() within the Javascript code in the header and also wrapped around the tags where the charts are displayed, but neither method is working.
What is the best way to get these performance metrics based on rendering?
Short Answer :
Either :
Start your timing right before the chart code executes and setup a MutationObserver to watch the DOM and end the time when all mutation ends.
Find out if the charting library has a done() event. (But be cautious as this can be inaccurate depending on implementation/library. "done()" could mean visually done, but background work is still being performed. This could cause interactivity to be jumpy until the chart is completely ready).
Long Answer :
I'm assuming your test data is quite large since most libraries can handle a couple thousand points without any negligible degradation. Measuring performance for client-side charting libraries is actually a two sided issue: rendering times and usability.
Rendering times can be measured by the duration when a library interprets the dataset, to the visual representation of the chart. Depending on each library's interpretation algorithm, your mileage will vary depending on the data size. Let's say library X uses an aggressive sampling
algorithm and only has to draw a small percentage of the dataset. Performance will be extremely fast, but it may or may not be an accurate representation of your data set. Even more so, interactivity at a finer grain detail could be limited.
Which leads me to the usability and interactivity aspect of performance. We're using a computer and not a chart on a piece of paper; it should be as interactive as possible.
As the amount of interactivity goes up though, your browser could be susceptible to slowdown depending on the library's implementation. What if each of your million data points was to be an interactive dom node? 1 Million data points would surely crash the browser.
Most of the charting libraries out there deal with the tradeoff between performance, accuracy, and usability differently. As for what is It all depends on the implementation.
Plug/Source : I am a developer at ZingChart and we deal with our customers with large datasets all the time. We also built this which is pretty relevant to your tests : http://www.zingchart.com/demos/zingchart-vs/
My method is really basic. I create a var with current time then call a console.log() with the time I got to the end of my code block and the difference.
var start = +new Date();
//do lots of cool stuff
console.log('Rendered in ' + (new Date() - start) + ' ms');
Very generic and does what it says on the tin. If you want to measure each section of code you would have to make new time slots. Yes, the calculation takes time. But it is miniscule compared to what the code that I want to measure is doing. Example in action at the jsFiddle.

Pattern recognition and prediction in C# or JavaScript

I started working on a small project - at least I thought so.
The idea is simple - I have a list of boolean values, which were randomly generated (provably fair but not completely random engine). Due to the engine's non-randomness, there are emerging patterns in the boolean outcome.
What I wish to do is quite simple in theory: recognize the pattern early on, and predict the next one based on all the previous data sets. I do realize that an actual implementation will be extremely frustrating to work with.
Could anyone give me an idea where to start with this?

Client side search engine optimization

Due to the reasons outlined in this question I am building my own client side search engine rather than using the ydn-full-text library which is based on fullproof. What it boils down to is that fullproof spawns "too freaking many records" in the order of 300.000 records whilst (after stemming) there are only about 7700 unique words. So my 'theory' is that fullproof is based on traditional assumptions which only apply to the server side:
Huge indices are fine
Processor power is expensive
(and the assumption of dealing with longer records which is just applicable to my case as my records are on average 24 words only1)
Whereas on the client side:
Huge indices take ages to populate
Processing power is still limited, but relatively cheaper than on the server side
Based on these assumptions I started of with an elementary inverted index (giving just 7700 records as IndexedDB is a document/nosql database). This inverted index has been stemmed using the Lancaster stemmer (most aggressive one of the two or three popular ones) and during a search I would retrieve the index for each of the words, assign a score based on overlap of the different indices and on similarity of typed word vs original (Jaro-Winkler distance).
Problem of this approach:
Combination of "popular_word + popular_word" is extremely expensive
So, finally getting to my question: How can I alleviate the above problem with a minimal growth of the index? I do understand that my approach will be CPU intensive, but as a traditional full text search index seems unusably big this seems to be the only reasonable road to go down on. (Pointing me to good resources or works is also appreciated)
1 This is a more or less artificial splitting of unstructured texts into small segments, however this artificial splitting is standardized in the relevant field so has been used here as well. I have not studied the effect on the index size of keeping these 'snippets' together and throwing huge chunks of texts at fullproof. I assume that this would not make a huge difference, but if I am mistaken then please do point this out.
This is a great question, thanks for bringing some quality to the IndexedDB tag.
While this answer isn't quite production ready, I wanted to let you know that if you launch Chrome with --enable-experimental-web-platform-features then there should be a couple features available that might help you achieve what you're looking to do.
IDBObjectStore.openKeyCursor() - value-free cursors, in case you can get away with the stem only
IDBCursor.continuePrimaryKey(key, primaryKey) - allows you to skip over items with the same key
I was informed of these via an IDB developer on the Chrome team and while I've yet to experiment with them myself this seems like the perfect use case.
My thought is that if you approach this problem with two different indexes on the same column, you might be able to get that join-like behavior you're looking for without bloating your stores with gratuitous indexes.
While consecutive writes are pretty terrible in IDB, reads are great. Good performance across 7700 entries should be quite tenable.

Thought Process for Solving Algebra Equations?

I'm working on a graphing applications that basically graphs equations with on an HTML5 canvas. I had no problem graphing equations that were along the lines of y=3x^(2) etc. That was as easy as plugging in a given x value, substituting exponents for native functions and voila!
Ideally however, I'd like to graph equations for circles and other equations that don't necessarily start with y=.... This would require actually doing algebra, which, unfortunately is not so easy. My question is: what is the most logical way to solve a problem such as 3x+3y=15? Let's assume that I'm given an x and I'm solving for y. How would you go about creating a function that solves it?
Obviously, I could choose to be extremely inefficient and loop through y values until I find one that satisfies the equation, but let's try to avoid that.
I'm not asking for you to write the script for me, I'm just asking for the best/most efficient thought-process to get started.
Currently, this project is being written in Javascript.
Thanks!
One (approximate numerical) way is to take your equation re-write it as P(x) = 0 [in your case P(x) = 3(x^2) + 3(y^2) - 15] and then use a numerical technique such as Newton-Raphson to find the roots of P(x)
If you want to solve symbolically, then a Computer Algebra System (CAS) is required (non-trivial).
usually you would express the equation with one variable on one side of the equals sign and the other variable on the other.
If you want to rewrite equations form random user input, you will need some kind of parsing engine.
look here for a discussion
y=3x^(2) is not linear its quadatric, 3x+3y=15 is in fact linear.
It depends on how complex you want to go, it's not that challenging to write something to rearrange a linear equation like 3x+3y=15 into its standard linear form (y=5-x), but it gets harder fast and while there are probably server side libraries for it, i'm not sure about JS.
The proper name for what you are looking for: http://en.wikipedia.org/wiki/Computer_algebra_system

Categories

Resources