How to do javascript online k-means clustering for many dimensions

How to do javascript online k-means clustering for many dimensions - javascript

I found many examples of javascript online k-means clustering, but all of the are for 2 dimensions.
If I have 56 dimensions (for example), how can I do the clustering?
Bonus question:
Could it be possible, having some new data, to predict some value looking the clusters (like, 76% of belonging to cluster x, so the value should be y)

k-means algorithm should be easy to port to any number of dimensions. It looks like this:
Randomly choose centers of clusters.
For each point check, what is the nearest cluster.
Compute new cluster center by computing avarage from all points.
Repeat until cluster centers don't change.
In 2d, you check the distance between (x1, x2) and (y1, y2) in 2. like this (x1-x2)^2 + (y1-y2)^2 (you don't need to use square root, if you are using distance only to compare it with another distance). In 56 dimensions, you just have 56 components.
In 2d, you compute cluster center by taking avarage of all points. Take the first dimension of all points and take the average avg1, take all the second dimensions avg2 up to 56 and your new cluster center will be (avg1, avg2, avg3 ... avg56).
What is not easy is that it is very expensive. Check out algorithms for dimensionality reduction (feature extraction) like PCA.
Also make sure, that all freatures are normalized. For example - they have ranges between (-100, 100).
If you need more information, check out Machine Learning course at coursera.
Week 8 is all about clustering and its traps.

Related

Find the largest rectangle that fits inside a polygon

I need to find the largest rectangle that can fit inside any polygon,
what i tried is dividing the svg to 2d grid and loop the 2d array to see if the current grid cell intersects with the polygon to create a new 2d binary array where intersection is 1 else 0
now i need to find the largest rectangle from that 2d array AND more importantly its location
as example:
if the 2d array is like this, i need to find the largest rect in that array and its x1,y1 (start i,j) and x2,y2 (end i,j).

well you can brute force the location and scan for the size which will be O(n^6) if n is the avg size of side of your map in pixels ...
The location might be speed up by search (accepting not strictly sorted data) for example like this:
How approximation search works
which would lead to ~O(n^4.log^2(n)). But beware the search must be configured properly in order to not skip solution ... The size search can be improved too by using similar technique like I did in here:
2D OBB
Just use different metric so I would create LUT tables of start and end positions for each x and y (4 LUT tables) which will speed up the search leading to ~O(n^2.log^2(n)) while creation of LUT is O(n^2). btw the same LUTs I sometimes use in OCR like here (last 2 images):
OCR and character similarity
Now problem with this approach is it can not handle concave polygon correctly as there might be more edges per x,y than just 2. So to remedy that you would need to have more LUTs and use them based on position in polygon (divide polygon to "convex" areas)
So putting all these together would look something like this:
approx loop (center x) // ~O(log(n))
approx loop (center y) // ~O(log(n))
grow loop (square size to max using) LUT // O(n)
{
grow loop (x size to max while decreasing original square y size) // O(n)
grow loop (y size to max while decreasing original square x size) // O(n)
use bigger from the above 2 rectangles
}
Just do not forget to use area of polygon / area of rectangle as approximation error value. This algo is resulting in ~O(n^2.log^2(n)) which is not great but still doable.
Another option is convert your polygon to squares, and use bin-packing and or graph and or backtracking techniques to grow to biggest rectangle ... but those are not my cup of tea so I am not confident enough to create answer about them.

CesiumJS - Distance Between Two Points

My goal is to calculate the distance between two Cesium entities in kilometers. As a bonus, I eventually want to be able to measure their distance in pixels.
I have a bunch of placemarks in KML format like this:
<Placemark>
<name>Place</name>
<Point><coordinates>48.655,-31.175</coordinates></Point>
<styleUrl>#style</styleUrl>
<ExtendedData>
...
</ExtendedData>
</Placemark>
I am importing them into Cesium like so:
viewer.dataSources.add(Cesium.KmlDataSource.load('./data.kml', options)).then(function(dataSource) {
var entities = dataSource.entities._entities._array;
I have attempted to create new Cartesian3 objects of entities I care about, but the x, y, and z values I get from the entity object are in the hundreds of thousands. The latitude and longitude from my KML are nowhere to be found in the entity objects.
If I do create Cartesian3 objects and compute the distance like so:
var distance = Cesium.Cartesian3.distance(firstPoint, secondPoint);
it returns numbers in the millions. I have evaluated the distance between multiple points this way and when I compare those values returned to the result of an online calculator which returns the actual value in kilometers, the differences in the distances are not linear (some of the distances returned by Cesium are 900 times the actual distance and some are 700 times the actual distance).
I hope that is enough to receive help. I am not sure where to start fixing this. Any help would be appreciated. Thank you.

A couple of things are going on here. The Cesium.Cartesian3 class holds meters, so it is correct to divide by 1000 to get km, but that's not the full story. Cartesian3s are positions on a 3D globe, and if you compute a simple Cartesian.distance between two of them on opposite sides of that globe, you'll get the Cartesian linear distance, as in the length of a line that cuts through the middle of the globe to get from one to the other, rather than traveling around the surface of the globe to get to the far side.
To get the distance you actually want -- the distance of a line that follows the curvature of the surface of the Earth -- check out the answer to Cesium JS Line Length on GIS SE.

If I have one lat/lng which I assume is at 0,0 then how do I calculate the x, y coordinates of another lat/lng pair?

I've seen many variations of this question asked but am having trouble relating their answers to my specific need.
I have several sets of 3 lat/lng coordinate pairs. The coordinates in any set are within a few km of eachother.
For each set I would like to convert the coordinates to x/y values so that I can plot them.
I would like to assign 1 of the coordinates to 0,0 and then compute the relative x/y values of the other two coordinates.
This site does what I want but unfortunately doesn't share the algorithm:
http://www.whoi.edu/marine/ndsf/cgi-bin/NDSFutility.cgi?form=0&from=LatLon&to=XY

First some definitions just to be clear
let a be latitude <-pi/2,+pi/2>
let b be longitude <0,+2*pi>
let re,rp be equator and pole radiuses of Earth
a0,b0, a1,b1, a2,b2 are your points in spherical coordinates
and x0,y0, x1,y1, x2,y2 are your wanted cartesian coordinates
convert coordinates to relative to (a0,b0)
Leta assume East is aligned to your X-axis and North pole is aligned to Y-axis
x0=0.0;
y0=0.0;
r1=re*cos(a1)+rp*sin(a1) // actual radius for point 1
r2=re*cos(a2)+rp*sin(a2) // actual radius for point 2
x1=x0+((b1-b0)*r1);
x2=x0+((b2-b0)*r2);
y1=y0+((a1-a0)*re); // here instead of re should be length of ellipse curve from 0 to a1-a0
y2=y0+((a2-a0)*re); // here instead of re should be length of ellipse curve from 0 to a2-a0
if re!=rp then the y1,y2 coordinates will be less accurate
to correct that just replace ((a1-a0)*re) with the propper formula
for ellipse arc->length computation (this one is for circle)
I am too lazy to compute that integral
anyway even this is good enough (earth eccentricity is not that bad)
you also can normalize the angles after substraction
while (a<-pi) a+=2.0*pi;`
while (a>+pi) a-=2.0*pi;`
just to be safe ...

Actually, that's not entirely true. The site does share the algorithm, just not in the way one would expect to.
See http://www.whoi.edu/marine/ndsf/utility/NDSFutility.js .
Hope that helps.

Determine if a 2D point is within a quadrilateral

I'm working on a JS program which I need to have determine if points are within four corners in a coordinate system.
Could somebody point me in the direction of an answer?
I'm looking at what I think is called a convex quadrilateral. That is, four pretty randomly chosen corner positions with all angles smaller than 180°.
Thanks.

There are two relatively simple approaches. The first approach is to draw a ray from the point to "infinity" (actually, to any point outside the polygon) and count how many sides of the polygon the ray intersects. The point is inside the polygon if and only if the count is odd.
The second approach is to go around the polygon in order and for every pair of vertices vi and vi+1 (wrapping around to the first vertex if necessary), compute the quantity (x - xi) * (yi+1 - yi) - (xi+1 - xi) * (y - yi). If these quantities all have the same sign, the point is inside the polygon. (These quantities are the Z component of the cross product of the vectors (vi+1 - vi) and (p - vi). The condition that they all have the same sign is the same as the condition that p is on the same side (left or right) of every edge.)
Both approaches need to deal with the case that the point is exactly on an edge or on a vertex. You first need to decide whether you want to count such points as being inside the polygon or not. Then you need to adjust the tests accordingly. Be aware that slight numerical rounding errors can give a false answer either way. It's just something you'll have to live with.
Since you have a convex quadrilateral, there's another approach. Pick any three vertices and compute the barycentric coordinates of the point and of the fourth vertex with respect to the triangle formed by the three chosen vertices. If the barycentric coordinates of the point are all positive and all less than the barycentric coordinates of the fourth vertex, then the point is inside the quadrilateral.
P.S. Just found a nice page here that lists quite a number of strategies. Some of them are very interesting.

You need to use winding, or the ray trace method.
With winding, you can determine whether any point is inside any shape built with line segments.
Basically, you take the cross product of each line segment with the point, then add up all the results. That's the way I did it to decide if a star was in a constellation, given a set of constellation lines. I can see that there are other ways..
http://en.wikipedia.org/wiki/Point_in_polygon
There must be some code for this in a few places.

It is MUCH easier to see if a point lies within a triangle.
Any quadrilateral can be divided into two triangles.
If the point is in any of the two triangles that comprise the quadrilateral, then the point is inside the quadrilateral.

In a spherical condition, given 3 points and their respective distances to a 4th point, how do a find its geolocation? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Trilateration using 3 latitude and longitude points, and 3 distances
This is more of a math question than programming question. Basically, I have P1:(lat1, lon1), P2:(lat2, lon2), P3:(lat3, lon3) and D1, D2, D3, and a 4th unknown point Px:(latx, lonx); also P1, P2, P3 do not lie in the same path of Great Circle, and D1 is the distance between P1 and Px, and D2 is the distance betwwen P2 and Px, etc.
How do I figure out the coordinates of Px?
=edited based on reply=
Thanks a lot!
PS. If you are going to point to any API, I would like it to be in JavaScript.

You have to understand that there will be multiple points that satisfy the mathematical constraint here. Think clearly, if you have two points on a sphere (ignore the geodesic form for now), P1 and P2, and you have another point T1, at distance x from P1 and distance y from P2, then you will have another symmetric (mirrored) point T1' which will satisfy the same distance conditions, on the other side, so to speak.
Even worse: Consider the case of a sphere with a diameter D. Your P1 is at the North Pole, and your P2 is at the South pole. Do you see that the all points on the equator will satisfy your condition?
Apply this to your example: Consider P1 at the North Pole. Consider P2 at the South Pole. Consider distance to Px, i.e. D1 = D2 = (2.pie.r)/4. See the problem? All points on the equator satisfy this, not a single unique point. In fact, for this case, even if D1 != D2, then you have smaller concentric lines (concentric to the equator) whose points satisfy these constraints.
Too many Px's in your case, not one. To come to a singularity point on a spherical surface, the description constraints would be more specific.
Lastly, establishing correctness of the context is important. Should your algorithm support all points that meet the criteria? Or should your criteria be altered such that the algorithm evaluates to a singular point, always. Be careful.
Some links to help you:
Wikipedia: http://en.wikipedia.org/wiki/Spherical_coordinates
SO: Plotting a point on the edge of a sphere
Updates, based on your three point example:
Again, there can be multiple points satisfying your criteria. What if P1, P2, P3 lie on the same arc? See the diagram below. Even with three points, there is no guarantee that there will be a single fourth point satisfying the distance criteria. Even with n points, there is no such guarantee.
In mathematical language, for a set of n random points, and a set of distances from these individual points, the set of resulting points that satisfy the distance criteria MAY have more than one elements.
You may be fooled into thinking: Oh, this guy is always assuming points lying on the same arc. Well, you are not making a special algorithm, are you? Your algo will be a generalized solution, won't it?
You need to guarantee that the points are not on the same arc (in a set of n points, I think at least 1 point cannot be on the same arc).
For keeping source points to a bare minimum : You need to establish traingular relations between points, because then, using ONLY two points, the triangle relation will yield exactly one point.
What triangle? Visualize this: You have two points, and a third unknown point. All distance you mention are SPHERICAL, i.e. curved surface distances. Do you see that there are also flat distances between these points? Can you visualize, that there will be a plane passing through these points, slicing the sphere, right? I say this to emphasize that you do not need to worry about surface curvature (hence 3d steradian angles). You can see the underlying 2d triangle, whose unknown vertex will also be the third point on the sphere surface.
I know this maybe very hard for you to visualize, I'll try making a diagram for this. (Can't find any good online tools!).
Lastly, this will be of significant help: Please read carefully.
Taken from Wikipedia: http://en.wikipedia.org/wiki/Great-circle_distance
The great-circle or orthodromic distance is the shortest distance between any two points on the surface of a sphere measured along a path on the surface of the sphere (as opposed to going through the sphere's interior). Because spherical geometry is different from ordinary Euclidean geometry, the equations for distance take on a different form. The distance between two points in Euclidean space is the length of a straight line from one point to the other. On the sphere, however, there are no straight lines. In non-Euclidean geometry, straight lines are replaced with geodesics. Geodesics on the sphere are the great circles (circles on the sphere whose centers are coincident with the center of the sphere).
Some more updates:
Conversion between long, lats to Cartesian coordinates can be done by the Haversine formula. Google it. See here: Converting from longitude\latitude to Cartesian coordinates
and here: http://en.wikipedia.org/wiki/World_Geodetic_System

Develop Reference

JavaScript is the programming language of the Web.