How to make k-means algorithm functional - javascript

I have a very basic implementation of k-means in javascript (I know but it needs to run in the browser). What I would like to understand is - how could one make this more functional?
It is currently full of loops, and extremely difficult to follow / reason about, code below:
export default class KMeans {
constructor(vectors, k) {
this.vectors = vectors;
this.numOfVectors = vectors.length;
this.k = k || bestGuessK(this.numOfVectors);
this.centroids = randomCentroids(this.vectors, this.k);
}
classify(vector, distance) {
let min = Infinity;
let index = 0;
for (let i = 0; i < this.centroids.length; i++) {
const dist = distance(vector, this.centroids[i]);
if (dist < min) {
min = dist;
index = i;
}
}
return index;
}
cluster() {
const assigment = new Array(this.numOfVectors);
const clusters = new Array(this.k);
let movement = true;
while (movement) {
// update vector to centroid assignments
for (let i = 0; i < this.numOfVectors; i++) {
assigment[i] = this.classify(this.vectors[i], euclidean);
}
// update location of each centroid
movement = false;
for (let j = 0; j < this.k; j++) {
const assigned = [];
for (let i = 0; i < assigment.length; i++) {
if (assigment[i] === j) assigned.push(this.vectors[i]);
}
if (!assigned.length) continue;
const centroid = this.centroids[j];
const newCentroid = new Array(centroid.length);
for (let g = 0; g < centroid.length; g++) {
let sum = 0;
for (let i = 0; i < assigned.length; i++) {
sum += assigned[i][g];
}
newCentroid[g] = sum / assigned.length;
if (newCentroid[g] !== centroid[g]) {
movement = true;
}
}
this.centroids[j] = newCentroid;
clusters[j] = assigned;
}
}
return clusters;
}
}

It certainly can.
You could start with this:
classify(vector, distance) {
let min = Infinity;
let index = 0;
for (let i = 0; i < this.centroids.length; i++) {
const dist = distance(vector, this.centroids[i]);
if (dist < min) {
min = dist;
index = i;
}
}
return index;
}
Why is this a member function? Wouldn't a pure function const classify = (centroids, vector, distance) => {...} be cleaner?
Then for an implementation, let's change the distance signature a bit. If we curry it to const distance = (vector) => (centroid) => {...}, we can then write
const classify = (centroids, vector, distance) =>
minIndex (centroids .map (distance (vector)))
And if that distance API is out of our control, it's not much harder:
const classify = (centroids, vector, distance) =>
minIndex (centroids .map (centroid => distance (vector, centroid)))
Granted, we haven't written minIndex yet, but we've already broken the problem down to use a more meaningful abstraction. And minIndex isn't hard to write. You can do it imperatively as the original classify function did, or with something like this:
const minIndex = (xs) => xs.indexOf (Math.min (...xs))
Note that distance is a slightly misleading name here. I had to read it more carefully because I assumed a name like that would represent..., well a distance. Instead it's a function used to calculate distance. Perhaps the name metric or something like distanceFunction, distanceFn, or distanceImpl would be more obvious.
Now let's move on to this bit:
const newCentroid = new Array(centroid.length);
for (let g = 0; g < centroid.length; g++) {
let sum = 0;
for (let i = 0; i < assigned.length; i++) {
sum += assigned[i][g];
}
newCentroid[g] = sum / assigned.length;
if (newCentroid[g] !== centroid[g]) {
movement = true;
}
}
This code has two responsibilities: creating the newCentroid array, and updating the value of movement if any value has changed.
Let's separate those two.
First, creating the new centroid. We can clean up that nested for-loop to something like this:
const makeNewCentroid = (centroid, assigned) =>
centroid .map ((c, g) => mean (assigned .map ((a) => a[g])))
This depends on a mean function, which we'll write along with its required sum function like this:
const sum = (ns) => ns .reduce ((t, n) => t + n, 0)
const mean = xs => sum (xs) / xs.length
Then we need to update movement. We can do that easily based on centroids and newCentroids:
movement = centroids.some((c, i) => c !== newCentroids[i])
Obviously, you can continue in this manner. Each for loop should have a fundamental purpose. Find that purpose and see if one of the Array.prototype methods could better express it. For the second section we worked with above, we found two purposes, and just split them into two separate blocks.
This should give you a good start on making this more functional. There is no magic bullet. But if you think in terms of pure functions on immutable data, and on strong separation of concerns, you can usually move in a functional direction.

Related

GPU-JS: Error: too many arguments for kernel

I was trying to make simple script that uses GPU for multiplying arrays, but when I turn on the code, it shows error as in title. I don't know if it's my fault and I didn't installed every library or its a bug.
Code is from gpu-js github example:
const { GPU } = require('gpu.js');
const gpu = new GPU();
const multiplyMatrix = gpu.createKernel(function(a, b) {
let sum = 0;
for (let i = 0; i < 512; i++) {
sum += a[this.thread.y][i] * b[i][this.thread.x];
}
return sum;
}).setOutput([512, 512]);
const c = multiplyMatrix(a, b);
Thanks in advance.
A and B are not defined, you need to define your matrices first and then call the function. Here's the full example from their website, comments mine:
// Function to create the 512x512 matrix
const generateMatrices = () => {
const matrices = [[], []]
for (let y = 0; y < 512; y++){
matrices[0].push([])
matrices[1].push([])
for (let x = 0; x < 512; x++){
matrices[0][y].push(Math.random())
matrices[1][y].push(Math.random())
}
}
return matrices
}
//Define the function to be ran on GPU
const gpu = new GPU();
const multiplyMatrix = gpu.createKernel(function(a, b) {
let sum = 0;
for (let i = 0; i < 512; i++) {
sum += a[this.thread.y][i] * b[i][this.thread.x];
}
return sum;
}).setOutput([512, 512])
// Create the matrices
const matrices = generateMatrices()
// Run multiplyMatrix using the 2 matrices created.
const out = multiplyMatrix(matrices[0], matrices[1])

matrix with different colors javascript

I need to build 2d matrix 50x50 representing boxes with random colors, but if the boxes which are close to each other have the same colors, they should get different random color from each other, until it's different and then continue building.
Here I made matrix with boxes inside it works fine, but colors sometimes do match:
function onLoad(evt)
{
var matrix = [];
for (var i = 0; i < 50; i++) {
var row = [];
for (var j = 0; j < 50; j++) {
var randColor = Math.floor(Math.random()*16777215).toString(16);
row.push(MyComponent(randColor));
}
matrix.push(row);
}
var newData = matrix.map(function(row) {
return row.map(function(x) {
return x;
})})
}
You need a way to determine whether a particular color is too close to another. One way to do this is with rgb-lab (or, less accurately, euclidean distance). Say you use rgb-lab's deltaE function, which takes two arguments, where each argument is a 3-item array of RGB amounts.
Generate your random colors such that you can get their components' decimal values easily, and so that you can also get their hex string representation easily. Then iterate over the filled adjacent indicies and compare the colors. If they're too similar, try again. Something along the lines of:
const MINIMUM_DISTANCE = 25;
const getColor = () => {
const r = Math.floor(Math.random() * 256);
const g = Math.floor(Math.random() * 256);
const b = Math.floor(Math.random() * 256);
const str = r.toString(16) + g.toString(16) + b.toString(16);
return { rgb: [r, g, b], str };
};
const matrix = [];
for (let i = 0; i < 50; i++) {
const row = [];
for (let j = 0; j < 50; j++) {
let color;
let tooClose = false;
do {
color = getColor();
tooClose =
(row[j - 1] && deltaE(color.rgb, row[j - 1].rgb) < 25) ||
(matrix[i - 1] && deltaE(color.rgb, row[i - 1][j].rgb < 25));
} while (tooClose);
row.push(color);
}
}
Change the MINIMUM_DISTANCE as desired. See here for an explanation of the numbers.
Then you'll need to turn the color objects into an array of components with color strings at the end.
const arrayOfComponents = matrix.map(
row => row.map(
({ str }) => MyComponent(str)
)
);

Majority Element Leetcode with Hash Javascript Passing 1 test not the other

Given an array of size n, find the majority element. The majority element is the element that appears more than ⌊ n/2 ⌋ times.
You may assume that the array is non-empty and the majority element always exists in the array.
Example 1:
Input: [3,2,3]
Output: 3
Example 2:
Input: [6,5,5]
Output: 2
My code:
let newHash = {};
let local = 0;
let global = 0;
for (let num of nums){
if (newHash[num]){
newHash[num] += 1
} else {
newHash[num] = 1
}
for (let num in newHash){
console.log(num)
local = Math.max(newHash[num], newHash[num] + local)
if (local > global){
global = local
console.log(global)
}
console.log(local)
}
return num
}
};
I pass the first example [3,2,3] but fail on [6,5,5]. I know I am super close but cannot figure out how to pass.
Since the majority element will always exist, you can just find the element that has the highest frequency.
function solve(nums){
let newHash = {};
let max = 0, res;
for (let num of nums){
if (newHash[num]){
newHash[num] += 1;
} else {
newHash[num] = 1;
}
if(newHash[num] > max){
max = newHash[num];
res = num;
}
}
return res;
}
console.log(solve([6,5,5]));
I guess you're solving LeetCode's 169, for which this'll pass through:
const majorityElement = function(nums) {
const map = {};
for (let i = 0; i < nums.length; i++) {
map[nums[i]] = map[nums[i]] + 1 || 1;
if (map[nums[i]] > nums.length >> 1) {
return nums[i];
}
}
};
Or a bit easier to understand:
const majorityElement = function(nums) {
const map = {};
for (let i = 0; i < nums.length; i++) {
map[nums[i]] = map[nums[i]] + 1 || 1;
if (map[nums[i]] > nums.length / 2) {
return nums[i];
}
}
};
References
For additional details, you can see the Discussion Board. There are plenty of accepted solutions with a variety of languages and explanations, efficient algorithms, as well as asymptotic time/space complexity analysis1, 2 in there.
Array Length in JavaScript

How to code all for all cases of Two Sum javascript problem

I have been working on the two sum problem for the past few hours and can't seem to account for the case where there are only two numbers and their sum is the same as the first number doubled.
The result should be [0,1], but i'm getting [0,0].
let nums = [3,3];
let targetNum = 6;
function twoSum(nums, target) {
for (let i = 0; i < nums.length; i++) {
for (let b = i+1; b < nums.length; b++) {
if ((nums[i] + nums[b]) == target) {
return [nums.indexOf(nums[i]), nums.indexOf(nums[b])];
}
}
}
}
console.log(twoSum(nums, targetNum))
Two Sum
My approach uses a javascript object and completes the algorithm in O(n) time complexity.
const twoSum = (nums, target) => {
let hash = {}
for(i=0;i<nums.length;i++) {
if (hash[nums[i]]!==undefined) {
return [hash[nums[i]], i];
}
hash[target-nums[i]] = i;
}
};
console.log(twoSum([2,7,11,15], 9)); // example
This is not the way to solve the problem. Step through the array and save the complement of the target wrt the number in the array. This will also solve your corner case.
You should consider, indexOf(i) -> start from the first element, returns the index when match found! That is why in your code, nums.indexOf(nums[i]) and nums.indexOf(nums[b]) which is basically 3 in all two cases, it will return 0, cause 3 is the first element in array.
instead of doing this, return the index itself.
let nums = [3,3];
let targetNum = 6;
function twoSum(nums, target) {
for (let i = 0; i < nums.length; i++) {
for (let b = i+1; b < nums.length; b++) {
if ((nums[i] + nums[b]) == target) {
return i + "" +b;
}
}
}
}
console.log(twoSum(nums, targetNum))

Simple Feedforward Neural Network in JavaScript

I'm new to this site, so I apologize in advance if I'm doing anything wrong in this post.
I'm currently trying out machine learning, and I'm learning neural networks. I'm currently using http://neuralnetworksanddeeplearning.com/. However, I don't fully understand everything, and all of the code is written in Python (I'm more comfortable with JavaScript).
I've created a program that works for simple data. However, for more complicated data (handwritten digits recognition with MNIST data), the accuracy rate isn't nearly as high as the website above says it will be, by using a neural network of 784 input neurons, 10-400 hidden neurons in the hidden layer (only one hidden layer and tried several possible number of neurons), and 10 output neurons with hundreds of iterations. I think that there is an error with my back propagation step (i.e. the train step, I'm including the other functions here as reference) that prevents it from learning fast enough (BTW, I'm using the cross-entropy as my cost function). I would really appreciate if anyone can help me find the error. Thanks in advance.
Below is the code. The weights are arranged in an array of arrays of arrays (weight[i][j][k] is the weight between the jth neurons in the ith layer and the kth neuron in the (i+1)th layer). Similarly, bias[i][j] is the bias of the (i+1)th layer for the jth neuron. The training data is formatted as an array of objects with keys of inputs and outputs (see example below).
class NeuralNetwork {
constructor(layers) {
// Check if layers is a valid argument
// Initialize neural network
if (!Array.isArray(layers) || layers.length < 2) {
throw Error("Layers must be specified as an array of length at least 2");
}
this.weights = [];
this.biases = [];
for (let i = 0, l = layers.length; i < l; ++i) {
let currentLayer = layers[i];
if (typeof currentLayer === "number" && Number.isInteger(currentLayer) && currentLayer > 0) {
let numWeights = layers[i + 1];
if (i < l - 1) {
this.weights.push([]);
}
if (i) {
this.biases.push([]);
}
// Seed weights and biases
for (let j = 0; j < currentLayer; ++j) {
if (i < l - 1) {
let weights = [];
for (let k = 0; k < numWeights; ++k) {
weights.push(Math.random() * 2 - 1);
}
this.weights[i].push(weights);
}
if (i) {
this.biases[i - 1].push(Math.random() * 2 - 1);
}
}
} else {
throw Error("Array used to specify NeuralNetwork layers must consist solely of positive integers");
}
}
this.activation = (x) => 1 / (1 + Math.exp(-x));
this.activationDerivative = (x) => this.activation(x) * (1 - this.activation(x));
Object.freeze(this);
console.log("Successfully initialized NeuralNetwork");
return this;
}
run(input, training) {
// Forward propagation
let currentInput;
if (training) {
currentInput = [input.map((a) => {return {before: a, after: a}})];
} else {
currentInput = [...input];
}
for (let i = 0, l = this.weights.length; i < l; ++i) {
let newInput = [];
for (let j = 0, m = this.weights[i][0].length, n = (training ? currentInput[i] : currentInput).length; j < m; ++j) {
let sum = this.biases[i][j];
for (let k = 0; k < n; ++k) {
sum += (training ? currentInput[i][k].after : currentInput[k]) * this.weights[i][k][j];
}
if (training) {
newInput.push({
before: sum,
after: this.activation(sum)
});
} else {
newInput.push(this.activation(sum));
}
}
if (training) {
currentInput.push(newInput);
} else {
currentInput = newInput;
}
}
return currentInput;
}
train(data, learningRate = 0.1, batch = 50, iterations = 10000) {
// Backward propagation
console.log("Initialized training");
let length = data.length,
totalCost = 0,
learningRateFunction = typeof learningRate === "function",
batchCount = 0,
weightChanges = [],
biasChanges = [];
for (let i = 0; i < iterations; ++i) {
let rate = learningRateFunction ? learningRate(i, totalCost) : learningRate;
totalCost = 0;
for (let j = 0, l = length; j < l; ++j) {
let currentData = data[j],
result = this.run(currentData.input, true),
outputLayer = result[result.length - 1],
outputLayerError = [],
errors = [];
for (let k = 0, m = outputLayer.length; k < m; ++k) {
let currentOutputNeuron = outputLayer[k];
outputLayerError.push(currentOutputNeuron.after - currentData.output[k]);
totalCost -= Math.log(currentOutputNeuron.after) * currentData.output[k] + Math.log(1 - currentOutputNeuron.after) * (1 - currentData.output[k]);
}
errors.push(outputLayerError);
for (let k = result.length - 1; k > 1; --k) {
let previousErrors = errors[0],
newErrors = [],
currentLayerWeights = this.weights[k - 1],
previousResult = result[k - 1];
for (let i = 0, n = currentLayerWeights.length; i < n; ++i) {
let sum = 0,
currentNeuronWeights = currentLayerWeights[i];
for (let j = 0, o = currentNeuronWeights.length; j < o; ++j) {
sum += currentNeuronWeights[j] * previousErrors[j];
}
newErrors.push(sum * this.activationDerivative(previousResult[i].before));
}
errors.unshift(newErrors);
}
for (let k = 0, n = this.biases.length; k < n; ++k) {
if (!weightChanges[k]) weightChanges[k] = [];
if (!biasChanges[k]) biasChanges[k] = [];
let currentLayerWeights = this.weights[k],
currentLayerBiases = this.biases[k],
currentLayerErrors = errors[k],
currentLayerResults = result[k],
currentLayerWeightChanges = weightChanges[k],
currentLayerBiasChanges = biasChanges[k];
for (let i = 0, o = currentLayerBiases.length; i < o; ++i) {
let change = rate * currentLayerErrors[i];
for (let j = 0, p = currentLayerWeights.length; j < p; ++j) {
if (!currentLayerWeightChanges[j]) currentLayerWeightChanges[j] = [];
currentLayerWeightChanges[j][i] = (currentLayerWeightChanges[j][i] || 0) - change * currentLayerResults[j].after;
}
currentLayerBiasChanges[i] = (currentLayerBiasChanges[i] || 0) - change;
}
}
++batchCount;
if (batchCount % batch === 0 || i === iterations - 1 && j === l - 1) {
for (let k = 0, n = this.weights.length; k < n; ++k) {
let currentLayerWeights = this.weights[k],
currentLayerBiases = this.biases[k],
currentLayerWeightChanges = weightChanges[k],
currentLayerBiasChanges = biasChanges[k];
for (let i = 0, o = currentLayerWeights.length; i < o; ++i) {
let currentNeuronWeights = currentLayerWeights[i],
currentNeuronWeightChanges = currentLayerWeightChanges[i];
for (let j = 0, p = currentNeuronWeights.length; j < p; ++j) {
currentNeuronWeights[j] += currentNeuronWeightChanges[j] / batch;
}
currentLayerBiases[i] += currentLayerBiasChanges[i] / batch;
}
}
weightChanges = [];
biasChanges = [];
}
}
totalCost /= length;
}
console.log(`Training ended due to iterations reached\nIterations: ${iterations} times\nTime spent: ${(new Date).getTime() - startTime} ms`);
return this;
}
}
Example
Tests if a point is inside a circle. For this example, the neural network performs well. However, for more complicated examples such as handwriting recognition, the neural network performs really badly (best I can get for a single neural network is 70% accuracy, compared to the 96% accuracy stated in the website even when using similar parameters).
let trainingData = [];
for (let i = 0; i < 1000; ++i) {
let [x, y] = [Math.random(), Math.random()];
trainingData.push({input: [x, y], output: [Number(Math.hypot(x,y) < 1)]});
}
let brain = new NeuralNetwork([2, 5, 5, 1]);
brain.train(trainingData.slice(0,700), 0.1, 10, 500); // Accuracy rate 95.33% on the remaining 300 entries in trainingData
Ok, I guess I'm going to answer my own question. So, I don't think there is an error in my code and it's perfectly fine to use (albeit really, really inefficient) if anyone wants to.
The reason why my runs on the MNIST data did not give accurate answers come from the fact that I did not process the data at first. The raw data gave the darkness of the 28*28 pixels in the range of [0, 255], which I used directly as the input for each of the training data. The correct procedure here would be to convert this into the range of [0, 1] or [-1, 1].
The reason that the [0, 255] range does not work as well is due to the fact that the second hidden layer of neurons will receive really positive or negative inputs.
When the backpropagation algorithm computes the gradient, the change computed for each weight will be really small as it is proportional to the slope of the activation function at the input to the neuron (the derivative of the logistic function is exp(-x)/(1+exp(-x)), which is close to 0 for really positive and negative values of x). Thus, the neural network will take really long to train and, in my case, was not able to learn the data well.
With the correct method, I am able to achieve around 90% accuracy for a 784*200*10 neural network in a fairly short time, though it still is not nearly as accurate as what the author says he is able to achieve using an even simpler algorinthm in the link mentioned in the question.

Categories

Resources