Longest Common Subsequence, required length on contiguous parts - javascript

I think I have enough grasp of the LCS algorithm from this page. Specifically this psedo-code implementation: (m and n are the lengths of A and B)
int lcs_length(char * A, char * B) {
allocate storage for array L;
for (i = m; i >= 0; i--)
for (j = n; j >= 0; j--) {
if (A[i] == '\0' || B[j] == '\0') L[i,j] = 0;
else if (A[i] == B[j]) L[i,j] = 1 + L[i+1, j+1];
else L[i,j] = max(L[i+1, j], L[i, j+1]);
}
return L[0,0];
}
The L array is later backtracked to find the specific subsequence like so:
sequence S = empty;
i = 0;
j = 0;
while (i < m && j < n) {
if (A[i]==B[j]) {
add A[i] to end of S;
i++; j++;
}
else if (L[i+1,j] >= L[i,j+1]) i++;
else j++;
}
I have yet to rewrite this into Javascript, but for now I know that the implementation at Rossetta Code works just fine. So to my questions:
1. How do I modify the algorithm to only return the longest common subsequence where the parts of the sequence are of a given minimum length?
For example, "thisisatest" and "thimplestesting" returns "thistest", with the contiguous parts "thi", "s" and "test". Let's define 'limit' as a minimum requirement of contiguous characters for it to be added to the result. With a limit of 3 the result would be "thitest" and with a limit of 4 the result would be "test". For my uses I would like to not only get the length, but the actual sequence and its indices in the first string. It doesn't matter if that needs to be backtracked later or not.
2. Would such a modification reduce the complexity or increase it?
From what I understand, analysing the entire suffix tree might be a solution to find a subsequence that fits a limit? If correct, is that significantly more complex than the original algorithm?.
3. Can you optimize the LCS algorithm, modified or not, with the knowledge that the same source string is compared to a huge amount of target strings?
Currently I'm just iterating through the target strings finding the LCS and selecting the string with the longest subsequence. Is there any significant preprocessing that could be done on the source string to reduce the time?
Answers to any of my questions are welcome, or just hints on where to research further.
Thank you for your time! :)

Related

How to make this javascript ordering consecutive integers ascending by swaps function more efficient?

Object is to order ascending an unordered list of consecutive integers 1,2,3.....n by swapping numbers (say swapping values at indices 3 and 6), and find the minimum number of swaps needed. My code works, but times out (10 second limit) when given the edge case 100000 integers. How can I streamline this code? To me it feels fairly minimal already - no nested loops or anything. I'm not very experienced with efficiency evaluation, any help would be appreciated, thanks.
function minimumSwaps(arr) {
var swaps = 0;
for (var i = 0; i < arr.length; i++) {
if (arr[i] !== i+1) {
var tempIndex = arr.indexOf(i+1);
var tempVal = arr[i];
arr[i] = i+1;
arr[tempIndex] = tempVal;
swaps += 1;
}
}
//console.log(swaps)
return swaps;
}

How does indexOf method on string work in JavaScript

I was doing a question on codility and came across this problem for which I wrote something like this:
function impact(s) {
let imp = 4; // max possible impact
for (let i = 0; i < s.length; i++) {
if (s[i] === 'A') return 1;
else if (s[i] === 'C') imp = Math.min(imp, 2);
else if (s[i] === 'G') imp = Math.min(imp, 3);
else if (s[i] === 'T') imp = Math.min(imp, 4);
}
return imp;
}
function solution(S, P, Q) {
const A = new Array(P.length);
for (let i = 0; i < P.length; i++) {
const s = S.slice(P[i], Q[i] + 1);
A[i] = impact(s);
}
return A;
}
And it failed all the performance tests
Now I changed it to the following code which I thought would be slower but to my surprise it scored 100%:
function solution(S, P, Q) {
let A = []
for (let i = 0; i < P.length; i++) {
let s = S.slice(P[i], Q[i] + 1)
if (s.indexOf('A') > -1) A.push(1)
else if (s.indexOf('C') > -1) A.push(2)
else if (s.indexOf('G') > -1) A.push(3)
else if (s.indexOf('T') > -1) A.push(4)
}
return A
}
Which to me made no sense, because I was using 4 indexOf which should be slower than 1 linear iteration of the same string. But it's not.
So, how does String.indexOf() work and why are 4 .indexOf so much faster than 1 iteration?
In your first solution you have two loops. The second loop is in impact. That second loop corresponds roughly to the four indexOf you have in the second solution.
One iteration of the second loop will do at most 4 comparisons, and there will be at most n iterations. So this makes at most 4n comparisons. The same can be said of the indexOf solution. Each of these four indexOf may need to scan the whole array, which represents n comparisons. And so that also amounts to a worst case of 4n comparisons.
The main difference however, is that the scanning that an indexOf performs, is not implemented in JavaScript, but in highly efficient pre-compiled code, while the first solution does this scanning with (slower) JavaScript code. As a rule of thumb, it is always more efficient to use native String/Array methods (like there are indexOf, slice, includes,...) than implementing a similar functionality with an explicit for loop.
Another thing to consider is that if there is an "A" in the data at position i, then the second solution will find it after i comparisons (internal to the indexOf implementation), while the first solution will find it after 4i comparisons, because it also makes the comparisons for the other three letters during the same iterations in which it looks for an "A". This extra cost decreases for when there is no "A", but a "C" somewhere, ...etc.

How to improve performance of this Javascript/Cracking the code algorithm?

so here is the question below, with my answer to it. I know that because of the double nested for loop, the efficiency is O(n^2), so I was wondering if there were a way to improve my algorithm/function's big O.
// Design an algorithm and write code to remove the duplicate characters in a string without using any additional buffer. NOTE: One or two additional variables are fine. An extra copy of the array is not.
function removeDuplicates(str) {
let arrayString = str.split("");
let alphabetArray = [["a", 0],["b",0],["c",0],["d",0],["e",0],["f",0],["g",0],["h",0],["i",0],["j",0],["k",0],["l",0],["m",0],["n",0],["o",0],["p",0],["q",0],["r",0],["s",0],["t",0],["u",0],["v",0],["w",0],["x",0],["y",0],["z",0]]
for (let i=0; i<arrayString.length; i++) {
findCharacter(arrayString[i].toLowerCase(), alphabetArray);
}
removeCharacter(arrayString, alphabetArray);
};
function findCharacter(character, array) {
for (let i=0; i<array.length; i++) {
if (array[i][0] === character) {
array[i][1]++;
}
}
}
function removeCharacter(arrString, arrAlphabet) {
let finalString = "";
for (let i=0; i<arrString.length; i++) {
for (let j=0; j<arrAlphabet.length; j++) {
if (arrAlphabet[j][1] < 2 && arrString[i].toLowerCase() == arrAlphabet[j][0]) {
finalString += arrString[i]
}
}
}
console.log("The string with removed duplicates is:", finalString)
}
removeDuplicates("Hippotamuus")
The ASCII/Unicode character codes of all letters of the same case are consecutive. This allows for an important optimization: You can find the index of a character in the character count array from its ASCII/Unicode character code. Specifically, the index of the character c in the character count array will be c.charCodeAt(0) - 'a'.charCodeAt(0). This allows you to look up and modify the character count in the array in O(1) time, which brings the algorithm run-time down to O(n).
There's a little trick to "without using any additional buffer," although I don't see a way to improve on O(n^2) complexity without using a hash map to determine if a particular character has been seen. The trick is to traverse the input string buffer (assume it is a JavaScript array since strings in JavaScript are immutable) and overwrite the current character with the next unique character if the current character is a duplicate. Finally, mark the end of the resultant string with a null character.
Pseudocode:
i = 1
pointer = 1
while string[i]:
if not seen(string[i]):
string[pointer] = string[i]
pointer = pointer + 1
i = i + 1
mark string end at pointer
The function seen could either take O(n) time and O(1) space or O(1) time and O(|alphabet|) space if we use a hash map.
Based on your description, I'm assuming the input is a string (which is immutable in javascript) and I'm not sure what exactly does "one or two additional variables" mean so based on your implementation, I'm going to assume it's ok to use O(N) space. To improve time complexity, I think implementations differ according to different requirements for the outputted string.
Assumption1: the order of the outputted string is in the order that it appears the first time. eg. "bcabcc" -> "bca"
Suppose the length of s is N, the following implementation uses O(N) space and O(N) time.
function removeDuplicates(s) {
const set = new Set(); // use set so that insertion and lookup time is o(1)
let res = "";
for (let i = 0; i < s.length; i++) {
if (!set.has(s[i])) {
set.add(s[i]);
res += s[i];
}
}
return res;
}
Assumption2: the outputted string has to be of ascending order.
You may use quick-sort to do in-place sorting and then loop through the sorted array to add the last-seen element to result. Note that you may need to split the string into an array first. So the implementation would use O(N) space and the average time complexity would be O(NlogN)
Assumption3: the result is the smallest in lexicographical order among all possible results. eg. "bcabcc" -> "abc"
The following implementation uses O(N) space and O(N) time.
const removeDuplicates = function(s) {
const stack = []; // stack and set are in sync
const set = new Set(); // use set to make lookup faster
const lastPos = getLastPos(s);
let curVal;
let lastOnStack;
for (let i = 0; i < s.length; i++) {
curVal = s[i];
if (!set.has(curVal)) {
while(stack.length > 0 && stack[stack.length - 1] > curVal && lastPos[stack[stack.length - 1]] > i) {
set.delete(stack[stack.length - 1]);
stack.pop();
}
set.add(curVal);
stack.push(curVal);
}
}
return stack.join('');
};
const getLastPos = (s) => {
// get the last index of each unique character
const lastPosMap = {};
for (let i = 0; i < s.length; i++) {
lastPosMap[s[i]] = i;
}
return lastPosMap;
}
I was unsure what was mean't by:
...without using any additional buffer.
So I thought I would have a go at doing this in one loop, and let you tell me if it's wrong.
I have worked on the basis that the function you have provided gives the correct output, you were just looking for it to run faster. The function below gives the correct output and run's a lot faster with any large string with lots of duplication that I throw at it.
function removeDuplicates(originalString) {
let outputString = '';
let lastChar = '';
let lastCharOccurences = 1;
for (let char = 0; char < originalString.length; char++) {
outputString += originalString[char];
if (lastChar === originalString[char]) {
lastCharOccurences++;
continue;
}
if (lastCharOccurences > 1) {
outputString = outputString.slice(0, outputString.length - (lastCharOccurences + 1)) + originalString[char];
lastCharOccurences = 1;
}
lastChar = originalString[char];
}
console.log("The string with removed duplicates is:", outputString)
}
removeDuplicates("Hippotamuus")
Again, sorry if I have misunderstood the post...

find sequence of numbers in array and alert highest number

I have an array that looks like this:
[1,2,3,4,5,6,8,10,12,13,14,15,20,21,22,23,24,25,26,27,28,29,30]
the highest count on one of the found sequences would be: 10
My goal is to loop through the array and identify the sequences of numbers, then find the length of the highest sequence that exists.
So, based on the array above, the length of the longest sequence would be "10"
Does anyone know of quick and easy script to find this?
OK, I think I found a very short way of doing this (only 1 line for the for loop):
var arr = [1,2,3,4,5,6,8,10,12,13,14,15,20,21,22,23,24,25,26,27,28,29,30];
var res = new Array();
res[0] = 0;
for(var i=1;i<arr.length;i++) res[i] = (arr[i] == arr[i-1] + 1) ? (res[i-1] + 1) : 0;
var maxLength = Math.max.apply({},res);
this gives you (10) as the result. if you need (11) (which makes more sense) change the 0 to 1 in the for loop.
jsFiddle link: http://jsfiddle.net/gEzzA/8/
You don't need jQuery for this.
function longestSeq(arr) {
var len = 0, longestLen = -1, prev = null;
for (var i = 0; i < arr.length; ++i) {
if (prev == null || arr[i] - 1 === prev)
++len;
else {
if (len > longestLen) longestLen = len;
len = 1;
}
}
return longestLen > len ? longestLen : len;
}
What that does is keep track of how long it's been since a "break" has been seen. Each time a break is seen, it checks whether the longest so far is shorter than the last good run.
Here's the solution in pseudo code...
First, setup another array with the same number of elements and initialised to zero, to use as counters...
Array01:=[1,2,3,4,5,6,8,10,12,13,14,15,20,21,22,23,24,25,26,27,28,29,30]
Array02:=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Now the logic for filling in the counters...
FOR i:=0 TO LastElement DO
WHILE (Array01[i+1]-Array01[i]=1) AND (i<LastElement) DO Inc(Array02[i]);
Now to scan who's got the highest sequence score...
which:=0; Value:=Array02[0];
FOR i:=0 TO LastElement DO
IF Array02[i]>Value THEN BEGIN Value:=Array02[i]; Which:=i; END;
So, at the end of this the highest sequence is held by Array element "Which" and the count is "Value"!

Javascript regular expressions problem

I am creating a small Yahtzee game and i have run into some regex problems. I need to verify certain criteria to see if they are met. The fields one to six is very straight forward the problem comes after that. Like trying to create a regex that matches the ladder. The Straight should contain one of the following characters 1-5. It must contain one of each to pass but i can't figure out how to check for it. I was thinking /1{1}2{1}3{1}4{1}5{1}/g; but that only matches if they come in order. How can i check if they don't come in the correct order?
If I understood you right, you want to check if a string contains the numbers from 1 to 5 in random order. If that is correct, then you can use:
var s = '25143';
var valid = s.match(/^[1-5]{5}$/);
for (var i=1; i<=5; i++) {
if (!s.match(i.toString())) valid = false;
}
Or:
var s = '25143';
var valid = s.split('').sort().join('').match(/^12345$/);
Although this definitely can be solved with regular expressions, I find it quite interesting and educative to provide a "pure" solution, based on simple arithmetic. It goes like this:
function yahtzee(comb) {
if(comb.length != 5) return null;
var map = [0, 0, 0, 0, 0, 0];
for(var i = 0; i < comb.length; i++) {
var digit = comb.charCodeAt(i) - 48;
if(digit < 1 || digit > 6) return null;
map[digit - 1]++;
}
var sum = 0, p = 0, seq = 0;
for(var i = 0; i < map.length; i++) {
if(map[i] == 2) sum += 20;
if(map[i] >= 3) sum += map[i];
p = map[i] ? p + 1 : 0;
if(p > seq) seq = p;
}
if(sum == 5) return "Yahtzee";
if(sum == 23) return "Full House";
if(sum == 3) return "Three-Of-A-Kind";
if(sum == 4) return "Four-Of-A-Kind";
if(seq == 5) return "Large Straight";
if(seq == 4) return "Small Straight";
return "Chance";
}
for reference, Yahtzee rules
For simplicity and easiness, I'd go with indexOf.
string.indexOf(searchstring, start)
Loop 1 to 5 like Max but just check indexOf i, break out for any false.
This also will help for the small straight, which is only 4 out of 5 in order(12345 or 23456).
Edit: Woops. 1234, 2345, 3456. Sorry.
You could even have a generic function to check for straights of an arbitrary length, passing in the maximum loop index as well as the string to check.
"12543".split('').sort().join('') == '12345'
With regex:
return /^([1-5])(?!\1)([1-5])(?!\1|\2)([1-5])(?!\1|\2|\3)([1-5])(?!\1|\2|\3|\4)[1-5]$/.test("15243");
(Not that it's recommended...)
A regexp is likely not the best solution for this problem, but for fun:
/^(?=.*1)(?=.*2)(?=.*3)(?=.*4)(?=.*5).{5}$/.test("12354")
That matches every string that contains exactly five characters, being the numbers 1-5, with one of each.
(?=.*1) is a positive lookahead, essentially saying "to the very right of here, there should be whatever or nothing followed by 1".
Lookaheads don't "consume" any part of the regexp, so each number check starts off the beginning of the string.
Then there's .{5} to actually consume the five characters, to make sure there's the right number of them.

Categories

Resources