Determine the shortest possible match of a regular expression

Determine the shortest possible match of a regular expression - javascript

I have an randomly ordered array of regular expressions like this:
let patterns = [
/foo+ba+r/,
/foo/,
/foo+bar/,
/foobar/,
/m[eo]{4,}w/,
/boo/,
/fooo*/,
/meow/
]
I'm not sure if this is possible but I would like to write an algorithm which sorts the regular expressions from least greedy to most greedy, like this:
[
/foo/,
/boo/,
/fooo*/,
/meow/,
/foobar/,
/foo+bar/,
/m[eo]{4,}w/,
/foo+ba+r/
]
I would imagine such sorting could be achieved like so:
patterns.sort((p1, p2) { return p1.greediness() - p2.greediness() });
But there exists no method called greediness in the the RegExpr class.
Ideally, the greediness method would return the number of characters which could be possibly matched at minimum. i.e:
/foo/.greediness() == 3
/boo/.greediness() == 3
/fooo*/.greediness() == 3
/meow/.greediness() == 4
/foobar/.greediness() == 6
/foo+bar/.greediness() == 6
/m[eo]{4,}w/.greediness() == 6
/foo+ba+r/.greediness() == 6
What would your solution be to this problem?

This is indeed a very difficult problem, which requires the ability to parse a regular expression (if the rules for regular expressions were reduced so that the only allowed input were "regular" characters and the special regex special characters ()[]|*+?, constructing such a parser would not be too difficult). But given that you had such a parser, this would be my approach:
Convert the regular expression to a Nondeterministic Finite Automaton (NFA). This step is what requires a good regex parser but once you have that then the NFA construction is quite straight forward. Of course, if you can find a ready-made regular expression to NFA implementation, then that would be ideal.
Construct a directed, weighted graph representation of the NFA giving weight 1 to edges that represented a character transition and 0 to edges that represented an epsilon transition.
Using Dijkstra's algorithm find the shortest path from the initial state of the NFA to the final state.
Let's take as an example the regex m[eo]{2,}w. Converted to a NFA with the appropriate edges marked with the weight above the edge and the character causing the state transition marked below the edge we get:
If an edge was defined by a length-3 array of elements consisting of [from-state, to-state, weight], the array of edges for the above digraph would be:
const edges = [
[0, 1, 1],
[1, 2, 0],
[1, 3, 0],
[2, 4, 1],
[3, 5, 1],
[4, 6, 0],
[5, 6, 0],
[6, 7, 0],
[6, 8, 0],
[7, 9, 1],
[8, 10, 1],
[9, 11, 0],
[10, 11, 0],
[11, 6, 0],
[11, 12, 1]
];
Applying Dijkstra's algorithm to get the shortest path from state 0 to state 12 produces a length of 4 with the following path:
0 -> 1 -> 3 -> 5 -> 6 -> 8 -> 10 -> 11 -> 12
And thus the shortest string recognized by the regex would be 4.
So now all you need to do is find or code a JavaScript regular expression to NFA algorithm and a Dijkstra algorithm.
Update
If you are creating your own regex parser, then you can actually bypass creating the NFA and Dijkstra algorithm and compute the length instead. The following does not purport to be a full parser. For example, it does not support named group and it only recognized the basic "stuff."
/*
Grammar (my own extended BNF notation) where [token(s)]? denotes an optional token or tokens
and [token(s)]+ denotes one or more of these tokens.
E -> F E'
E' -> '|' E
F -> [SINGLE_CHAR FOLLOWER | '(' ['?:']? E ')' FOLLOWER]+ | epsilon
SINGLE_CHAR -> CHAR | '[' BRACKET_CHARS ']'
FOLLOWER -> EXPRESSION_FOLLOWER NON_GREEDY
BRACKET_CHARS -> CHAR BRACKET_CHARS | epsilon
EXPRESSION_FOLLOWER -> '*' | '+' | '?' | '{' number [',' [number]? '}' | epsilon
NON_GREEDY -> '?' | epsilon
*/
const EOF = 0;
const CHAR = 1;
let current_char;
let current_token = null;
let tokenizer;
function* lexer(s) {
// Produce next token:
const single_character_tokens = '?*+{}[](),|';
const l = s.length;
let i = 0;
while (i < l) {
current_char = s[i++];
if (single_character_tokens.indexOf(current_char) != -1) {
// the current character is the token to yield:
yield current_char;
}
else {
if (current_char == '\\') {
if (i < l) {
current_char = s[i++];
if (current_char >= '0' && current_char <= '9') {
throw 'unsupported back reference';
}
if (current_char == 'b' || current_char == 'B') {
continue; // does not contribute to the length
}
}
else {
throw 'invalid escape sequence';
}
}
else if (current_char == '^' || current_char == '$') {
continue; // does not contribute to length
}
yield CHAR; // the actual character is current_char
}
}
yield EOF;
}
function FOLLOWER() {
// return a multiplier
if (current_token === '?' || current_token === '*' || current_token === '+') {
const l = current_token === '+' ? 1 : 0;
current_token = tokenizer.next().value;
if (current_token === '?') { // non-greedy
current_token = tokenizer.next().value;
}
return l;
}
if (current_token === '{') {
current_token = tokenizer.next().value;
let s = '';
while (current_token !== '}' && current_token !== EOF) {
if (current_token === EOF) {
throw 'syntax error';
}
s += current_char;
current_token = tokenizer.next().value;
}
current_token = tokenizer.next().value;
const matches = s.match(/^(\d)+(,\d*)?$/);
if (matches === null) {
throw 'synatx error';
}
return parseInt(matches[0]);
}
return 1;
}
function F() {
let l = 0;
while (current_token == CHAR || current_char == '(' || current_char == '[') {
if (current_token === CHAR || current_token === '[') {
if (current_token == CHAR) {
current_token = tokenizer.next().value;
}
else {
current_token = tokenizer.next().value;
if (current_token == ']') {
current_token = tokenizer.next().value;
// empty []
FOLLOWER();
continue;
}
while (current_token != ']' && current_token != EOF) {
current_token = tokenizer.next().value;
}
if (current_token !== ']') {
throw 'syntax error';
}
current_token = tokenizer.next().value;
}
const multiplier = FOLLOWER();
l += multiplier;
}
else if (current_token === '(') {
current_token = tokenizer.next().value;
if (current_token === '?') { // non-capturing group
current_token = tokenizer.next().value;
if (current_token !== CHAR || current_char !== ':') {
throw 'syntax error';
}
current_token = tokenizer.next().value;
}
const this_l = E();
if (current_token !== ')') {
throw 'synatx error';
}
current_token = tokenizer.next().value;
const multiplier = FOLLOWER();
l += this_l * multiplier;
}
else {
throw 'syntax error';
}
}
return l;
}
function E() {
let min_l = F();
while (current_token === '|') {
current_token = tokenizer.next().value;
const l = F();
if (l < min_l) {
min_l = l;
}
}
return min_l;
}
function parse(s) {
tokenizer = lexer(s);
current_token = tokenizer.next().value;
const l = E();
if (current_token !== EOF) {
throw 'syntax error';
}
return l;
}
let patterns = [
new RegExp(''),
/(?:)[]()/,
/abc|$/,
/^foo+ba+r$/,
/foo+ba+r/,
/foo/,
/foo+bar/,
/foobar/,
/m([eo]{4,})w/,
/m(?:[eo]{4,})w/,
/boo/,
/fooo*/,
/meow/,
/\b\d+\b/
];
RegExp.prototype.greediness = function () {
return parse(this.source);
};
let prompt_msg = 'Try your own regex wihout the / delimiters:';
while (true) {
const regex = prompt(prompt_msg);
try {
console.log(`You entered '${regex}' and its greediness is ${new RegExp(regex).greediness()}.`);
break;
}
catch (error) {
prompt_msg = `Your input resulted in ${error}. Try again:`;
}
}
console.log('\nSome tests:\n\n');
for (const pattern of patterns) {
console.log(`pattern = ${pattern.source}, greediness = ${pattern.greediness()}`);
}

As Pointy said in the comments, this is a hard problem.
Here is the beginning of a solution:
const greediness = (s) =>
s .toString () .slice (1, -1)
.replace (/\[[^\]]+]/g, 'X')
.replace (/.\{((\d+)(,\d*))\}/g, (s, a, ds, _) => 'X' .repeat (Number (ds)))
.replace (/.\+/g, 'X')
.replace (/.\?/g, '')
.replace (/.\*/g, '')
.length
const sortByGreediness = (patterns) =>
[...patterns] .sort ((a, b) => greediness (a) - greediness (b))
// .map (s => [s, greediness (s)]) // to show sizes
const patterns = [/foo+ba+r/, /foo/, /foo+bar/, /foobar/, /m[eo]{4,}w/, /boo/, /fooo*/, /meow/]
console .log (sortByGreediness (patterns))
We simply take the text of the regex and replace quantifiers and their preceding characters with the smallest number of characters that might match. We do something similar for blocks like [eo] and X{4,}.
This we might go through steps like this:
m[eo]{4,}wp+u+r?r*
mX{4,}wp+u+r?r*
mXXXXwp+u+r?r*
mXXXXXwXr?r*
mXXXXXwXr*
mXXXXXwX
- length 7
But this doesn't touch on the complexities that can be inside a regex, and doesn't even try to handle capturing groups. I think it would next to impossible to do for the full regex spec, but perhaps this can be expanded toward what you need.
(If you're getting more complex, you might want to do this repeatedly with code something like the following, or with a while loop in place of its recursion.
const greediness = (s, next = s .toString () .slice (1, -1), prev = null) =>
next === prev
? next .length
: greediness (s, next
.replace (/\[[^\]]+]/g, 'X')
.replace (/.\{((\d+)(,\d*))\}/g, (s, a, ds, _) => 'X' .repeat (Number (ds)))
.replace (/.\+/g, 'X')
.replace (/.\?/g, '')
.replace (/.\*/g, '')
, next)

Related

Improving combinations from abc[d[e,f],gh] pattern algorithm

I wrote an algorithm that is inadequate, namely because it does not handle [,abc] cases (see the string variations and conditions below), and would like to know how it can be improved so it covers those cases:
Given
Pattern, that describes strings variations: abc[de[f,g],hk], which gives
abcdef
abcdeg
abchk
Pattern consists of "arrays", that followed by strings: abc[...], and strings adj,kg,q
Another possible more complex example: utvk[fvu,gn[u,k,r],nl,q[t[ij,lo[z,x]],bm]].
Conditions
Strings itself can contain only letters and numbers. There couldn't be abc[h\,k,b] or abc[h\[k,b] that gives abch,k or abch[k.
"Arrays" always not empty, and has at least 2 elements.
There can be any order of "array", or "only string" value, i.e.: abc[a,b[c,d]] or abc[a[b,c],d]. The order is strict from left to right, there can not be from pattern abc[d,e] combinations eabc or dabc.
abc[d,e] doesn't gives abcde nor abced string, only abcd and abce.
Pattern always starts with string with array: something[...].
There can be string without array: abc[a,bc[d,f]], but array without string is not allowed: abc[a,[d,f]].
There can be an empty string, i.e.: a[,b], that gives a and ab
My solution
function getStrings(pat) {
if(pat.indexOf('[') == -1)
return pat;
String.prototype.insert = function(index, string) {
if (index > 0) {
return this.substring(0, index) + string + this.substr(index);
}
return string + this;
};
function getArray(str, start, isSource = false) {
if (start < 0) return null;
var n = 0;
var ret = "";
var i = start;
for (; i < str.length; i++) {
if (str[i] == "[") n++;
else if (str[i] == "]") n--;
if (n == 0) break;
}
var ret = {
str: "",
arr: "",
end: 0,
};
ret.arr = str.slice(start, i) + "]";
ret.end = i;
start--;
var end = start;
for (
;
start > 0 &&
str[start] != "," &&
str[start] != "]" &&
str[start] != "[";
start--
) {}
if(!isSource)
start++;
end++;
ret.str = str.slice(start, end);
return ret;
}
function getElement(source, start) {
var ret = [];
start++;
for (
;
start < source.length && source[start] != "," && source[start] != "]";
start++
)
ret[ret.length] = source[start];
return ret;
}
var source = getArray(pat, pat.indexOf("["), true); // parsing
var ar = source.arr;
source.arrs = getArrays(source); // parsing
source.source = true;
var fi = "";
var temp_out = [];
var tol = 0;
return getVariations(source); // getting variations of parsed
function getVariations(source) {
if (source.arrs == undefined) {
} else
for (var i = 0; i < source.arrs.length; i++) {
if (source.source) fi = source.str;
if (!source.arrs[i].arrs) {
temp_out[tol] = fi + source.arrs[i].str;
tol++;
} else {
var lafi = fi;
fi += source.arrs[i].str;
getVariations(source.arrs[i]);
if(i != source.arrs.length - 1)
fi = lafi;
}
if (source.source && i == source.arrs.length - 1) {
var temp = temp_out;
temp_out = [];
tol = 0;
return temp;
}
}
}
function getArrays(source) {
var la = 1;
var start = 0;
var arrs = [];
if (!source.arr) return;
while (start != -1) {
start = source.arr.indexOf("[", la);
var qstart = source.arr.indexOf(",", la);
if(source.arr[la] == ',')
qstart = source.arr.indexOf(",", la+1);
var pu = false;
if(qstart != la && qstart != -1 && qstart < start && start != -1)
{
pu = true;
var str = source.arr;
var buf = [];
qstart--;
var i = -1;
for(i = qstart; i > 0 && str[i] != '[' && str[i] != ','; i--)
{}
i++;
for(; i < str.length && str[i]!= ','; i++)
{
buf[buf.length] = str[i];
}
if(buf.length == 0)
{
la = start;
alert("1!")
}
else
{
buf = buf.join('');
arrs[arrs.length] = {str:buf};
la += buf.length+1;
}
}
else
if (start != -1) {
arrs[arrs.length] = getArray(source.arr, start);
la = arrs[arrs.length - 1].end + 1;
} else {
start = source.arr.indexOf(",", la);
if (start != -1) {
var ret = getElement(source.arr, start);
arrs[arrs.length] = ret;
la += ret.length;
}
}
}
for (var i = 0; i < arrs.length; i++)
if (typeof arrs[i] != "string" && arrs[i].arr) {
arrs[i].arrs = getArrays(arrs[i]);
var st = arrs[i].arr;
if (occ(arrs[i].arr, "[") == 1 && occ(arrs[i].arr, "]") == 1) {
st = st.replaceAll("[", '["');
st = st.replaceAll("]", '"]');
st = st.replaceAll(",", '","');
st = JSON.parse(st);
for (var j = 0; j < st.length; j++) st[j] = { str: st[j] };
arrs[i].arrs = st;
}
} else if (typeof arrs[i] == "string") {
arrs[i] = { str: arrs[i] };
}
RecursArrs(arrs);
return arrs;
}
function RecursArrs(arrs) {
for (var i = 0; i < arrs.length; i++) {
if (!arrs[i].source)
if (arrs[i].arr) {
delete arrs[i].arr;
delete arrs[i].end;
}
if (!arrs[i].str) {
try{
arrs[i] = { str: arrs[i].join("") };
}catch(er)
{
arrs[i] = {str:''};
}
if (i && arrs[i - 1].str == arrs[i].str) {
arrs.splice(i, 1);
i--;
}
} else if (arrs[i].arrs) RecursArrs(arrs[i].arrs);
}
}
function occ(string, word) {
return string.split(word).length - 1;
}
}
// getStrings('IE5E[COR[R[,G[A,E,I]],S,T,U,V,W,X,Y,Z],EORRG[I,M]]')

I would use a regular expression to break up the input into tokens. In this case I chose to take pairs of (letters, delimiter), where the delimiter is one of "[", "]", ",". The letters part could be empty.
Then I would use a recursive function like you did, but I went for a recursive generator function.
Here is the suggested implementation:
function* getStrings(pattern) {
const tokens = pattern.matchAll(/([^[\],]*)([[\],])?/g);
function* dfs(recur=false) {
let expectToken = true;
while (true) {
const [, token, delim] = tokens.next().value;
if (delim === "[") {
for (const deep of dfs(true)) yield token + deep;
} else {
if (token || expectToken) yield token;
if (delim === "]" && !recur) throw "Invalid pattern: too many `]`";
if (!delim && recur) throw "Invalid pattern: missing `]`";
if (delim !== ",") return;
}
expectToken = delim !== "["; // After [...] we don't expect a letter
}
}
yield* dfs();
}
const input = 'IE5E[COR[R[,G[A,E,I]],S,T,U,V,W,X,Y,Z],EORRG[I,M]]';
for (const s of getStrings(input))
console.log(s);
This implementation should match the patterns according to the given restrictions, but it will also allow the following:
An "array" can start without a prefix of letters. So [a,b] is allowed and will produce the same output as a,b.
An "array" may be followed immediately by letters or a new "array", but this will be interpreted as if they were separated by a comma. So x[a,b]c will be interpreted as x[a,b],c
An "array" can be empty. In that case the array is ignored. So x[] is the same as x.
There is some basic error checking: an error will be generated when the brackets are not balanced.

We can do this in an inside-out fashion. If we replace the innermost group (e.g. 'de[fg]' with its expansion, 'def,deg', and recur until there are no more groups remaining, we will have created a comma-separated list of final strings, which we can simply split apart and return.
const _expand = (
s,
match = s .match (/(.*?)(\w*)\[([^\[\]]+)\](.*)/),
[_, a, b, c, d] = match || []
) => match ? _expand (a + c .split (',') .map (x => b + x) .join (',') + d) : s
const expand = (s) => _expand (s) .split (',')
console .log (expand ('abc[de[f,g],hk]'))
console .log (expand ('utvk[fvu,gn[u,k,r],nl,q[t[ij,lo[z,x]],bm]]'))
.as-console-wrapper {max-height: 100% !important; top: 0}
Our main recursive function -- _expand -- uses a regular expression that extracts the first group, and breaks it into constituent parts, and puts it back together by mapping over the parts of the array. Then our public function, expand simply calls the recursive one and splits the result into an array.
For example, this is how the recursive calls would be handled for the string, 'utvk[fvu,gn[u,k,r],nl,q[t[ij,lo[z,x]],bm]]':
'utvk[fvu,gn[u,k,r],nl,q[t[ij,lo[z,x]],bm]]' //-->
// ^^^^^^^^^
'utvk[fvu,gnu,gnk,gnr,nl,q[t[ij,lo[z,x]],bm]]' //-->
// ^^^^^^^
'utvk[fvu,gnu,gnk,gnr,nl,q[t[ij,loz,lox],bm]]' //-->
// ^^^^^^^^^^^^^^^^^
'utvk[fvu,gnu,gnk,gnr,nl,q[tij,tloz,tlox,bm]]' //-->
// ^^^^^^^^^^^^^^^^^^^
'utvk[fvu,gnu,gnk,gnr,nl,qtij,qtloz,qtlox,qbm]' //-->
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
'utvkfvu,utvkgnu,utvkgnk,utvkgnr,utvknl,utvkqtij,utvkqtloz,utvkqtlox,utvkqbm'
Update: Regex explanation:
The regex used here can be broken down into six sections:
(.*?): captures (non-greedy) an initial set of characters, stored as a
(\w*): captures our letters before an opening brace, stored as b
\[: captures an opening brace ([)
([^\[\]]+): captures everything but braces ([ or ]), stored as c
\]: captures a closing brace (])
(.*): captures everything after the closing brace, stored as d
The point is for the group inside the braces to include no other braces. An example might look like this:
utvk[fvu,gn[u,k,r],nl,q[t[ij,lo[z,x]],bm]]
`----+---'\/|`-+-'|`----------+-----------'
\ | \ \ \__ \
| \ \_ \__ \____ \
a: (.*?) \_ \_ \ \ \
~~~~~ | \ \__ \ \
b: (\w*) | \ \ \
~~~~~ | \ \ \
[: \[ | \ \
~~ | \ \
c: ([^\[\]]+) \ \
~~~~~~~~~~ | |
]: \] |
~~ |
d: (.*)
~~~~

Vanialla solution without recursion:
const expander = /([^,[\]]*?)\[([^[\]]*?)]/;
const parse = (fields) => {
let result = fields;
while (result.match(expander)) {
result = result.replace(expander, (m, p1, p2) => p2.split(',').map((e) => `${p1}${e}`).join(','));
}
return result.split(',');
};
console.log(parse('abc[de[f,g],hk]'));
// => [ 'abcdef', 'abcdeg', 'abchk' ]
console.log(parse('utvk[fvu,gn[u,k,r],nl,q[t[ij,lo[z,x]],bm]]'));
// => [ 'utvkfvu', 'utvkgnu', 'utvkgnk', 'utvkgnr', 'utvknl', 'utvkqtij', 'utvkqtloz', 'utvkqtlox', 'utvkqbm' ]
.as-console-wrapper {max-height: 100% !important; top: 0}
Basically I just took the code from object-fields, which one could use as follows
// const objectFields = require('object-fields');
const parse = (input) => objectFields.split(input.replace(/\[/g, '(').replace(/]/g, ')')).map((e) => e.replace(/\./g, ''));
console.log(parse('abc[de[f,g],hk]'));
// => [ 'abcdef', 'abcdeg', 'abchk' ]
console.log(parse('utvk[fvu,gn[u,k,r],nl,q[t[ij,lo[z,x]],bm]]'));
// => [ 'utvkfvu', 'utvkgnu', 'utvkgnk', 'utvkgnr', 'utvknl', 'utvkqtij', 'utvkqtloz', 'utvkqtlox', 'utvkqbm' ]
.as-console-wrapper {max-height: 100% !important; top: 0}
<script src="https://bundle.run/object-fields#3.0.1"></script>
Disclaimer: I'm the author of object-fields

Let's describe an algorithm in words. Let's define word as a group of consecutive letters without a comma or bracket, which can also be an empty string. Then one way to think about this process is as a stack with two types of entries:
A word.
An opening bracket, [.
As we traverse the string,
(1) push words and opening brackets onto the stack, not commas.
(2a) when we reach a closing bracket, ], we start a list and keep popping the stack, adding words to that list until we pop an opening bracket from the stack. We then (2b) pop the next entry in the stack, which is the prefix for our current list, and (2c) push each entry from the list onto the stack with the prefix prepended.
Finally, return the stack.
Here's an implementation of the algorithm described above.
function f(s) {
if (s.length == 0) {
return [];
}
const stack = [""];
let i = 0;
while (i < s.length) {
if (s[i] == "[") {
i += 1;
stack.push("[", "");
} else if (s[i] == "]") {
i += 1;
const suffixes = [];
while (true) {
const word = stack.pop();
if (word == "[") {
const prefix = stack.pop();
for (let j = suffixes.length - 1; j >= 0; j--) {
stack.push(prefix + suffixes[j]);
}
break;
} else {
suffixes.push(word);
}
}
} else if (s[i] == ",") {
i += 1;
stack.push("");
} else {
stack[stack.length - 1] += s[i];
i += 1;
}
}
return stack;
}
// Output
var s = "a[bp,c[,d]],b[yx,]"
console.log(s);
for (const w of f(s)) {
console.log(w);
}
console.log("");
s = "abc[de[f,g],hk]"
console.log(s);
for (const w of f(s)) {
console.log(w);
}

Here is a recursion free solution using object-scan.
This solution is probably more of academic interest since it uses library internals and I wrote it to satisfy my curiosity whether it could be done this way. Also serves as a head scratcher for #ScottSauyet - payback for his answer which took me a while to figure out =)
Anyways, enjoy!
.as-console-wrapper {max-height: 100% !important; top: 0}
<script type="module">
import objectScan from 'https://cdn.jsdelivr.net/npm/object-scan#18.4.0/lib/index.min.js';
import { compile } from 'https://cdn.jsdelivr.net/npm/object-scan#18.4.0/lib/core/compiler.js';
const parse = (input) => {
const compiled = compile([input.replace(/\[/g, '.{').replace(/]/g, '}')], {});
return objectScan(['++{children[*]}.value'], {
filterFn: ({ parent }) => parent.children.length === 0,
rtn: ({ parents }) => parents.filter((e) => !Array.isArray(e)).map(({ value }) => value).reverse().slice(1).join('')
})(compiled);
};
console.log(parse('abc[de[f,g],hk]'));
// => [ 'abcdef', 'abcdeg', 'abchk' ]
console.log(parse('utvk[fvu,gn[u,k,r],nl,q[t[ij,lo[z,x]],bm]]'));
// => [ 'utvkfvu', 'utvkgnu', 'utvkgnk', 'utvkgnr', 'utvknl', 'utvkqtij', 'utvkqtloz', 'utvkqtlox', 'utvkqbm' ]
</script>
Disclaimer: I'm the author of object-scan

find multiple (2) elements in a string in javascript

I want to return whether one of the conditions exist within findAB(str) as boolean
A 5 letter long string which starts with 'a' ends with 'b'
A 5 letter long string which starts with 'b' ends with 'a'
function findAB(str) {
let lowerSTR = str.toLowerCase()
let indexOFa = lowerSTR.indexOf('a')
let indexOFb = lowerSTR.indexOf('b')
if (lowerSTR[indexOFa + 4] === 'b' || lowerSTR[indexOFb + 4] === 'a') {
return true;
}
return false;
}
I first changed strings into lower case using .toLowerCase() method and defined indexOFa and indexOFb.
I thought simply by doing index of a + 4 will turn out true but in fact it doesn't and cannot figure out what I did wrong.
I also know some method to find elements in an array such as find(), includes(), map() or filter but not sure if I can use it since it is not an array.

If you want to change your approach you can use regex like this
function findAB(str) {
let case1 = str.match(/a[a-z]*b/g); // get all substrings starting with a and ending with b
let case2 = str.match(/b[a-z]*a/g); // get all substrings starting with b and ending with a
case1.forEach((matchedString) => {
if (matchedString.length == 5) {
return true;
}
});
case2.forEach((matchedString) => {
if (matchedString.length == 5) {
return true;
}
});
return false; // return false if no substrings matching condition exists
}
If you need to tell the difference between which case was triggered this could be a cleaner solution to read.

If you want to search for a pattern of 5 characters which starts and ends with (a and b) or (b and a) irrespective of case and any length of string, you can do something like this.
The regex considers that the string "a _ _ _ b" or "b _ _ _ a" can have only alphabets in any case not any other character, if you are accepting any other character you can update [a-zA-Z] part in the regex.
function findAB(str) {
const regexp = /(.?(((a|A)[a-zA-Z]{3}(b|B))|((b|B)[a-zA-Z]{3}(a|A))).?)/;
return regexp.test(str);
}
If it is fixed that length of string should be 5 and you are just dealing with first and last character you can simply use the following function
function findAB(str) {
const lengthOfString = str.length;
if (lengthOfString === 5) {
const firstCharacter = str[0].toLowerCase()
const lastCharacter = str[lengthOfString-1].toLowerCase()
return (firstCharacter === 'a' && lastCharacter === 'b') || (firstCharacter === 'b' && lastCharacter === 'a');
}
// returns false string does not contains 5 characters
return false;
}

Correcting your solution.
The problem with your solution is that indexOf returns -1 if the no match is found. So findAB("bbbbb") will return true, because indexOFa variable will be equal to -1 and lowerSTR[indexOFa + 4] will check the second last character and the condition will evaluate to true. So the following code will solve your problem.
function findAB(str) {
let lowerSTR = str.toLowerCase()
let indexOFa = lowerSTR.indexOf('a')
let indexOFb = lowerSTR.indexOf('b')
if ((indexOFa >= 0 && lowerSTR[indexOFa + 4] === 'b') || (indexOFb >= 0 && lowerSTR[indexOFb + 4] === 'a')) {
return true;
}
return false;
}
Also, I wouldn't prefer hardcoding 4 better use str.length-1.
You could also just simply do it like this
function findAB(str) {
let lowerSTR = str.toLowerCase();
let indexOFa = lowerSTR.indexOf('a');
let indexOFb = lowerSTR.indexOf('b');
if (
(indexOFa === 0 && indexOFb === str.length - 1) ||
(indexOFb === 0 && indexOFa === str.length - 1)
)
return true;
return false;
}

Return the first non-repeating character of a string

In the first chunk of my code I have an ' if ' statement that is not working as it should, and I can't figure out why.
When using the argument 'hous', it should enter the first ' if ' statement and return 0. It returns -1 instead.
var firstUniqChar = function(s) {
for (let i = 0; i < s.length; i++){
let letter = s[i];
// console.log('s[i]: ' + letter);
// console.log(s.slice(1));
// console.log( 'i: ' + i);
if ((i = 0) && !(s.slice(1).includes(letter))) {
return 0;
}
if ((i = s.length - 1) && !(s.slice(0, i).includes(letter))) {
return 1;
}
if(!(s.slice(0, i).includes(letter)) && !(s.slice(i + 1).includes(letter))) {
return 2;
}
}
return -1;
};
console.log(firstUniqChar("hous"));

This is another way you can write your function:
const firstUniqChar = s => [...s].filter(c=>!(s.split(c).length-2))[0] || -1;
console.log(firstUniqChar("hous"));
console.log(firstUniqChar("hhoous"));
console.log(firstUniqChar("hhoouuss"));

Look up method for scattered repeated characters and functional find()-based approach
You may break your input string into array of characters (e.g. using spread syntax ...) and make use of Array.prototype.find() (to get character itserlf) or Array.prototype.findIndex() (to get non repeating character position) by finding the character that is different from its neighbors:
const src = 'hhoous',
getFirstNonRepeating = str =>
[...str].find((c,i,s) =>
(!i && c != s[i+1]) ||
(c != s[i-1] && (!s[i+1] || c != s[i+1])))
console.log(getFirstNonRepeating(src))
.as-console-wrapper{min-height:100%;}
Above will work perfectly when your repeating characters are groupped together, if not, I may recommend to do 2-passes over the array of characters - one, to count ocurrence of each character, and one more, to find out the first unique:
const src = 'hohuso',
getFirstUnique = str => {
const hashMap = [...str].reduce((r,c,i) =>
(r[c]=r[c]||{position:i, count:0}, r[c].count++, r), {})
return Object
.entries(hashMap)
.reduce((r,[char,{position,count}]) =>
((count == 1 && (!r.char || position < r.position)) &&
(r = {char, position}),
r), {})
}
console.log(getFirstUnique(src))
.as-console-wrapper{min-height:100%;}

function nonRepeat(str) {
return Array
.from(str)
.find((char) => str.match(newRegExp(char,'g')).length === 1);
}
console.log(nonRepeat('abacddbec')); // e
console.log(nonRepeat('1691992933')); // 6
console.log(nonRepeat('thhinkninw')); // t

How to find which group was captured in Javascript?

I have a regular expression /(q)|([zZ])|(E)/.
My question is, how to get WHICH group was matched.
So, if I do
"ZqE".replace(/(q)|([zZ])|(E)/g, /* ??? */)
How do I get the output "213"?

You can do something like this, the group which matches will hold the value remaining one will be undefined
let mapper = {
'g1': 1,
'g2': 2,
'g3': 3
}
let final = "ZqE".replace(/(q)|([zZ])|(E)/g, (m, g1, g2, g3) => {
return g1 !== undefined && mapper['g1'] || g2 !== undefined && mapper['g2'] || g3 !== undefined && mapper['g3']
})
console.log(final)

You can use the result array from exec to figure out which group was matched:
let re = /(q)|([zZ])|(E)/g;
while (result = re.exec('ZqE')) {
console.log(result.findIndex((v, i) => i && typeof(v) !== 'undefined'));
}

In the pattern you specify 3 capturing groups which are numbered from 1 to 3.
If you want to get "213" and you know that you want to convert q to 1, zZ to 2 and E to 3, as an alternative you could do that by checking the values of the match using replace.
let result = "ZqE".replace(/[ZzEq]/g, function(m) {
if (m.toLowerCase() === 'z') return 2;
if (m === 'q') return 1;
if (m === 'E') return 3;
});
console.log(result);

Javascript | Dynamic array of Characters based on consecutive letters conditions

I was doing a codewar challange, and couldn't find a solution, but I really want to know how we can solve this problem.
So we getting two integers, let's say N and D and we should return a string containing exactly N letters 'n' and exactlly D letters d with no three consecutive letters being same.
For example if we get N=5 and D=3 we should return "nndnndnd" or "nbnnbbnn" or any other correct answer
another example like if we get N=1 D=4 the only accepted answer should be "ddndd"
What I did was making a helper function like this :
function generateArray (char,q){
let arr= []
for(let i=0; i<q; i++){
arr.push(char)
}
return arr
}
and inside the main function :
function solution(N, D) {
let arrayOfchar = generateArray('n',N)
arrayOfchar.reduce((prev,current,index) => {
for(let i=0; i<D; i++) {
if(prev===current) {
arrayOfchar.splice(index, 0, "d")
}
}
})
}
But I don't know hoe should I put the "d" only after two or less consecutive "n"
Anyone clue?

Rather than creating an entire array of the same character at the very start, I think it would make more sense to create the array piece-by-piece, until N and D come out to 0.
Here's one possible implementation. The general idea is to try to push whichever character count is larger, or if that's not possible due to 3-in-a-row, push the other character, and subtract the appropriate character count by one. Repeat until both counts are 0:
function solution(n, d) {
const arr = [];
function canPush(char) {
const { length } = arr;
return (arr[length - 1] !== char || arr[length - 2] !== char);
}
function push(char) {
arr.push(char);
if (char === 'n') n--;
else if (char === 'd') d--;
}
while (n > 0 || d > 0) {
if (n > d) {
if (canPush('n')) push('n');
else if (d === 0) return console.log('Impossible');
else push('d');
} else if (d >= n) {
if (canPush('d')) push('d');
else if (n === 0) return console.log('Impossible');
else push('n');
}
}
console.log(JSON.stringify(arr));
// return arr;
}
solution(5, 3);
solution(1, 4);
solution(1, 5);
solution(5, 1);
solution(2, 5);
solution(2, 6);
solution(2, 7);

Here is another solution to this interesting problem. The idea is not to go one by one but to figure which one is the larger number and then do an array of pairs of that letter while doing a simple array of the smaller and then just concat them one with another ... so you have 5 and 3 ... nn + d + nn + d + n. 2 pairs of the bigger plus one of the smaller etc.
const fillArray = (length, letter, bigNumber) => {
var arr = []
for(var index=0; index < length; index++) {
arr.push([letter, bigNumber%2 && index+1 === length ? null : letter])
}
return arr;
}
const getString = (n, d) => {
var swtch = d > n, arr = [],
bigger = {number: swtch ? d : n, letter: swtch ? 'd' : 'n', ceil: Math.ceil((swtch ? d : n)/2)},
smaller = {number: swtch ? n : d, letter: swtch ? 'n' : 'd', ceil: Math.ceil((swtch ? n : d)/2)}
if(Math.abs((bigger.number/2) - smaller.number >= 1.5)) {
return 'Not possible with given parameters!'
}
var bigWorkArray = fillArray(bigger.ceil, bigger.letter, bigger.number)
var smallWorkArray = n === d ? fillArray(smaller.ceil, smaller.letter, smaller.number) : Array(smaller.number).fill(smaller.letter)
for(var i=0; i < bigWorkArray.length; i++) {
arr.push(...bigWorkArray[i],...smallWorkArray[i] || '')
}
return arr.join('');
}
console.log(getString(5,3))
console.log(getString(1,4))
console.log(getString(1,5))
console.log(getString(5,1))
console.log(getString(2,5))
console.log(getString(2,6))
console.log(getString(2,7))

Develop Reference

JavaScript is the programming language of the Web.

Determine the shortest possible match of a regular expression - javascript

Related

Improving combinations from abc[d[e,f],gh] pattern algorithm

find multiple (2) elements in a string in javascript

Return the first non-repeating character of a string

How to find which group was captured in Javascript?

Javascript | Dynamic array of Characters based on consecutive letters conditions

Categories

Resources