Comparing two RegEx objects in Node.js

Comparing two RegEx objects in Node.js - javascript

I'm using NodeRED to perform some logic on a string which has been created from image analysis (OCR) on Microsoft Azure Cognitive Services. The image analysis doesn't allow for any pattern matching / input pattern.
The resulting string (let's call it 'A') sometimes interprets characters slightly incorrectly, typical things like 'l' = '1' or 's' = '5'.
The resulting string can be one of only a few different formats, for argument sake lets say:
[a-z]{4,5}
[a-g]{3}[0-9]{1,2}
[0-9][a-z]{4}
What I need to do is determine which format the intepretted string ('A') most closely aligns to ('1', '2' or '3'). Once I establish this, I was planning to adjust the misinterpretted characters and hopefully be left with a string that is (near) perfect.
My initial plan was to convert 'A' into RegEx - so if 'A' came back as "12345", I would change this to a RegEx object [1|l][2|z]34[5|s], compare this object to the RegEx objects and hopefully one would come back as a match.
In reality, the interpretted string is more like 8 alphanumeric and five different (fairly complex) RegEx possibilities, but I've tried to simplify the problem for the purposes of this question.
So the question: is it possible to compare RegEx in this way? Does anyone have any other suggestions on how this image analysis could be improved?
Thanks

Here is a solution using a Cartesian product to compare a string for possible matches. Test string is 'abclz', which could match pattern1 or pattern2:
const cartesian = (...a) => a.reduce((a, b) => a.flatMap(d => b.map(e => [d, e].flat())));
const charMapping = {
'1': ['1','l'],
'l': ['1','l'],
'2': ['2','z'],
'z': ['2','z'],
'5': ['5','s'],
's': ['5','s']
};
const buckets = {
pattern1: /^[a-z]{4,5}$/,
pattern2: /^[a-g]{3}[0-9]{1,2}$/,
pattern3: /^[0-9][a-z]{4}$/
};
const input = 'abclz';
console.log('input:', input);
let params = input.split('').map(c => charMapping[c] || [c]);
let toCompare = cartesian(...params).map(arr => arr.join(''));
console.log('toCompare:', toCompare);
let potentialMatches = toCompare.flatMap(str => {
return Object.keys(buckets).map(pattern => {
let match = buckets[pattern].test(str);
console.log(str, pattern + ':', match);
return match ? str : null;
}).filter(Boolean);
});
console.log('potentialMatches:', potentialMatches);
Output:
input: abclz
toCompare: [
"abc12",
"abc1z",
"abcl2",
"abclz"
]
abc12 pattern1: false
abc12 pattern2: true
abc12 pattern3: false
abc1z pattern1: false
abc1z pattern2: false
abc1z pattern3: false
abcl2 pattern1: false
abcl2 pattern2: false
abcl2 pattern3: false
abclz pattern1: true
abclz pattern2: false
abclz pattern3: false
potentialMatches: [
"abc12",
"abclz"
]

Related

adding indicators into a string according to different case

I will receive an array of string-like below.
In each string, there may be three signs: $,%,* in the string
For example,
“I would $rather %be $happy, %if working in a chocolate factory”
“It is ok to play tennis”
“Tennis $is a good sport”
“AO is really *good sport”
However, there may be no signs in it, maybe only one sign in it.
There are only five cases in string,
1. no sign at all,
2. having $,% ;
3. having only $,
4 having only %,
5 having only *
If there is no sign, I don’t need to process it.
Otherwise, I need to process it and add an indicator to the left of the first sign that occurs in the sentence.
For example:
“I would ---dollorAndperSign—-$rather %be $happy, %if working in a chocolate factory”
“Tennis --dollorSign—-$is a good sport”
This is my idea code.
So, I need to decide if the string contains any sign. If there is no sign, I don’t need to process it.
texts.map((text) => {
if (text.includes("$") || text.includes("%") || text.includes("*")) {
//to get the index of signs
let indexOfdollar, indexOfper, indexOfStar;
indexOfdollar = text.indexOf("$");
indexOfper = text.indexOf("%");
indexOfStar = text.indexOf("*");
//return a completed process text
}
});
Question:
how do I know which index is the smallest one in order to locate the position of the first sign occurring in the text? Getting the smallest value may not be the correct approach coz there may be the case that I will get -1 from the above code?

I focussed only on the "get the smallest index" part of your question... Since you will be able to do what you want with it after.
You can have the indexOf() in an array, filter it to remove the -1 and then use Math.min() to get the smallest one.
Edited to output an object instead, which includes the first index and some booleans for the presence each char.
const texts = [
"I would $rather %be $happy, %if working in a chocolate factory",
"It is ok to play tennis",
"Tennis $is a good sport",
"AO is really *good sport"
]
const minIndexes = texts.map((text,i) => {
//to get the signs
const hasDollard = text.indexOf("$") >= 0
const hasPercent = text.indexOf("%") >= 0
const hasStar = text.indexOf("*") >= 0
//to get the first index
const indexes = [text.indexOf("$"), text.indexOf("%"), text.indexOf("*")].filter((index) => index >= 0)
if(!indexes.length){
return null
}
return {
index: Math.min( ...indexes),
hasDollard,
hasPercent,
hasStar
}
});
console.log(minIndexes)

const texts = [
"I would $rather %be $happy, %if working in a chocolate factory",
"It is ok to play tennis",
"Tennis $is a good sport",
"AO is really *good sport"
]
texts.forEach(text => {
let sighs = ["%","$","*"];
let chr = text.split('').find(t => sighs.find(s => s==t));
if (!chr)
return;
text = text.replace(chr, "---some text---" + chr);
console.log(text);
})

const data = ['I would $rather %be $happy, %if working in chocolate factory', 'It is ok to play tennis', 'Tennis $is a good sport', 'AO is really *good sport']
const replace = s => {
signs = { $: 'dollar', '%': 'per', '*': 'star' },
characters = Array.from(s, (c,i)=> '$%*'.includes(c)? c:'').join('')
headText = [...new Set(Array.from(characters))].map(c => signs[c]).join('|')
s.replace(/[\$\%\*]/, `--${text}--$&`);
}
const result = data.map(replace)

Why regExp has diff results in diff senarios?

Simply speaking, in either node.js or in the browser, run the code below:
const sep = '\\';
const regExpression = `/b\\${sep}|a\\${sep}/`;
const testCases = ['a\\abb\\abc','b\\'];
const regTest = new RegExp(regExpression);
console.log(`Result for ${testCases[0]} is ${regTest.test(testCases[0])}`)
console.log(`Result for ${testCases[1]} is ${regTest.test(testCases[1])}`)
Both of the outputs are false:
error
however, if I change to this:
const regExpression = `/c|b\\${sep}|a\\${sep}/`;
Both of the results will be true....why?
right
Another interesting thing is: The matching condition cannot be always the first, which takes '/c|b\${sep}|a\${sep}/' as an example, 'c' will NOT match.....

Is because of the regex itself.
const regExpression = "/test/";
const regTest = new RegExp(regExpression);
console.log(regTest); // Regex: //test//
console.log(regTest.test("test")) // false
console.log(regTest.test("/test/")) // true
In the first case /b\\\\|a\\\\/ -> regex -> //b\\|a\\//. The regex will try to find /b\\ or a\\/. So will fail in both values.
'a\\abb\\abc' => FALSE
'b\\' => FALSE
'a\\/abb\\abc' => TRUE (a\\/ coincidence)
'/b\\' => TRUE (/b\\ coincidence)
In the second case /c|b\\\\|a\\\\/ -> regex-> /c|b\\|a\\/. The regex will try to find /c or b\\ or a\\/.
'a\\abb\\abc' => TRUE (b\\ coincidence)
'b\\' => TRUE (b\\ coincidence)
So, in conclusion you could solve your problem with:
const regExpression = `b\\${sep}|a\\${sep}`;
This should try to find b\\ or a\\. I don't know if this is the case but remember the ^ and $ regex tokens too. You could make your tests in regex101.

How to safely convert a comma-separated string to an array of numbers

Edit:
Why this is not a duplicate
This question specifically addresses the problem that empty strings as well as any number of space characters are converted to 0 using Number(str).
Take a look at the following snippet:
convertBtn.addEventListener('click', () => console.log(toArrayOfNumber(foo.value)))
const toArrayOfNumber = str => str.split(',').map(Number);
<input type="text" id="foo" value="1,2,4,-1" />
<button type="button" id="convertBtn">convert to array of numbers</button>
As long as the input has a proper value everything works fine. Now I want to make it failsafe for the following values:
,
, ,,
What caught me off guard here is that Number("") and Number(" ") both return 0 (which I don't want as for my usecase I don't want "" or any number of spaces to be considered a Number).
So I came up with this:
convertBtn.addEventListener('click', () => console.log(toArrayOfNumber(foo.value)))
const toArrayOfNumber = str => str.split(',').filter(x => x.trim() !== "").map(Number);
<input type="text" id="foo" value="1,2,4,-1,,, ,, 11" />
<button type="button" id="convertBtn">convert to array of numbers</button>
This feels awkward, I'm hoping there is an obvious and better solution which I don't see.

Answer from #KendallFrey (who is refusing to post so I'm stealing his solution)
'1,2,4,-1,,, ,, 11,0,0'.split(/[, ]+/).map(x=>+x)
You can still use .map(Number) but x=>+x is 1 byte shorter.
Results in the console: (7) [1, 2, 4, -1, 11, 0, 0]
Another regex solution (that doesn't allow decimals): /-?\d+/g

If you're only dealing with integers the following will also work. 0's will be kept but empty values will be removed.
split(',').filter(num => !isNaN(parseInt(num))).map(Number);
Example
const str = '1,2,4,-1,,, ,0, 11';
console.log(str.split(',').filter(num => !isNaN(parseInt(num))).map(Number));

Try this:
const value="-99,1,2,4,-1,,, ,0,, 11"
const toArrayOfNumber = str => str.split(',').map(num => num.trim() && Number(num)).filter(num => !Number.isNaN(num) && typeof num != 'string');
const nums = toArrayOfNumber(value);
nums.forEach(num => console.log(typeof num, num));
We use the results of trim to determine if we should process it like a number. If not then we just have a string and that is filtered out.

You could also do this via Array.reduce where in each iteration you check with isNaN for the parseInt:
let data = "1,2,4,-1,,, ,0, 11"
let r = data.split(',').reduce((r,c) => isNaN(parseInt(c)) ? r : [...r, +c],[])
console.log(r)
This way you only iterate over the array once post splitting. Also this would keep zeros intact and just drop any non numbers.

In Node/Javascript, how do I map cardinality from string to int?

For example, I have user input any string: "1st", "2nd", "third", "fourth", "fifth", "9999th", etc. These are just examples, the user can input any string.
I want to map this to integer cardinality:
"1st" -> 0
"2nd" -> 1
"third" -> 2
"fourth" -> 3
"fifth" -> 4
"9999th" -> 9998
So I need some kind of function where:
function mapCardinality(input: string): number{
let numberResult:number = ??
return numberREesult;
}
and I can call it like this:
console.log(
mapCardinality("1st"), // print 0
mapCardinality("2nd"), // print 1
mapCardinality("third"), // print 2
mapCardinality("fourth"), // print 3
mapCardinality("fifth"), // print 4
mapCardinality("9999th") // print 9998
);

Just look it up in an array or parse it as number:
const mapCardinality = c => {
const pos = ["1st", "2nd", "third", "fourth", "fifth"].indexOf(c);
return pos === -1 ? parseInt(c, 10) - 1 : pos;
};

I'd first ask what are the suffixes for all of the inputs?
'nd', 'rd', 'st', 'th' (most numbers)
If they enter an integer with the above prefixes then you could write the following function:
const getInteger = input => input.slice(0, -2);
const num = getInteger('999th');
console.log(num); // prints "999"
If they enter the elongated variant, it becomes much more complex, especially when it comes to typos, lack of spaces, etc. One way could be to map single digit words ('one', 'two', etc), tens ('ten', 'twenty', etc'), hundreds, thousands, and so on instead of every number imaginable. I would then parse and find matching words to give a result. That being said it is still limiting. I would strongly suggest limiting user input formats. Why can't the user input an integer?
const cardinalDictionary = {
'zero': 0,
'one': 1,
...,
'twenty',
...,
'hundred': 100,
'thousand': 1000,
};

JavaScript - Matching alphanumeric patterns with RegExp

I'm new to RegExp and to JS in general (Coming from Python), so this might be an easy question:
I'm trying to code an algebraic calculator in Javascript that receives an algebraic equation as a string, e.g.,
string = 'x^2 + 30x -12 = 4x^2 - 12x + 30';
The algorithm is already able to break the string in a single list, with all values on the right side multiplied by -1 so I can equate it all to 0, however, one of the steps to solve the equation involves creating a hashtable/dictionary, having the variable as key.
The string above results in a list eq:
eq = ['x^2', '+30x', '-12', '-4x^2', '+12x', '-30'];
I'm currently planning on iterating through this list, and using RegExp to identify both variables and the respective multiplier, so I can create a hashTable/Dictionary that will allow me to simplify the equation, such as this one:
hashTable = {
'x^2': [1, -4],
'x': [30, 12],
' ': [-12]
}
I plan on using some kind of for loop to iter through the array, and applying a match on each string to get the values I need, but I'm quite frankly, stumped.
I have already used RegExp to separate the string into the individual parts of the equation and to remove eventual spaces, but I can't imagine a way to separate -4 from x^2 in '-4x^2'.

You can try this
(-?\d+)x\^\d+.
When you execute match function :
var res = "-4x^2".match(/(-?\d+)x\^\d+/)
You will get res as an array : [ "-4x^2", "-4" ]
You have your '-4' in res[1].
By adding another group on the second \d+ (numeric char), you can retrieve the x power.
var res = "-4x^2".match(/(-?\d+)x\^(\d+)/) //res = [ "-4x^2", "-4", "2" ]
Hope it helps

If you know that the LHS of the hashtable is going to be at the end of the string. Lets say '4x', x is at the end or '-4x^2' where x^2 is at end, then we can get the number of the expression:
var exp = '-4x^2'
exp.split('x^2')[0] // will return -4
I hope this is what you were looking for.

function splitTerm(term) {
var regex = /([+-]?)([0-9]*)?([a-z](\^[0-9]+)?)?/
var match = regex.exec(term);
return {
constant: parseInt((match[1] || '') + (match[2] || 1)),
variable: match[3]
}
}
splitTerm('x^2'); // => {constant: 1, variable: "x^2"}
splitTerm('+30x'); // => {constant: 30, variable: "x"}
splitTerm('-12'); // => {constant: -12, variable: undefined}
Additionally, these tool may help you analyze and understand regular expressions:
https://regexper.com/
https://regex101.com/
http://rick.measham.id.au/paste/explain.pl

Develop Reference

JavaScript is the programming language of the Web.

Comparing two RegEx objects in Node.js - javascript

Related

adding indicators into a string according to different case

Why regExp has diff results in diff senarios?

How to safely convert a comma-separated string to an array of numbers

In Node/Javascript, how do I map cardinality from string to int?

JavaScript - Matching alphanumeric patterns with RegExp

Categories

Resources