Javascript regular expression to capture every possible mathematical operation between parenthesis - javascript

I am trying to capture mathematical expressions between parenthesis in a string with javascript. I need to capture parenthesis that ONLY include numbers and mathematical operators [0-9], +, - , *, /, % and the decimal dot. The examples below demonstrate what I am after. I managed to get close to the desired result but the nested parenthesis always screw my regex up so I need help! I also need it to look globally, not for first occurence only.
let string = "If(2>1,if(a>100, (-2*(3-5)(8-2)), (1+2)), (3(1+2)) )";
What I want to do if possible is manage to transform this syntax
if(condition, iftrue, iffalse)
to this syntax
if(condition) { iftrue } else { iffalse }
so that it can be evaluated by javascript and previewed in the browser. I have done it so far but if the iftrue or iffalse blocks contain parenthesis, everything blows up! So I m trying to capture that parenthesis and calculate before transforming the syntax. Any advice is appreciated.
The closest i got was this /[\d()+-*/.]/g which gets whats i want but in this example
(1+2) (1 < 1) sdasdasd (1*(2+3))
instead of dismissing the (1<1) group entirelly it matches (1 and 1). My ideal scenario would be
(1+2) (1<1) sdasdasd (1*(2+3))
Another example:
let codeToEval = "if(a>10, 2, 2*(b+4))";
codeToEval is the passed in a function that replaces a and b with the correct values so it ends up like this.
codeToEvalAfterReplacement = "if(5>10,2,2*(5+4))";
And now I want to transform this in
if(5>10) {
2
} else {
2*(5+4)
}
so it can be evaluated by javascript eval() and eventually previewed to the users.

Your current regex /[\d()+-*/.]/g will match single characters from the class
but multiple times because of the g flag, this is why (1 and 1) are still matched
in (1 < 1).
Based on your pattern requirements I would change it to /\([-+*/%.0-9()]+\)/g.
This will match parentheses containing one or more of the characters you describe within them.
Note that your current pattern has a - somewhere in the middle of a class which can lead to weird behaviours because some regex engines will treat +-* within a class as a range (plus through asterisk, which is a stange range). Notice I put - at the start of the class in the new pattern so it matches an actual -.
I've assumed there will be no empty parentheses (), if there are you can change + (one or more) after ] to * (zero or more)
The g flag is still added so you match every one of such expressions.
I can't say with 100% certainty that the new regex will allow you to robustly transform the syntax you state, as it depends on the complexity of the 'iftrue' and 'iffalse' code blocks. See if you can make it work with the new pattern, otherwise you may want to look into other solutions for parsing code.

Call function in if parenthesis and all conditions in that function.
if(test()){
// if code
}else{
// else code
}
function test(){
// check both cases here
if(case 1 && case 2){
return true
}
return false;
}

Related

JavaScript regex inline validation for basic calculation string with one operator

I've written a basic 2 operand calculator app (+ - * /) that uses a couple of inline regex validations to filter away invalid characters as they are typed.
An example looks like:
//check if operator is present
if(/[+\-*\/]/.test(display1.textContent)){
//validate the string each time a new character is added
if(!/^\d+\.?\d*[+\-*\/]?\d*\.?\d*$/.test(display1.textContent)){
console.log('invalid')
return false
}
//validate the string character by character before operator
} else {
if(!/^\d+\.?\d*$/.test(display1.textContent)){
console.log('invalid')
return false
}
}
In the above, a valid character doesn't return false:
23.4x0.00025 (no false returned and hence the string is typed out)
But, if an invalid character is typed the function returns false and the input is filtered away:
23.4x0.(x) x at the end returns a false so is filtered (only one operator allowed per calculation)
23.4x0. is typed
It works pretty well but allows for the following which I would like to deal with:
2.+.1
I would prefer 2.0+0.1
My regex would need an if-then-else conditional stating that if the current character is '.' then the next character must be a number else the next char can be number|.|operator. Or if the current character is [+-*/] then the next character must be a number, else the next char can be any char (while following the overall logic).
The tricky part is that the logic must process the string as it is typed character by character and validate at each addition (and be accurate), not at the end when the string is complete.
if-then-else regex is not supported in JavaScript (which I think would satisfy my needs) so I need to use another approach whilst remaining within the JS domain.
Any suggestions about this specific problem would be really helpful.
Thanks
https://github.com/jdineley/Project-calculator
Thanks #trincot for the tips using capturing groups and look around. This helped me write what I needed:
https://regex101.com/r/khUd8H/1
git hub app is updated and works as desired. Now just need to make it pretty!
For ensuring that an operator is not allowed when the preceding number ended in a point, you can insert a positive look behind in your regex that requires the character before an operator to always be a digit: (?<=\d)
Demo:
const validate = s => /^(\d+(\.\d*)?((?<=\d)[+*/-]|$))*$/.test(s);
document.querySelector("input").addEventListener("input", function () {
this.style.backgroundColor = validate(this.value) ? "" : "orange";
});
Input: <input>

Mathematical Formula - Variable Substitution with Regex

I'm working on a Javascript function that evaluates a user-entered string as a mathematical formula.
For example, the user may type in 1 + 1, and the function will evaluate it to 2. I'm using a library to do this, so the math and syntax is handled for that already. However, I have variables that the user can reference. The user can create a number variable, give it a name (of their choosing), and reference it in the equation. Assume the user writes 1 + counter, the math eval library obviously doesn't know what counter is, so I am using regular expressions to pre-process the formula. The preprocessing function will see counter, lookup its value, and replace it with the literal. So if the user had set counter to 3 elsewhere, my function will take 1 + counter, replace counter with 3 to get 1 + 3, and then send the formula to the math evaluation library.
The issue I'm having is writing a function that processes this using regular expressions.
I'm starting with the regular expression ([^A-Za-z0-9])counter($|[^A-Za-z0-9]), which matches counter only if there is NOT another alphanumeric character on either side of it. For example, the user may type in counter2 at some point, and I want to make sure that counter2 is looked up, but that counter would not match. Secretely, to improve performance, I actually loop through variables, generate regular expressions for them, and match them that way. Some may not match at all, but it runs in O(n) rather than having to search through a list of variables for every reference in the array. In other words, I don't build a syntax tree or anything, so if I had the variables counter and counter2, I would generate regex for each and try to match them, hence if the formula was counter2, the function still tries a match for counter and counter2, but only counter2 should match.
The code I'm using is as follows:
var re = new RegExp(`(^|[^A-Za-z0-9])${variableName}($|[^A-Za-z0-9])`, "g");
let match = re.exec(formula);
while (match !== null) {
// If "+counter+" is matched, I have to make sure that the +'s remain, hence replacing on the match
var sub = match[0].replace(`${variableName}`, `{${variableValue}}`);
formula = formula.replace(match[0], sub)
re.lastIndex = 0; // just reset to the start for now
match = re.exec(formula);
}
// Pass to math library next
This works in most cases but I have the following issue:
For the formula counter+counter, only the first counter+ matches, when both should match.
So, what I need is basically regular expression/function that does the following:
Take a variable name
Replace all occurences of it as long as the occurences don't have a alphanumeric character in front or back. So if I'm matching counter against a formula, +counter+ would match (+ aren't alphanumeric), + counter would match (space isn't alphanumeric), but counter2 wouldn't match, because it's a different variable name entirely, and 2 is alphanumeric.
Any ideas? I'm trying to do this the right way, I imagine there can be many unknown side effects if I don't do this correctly.
Thanks for the help!
You may use a lookahead at the end, (?=$|[^A-Za-z0-9]) instead of a ($|[^A-Za-z0-9]) capturing group, and shrink the code to a greater extent if you just use replace:
var re = new RegExp(`(^|[^A-Za-z0-9])${variableName}(?=\$|[^A-Za-z0-9])`, "g");
formula = formula.replace(re, "$1"+variableValue)
Note the $1 in the replacement part is the backreference to the value stored in Group 1, that is, start of string or any char but an ASCII alphanumeric (captured with (^|[^A-Za-z0-9])).

Is there an efficient way to test whether a string contains non-overlapping substrings to match the array of regular expressions?

All I want is to test whether a string contains non-overlapping substrings to match the array of regexes in the following way: if a substring matches some item of the array, remove the corresponding regex from the array, and continue. I will need a function func(arg1, arg2) that will take two arguments: the first one is the string itself, and the second one is an array of regular expressions to test.
I've read some explanations (such as Regular Expressions: Is there an AND operator?), but they do not answer this specific question. For example, in Javascript, the following three code snippets will return true:
/(?=ab)(?=abc)(?=abcd)/gi.test("eabzzzabcde");
/(?=.*ab)(?=.*abc)(?=.*abcd)/gi.test("eabzzzabcde");
/(?=.*?ab)(?=.*?abc)(?=.*?abcd)/gi.test("eabzzzabcde");
which is, obviously, not what I want (because "abc" and "abcd" in "eabzzzabcde" are just mixed together in an overlapping way). So, func("eabzzzabcde", [/ab/gi, /abc/gi, /abcd/gi]) should return false.
But, func("Fhh, fabcw wxabcdy yz... zab.", [/ab/gi, /abc/gi, /abcd/gi]) should return true because none of "ab", "abc" and "abcd" substrings overlap each other. The logic is the following. We have an array of regexes: [/ab/gi, /abc/gi, /abcd/gi], and some possible combination of three (where 3 is equal to the length of that array) non-overlapping, separate substrings of the original string: fabcw, xabcdy and zab. Does fabcw match /abc/gi? Yes. Okay, we remove /abc/gi from the array, and we have [/ab/gi, /abcd/gi] for xabcdy and zab. Does xabcdy match /abcd/gi? Yes. Okay, we remove /abcd/gi from the current array, and we have [/ab/gi] for zab. Does zab. match /ab/gi? Yes. No more regexes left in the current array, and we always answered "yes", so — return true.
The tricky part here is to find an efficient (such that performance is not too terrible) way to get at least one possible “good” combination of non-overlapping substrings.
The more complex case is e.g. func("acdxbaab ababaacb", [/.*?a.*?b.*?c/gi, /.*?c.*?b.*?a/gi]). Using the logic described above, we can see that if we take two non-overlapping parts of the original string — "acdxba" (or "cdxba") and "abaac" (or "abaacb", "babaac" etc.) — the first one matches /.*?c.*?b.*?a/gi, and the second one matches /.*?a.*?b.*?c/gi. So, func("acdxbaab ababaacb", [/.*?a.*?b.*?c/gi, /.*?c.*?b.*?a/gi]) should return true.
Is there any efficient way to solve such a problem?
Assuming each pattern should match exactly once, then we can construct a regexp of all of their permutations:
const patterns= ['ab', 'abc', 'abcd'];
const input = "Fhh, fabcw wxabcdy yz... zab.";
// Create a regexp of the form
// (.*?ab.*?abc.*?abcd.*?)
function build(patterns) {
return `(${['', ...patterns, ''].join('.*?')})`;
}
function match(input, patterns) {
const regexps = [...permute(patterns)].map(build);
// Create a regexp of the form
// /(.*?ab.*?abc.*?abcd.*?)|(.*?ab.*?abcd.*?abc.*?)|.../
const regexp = new RegExp(regexps.join('|'));
return regexp.test(input);
}
// Simple permutation generator.
function *permute(a, n = a.length) {
if (n <= 1) yield a.slice();
else for (let i = 0; i < n; i++) {
yield *permute(a, n - 1);
const j = n % 2 ? 0 : i;
[a[n-1], a[j]] = [a[j], a[n-1]];
}
}
console.log(match(input, patterns));
This will result in a very long regexp if there are more than a half-dozen or so patterns. To deal with this, we can test each permutation one at a time:
function match(input, patterns) {
return Array.from(permute(patterns))
.some(perm => input.match(build(perm)));
}
If there are ten patterns, we will end up doing a couple million tests.
Disclaimers
This uses ES6 features. Fall back to equivalent ES5 syntax if you need to.
The input patterns here are strings. To handle regexps instead would require a little bit of logic to extract the pattern from the regexp, and also escape any special regexp characters appearing in it.
Is there an efficient way to test whether a string contains non-overlapping substrings to match the array of regular expressions?
I doubt that you would call the above solution "efficient", but I don't know if there is a more efficient one. As far as I can see, any approach to this problem is going to involve backtracking. You could match the first nine of ten patterns, and then discover that the last one won't match because one of the earlier nine greedily ate up part of what the tenth needed, even though it could have matched itself somewhere later in the string. Therefore, I will go out on a limb and say that this problem is intrinsically of order O(n!).

How to look for a pattern that might be missing some characters, but following a certain order?

I am trying to make a validation for "KQkq" <or> "-", in the first case, any of the letters can be missing (expect all of them, in which case it should be "-"). The order of the characters is also important.
So quick examples of legal examples are:
-
Kkq
q
This is for a Chess FEN validation, I have validated the first two parts using:.
var fen_parts = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1";
fen_parts = fen_parts.split(" ");
if(!fen_parts[0].replace(/[1-8/pnbrqk]/gi,"").length
&& !fen_parts[1].replace(/[wb]/,"").length
&& !fen_parts[2].replace(/[kq-]/gi,"").length /*not working, allows KKKKKQkq to be valid*/
){
//...
}
But simply using /[kq-]/gi to validate the third part allows too many things to be introduced, here are some quick examples of illegal examples:
KKKKQkq (there is more than one K)
QK (order is incorrect)
You can do
-|K?Q?k?q?
though you will need to do a second test to ensure that the input is not empty. Alternatively, using only regex:
KQ?k?q?|Qk?q?|kq?|q|-
This seems to work for me...
^(-|(K)?((?!\2)Q)?((?!\2\3)k)?((?!\2\3\4)q)?)$
A .match() returns null if the expression did not match. In that case you can use the logical OR to default to an array with an empty-string (a structure similar to the one returned by .match() on a successful match), which will allow you to check the length of the matched expression. The length will be 0 if the expression did not match, or K?Q?k?q? matched the empty string. If the pattern matches, the length will be > 0. in code:
("KQkq".match(/^(?:K?Q?k?q?|-)$/) || [""])[0].length
Because | is "stronger" than you'd expect, it is necessary to wrap your actual expression in a non-capturing group (?:).
Having answered the question, let's have a look at the rest of your code:
if (!fen_parts[0].replace(/[1-8/pnbrqk]/gi,"").length)
is, from the javascript's perspective equivalent to
if (!fen_parts[0].match(/[^1-8/pnbrqk]/gi))
which translates to "false if any character but 1-8/pnbrqk". This notation is not only simpler to read, it also executes faster as there is no unnecessary string mutation (replace) and computation (length) going on.

Using Regular Expressions with Javascript replace method

Friends,
I'm new to both Javascript and Regular Expressions and hope you can help!
Within a Javascript function I need to check to see if a comma(,) appears 1 or more times. If it does then there should be one or more numbers either side of it.
e.g.
1,000.00 is ok
1,000,00 is ok
,000.00 is not ok
1,,000.00 is not ok
If these conditions are met I want the comma to be removed so 1,000.00 becomes 1000.00
What I have tried so is:
var x = '1,000.00';
var regex = new RegExp("[0-9]+,[0-9]+", "g");
var y = x.replace(regex,"");
alert(y);
When run the alert shows ".00" Which is not what I was expecting or want!
Thanks in advance for any help provided.
strong text
Edit
strong text
Thanks all for the input so far and the 3 answers given. Unfortunately I don't think I explained my question well enough.
What I am trying to achieve is:
If there is a comma in the text and there are one or more numbers either side of it then remove the comma but leave the rest of the string as is.
If there is a comma in the text and there is not at least one number either side of it then do nothing.
So using my examples from above:
1,000.00 becomes 1000.00
1,000,00 becomes 100000
,000.00 is left as ,000.00
1,,000.00 is left as 1,,000.00
Apologies for the confusion!
Your regex isn't going to be very flexible with higher orders than 1000 and it has a problem with inputs which don't have the comma. More problematically you're also matching and replacing the part of the data you're interested in!
Better to have a regex which matches the forms which are a problem and remove them.
The following matches (in order) commas at the beginning of the input, at the end of the input, preceded by a number of non digits, or followed by a number of non digits.
var y = x.replace(/^,|,$|[^0-9]+,|,[^0-9]+/g,'');
As an aside, all of this is much easier if you happen to be able to do lookbehind but almost every JS implementation doesn't.
Edit based on question update:
Ok, I won't attempt to understand why your rules are as they are, but the regex gets simpler to solve it:
var y = x.replace(/(\d),(\d)/g, '$1$2');
I would use something like the following:
^[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)$
[0-9]{1,3}: 1 to 3 digits
(,[0-9]{3})*: [Optional] More digit triplets seperated by a comma
(\.[0-9]+): [Optional] Dot + more digits
If this regex matches, you know that your number is valid. Just replace all commas with the empty string afterwards.
It seems to me you have three error conditions
",1000"
"1000,"
"1,,000"
If any one of these is true then you should reject the field, If they are all false then you can strip the commas in the normal way and move on. This can be a simple alternation:
^,|,,|,$
I would just remove anything except digits and the decimal separator ([^0-9.]) and send the output through parseFloat():
var y = parseFloat(x.replace(/[^0-9.]+/g, ""));
// invalid cases:
// - standalone comma at the beginning of the string
// - comma next to another comma
// - standalone comma at the end of the string
var i,
inputs = ['1,000.00', '1,000,00', ',000.00', '1,,000.00'],
invalid_cases = /(^,)|(,,)|(,$)/;
for (i = 0; i < inputs.length; i++) {
if (inputs[i].match(invalid_cases) === null) {
// wipe out everything but decimal and dot
inputs[i] = inputs[i].replace(/[^\d.]+/g, '');
}
}
console.log(inputs); // ["1000.00", "100000", ",000.00", "1,,000.00"]

Categories

Resources