How do I combine 2 regex patterns into 1 and use it within a function - javascript

I have a regEx for checking a number is less than 15 significant figures, Borrowed from this SO answer
/^-?(?=\d{1,15}(?:[.,]0+)?0*$|(?:(?=.{1,16}0*$)(?:\d+[.,]\d+)‌​)).+$/
The the other is used to check that same number is upto 2 decimal places(truncate)
/^-?(\d*\.?\d{0,2}).*/
I have almost 0 regex skill.
Question: How do I combine the 2 regexes to do the work of both, AND not just either OR( accomplished by | character - i am not sure if it achieves same function as combining both)
something like:
/^-?(?=\d{1,15}(?:[.,]0+)?0*$|(?:(?=.{1,16}0*$)(?:\d+[.,]\d+)‌​)).+$ <AND&&NOTOR>(\d*\.?\d{0,2}).*/
Thanks in advance
EDIT: edit moved to a seperate SO question

If you add only one condition of maximum 2 decimal places to first regex, try this..
^-?(?=\d{1,15}(?:[.,]0+)?0*$|(?:(?=[,.\d]{1,16}0*$)(?:\d+[.,]\d{1,2}$))).+$
Demo,,, in which I only changed original \d+ to d{1,2}$
Edited for the reguest to extract 15 significant figures and capture group 1 ($1). Try this which is wrapped to capture group 1 ($1) and limited 15 significant figures to be extracted easily.
^(-?(?=\d{1,15}(?:[.,]0+)?0*$|(?:(?=[,.\d]{1,16}0*$)(?:\d+[.,]\d{1,2}$))).{1,16}).*$
Demo,,, in which changed to .{1,16} from .+$.
If the number matches, then able to be replaced $1, but if not so, replaced nothing, thus remains original unmatched number.
Therefore, if you want to extract 15 significant figures by replacing with $1 only when your condition is satisfied, try this regex to your function.
^(-?(?=\d{1,15}(?:[.,]0+)?0*$|(?:(?=[,.\d]{1,16}0*$)(?:\d+[.,]\d{1,2}$))).{1,16}).*$|^.*$
Demo,,, in which all numbers are matched, but only the numbers satisfying your condition are captured to $1 in format of 15 significant figures.

Related

Regex for custom decimal and thousand separator

I am using the below regex the handle the custom thousand separator which could be any of the , or . or space character which works for the thousand separator and not for the decimal indicator.
I am trying to add a new capturing group to handle decimal indicator (, or .) with maximum 2 decimals but the regex breaks for thousand separator with it.
^[+]?(?:\d{1,3}(?:(,|.| )\d{3})*|\d+)?,?$
How to add a capturing group to handle decimal with custom character? Any Ideas?
Valid Inputs:
1234
123.45
123,45
1234.56
1234,56
123
1,234
12,345
1,234,567
12,345,678
123,456,789
12
1.234
12.345
1.234.567
12.345.678
123.456.789
123
1 234
12 345
123 456
1 234 567
12 345 678
123 456 789
123.4567
123,4567
1,345.67
1.345,67
1 345.67
12,345.67
12.345,67
12 345.67
123,456,789.34
123.456.789,34
123 456 789.34
Not Valid:
12.345.67
12,345,67
12 345 67
123 456 789 34
Well, your specification is ambiguous, as accepting the decimal indicator as ',' you are allowing to parse 123,456 as the number 123456 or as the number 123.456 (one thousandth of it)? If you fix the ambiguity disallowing only a number of three decimals, you solve the ambiguity, but at a high cost, you need the user to understand that if he makes the mistake of using three decimals, he/she will obtain weird results under strange conditions (123,456 will be parsed as 123456.0 while 123,4560will do as 123.456) This is weird for a user to accept. It's more interesting to use the condition that a single , or . means a decimal point, while if you have both indicators, the first will be a group separator, while the second will be a decimal point.
IMHO I should never use the space as a decimal indicator (if using it as a group separator, just use it as the only digit group separator ---some programming languages e.g. Java, allow for _ to be used as a digit group separator), just nobody uses it. It's preferable to use no decimal indicator at all (making the number an integer, scaled 10, 100, or 1000 times, this has been used for long in desktop calculators) as quick data input people prefer to key the extra zeros, than to move the finger to locate de decimal point and then type two more digits for the most of the times. Don't say then if he has to go to the letters keyboard to find the space bar. (well, of course it is more difficult to go there to find the underscore _ char, but quick typers don't use group separators)
In other side, people normally don't key the thousands separators, but just for readability (the computers do it in printing, but never on reading). In this scenario, sometimes they want not the rigid situation of having groups of three digits, but to use them arbitrarily. This leads to some situations where the user wants to separate digits in groups of three left of the decimal point, while using groups of five or ten one the right (which is something you don't contemplate at all) making, e.g. PI to appear as:
3.14159 26535 89793 23846 264338 3
I agree that using the alternate decimal point as group separator could be interesting, but at both sides of the actual decimal point, and never forcing groups of three.
Anyway, just to fit on your specs, I've written the following lex(1) specification to parse your input.
pfx [1-9][0-9]?[0-9]?
grp [0-9][0-9][0-9]
dec [0-9]*
e1 [+-]?{pfx}([.]{grp})*([,]{dec})?
e2 [+-]?{pfx}([,]{grp})*([.]{dec})?
e3 [+-]?{pfx}([ ]{grp})*([.,]{dec})?
e4 [+-]?[1-9][0-9]*([,.]{dec})?
e5 [+-]?0?([,.]{dec})?
%%
{e1}|{e2}|{e3}|{e4}|{e5} printf("\033[32m[%s]\033[m\n", yytext);
[0-9., +-]* printf("\033[31m[%s]\033[m\n", yytext);
. |
\n |
\t ;
%%
int main()
{
yylex();
}
int yywrap()
{
return 1;
}
Your regular expression, complete, should be something like:
[+-]?[0-9]{1,3}([ ][0-9]{3})*([,.]([0-9]{3}[ ])*[0-9]{1,3})?|[+-]?[0-9]{1,3}([ ][0-9]{3})*([,.][0-9]{0,2})?|[+-]?[0-9]{0,2}[,.]([0-9]{3}[ ])*[0-9]{1,3}|[+-]?[0-9]{1,3}([,][0-9]{3})*([.]([0-9]{3}[,])*[0-9]{1,3})?|[+-]?[0-9]{1,3}([,][0-9]{3})*([.][0-9]{0,2})?|[+-]?[0-9]{0,2}[.]([0-9]{3}[,])*[0-9]{1,3}|[+-]?[0-9]{1,3}([.][0-9]{3})*([,]([0-9]{3}[.])*[0-9]{1,3})?|[+-]?[0-9]{1,3}([.][0-9]{3})*([,][0-9]{0,2})?|[+-]?[0-9]{0,2}[,]([0-9]{3}[.])*[0-9]{1,3}|[+-]?[0-9]*[,.][0-9]+|[+-]?[0-9]+[,.][0-9]*|[+-]?[0-9]+
Note
Some regexp libraries, don't implement correctly the | operator, making it not actually conmutative as it should be (the worst case I know is regex101.com, see below), and forcing you to put the operands in some particular order to match some strings (this is a bug in the library, but unfortunately, this is spread) Below is the above (which works fine with sed(1)) and you'll see how it doesn't match correctly in reg101 (There should be far less matches).
I've written also a bash script (shown below) to use sed(1) with the above regexp, so you can see how it works at your site:
dig="[0-9]"
af0="${dig}{0,2}"
af1="${dig}{1,3}"
grp="${dig}{3}"
t01="[+-]?${af1}([ ]${grp})*([,.](${grp}[ ])*${af1})?"
t02="[+-]?${af1}([ ]${grp})*([,.]${af0})?"
t03="[+-]?${af0}[,.](${grp}[ ])*${af1}"
t04="[+-]?${af1}([,]${grp})*([.](${grp}[,])*${af1})?"
t05="[+-]?${af1}([,]${grp})*([.]${af0})?"
t06="[+-]?${af0}[.](${grp}[,])*${af1}"
t07="[+-]?${af1}([.]${grp})*([,](${grp}[.])*${af1})?"
t08="[+-]?${af1}([.]${grp})*([,]${af0})?"
t09="[+-]?${af0}[,](${grp}[.])*${af1}"
t10="[+-]?${dig}*[,.]${dig}+"
t11="[+-]?${dig}+[,.]${dig}*"
t12="[+-]?${dig}+"
s01="${t01}|${t02}|${t03}"
s02="${t04}|${t05}|${t06}"
s03="${t07}|${t08}|${t09}"
s04="${t10}|${t11}|${t12}"
reg="${s01}|${s02}|${s03}|${s04}"
echo "$reg"
sed -E -e "s/${reg}/<&>/g"
You can find all this code (and updates) here.
The following regex will match all the cases from your example:
^[+]?(?:\d{1,3}(?:([,. ])\d{3})*|\d+)?(?:[,.]\d+?){0,1}$
The last part (?:[,.]?\d+?){0,1}, makes the matching of the decimal part optional.
There you go:
^[+]?(?:\d{1,3}(?:(,|.| )\d{3})*|\d+)?((?<!,\d{3})(,\d+)|(?<!\.\d{3})(\.\d+))?$
Regex 101 demo
Assuming
123.4567
123,4567
123 4567
are not valid, you can use:
^[+-]?(?:(?:\d{1,3}(?:,\d{3})*|\d+)(?:\.\d\d)?|(?:\d{1,3}(?:\.\d{3})*|\d+)(?:,\d\d)?|(?:\d{1,3}(?: \d{3})*|\d+)(?:[,.]\d\d)?)$
Demo & explanation

Grab multiple numbers 1-10 from string

I am parsing a string of multiple numbers between 1 and 10 with the eventual goal of adding them to a set.
There will be multiple concatenated numbers after a text identifier such as {text}12345678910.
I am currently using match(/\d/g) to grab the numbers but it separates 1 and 0 in 10. I then look for 0 in my String Array, see if there's a 1 in the element before it, turn it into a 10 and delete the other entry. Not very elegant.
How can I clean up my matching code? I definitely don't need to use regex for this, but it makes grabbing the numbers fairly easy.
You could just match with this regex:
/10|\d/g
(instead of the one you use currently, not additionally)
Regex is executed left-to-right, so first it finds any occurrences of 10, and then of other digits (so using, for example /\d|10/g or even /\d|(10)/g won't work either).

Regex for validating currency number format

I've got following formats, that are acceptable
1200000,00
1200000.00
1,200,000.00
1 200 000.00
1 200 000,00
1 200 000,0000
-1 200 000.00
At the moment I was able to verify only ^-?\\d+$, ^-?\\d+[\\,\\.]\\d{2}$, ^-?\\d+[\\,\\.]\\d{2,}$. Two last format are separate, so that I would know is rounding needed or not. All three format use gm flags to check string from start ^ to end $.
Those regular expressions cover only first two elements in list. Other elements, that use commas and spaces for thousand separation are not verified yet and I'm not sure how to achieve that.
Also there is a "beautifier" expression (\\d)(?=(\\d{3})+(?!\\d)), that will take this 1200000,00 and turn it into 1 200 000,00 with such usage '1200000,00'.replace(('(\\d)(?=(\\d{3})+(?!\\d))', 'g'), '$1 ').
So question states, what would be a correct regular expression to validate such format 1 200 000.00 or 1,200,000.00? Since I assume difference with \s\, could be easily done in same expression.
Thank you.
For validating the last two numbers, you can use the following:
^-?\d{1,3}(?:[\s,]\d{3})*(?:\.\d+)?$
1 2 3 4 5
Optional minus sign
1..3 digits
Zero or more fragments that consist of
comma or space
3 digits
optional fraction part consisting of a dot followed by 1 or more digits.
This doesn't directly solve the problem due to me misreading. But it might still be useful to someone so I'll let it stay.
Stop trying to solve every problem with regex. Regex is great when you have one or two very well defined strings. Not a million formats.
This can be solved with minimal regex. Magic is in the bold part.
var numbers = [
"1200000,00",
"1200000.00",
"1,200,000.00",
"1 200 000.00",
"1 200 000,00",
"1 200 000,0000",
"-1 200 000.00"
];
var parseWeirdNumber = function(numberString) {
//Split numbers to parts. , . and space are all valid delimiters.
var numberParts = numberString.split(/[.,\s]/);
//Remove the last part. **This means that all input must have fraction!!**
var fraction = numberParts.pop();
//Rejoin back without delimiters, and reapply the fraction.
//parseFloat to convert to a number
var number = parseFloat(numberParts.join('') + "." + fraction);
return number;
}
numbers = numbers.map(parseWeirdNumber);
console.log(numbers);

Expression regular for check phone numbers at word level

I'm trying to write a RegEx to test if a number is valid and for valid I mean any number that matches country calling codes but also where the format of telephone numbers is standardized by ITU-T in the recommendation E.164. This specifies that the entire number should be 15 digits or shorter, and begin with a country prefix as said here so I did this:
^\+\d{2}|\d{3}([0-9])\d{7}$
But it's not working. In my case (VE numbers can't match the RegEx since this one are validated in another way) this input is valid:
+1420XXXXXXXXXXX // Slovakia - X is a digit and could be more, tough, 5 minimum
001420XXXXXXXXXX // Slovakia - I've changed from + to 00
420XXXXXXXXXXXXX // Slovakia - I've removed the 00 o + but number still being valid
+40XXXXXXXXXXXXX // Romania
Invalid numbers are the one that doesn't match the RegEx and the one started with +58 since they are from VE. So, resuming, a valid number should have:
+XX|+XXX plus 12|11 digits (5 minimum) where XX|XXX is the country code and then since maximum is 15 digits then should be 12 or 11 digits depending on the country format
Can any help me with this? It's a one I called complex
Few strange things going on with your regexp:
\d is shorthand for [0-9] - fine to use both, but I'm wondering why they're mixed
what you are searching with you OR (|) is "something that starts with +XX" i.e. plus and two numbers (^\+\d{2}) OR "something that ends with XXXXXXXXXXX" i.e. 11 numbers (\d{3}([0-9])\d{7}$)
You need to group (with brackets) the OR choices, otherwise it is everything to the left or everything to the right (simplistically)
^\+(\d{2}|\d{3})([0-9])\d{7}$
There is, however, another way of giving the number of occurrences : {m,n} means occurs between m and n times. So you could say ^\+\d{7,15}$ (where 7 is your minimum 5 + the minimum country code of 2).
To really do this, however, you might want to take a look here (https://code.google.com/p/libphonenumber/ 1) where there is a complete validation and formatting for all phone numbers available as javascript.

Performance of regex within jQuery data selector: dependance on certain string length

The setup: I have a div with a bunch of radio buttons, each of which has been associated with a custom attribute and value using $(element).data(attr_name,attr_value);. When an underlying data structure is changed, I iterate over the fields and set the appropriate buttons to checked:true by using the ':data' selector found here: https://stackoverflow.com/a/2895933/1214731
$($('#style-options').find(':radio').filter(':data('+key+'=='+value+')'))
.prop('checked',true).button('refresh');
This works great: it finds the appropriate elements, even with floating-point values.
Performance depends on value:
I noticed that when I clicked on certain buttons, the page took fractionally longer to respond (for most buttons there was no noticeable delay). Digging a little deeper, this seems to be occurring when certain floating point values are being searched for.
Using chrome dev tools, I logged the following:
> key='fill-opacity';
"fill-opacity"
> value=.2*2;
0.4
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 43.352ms undefined
> value=.2*3;
0.6000000000000001
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 10322.866ms undefined
The difference in speed is a factor of >200!
Next, I tried typing the number in manually (e.g. decimal place, six, 14x zeros, one) - same speed. All numbers with the same number of digits were the same speed. I then reduced the number of digits progressively:
# of digits time (ms)
16 10300
15 5185
14 2665
13 1314
12 673
11 359
10 202
9 116
8 77
7 60
6 50
5 41
4 39
I quickly ruled out the equality check between numeric and string - no dependence on string length there.
The regex execution is strongly dependent on string length
In the linked answer above, the regex that parses the data string is this:
var matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;
The string passed in is of the form [name operator value]. The length of name doesn't seem to make much difference; the length of value has a big impact on speed however.
Specific questions:
1) Why does the length of name have minimal effect on performance, while the length of value has a large effect?
2) Doubling the execution time with each additional character in name seems excessive - is this just a characteristic of the particular regex the linked solution uses, or is it a more general feature?
3) How can I improve performance without sacrificing a lot of flexibility? I'd like to still be able to pass arguments as a single string to a jQuery selector so type checking up front seems difficult, though I'm open to suggestions.
Basic test code for regex matching speeds:
matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.1111111111111)}; console.timeEnd('regex')
regex: 538.018ms
//add an extra digit - doubles duration of test
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.11111111111111)}; console.timeEnd('regex')
regex: 1078.742ms
//add a bunch to the length of 'name' - minimal effect
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('xxxxxxxxxxxxxxxxxxxx=='+.11111111111111)}; console.timeEnd('regex')
regex: 1084.367ms
A characteristic of regexp matching is that they are greedy. If you try to match the expression a.*b to the string abcd, it will happen in these steps:
the first "a" will match
the .* will match the second char, then the third, till the end of the string
reaching the end of the string, there is still a "b" to matched, the matching will fail
the regexp processing starts to backtrack
the last char will be "unmatched" and it will try to match "b" to "d". Fails again. More backtracking
tries to match "b" to "c". Fail. Backtrack.
match "b" to "b". Success. Matching ends.
Although you matched just a few chars, you iterated all the string. If you have more than one greedy operator, you can easily get an input string that will match with exponential complexity.
Understanding backtracking will prevent a lot of errors and performance problems. For example, 'a.*b' will match all the string 'abbbbbbb', instead of just the first 'ab'.
The easiest way to prevent these kind of errors in modern regexp engines, is to use the non-greedy version of the operators * and +. They are usually represented by the same operators followed by a question mark: *? and +?.
I confess that I really didn't stop to debug the complicate regexp that you posted, but I believe that the problem is before matching the '=' symbol. The greedy operator is in this subexpression:
(?:\\\.|[^.,])+\.?)+
I'd try to change it to a non-greedy version:
(?:\\\.|[^.,])+\.?)+?
but this is really just a wild guess. I'm using pattern recognition to solve the problem :-) It makes sense because it is backtracking for each character of the "value" till matching the operator. The name is matched linearly.
This regular expression is just too complex for my taste. I love regular expressions, but it looks like this one matches this famous quotation:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.

Categories

Resources