Can this numeric range regex be refactored? - javascript

I need to match a number range:
-9223372036854775808 to 9223372036854775807
^(?:922337203685477580[0-7]|9223372036854775[0-7]\d{2}|922337203685477[0-4]\d{3}|92233720368547[0-6]\d{4}|9223372036854[0-6]\d{5}|922337203685[0-3]\d{6}|92233720368[0-4]\d{7}|9223372036[0-7]\d{8}|922337203[0-5]\d{9}|92233720[0-2]\d{10}|922337[0-1]\d{12}|92233[0-6]\d{13}|9223[0-2]\d{14}|922[0-2]\d{15}|92[0-1]\d{16}|9[01]\d{17}|[1-8]\d{18}|\d{0,18}|-(?:922337203685477580[0-8]|9223372036854775[0-7]\d{2}|922337203685477[0-4]\d{3}|92233720368547[0-6]\d{4}|9223372036854[0-6]\d{5}|922337203685[0-3]\d{6}|92233720368[0-4]\d{7}|9223372036[0-7]\d{8}|922337203[0-5]\d{9}|92233720[0-2]\d{10}|922337[0-1]\d{12}|92233[0-6]\d{13}|9223[0-2]\d{14}|922[0-2]\d{15}|92[0-1]\d{16}|9[01]\d{17}|[1-8]\d{18}|\d{0,18}))?$
// space for easier copy and paste
Yes, I know it sounds crazy, but there's a long story behind this. I can't figure out how to do this in JavaScript by just checking a range, because of the size of the number, and this must be accurate.
Here's the thought process in breaking this thing down. I just started with the max number and worked my way down, then worked on the negative by just adding the - in the regex. You'll obviously have to copy and paste this thing somewhere to see it all. Also, could be mistakes. Made my head nearly explode.
9,223,372,036,854,775,807
922337203685477580[0-7]
9223372036854775[0-7][0-9]{2}
922337203685477[0-4][0-9]{3}
92233720368547[0-6][0-9]{4}
9223372036854[0-6][0-9]{5}
922337203685[0-3][0-9]{6}
92233720368[0-4][0-9]{7}
9223372036[0-7][0-9]{8}
922337203[0-5][0-9]{9}
92233720[0-2][0-9]{10}
922337[0-1][0-9]{12}
92233[0-6][0-9]{13}
9223[0-2][0-9]{14}
922[0-2][0-9]{15}
92[0-1][0-9]{16}
9[01][0-9]{17}
[1-8][0-9]{18}
[0-9]{0,18}
There's a single digit different in the negative vs. positive, so you'll see where I had to basically duplicate most of this.
So a few question:
Did I do this right?
If not, what's a better way?
Can this be done without regular expressions considering the size of the number? I need to validate client-side.
Can it be refactored and still retain strict rules?
Suggestions appreciated :)

Can this be done without regular expressions considering the size of the number?
It can be done in a series of if statements using only string operations (no need to convert to numbers).
all strings that don't match [0-9]{1,19} are out
all candidates that are of length 18 or less are good
for length 19 you can work with string comparison to see if they are numerically less than your upper limit
tweak the above to take care of negative numbers

Your regex is correct.
This is a shorter version
^(?:-9223372036854775808|-?(?:\d{0,18}|(?!922337203685477580[8-9]|92233720368547758[1-9]|92233720368547759|922337203685477[6-9]|92233720368547[8-9]|9223372036854[8-9]|922337203685[5-9]|92233720368[6-9]|92233720369|922337203[7-9]|92233720[4-9]|9223372[1-9]|922337[3-9]|92233[8-9]|9223[4-9]|922[4-9]|92[3-9]|9[3-9])\d{19}))$
Regex demo
How to generate that regex without mistake:
Input max number:
9223372036854775807
Output:
9223372036854775807
922337203685477580
92233720368547758
9223372036854775
922337203685477
92233720368547
9223372036854
922337203685
92233720368
9223372036
922337203
92233720
9223372
922337
92233
9223
922
92
9
Replace last number letter
9->remove all line
8->9
7->[8-9]
6->[7-9]
5->[6-9]
4->[5-9]
3->[4-9]
2->[3-9]
1->[2-9]
0->[1-9]
Output:
922337203685477580[8-9]
92233720368547758[1-9]
92233720368547759
922337203685477[6-9]
92233720368547[8-9]
9223372036854[8-9]
922337203685[5-9]
92233720368[6-9]
92233720369
922337203[7-9]
92233720[4-9]
9223372[1-9]
922337[3-9]
92233[8-9]
9223[4-9]
922[4-9]
92[3-9]
9[3-9]
Regex [Output]
922337203685477580[8-9]|92233720368547758[1-9]|92233720368547759|922337203685477[6-9]|92233720368547[8-9]|9223372036854[8-9]|922337203685[5-9]|92233720368[6-9]|92233720369|922337203[7-9]|92233720[4-9]|9223372[1-9]|922337[3-9]|92233[8-9]|9223[4-9]|922[4-9]|92[3-9]|9[3-9]
Add these[output] to regex
(?!output)\d{19}
Will become [output2]
(?!922337203685477580[8-9]|92233720368547758[1-9]|92233720368547759|922337203685477[6-9]|92233720368547[8-9]|9223372036854[8-9]|922337203685[5-9]|92233720368[6-9]|92233720369|922337203[7-9]|92233720[4-9]|9223372[1-9]|922337[3-9]|92233[8-9]|9223[4-9]|922[4-9]|92[3-9]|9[3-9])\d{19}
Matches \d{19} <= 9223372036854775807
Add
^(?:-9223372036854775808|-?(?:\d{0,18}|[output2]))$
^(?:-9223372036854775808|-?(?:\d{0,18}|(?!922337203685477580[8-9]|92233720368547758[1-9]|92233720368547759|922337203685477[6-9]|92233720368547[8-9]|9223372036854[8-9]|922337203685[5-9]|92233720368[6-9]|92233720369|922337203[7-9]|92233720[4-9]|9223372[1-9]|922337[3-9]|92233[8-9]|9223[4-9]|922[4-9]|92[3-9]|9[3-9])\d{19}))$
Will match
-9223372036854775808 or
+/- \d{0,18} or
+/- \d{19} <= 9223372036854775807
Demo

Related

Grab multiple numbers 1-10 from string

I am parsing a string of multiple numbers between 1 and 10 with the eventual goal of adding them to a set.
There will be multiple concatenated numbers after a text identifier such as {text}12345678910.
I am currently using match(/\d/g) to grab the numbers but it separates 1 and 0 in 10. I then look for 0 in my String Array, see if there's a 1 in the element before it, turn it into a 10 and delete the other entry. Not very elegant.
How can I clean up my matching code? I definitely don't need to use regex for this, but it makes grabbing the numbers fairly easy.
You could just match with this regex:
/10|\d/g
(instead of the one you use currently, not additionally)
Regex is executed left-to-right, so first it finds any occurrences of 10, and then of other digits (so using, for example /\d|10/g or even /\d|(10)/g won't work either).

Getting the numeric value after the hyphen in a string

How can I extract and get just the numeric value after the hyphen in a string?
Here is the input string:
var x = "-2147467259"
After some processing.... return:
alert(2147467259)
How do I accomplish this?
You could replace away the hyphen:
alert(+x.replace("-", ""));
And yes, the + is important. It converts a string to a number; so you're removing the hypen by replacing it with nothing, and then essentially casting the result of that operation into a number. This operation will also work if no hyphen is present.
You could also use substr to achieve this:
alert(+x.substr(1));
You could also use parseInt to convert the string to a number (which will end up negative if a hyphen is persent), and then find its absolute value:
alert(Math.abs(parseInt(x, 10));
As Bergi notes, if you can be sure that the first character in the string is always a hyphen, you can simple return its negative, which will by default cast the value into a number and then perform the negative operation on it:
alert(-x);
You could also check to see if the number is negative or positive via a tertiary operator and then perform the respective operation on it to ensure that it is a positive Number:
x = x >= 0 ? +x : -x;
This may be cheaper in terms of performance than using Math.abs, but the difference will be minuscule either way.
As you can see, there really are a variety of ways to achieve this. I'd recommend reading up on JavaScript string functions and number manipulation in general, as well as examining JavaScript's Math object to get a feel for what tools are available to you when you go to solve a problem.
How about:
Math.abs(parseInt("-2147467259"))
Or
"-2147467259".replace('-','')
or
"-2147467259".replace(/\-/,'')
#1 option is converting the string to numbers. The #2 approach is removing all - from the string and the #3 option even though it will not be necessary on this example uses Regular Expression but I wanted to show the possibility of using RegEx in replace situations.
If you need a number as the final value #1 is your choice if you need strings #2 is your choice.

Performance of regex within jQuery data selector: dependance on certain string length

The setup: I have a div with a bunch of radio buttons, each of which has been associated with a custom attribute and value using $(element).data(attr_name,attr_value);. When an underlying data structure is changed, I iterate over the fields and set the appropriate buttons to checked:true by using the ':data' selector found here: https://stackoverflow.com/a/2895933/1214731
$($('#style-options').find(':radio').filter(':data('+key+'=='+value+')'))
.prop('checked',true).button('refresh');
This works great: it finds the appropriate elements, even with floating-point values.
Performance depends on value:
I noticed that when I clicked on certain buttons, the page took fractionally longer to respond (for most buttons there was no noticeable delay). Digging a little deeper, this seems to be occurring when certain floating point values are being searched for.
Using chrome dev tools, I logged the following:
> key='fill-opacity';
"fill-opacity"
> value=.2*2;
0.4
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 43.352ms undefined
> value=.2*3;
0.6000000000000001
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 10322.866ms undefined
The difference in speed is a factor of >200!
Next, I tried typing the number in manually (e.g. decimal place, six, 14x zeros, one) - same speed. All numbers with the same number of digits were the same speed. I then reduced the number of digits progressively:
# of digits time (ms)
16 10300
15 5185
14 2665
13 1314
12 673
11 359
10 202
9 116
8 77
7 60
6 50
5 41
4 39
I quickly ruled out the equality check between numeric and string - no dependence on string length there.
The regex execution is strongly dependent on string length
In the linked answer above, the regex that parses the data string is this:
var matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;
The string passed in is of the form [name operator value]. The length of name doesn't seem to make much difference; the length of value has a big impact on speed however.
Specific questions:
1) Why does the length of name have minimal effect on performance, while the length of value has a large effect?
2) Doubling the execution time with each additional character in name seems excessive - is this just a characteristic of the particular regex the linked solution uses, or is it a more general feature?
3) How can I improve performance without sacrificing a lot of flexibility? I'd like to still be able to pass arguments as a single string to a jQuery selector so type checking up front seems difficult, though I'm open to suggestions.
Basic test code for regex matching speeds:
matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.1111111111111)}; console.timeEnd('regex')
regex: 538.018ms
//add an extra digit - doubles duration of test
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.11111111111111)}; console.timeEnd('regex')
regex: 1078.742ms
//add a bunch to the length of 'name' - minimal effect
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('xxxxxxxxxxxxxxxxxxxx=='+.11111111111111)}; console.timeEnd('regex')
regex: 1084.367ms
A characteristic of regexp matching is that they are greedy. If you try to match the expression a.*b to the string abcd, it will happen in these steps:
the first "a" will match
the .* will match the second char, then the third, till the end of the string
reaching the end of the string, there is still a "b" to matched, the matching will fail
the regexp processing starts to backtrack
the last char will be "unmatched" and it will try to match "b" to "d". Fails again. More backtracking
tries to match "b" to "c". Fail. Backtrack.
match "b" to "b". Success. Matching ends.
Although you matched just a few chars, you iterated all the string. If you have more than one greedy operator, you can easily get an input string that will match with exponential complexity.
Understanding backtracking will prevent a lot of errors and performance problems. For example, 'a.*b' will match all the string 'abbbbbbb', instead of just the first 'ab'.
The easiest way to prevent these kind of errors in modern regexp engines, is to use the non-greedy version of the operators * and +. They are usually represented by the same operators followed by a question mark: *? and +?.
I confess that I really didn't stop to debug the complicate regexp that you posted, but I believe that the problem is before matching the '=' symbol. The greedy operator is in this subexpression:
(?:\\\.|[^.,])+\.?)+
I'd try to change it to a non-greedy version:
(?:\\\.|[^.,])+\.?)+?
but this is really just a wild guess. I'm using pattern recognition to solve the problem :-) It makes sense because it is backtracking for each character of the "value" till matching the operator. The name is matched linearly.
This regular expression is just too complex for my taste. I love regular expressions, but it looks like this one matches this famous quotation:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.

Match 0 to 100 or 0% to 100% using regex

I'm trying to match the following, and i'm having a difficult time doing so.
I want to allow 0 to 100, or 0% to 100%. In my textfield, i strip the % out, so if they put in 100, it won't fail at the regex and skip the strip.
Therefore, i'd need a regex to allow 0 to 100 or 0% to 100%. 101 or 101% is invalid.
Presently, i have the following
(?:^((\\%)?100(\\%)?$)|(?:^(\\%)?[0-9]{1,2}(\\%)?)((\\.|,)?[0-9]+)?$)
But that allows 101 but not 101%
Please help! Any help would be greatly appreciated.
You're trying to use regex to do something that regex isn't meant to do. You have the javascript and python tags; you should use regex to ensure that the input is a number, and then use javascript or python to determine whether the number is too big or too small. It will be much easier that way.
Some people, when confronted with a
problem, think “I know, I'll use
regular expressions.” Now they have
two problems.. - Jamie Zawinski
1997
In all seriousness, you are trying to do something that Regular Expressions aren't intended to do, which is perform logic. What you are asking is, does this string of characters contain a number, which is fine and if it does, is it >= 0 && <= 100 which isn't something a regular expression can or should be doing.
While the expression ^(\d{1,2}|100)%?$ will work to tell you if the input matches the pattern, and let you pull the number out using the group. The fact that the result falls into the >=0 && <= 100 range is a side effect. This apparent range checking behavior won't work for any other arbitrary range of numbers. Side effects should be avoided for maintainable code.
Is it the optimal solution? Is the intention obvious from just looking at the expression? I would argue no, not without some comments describing the intent.
JavaScript
I think a better more maintainable solution would be to use the parseInt() function and then explicitly compare the result to >= 0 and <= 100. Explicit is Better than Implicit and is more self documenting.
Python
You will still have to resort to a regular expression to validate the format and extract only the numbers and convert that using int(), testing against the valid range would be redundant but also more explicit. Using the regular expression might not be such a bad option in this case, as long as you comment the intention of the use of the regular expression.
/^(100|[0-9]{1,2})%?$/
either '100' or any number consisting of one or two digits
possibly followed by a percent sign.
Test it here: http://jsbin.com/azeya3.
Oh yes, and the first capture contains the number.
use on javascript "parseInt"
parseInt does following:
"100" -> 100
"10k" -> 10
"100%" -> 100
after this, check if your number is smaller than 101.
example...
function parsePerc(value) {
return parseInt(value) < 101;
}
The question is tagged "python", so I'll answer in Python:
def is0to100(num):
try:
return 0 <= int(num.rstrip('%')) <= 100
except ValueError:
return False
This returns True if the passed-in value, minus the possible trailing '%', is a valid integer between 0 and 100. I didn't benchmark it, but I'd put money that it's way faster than a regexp.
In the spirit of #eykanal's answer, here's a regex:
(\d{1,3})%? works for me in Python with this.
My test text was:
101
100
100%
101%
0%
0
50%
That will get you just the number, which you can then parse and decide whether or not it's in your acceptable range.
First, I agreed with eykanal and Jarrod Roberson
But then, it's so simple with a regex that I doesn't agree more and that it's the better solution according to me:
'(\d\d?|100)(?!\d)\Z' and use of match()
I think that presenting in first position \d\d? for detecting numbers < 100 , and in second position the number 100 alone, is of more natural logic.
Also \d\d? is shorter and more rapidly understood than \d{1,2}
EDIT
better :
'(\d\d?|100)\D?\Z'
But these RE are for detection. Since the aim is to verify, this one is enough:
'(\d\d?|100)\Z'
Match 0 to 100
^100$|^[123456789][0-9]$|^[0-9]$
Match 0% to 100%
^100[%]$|^[123456789][0-9][%]$|^[0-9][%]$

Javascript percentage validation

I am after a regular expression that validates a percentage from 0 100 and allows two decimal places.
Does anyone know how to do this or know of good web site that has example of common regular expressions used for client side validation in javascript?
#Tom - Thanks for the questions. Ideally there would be no leading 0's or other trailing characters.
Thanks to all those who have replied so far. I have found the comments really interesting.
Rather than using regular expressions for this, I would simply convert the user's entered number to a floating point value, and then check for the range you want (0 to 100). Trying to do numeric range validation with regular expressions is almost always the wrong tool for the job.
var x = parseFloat(str);
if (isNaN(x) || x < 0 || x > 100) {
// value is out of range
}
I propose this one:
(^100(\.0{1,2})?$)|(^([1-9]([0-9])?|0)(\.[0-9]{1,2})?$)
It matches 100, 100.0 and 100.00 using this part
^100(\.0{1,2})?$
and numbers like 0, 15, 99, 3.1, 21.67 using
^([1-9]([0-9])?|0)(\.[0-9]{1,2})?$
Note what leading zeros are prohibited, but trailing zeros are allowed (though no more than two decimal places).
This reminds me of an old blog Entry By Alex Papadimoulis (of The Daily WTF fame) where he tells the following story:
"A client has asked me to build and install a custom shelving system. I'm at the point where I need to nail it, but I'm not sure what to use to pound the nails in. Should I use an old shoe or a glass bottle?"
How would you answer the question?
It depends. If you are looking to pound a small (20lb) nail in something like drywall, you'll find it much easier to use the bottle, especially if the shoe is dirty. However, if you are trying to drive a heavy nail into some wood, go with the shoe: the bottle with shatter in your hand.
There is something fundamentally wrong with the way you are building; you need to use real tools. Yes, it may involve a trip to the toolbox (or even to the hardware store), but doing it the right way is going to save a lot of time, money, and aggravation through the lifecycle of your product. You need to stop building things for money until you understand the basics of construction.
This is such a question where most people sees it as a challenge to come up with the correct regular expression to solve the problem, but it would be much better to just say that using regular expressions are using the wrong tool for the job.
The problem when trying to use regex to validate numeric ranges is that it is hard to change if the requirements for the allowed range is changes. Today the requirement may be to validate numbers between 0 and 100 and it is possible to write a regex for that which doesn't make your eyes bleed. But next week the requirment maybe changes so values between 0 and 315 are allowed. Good luck altering your regex.
The solution given by Greg Hewgill is probably better - even though it would validate "99fxx" as "99". But given the circumstances that might actually be ok.
Given that your value is in str
str.match(/^(100(\.0{1,2})?|([0-9]?[0-9](\.[0-9]{1,2})))$/)
^100(\.(0){0,2})?$|^([1-9]?[0-9])(\.(\d{0,2}))?\%$
This would match:
100.00
optional "1-9" followed by a digit (this makes the int part), optionally followed by a dot and two digits
From what I see, Greg Hewgill's example doesn't really work that well because parseFloat('15x') would simply return 15 which would match the 0<x<100 condition. Using parseFloat is clearly wrong because it doesn't validate the percentage value, it tries to force a validation. Some people around here are complaining about leading zeroes and some are ignoring trailing invalid characters. Maybe the author of the question should edit it and make clear what he needs.
I recomend this, if you are not exclusively developing for english speaking users:
[0-9]{1,2}((,|\.)[0-9]{1,10})?%?
You can simply replace the 10 by a 2 to get two decimal places.
My example will match:
15.5
5.4366%
1,43
50,55%
34
45%
Of cause the output of this one is harder to cast, but something like this will do (Java Code):
private static Double getMyVal(String myVal) {
if (myVal.contains("%")) {
myVal = myVal.replace("%", "");
}
if (myVal.contains(",")) {
myVal = myVal.replace(',', '.');
}
return Double.valueOf(myVal);
}
None of the above solutions worked for me, as I needed my regex to allow for values with numbers and a decimal while the user is typing ex: '18.'
This solution allows for an empty string so the user can delete their entire input, and accounts for the other rules articulated above.
/(^$)|(^100(\.0{1,2})?$)|(^([1-9]([0-9])?|0)\.(\.[0-9]{1,2})?$)|(^([1-9]([0-9])?|0)(\.[0-9]{1,2})?$)/
(100|[0-9]{1,2})(\.[0-9]{1,2})?
That should be the regex you want. I suggest you to read Mastering Regular Expression and download RegexBuddy or The Regex Coach.
#mlarsen:
Is not that a regex here won't do the job better.
Remember that validation msut be done both on client and on server side, so something like:
100|(([1-9][0-9])|[0-9])(\.(([0-9][1-9])|[1-9]))?
would be a cross-language check, just beware of checking the input length with the output match length.
(100(\.(0){1,2})?|([1-9]{1}|[0-9]{2})(\.[0-9]{1,2})?)

Categories

Resources