Julian Day Regex Without Leading Zeros - javascript

I am really struggling with creating a Reg Ex for Julian Day that does not allow leading zeros.
I am using it for Input Validation using JavaScript Reg Ex. It needs to match 1 through 366.
Example Matches:
1
99
366
159
Example Match Failures:
01
001
099
367
999
0
I tried this on regex101:
^[1-9]|[1-9][0-9]|[1-3][0-5][0-9]|36[0-6]$
But for I am not getting the optional parts down right. So when I put in 266, I get a match on 2 and 66. (This issue is translating to my input validation control.)
I thought about trying to use + for one or more, but I need to not allow leading zeros, so that does not work.
I have read the guidance on asking a RegEx question and tried to follow it, but if I missed something, please let me know and I will update my question.

The main issues are two: 1) the alternatives should have been grouped so that ^ and $ anchors could be applied to all of them, 2) the [1-3][0-5][0-9] part did not match 160 to 199 and 260 to 299, this part should have been split into two separate branches, [12][0-9]{2}|3[0-5][0-9].
You may use
^(?:[1-9]|[1-9][0-9]|[12][0-9]{2}|3[0-5][0-9]|36[0-6])$
See the regex demo.
Details
^ - start of string
(?: - group of alternatives:
[1-9] - 1 to 9
| - or
[1-9][0-9] - 10 to 99
| - or
[12][0-9]{2} - 100 to 299
| - or
3[0-5][0-9] - 300 to 359
| - or
36[0-6] - 360 to 366
) - end of the alternation group
$ - and the end of the string.

Related

Regex for custom decimal and thousand separator

I am using the below regex the handle the custom thousand separator which could be any of the , or . or space character which works for the thousand separator and not for the decimal indicator.
I am trying to add a new capturing group to handle decimal indicator (, or .) with maximum 2 decimals but the regex breaks for thousand separator with it.
^[+]?(?:\d{1,3}(?:(,|.| )\d{3})*|\d+)?,?$
How to add a capturing group to handle decimal with custom character? Any Ideas?
Valid Inputs:
1234
123.45
123,45
1234.56
1234,56
123
1,234
12,345
1,234,567
12,345,678
123,456,789
12
1.234
12.345
1.234.567
12.345.678
123.456.789
123
1 234
12 345
123 456
1 234 567
12 345 678
123 456 789
123.4567
123,4567
1,345.67
1.345,67
1 345.67
12,345.67
12.345,67
12 345.67
123,456,789.34
123.456.789,34
123 456 789.34
Not Valid:
12.345.67
12,345,67
12 345 67
123 456 789 34
Well, your specification is ambiguous, as accepting the decimal indicator as ',' you are allowing to parse 123,456 as the number 123456 or as the number 123.456 (one thousandth of it)? If you fix the ambiguity disallowing only a number of three decimals, you solve the ambiguity, but at a high cost, you need the user to understand that if he makes the mistake of using three decimals, he/she will obtain weird results under strange conditions (123,456 will be parsed as 123456.0 while 123,4560will do as 123.456) This is weird for a user to accept. It's more interesting to use the condition that a single , or . means a decimal point, while if you have both indicators, the first will be a group separator, while the second will be a decimal point.
IMHO I should never use the space as a decimal indicator (if using it as a group separator, just use it as the only digit group separator ---some programming languages e.g. Java, allow for _ to be used as a digit group separator), just nobody uses it. It's preferable to use no decimal indicator at all (making the number an integer, scaled 10, 100, or 1000 times, this has been used for long in desktop calculators) as quick data input people prefer to key the extra zeros, than to move the finger to locate de decimal point and then type two more digits for the most of the times. Don't say then if he has to go to the letters keyboard to find the space bar. (well, of course it is more difficult to go there to find the underscore _ char, but quick typers don't use group separators)
In other side, people normally don't key the thousands separators, but just for readability (the computers do it in printing, but never on reading). In this scenario, sometimes they want not the rigid situation of having groups of three digits, but to use them arbitrarily. This leads to some situations where the user wants to separate digits in groups of three left of the decimal point, while using groups of five or ten one the right (which is something you don't contemplate at all) making, e.g. PI to appear as:
3.14159 26535 89793 23846 264338 3
I agree that using the alternate decimal point as group separator could be interesting, but at both sides of the actual decimal point, and never forcing groups of three.
Anyway, just to fit on your specs, I've written the following lex(1) specification to parse your input.
pfx [1-9][0-9]?[0-9]?
grp [0-9][0-9][0-9]
dec [0-9]*
e1 [+-]?{pfx}([.]{grp})*([,]{dec})?
e2 [+-]?{pfx}([,]{grp})*([.]{dec})?
e3 [+-]?{pfx}([ ]{grp})*([.,]{dec})?
e4 [+-]?[1-9][0-9]*([,.]{dec})?
e5 [+-]?0?([,.]{dec})?
%%
{e1}|{e2}|{e3}|{e4}|{e5} printf("\033[32m[%s]\033[m\n", yytext);
[0-9., +-]* printf("\033[31m[%s]\033[m\n", yytext);
. |
\n |
\t ;
%%
int main()
{
yylex();
}
int yywrap()
{
return 1;
}
Your regular expression, complete, should be something like:
[+-]?[0-9]{1,3}([ ][0-9]{3})*([,.]([0-9]{3}[ ])*[0-9]{1,3})?|[+-]?[0-9]{1,3}([ ][0-9]{3})*([,.][0-9]{0,2})?|[+-]?[0-9]{0,2}[,.]([0-9]{3}[ ])*[0-9]{1,3}|[+-]?[0-9]{1,3}([,][0-9]{3})*([.]([0-9]{3}[,])*[0-9]{1,3})?|[+-]?[0-9]{1,3}([,][0-9]{3})*([.][0-9]{0,2})?|[+-]?[0-9]{0,2}[.]([0-9]{3}[,])*[0-9]{1,3}|[+-]?[0-9]{1,3}([.][0-9]{3})*([,]([0-9]{3}[.])*[0-9]{1,3})?|[+-]?[0-9]{1,3}([.][0-9]{3})*([,][0-9]{0,2})?|[+-]?[0-9]{0,2}[,]([0-9]{3}[.])*[0-9]{1,3}|[+-]?[0-9]*[,.][0-9]+|[+-]?[0-9]+[,.][0-9]*|[+-]?[0-9]+
Note
Some regexp libraries, don't implement correctly the | operator, making it not actually conmutative as it should be (the worst case I know is regex101.com, see below), and forcing you to put the operands in some particular order to match some strings (this is a bug in the library, but unfortunately, this is spread) Below is the above (which works fine with sed(1)) and you'll see how it doesn't match correctly in reg101 (There should be far less matches).
I've written also a bash script (shown below) to use sed(1) with the above regexp, so you can see how it works at your site:
dig="[0-9]"
af0="${dig}{0,2}"
af1="${dig}{1,3}"
grp="${dig}{3}"
t01="[+-]?${af1}([ ]${grp})*([,.](${grp}[ ])*${af1})?"
t02="[+-]?${af1}([ ]${grp})*([,.]${af0})?"
t03="[+-]?${af0}[,.](${grp}[ ])*${af1}"
t04="[+-]?${af1}([,]${grp})*([.](${grp}[,])*${af1})?"
t05="[+-]?${af1}([,]${grp})*([.]${af0})?"
t06="[+-]?${af0}[.](${grp}[,])*${af1}"
t07="[+-]?${af1}([.]${grp})*([,](${grp}[.])*${af1})?"
t08="[+-]?${af1}([.]${grp})*([,]${af0})?"
t09="[+-]?${af0}[,](${grp}[.])*${af1}"
t10="[+-]?${dig}*[,.]${dig}+"
t11="[+-]?${dig}+[,.]${dig}*"
t12="[+-]?${dig}+"
s01="${t01}|${t02}|${t03}"
s02="${t04}|${t05}|${t06}"
s03="${t07}|${t08}|${t09}"
s04="${t10}|${t11}|${t12}"
reg="${s01}|${s02}|${s03}|${s04}"
echo "$reg"
sed -E -e "s/${reg}/<&>/g"
You can find all this code (and updates) here.
The following regex will match all the cases from your example:
^[+]?(?:\d{1,3}(?:([,. ])\d{3})*|\d+)?(?:[,.]\d+?){0,1}$
The last part (?:[,.]?\d+?){0,1}, makes the matching of the decimal part optional.
There you go:
^[+]?(?:\d{1,3}(?:(,|.| )\d{3})*|\d+)?((?<!,\d{3})(,\d+)|(?<!\.\d{3})(\.\d+))?$
Regex 101 demo
Assuming
123.4567
123,4567
123 4567
are not valid, you can use:
^[+-]?(?:(?:\d{1,3}(?:,\d{3})*|\d+)(?:\.\d\d)?|(?:\d{1,3}(?:\.\d{3})*|\d+)(?:,\d\d)?|(?:\d{1,3}(?: \d{3})*|\d+)(?:[,.]\d\d)?)$
Demo & explanation

Allow integers from 0 to 59 after ':' in a string [duplicate]

I am trying to validate minutes:seconds input where minutes can be 07 or 7.
I can get 07:35 validated using below but not 7:35. When I process the input I can append a zero if values is less than 9 but want to be able to let users type 7:35 as well.
^([0-5]\d:[0-5]\d$)
You may make the first munite digit optional:
^[0-5]?\d:[0-5]\d$
^
See the regex demo
Details
^ - start of string
[0-5]? - an optional (1 or 0 repetitions) of a 0 to 5 digit
\d - any 1 digit
: - a : char
[0-5] - a digit from 0 to 5
\d - any 1 digit
$ - end of string.

Regex to select all characters that do not match a pattern

I'm weak with regexes but have put together the following regex which selects when my pattern is met, the problem is that i need to select any characters that do not fit the pattern.
/^\d{1,2}[ ]\d{1,2}[ ]\d{1,2}[ ][AB]/i
Correct pattern is:
## ## ## A|B aka [0 < x <= 90]*space*[0 < x <= 90] [0 < x <= 90] [A|B]
EG:
12 34 56 A → good
12 34 56 B → good
12 34 5.6 A → bad - select .
12 34 5.6 C → bad - select . and C
1A 23 45 6 → bad - select A and 6
Edit:
As my impression was that regex is used to perform validation of both characters and pattern/sequence at the same time. The simple question is how to select characters that do not fit the category of non negative numbers, spaces and distinct characters.
Answer 1
Brief
This isn't really realizable with 1 regex due to the nature of the regex. This answer provides a regex that will capture the last incorrect entry. For multiple incorrect entries, a loop must be used. You can correct the incorrect entries by running some code logic on the resulting captured groups to determine why it isn't valid.
My ultimate suggestion would be to split the string by a known delimiter (in this case the space character and then using some logic (or even a small regex) to determine why it's incorrect and how to fix it, as seen in Answer 2.
Non-matches
The following logic is applied in my second answer.
For any users wondering what I did to catch incorrect matches: At the most basic level, all this regex is doing is adding |(.*) to every subsection of the regex. Some sections required additional changes for catching specific invalid string formats, but the |(.*) or slight modifications of this will likely solve anyone else's issues.
Other modifications include:
Using opposite tokens
For example: Matching a digit
Original regex: \d
Opposite regex \D
For example: Matching a digit or whitepace
Original regex: [\d\s]
Opposite regex: [^\d\s]
Note [\D\S] is incorrect as it matches both sets of characters, thus, any non-whitespace or non-digit character (since non-whitespace includes digits and non-digits include whitespace, both will be matched)
Negative lookaheads
For example: Catching up to 31 days in a month
Original regex \b(?:[0-2]?\d|3[01])\b
Opposite regex: \b(?![0-2]?\d\b|3[01]\b)\d+\b
Code
First, creating a more correct regex that also ensures 0 < x <= 90 as per the OP's question.
^(?:(?:[0-8]?\d|90) ){3}[AB]$
See regex in use here
^(?:(?:(?:[0-8]?\d|90) |(\S*) ?)){3}(?:[AB]|(.*))$
Note: This regex uses the mi flags (multiline - assuming input is in that format, and case-insensitive)
Other Formats
Realistically, this following regex would be ideal. Unfortunately, JavaScript doesn't support some of the tokens used in the regex, but I feel it may be useful to the OP or other users that see this question.
See regex in use here
^(?:(?:(?:[0-8]?\d|90) |(?<n>\S*?) |(?<n>\S*?) ?)){3}(?:(?<n>\S*) )?(?:[AB]|(.*))$
Results
Input
The first section (sections separated by the extra newline/break) shows valid strings, while the second shows invalid strings.
0 45 90 A
0 45 90 B
-1 45 90 A
0 45 91 A
12 34 5.6 A
12 34 56 C
1A 23 45 6
11 1A 12 12 A
12 12 A
12 12 A
Output
0 45 90 A VALID
0 45 90 B VALID
-1 45 90 A INVALID: -1
0 45 91 A INVALID: 91
12 34 5.6 A INVALID: 5.6
12 34 56 C INVALID: C
1A 23 45 6 INVALID: 1A, 6
11 1A 12 12 A INVALID: 12 A
12 12 A INVALID: (missing value)
12 12 A INVALID: A, (missing value)
Note: The last entry shows an odd output, but that's due to a limitation with JavaScript's regex engine. The Other Formats section describes this and another method to use to properly catch these cases (using a different regex engine)
Explanation
This uses a simple | (OR) and captures the incorrect matches into a capture group.
^ Assert position at the start of the line
(?:(?:(?:[0-8]?\d|90) |(\S*) ?)){3} Match the following exactly 3 times
(?:(?:[0-8]?\d|90) |(.+)) Match either of the following
(?:[0-8]?\d|90) Match either of the following, followed by a space character literally
[0-8]?\d Match between zero and one of the characters in the set 0-8 (a digit between 0 and 8), followed by any digit
90 Match 90 literally
(\S*) ? Capture any non-whitespace character one or more times into capture group 1, followed by zero or one space character literally
(?:[AB]|(.*)) Match either of the following
[AB] Match any character present in the set (A or B)
(.*) Capture any character any number of times into capture group 2
$ Assert position at the end of the line
Answer 2
Brief
This method splits the string on the given delimiter and tests each section for the proper set of characters. It outputs a message if the value is incorrect. You would likely replace the console outputs with whatever logic you want use.
Code
var arr = [
"0 45 90 A",
"0 45 90 B",
"-1 45 90 A",
"0 45 91 A",
"12 34 5.6 A",
"12 34 56 C",
"1A 23 45 6",
"11 1A 12 12 A",
"12 12 A",
"12 12 A"
];
arr.forEach(function(e) {
var s = e.split(" ");
var l = s.pop();
var numElements = 3;
var maxNum = 90;
var syntaxErrors = [];
if(s.length != numElements) {
syntaxErrors.push(`Invalid number of elements: Number = ${numElements}, Given = ${s.length}`);
}
s.forEach(function(v) {
if(v.match(/\D/)) {
syntaxErrors.push(`Invalid value "${v}" exists`);
} else if(!v.length) {
syntaxErrors.push(`An empty value or double space exists`);
} else if(Number(v) > maxNum) {
syntaxErrors.push(`Value greater than ${maxNum} exists: ${v}`);
}
});
if(l.match(/[^AB]/)) {
syntaxErrors.push(`Last element ${l} in "${e}" is invalid`);
}
if(syntaxErrors.length) {
console.log(`"${e}" [\n\t${syntaxErrors.join('\n\t')}\n]`);
} else {
console.log(`No errors found in "${e}"`);
}
});

Regex match number between 1 and 31 with or without leading 0

I want to setup some validation on an <input> to prevent the user from entering wrong characters. For this I am using ng-pattern. It currently disables the user from entering wrong characters, but I also noticed this is not the expected behavior so I am also planning on creating a directive.
I am using
AngularJS: 1.6.1
What should the regex match
Below are the requirements for the regex string:
Number 0x to xx (example 01 to 93)
Number x to xx (example 9 to 60)
Characters are not allowed
Special characters are not allowed
Notice:
the 'x' is variable and could be any number between 0 and 100.
The number on the place of 'x' is variable so if it is possible to create a string that is easily changeable that would be appreciated!
What I tried
A few regex strings I tried where:
1) ^0*([0-9]\d{1,2})$
--> Does match 01 but not 1
--> Does match 32 where it shouldn't
2) ^[1-9][0-9]?$|^31$
--> Does match 1 but not 01
--> Does match 32 where it shouldn't
For testing I am using https://regex101.com/tests.
What am I missing in my attempts?
If your aim is to match 0 to 100, here's a way, based on the previous solution.
\b(0?[1-9]|[1-9][0-9]|100)\b
Basically, there's 3 parts to that match...
0?[1-9] Addresses numbers 1 to 9, by mentionning that 0 migh be present
[1-9][0-9] covers number 10 to 99, the [1-9] representing the tens
100 covers for 100
Here's an example of it
Where you to require to set the higher boundary to 42, the middle part of the expression would become [1-3][0-9] (covering 10 to 39) and the last part would become 4[0-2] (covering 40 to 42) like so:
\b(0?[1-9]|[1-3][0-9]|4[0-2])\b
This should work:
^(0?[1-9]|[12][0-9]|3[01])$
https://regex101.com/r/BYSDwz/1

Regular Expression Phone Number Validation

I wrote this regular expression for the Lebanese phone number basically it should start with
00961 or +961 which is the international code then the area code which
could be either any digit from 0 to 9 or cellular code "70" or "76" or
"79" then a 6 digit number exactly
I have coded the following reg ex without the 6 digit part :
^(([0][0]|[+])([9][6][1])([0-9]{1}|[7][0]|[7][1]|[7][6]|[7][8]))$
when i want to add code to ensure only 6 digits more are allowed to the expression:
^(([0][0]|[+])([9][6][1])([0-9]{1}|[7][0]|[7][1]|[7][6]|[7][8])([0-9]{6}))$
It Seems to accept 5 or 6 digits not 6 digits exactly
i am having difficulty finding whats wrong
use this regex ((00)|(\+))961((\d)|(7[0168]))\d{6}
Ths is what I would use.
/^(00|\+)961(\d|7[069])\d{6}$/
00 or +
961
a 1-digit number or 70 or 76 or 79
a 6-digit number
The [0-9]{1} will match also the cellular codes 7x since 7 is between 0 and 9. This means that a "5 digit cellular number" will match on a 7 and six more digits.
Try
/^(00961|\+961)([0-9]|70|76|79)\d{6}$/.test( phonenumber );
//^ start of string
// ^^^^^^^^^^^^^ 00961 or +0961
// ^^^^^^^^^^^^^^^^ a digit 0 to 9 or 70 or 76 or 79
// ^^^^^ 6 digits
// ^ end of string
The cellar code is forming a trap, as #ellak points out:
/^((00)|(\+))961((\d)|(7[0168]))\d{6}$/.test("009617612345"); // true
Here the code should breaks like this: 00 961 76 12345,
but the RegEx practically breaks it like this: 00 961 7 612345, because 7 is matched in \d, and the rest is combined, exactly in 6 digits, and matched.
I'm not sure if this is actually valid, but I guess this is not what you want, otherwise the RegEx in your question should work.
Here's a kinda long RegEx that avoids the trap:
/^(00|\+)961([0-68-9]\d{6}|7[234579]\d{5}|7[0168]\d{6})$/
A few test result:
/(00|\+)961([0-68-9]\d{6}|7[234579]\d{5}|7[0168]\d{6})/.test("009617012345")
false
/(00|\+)961([0-68-9]\d{6}|7[234579]\d{5}|7[0168]\d{6})/.test("009618012345")
true
/(00|\+)961([0-68-9]\d{6}|7[234579]\d{5}|7[0168]\d{6})/.test("009617612345")
false
/(00|\+)961([0-68-9]\d{6}|7[234579]\d{5}|7[0168]\d{6})/.test("0096176123456")
true
Just recently, the Lebanese Ministry of Telecommunication has changed area codes on the IMS. So the current Regex matcher becomes:
^(00|\+)961[ -]?(2[1245789]|7[0168]|8[16]|\d)[ -]?\d{6}$
Prefix: 00 OR +
Country code: 961
Area code: 1-digit or 2-digits; including 2*, 7*, 8*..., OR a single digit for Ogero numbers on the old IMS network starting with 0*, and finally older mobile lines starting with 03.
The 6-digit number
News on the961.com

Categories

Resources