Regex to select all characters that do not match a pattern

Regex to select all characters that do not match a pattern - javascript

I'm weak with regexes but have put together the following regex which selects when my pattern is met, the problem is that i need to select any characters that do not fit the pattern.
/^\d{1,2}[ ]\d{1,2}[ ]\d{1,2}[ ][AB]/i
Correct pattern is:
## ## ## A|B aka [0 < x <= 90]*space*[0 < x <= 90] [0 < x <= 90] [A|B]
EG:
12 34 56 A → good
12 34 56 B → good
12 34 5.6 A → bad - select .
12 34 5.6 C → bad - select . and C
1A 23 45 6 → bad - select A and 6
Edit:
As my impression was that regex is used to perform validation of both characters and pattern/sequence at the same time. The simple question is how to select characters that do not fit the category of non negative numbers, spaces and distinct characters.

Answer 1
Brief
This isn't really realizable with 1 regex due to the nature of the regex. This answer provides a regex that will capture the last incorrect entry. For multiple incorrect entries, a loop must be used. You can correct the incorrect entries by running some code logic on the resulting captured groups to determine why it isn't valid.
My ultimate suggestion would be to split the string by a known delimiter (in this case the space character and then using some logic (or even a small regex) to determine why it's incorrect and how to fix it, as seen in Answer 2.
Non-matches
The following logic is applied in my second answer.
For any users wondering what I did to catch incorrect matches: At the most basic level, all this regex is doing is adding |(.*) to every subsection of the regex. Some sections required additional changes for catching specific invalid string formats, but the |(.*) or slight modifications of this will likely solve anyone else's issues.
Other modifications include:
Using opposite tokens
For example: Matching a digit
Original regex: \d
Opposite regex \D
For example: Matching a digit or whitepace
Original regex: [\d\s]
Opposite regex: [^\d\s]
Note [\D\S] is incorrect as it matches both sets of characters, thus, any non-whitespace or non-digit character (since non-whitespace includes digits and non-digits include whitespace, both will be matched)
Negative lookaheads
For example: Catching up to 31 days in a month
Original regex \b(?:[0-2]?\d|3[01])\b
Opposite regex: \b(?![0-2]?\d\b|3[01]\b)\d+\b
Code
First, creating a more correct regex that also ensures 0 < x <= 90 as per the OP's question.
^(?:(?:[0-8]?\d|90) ){3}[AB]$
See regex in use here
^(?:(?:(?:[0-8]?\d|90) |(\S*) ?)){3}(?:[AB]|(.*))$
Note: This regex uses the mi flags (multiline - assuming input is in that format, and case-insensitive)
Other Formats
Realistically, this following regex would be ideal. Unfortunately, JavaScript doesn't support some of the tokens used in the regex, but I feel it may be useful to the OP or other users that see this question.
See regex in use here
^(?:(?:(?:[0-8]?\d|90) |(?<n>\S*?) |(?<n>\S*?) ?)){3}(?:(?<n>\S*) )?(?:[AB]|(.*))$
Results
Input
The first section (sections separated by the extra newline/break) shows valid strings, while the second shows invalid strings.
0 45 90 A
0 45 90 B
-1 45 90 A
0 45 91 A
12 34 5.6 A
12 34 56 C
1A 23 45 6
11 1A 12 12 A
12 12 A
12 12 A
Output
0 45 90 A VALID
0 45 90 B VALID
-1 45 90 A INVALID: -1
0 45 91 A INVALID: 91
12 34 5.6 A INVALID: 5.6
12 34 56 C INVALID: C
1A 23 45 6 INVALID: 1A, 6
11 1A 12 12 A INVALID: 12 A
12 12 A INVALID: (missing value)
12 12 A INVALID: A, (missing value)
Note: The last entry shows an odd output, but that's due to a limitation with JavaScript's regex engine. The Other Formats section describes this and another method to use to properly catch these cases (using a different regex engine)
Explanation
This uses a simple | (OR) and captures the incorrect matches into a capture group.
^ Assert position at the start of the line
(?:(?:(?:[0-8]?\d|90) |(\S*) ?)){3} Match the following exactly 3 times
(?:(?:[0-8]?\d|90) |(.+)) Match either of the following
(?:[0-8]?\d|90) Match either of the following, followed by a space character literally
[0-8]?\d Match between zero and one of the characters in the set 0-8 (a digit between 0 and 8), followed by any digit
90 Match 90 literally
(\S*) ? Capture any non-whitespace character one or more times into capture group 1, followed by zero or one space character literally
(?:[AB]|(.*)) Match either of the following
[AB] Match any character present in the set (A or B)
(.*) Capture any character any number of times into capture group 2
$ Assert position at the end of the line
Answer 2
Brief
This method splits the string on the given delimiter and tests each section for the proper set of characters. It outputs a message if the value is incorrect. You would likely replace the console outputs with whatever logic you want use.
Code
var arr = [
"0 45 90 A",
"0 45 90 B",
"-1 45 90 A",
"0 45 91 A",
"12 34 5.6 A",
"12 34 56 C",
"1A 23 45 6",
"11 1A 12 12 A",
"12 12 A",
"12 12 A"
];
arr.forEach(function(e) {
var s = e.split(" ");
var l = s.pop();
var numElements = 3;
var maxNum = 90;
var syntaxErrors = [];
if(s.length != numElements) {
syntaxErrors.push(`Invalid number of elements: Number = ${numElements}, Given = ${s.length}`);
}
s.forEach(function(v) {
if(v.match(/\D/)) {
syntaxErrors.push(`Invalid value "${v}" exists`);
} else if(!v.length) {
syntaxErrors.push(`An empty value or double space exists`);
} else if(Number(v) > maxNum) {
syntaxErrors.push(`Value greater than ${maxNum} exists: ${v}`);
}
});
if(l.match(/[^AB]/)) {
syntaxErrors.push(`Last element ${l} in "${e}" is invalid`);
}
if(syntaxErrors.length) {
console.log(`"${e}" [\n\t${syntaxErrors.join('\n\t')}\n]`);
} else {
console.log(`No errors found in "${e}"`);
}
});

Related

Regex to match numbers from large document in Javascript

Trying to create a regex that could match numbers from large document.
Find at least 10 continuous digits (which can go to maximum 15 digits) that could be separated by one or multiple
-
_
\s
(
)
[
]
Tried-
/(?:((\d([ \-_\s]+?)){5,8}))/
Eg:
1-2-3-4-5-6-7-8-9-0-12-34
1 2 3 4 5 6 7 8 9 0
123-456-789-0
123---456---789---987
12 34 56 78 90
12_ -34_-56--78__90

You may use
/\d(?:[-_\][()\s]*\d){9,14}/g
See the regex demo
Details
\d - a digit
(?:[-_\][()\s]*\d){9,14} - 9 to 14 repetitions of
[-_\][()\s]* - 0 or more repetitions of -, _, ], [, (, ) or whitespace
\d - a digit.
Note you do not need to escape [ inside a character class, it is parsed as a literal [ in a JS regex. However, ] must be escaped there, otherwise, it will close the character class prematurely.

What is the regex that properly splits SVG 'd' attributes into tokens?

I am trying to split the d attribute on a path tag in an svg file into tokens.
This one is relatively easy:
d = "M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7"
tokens = d.split(/[\s,]/)
But this is also a valid d attribute:
d = "M2-12C5,15,21,19,27-2C17,12-3,40,5,7"
The tricky parts are letters and numbers are no longer separated and negative numbers use only the negative sign as the separator. How can I create a regex that handles this?
The rules seem to be:
split wherever there is white space or a comma
split numerics from letters (and keep "-" with the numeric)
I know I can use lookaround, for example:
tokens = pathdef.split(/(?<=\d)(?=\D)|(?<=\D)(?=\d)/)
I'm having trouble forming a single regex that also splits on the minus signs and keeps the minus sign with the numbers.
The above code should tokenize as follows:
[ 'M', '2', '-12', 'C', '5', '15', '21', '19', '27', '-2', 'C', '17', '12', '-3', '40', '5', '7' ]

Brief
Unfortunately, JavaScript doesn't allow lookbehinds, so your options are fairly limited and the regex in the Other Regex Engines section below will not work for you (albeit it will with some other regex engines).
Other Regex Engines
Note: The regex in this section (Other Regex Engines) will not work in Javascript. See the JavaScript solution in the Code section instead.
I think with your original regex you were trying to get to:
[, ]|(?<![, ])(?=-|(?<=[a-z])\d|(?<=\d)[a-z])
This regex allows you to split on those matches (, or , or locations that are followed by -, or locations where a letter precedes a digit or locations where a digit precedes a letter).
Code
var a = [
"M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7",
"M2-12C5,15,21,19,27-2C17,12-3,40,5,7"
]
var r = /-?(?:\d*\.)?\d+|[a-z]/gi
a.forEach(function(s){
console.log(s.match(r));
});
Explanation
-?\d+(?:\.\d+)?|[a-z] Match either of the following
-?\d+(?:\.\d+)?
-? Match - literally zero or one time
(?:\d*\.)? Match the following zero or one time
\d* Match any number of digits
\. Match a literal dot
\d+ Match one or more digits
[a-z] Match any character in the range from a-z (any lowercase alpha character - since i modifier is used this also matches uppercase variants of those letters)
I added (?:\d*\.)? because (to the best of my knowledge) you can have decimal number values in SVG d attributes.
Note: Changed the original regex portion of \d+(?:\.\d+)? to (?:\d*\.)?\d+ in order to catch numbers that don't have the whole number part such as .5 as per #Thomas (see comments below question).

You could go for
-?\d+|[A-Z]
See a demo on regex101.com.
Here, instead of splitting, you could very well just match them:
matches = "M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7".match(/-?\d+|[A-Z]/g)
# matches holds the different tokens

Julian Day Regex Without Leading Zeros

I am really struggling with creating a Reg Ex for Julian Day that does not allow leading zeros.
I am using it for Input Validation using JavaScript Reg Ex. It needs to match 1 through 366.
Example Matches:
1
99
366
159
Example Match Failures:
01
001
099
367
999
0
I tried this on regex101:
^[1-9]|[1-9][0-9]|[1-3][0-5][0-9]|36[0-6]$
But for I am not getting the optional parts down right. So when I put in 266, I get a match on 2 and 66. (This issue is translating to my input validation control.)
I thought about trying to use + for one or more, but I need to not allow leading zeros, so that does not work.
I have read the guidance on asking a RegEx question and tried to follow it, but if I missed something, please let me know and I will update my question.

The main issues are two: 1) the alternatives should have been grouped so that ^ and $ anchors could be applied to all of them, 2) the [1-3][0-5][0-9] part did not match 160 to 199 and 260 to 299, this part should have been split into two separate branches, [12][0-9]{2}|3[0-5][0-9].
You may use
^(?:[1-9]|[1-9][0-9]|[12][0-9]{2}|3[0-5][0-9]|36[0-6])$
See the regex demo.
Details
^ - start of string
(?: - group of alternatives:
[1-9] - 1 to 9
| - or
[1-9][0-9] - 10 to 99
| - or
[12][0-9]{2} - 100 to 299
| - or
3[0-5][0-9] - 300 to 359
| - or
36[0-6] - 360 to 366
) - end of the alternation group
$ - and the end of the string.

Regex match number between 1 and 31 with or without leading 0

I want to setup some validation on an <input> to prevent the user from entering wrong characters. For this I am using ng-pattern. It currently disables the user from entering wrong characters, but I also noticed this is not the expected behavior so I am also planning on creating a directive.
I am using
AngularJS: 1.6.1
What should the regex match
Below are the requirements for the regex string:
Number 0x to xx (example 01 to 93)
Number x to xx (example 9 to 60)
Characters are not allowed
Special characters are not allowed
Notice:
the 'x' is variable and could be any number between 0 and 100.
The number on the place of 'x' is variable so if it is possible to create a string that is easily changeable that would be appreciated!
What I tried
A few regex strings I tried where:
1) ^0*([0-9]\d{1,2})$
--> Does match 01 but not 1
--> Does match 32 where it shouldn't
2) ^[1-9][0-9]?$|^31$
--> Does match 1 but not 01
--> Does match 32 where it shouldn't
For testing I am using https://regex101.com/tests.
What am I missing in my attempts?

If your aim is to match 0 to 100, here's a way, based on the previous solution.
\b(0?[1-9]|[1-9][0-9]|100)\b
Basically, there's 3 parts to that match...
0?[1-9] Addresses numbers 1 to 9, by mentionning that 0 migh be present
[1-9][0-9] covers number 10 to 99, the [1-9] representing the tens
100 covers for 100
Here's an example of it
Where you to require to set the higher boundary to 42, the middle part of the expression would become [1-3][0-9] (covering 10 to 39) and the last part would become 4[0-2] (covering 40 to 42) like so:
\b(0?[1-9]|[1-3][0-9]|4[0-2])\b

This should work:
^(0?[1-9]|[12][0-9]|3[01])$
https://regex101.com/r/BYSDwz/1

Regular Expression Phone Number Validation

I wrote this regular expression for the Lebanese phone number basically it should start with
00961 or +961 which is the international code then the area code which
could be either any digit from 0 to 9 or cellular code "70" or "76" or
"79" then a 6 digit number exactly
I have coded the following reg ex without the 6 digit part :
^(([0][0]|[+])([9][6][1])([0-9]{1}|[7][0]|[7][1]|[7][6]|[7][8]))$
when i want to add code to ensure only 6 digits more are allowed to the expression:
^(([0][0]|[+])([9][6][1])([0-9]{1}|[7][0]|[7][1]|[7][6]|[7][8])([0-9]{6}))$
It Seems to accept 5 or 6 digits not 6 digits exactly
i am having difficulty finding whats wrong

use this regex ((00)|(\+))961((\d)|(7[0168]))\d{6}

Ths is what I would use.
/^(00|\+)961(\d|7[069])\d{6}$/
00 or +
961
a 1-digit number or 70 or 76 or 79
a 6-digit number

The [0-9]{1} will match also the cellular codes 7x since 7 is between 0 and 9. This means that a "5 digit cellular number" will match on a 7 and six more digits.

Try
/^(00961|\+961)([0-9]|70|76|79)\d{6}$/.test( phonenumber );
//^ start of string
// ^^^^^^^^^^^^^ 00961 or +0961
// ^^^^^^^^^^^^^^^^ a digit 0 to 9 or 70 or 76 or 79
// ^^^^^ 6 digits
// ^ end of string

The cellar code is forming a trap, as #ellak points out:
/^((00)|(\+))961((\d)|(7[0168]))\d{6}$/.test("009617612345"); // true
Here the code should breaks like this: 00 961 76 12345,
but the RegEx practically breaks it like this: 00 961 7 612345, because 7 is matched in \d, and the rest is combined, exactly in 6 digits, and matched.
I'm not sure if this is actually valid, but I guess this is not what you want, otherwise the RegEx in your question should work.
Here's a kinda long RegEx that avoids the trap:
/^(00|\+)961([0-68-9]\d{6}|7[234579]\d{5}|7[0168]\d{6})$/
A few test result:
/(00|\+)961([0-68-9]\d{6}|7[234579]\d{5}|7[0168]\d{6})/.test("009617012345")
false
/(00|\+)961([0-68-9]\d{6}|7[234579]\d{5}|7[0168]\d{6})/.test("009618012345")
true
/(00|\+)961([0-68-9]\d{6}|7[234579]\d{5}|7[0168]\d{6})/.test("009617612345")
false
/(00|\+)961([0-68-9]\d{6}|7[234579]\d{5}|7[0168]\d{6})/.test("0096176123456")
true

Just recently, the Lebanese Ministry of Telecommunication has changed area codes on the IMS. So the current Regex matcher becomes:
^(00|\+)961[ -]?(2[1245789]|7[0168]|8[16]|\d)[ -]?\d{6}$
Prefix: 00 OR +
Country code: 961
Area code: 1-digit or 2-digits; including 2*, 7*, 8*..., OR a single digit for Ogero numbers on the old IMS network starting with 0*, and finally older mobile lines starting with 03.
The 6-digit number
News on the961.com

Develop Reference

JavaScript is the programming language of the Web.

Regex to select all characters that do not match a pattern - javascript

Related

Regex to match numbers from large document in Javascript

What is the regex that properly splits SVG 'd' attributes into tokens?

Julian Day Regex Without Leading Zeros

Regex match number between 1 and 31 with or without leading 0

Regular Expression Phone Number Validation

Categories

Resources