Regex to match numbers from large document in Javascript

Regex to match numbers from large document in Javascript - javascript

Trying to create a regex that could match numbers from large document.
Find at least 10 continuous digits (which can go to maximum 15 digits) that could be separated by one or multiple
-
_
\s
(
)
[
]
Tried-
/(?:((\d([ \-_\s]+?)){5,8}))/
Eg:
1-2-3-4-5-6-7-8-9-0-12-34
1 2 3 4 5 6 7 8 9 0
123-456-789-0
123---456---789---987
12 34 56 78 90
12_ -34_-56--78__90

You may use
/\d(?:[-_\][()\s]*\d){9,14}/g
See the regex demo
Details
\d - a digit
(?:[-_\][()\s]*\d){9,14} - 9 to 14 repetitions of
[-_\][()\s]* - 0 or more repetitions of -, _, ], [, (, ) or whitespace
\d - a digit.
Note you do not need to escape [ inside a character class, it is parsed as a literal [ in a JS regex. However, ] must be escaped there, otherwise, it will close the character class prematurely.

Related

HTML pattern for input that allows only numbers with maximum of 1 dot or (1 or 2 spaces)

I am trying to create pattern in angular for input to allow only numbers with 1 dot or (1 or 2 spaces) i.e. it should allow
Decimal Degrees like 45 or 34.1234 or -91.5
Degrees Minutes like 12 13
Degrees Minutes Seconds like 12 13 5
<input type="text" required pattern="/^[\d](\.| )?[\d]( )?[\d]">
I am looking for pattern in above code.
Thanks in advance.

as per the documentation for the HTML pattern-attribute
you shouldn't use the usual JavaScript/PHP/etc.
/^$/ delimiters
as pattern assumes
^(?: at the beginning and )$ at the end
ie.
pattern="[a-z]"
will accept all inputs containing only lowercase a-z characters
so here's a full pattern regexp that will do what you ask
pattern="(-)?\d{1,}|(-)?\d{1,}(\.)\d{1,}|(-)?\d{1,}(\s)(-)?\d{1,}|(-)?\d{1,}(\s)(-)?\d{1,}(\s)(-)?\d{1,}"
explanation:
(-)? optional negative sign
\d{1,} matches any number (ie. 3 or 235 but not 3.7)
| or
\d{1,} matches any number
(\.)\d{1,} followed by a single period followed by a number (ie. 4.34 or 5.98 but not 3.1.4 or 43..2)
| or
\d{1,} matches any number
(\s)\d{1,} followed by a single space followed by a number (ie. 3 5 or 654 34 but not 3 1 4 or 43 2)
| or
\d{1,} matches any number
(\s)\d{1,} followed by a single space followed by a number
(\s)\d{1,} followed by a single space followed by a number (ie. 5 2 65 or 543 23 1 but not 23 32 45 654 or 876 65 34)
// (-)? optional negative sign
// {1,} values can be more than 1 character (required or it will only accept single digits ie. 3 or 5.6 or 7 5 or 8 4 6 but not 43 or 23.46 or 12 76 or 35 65 78)
<form>
<label for="numbers">number or number with decimals or 2-3 numbers seperated by 1 space</label>
<input id="numbers" name="numbers" required pattern="(-)?\d{1,}|(-)?\d{1,}(\.)\d{1,}|(-)?\d{1,}(\s)(-)?\d{1,}|(-)?\d{1,}(\s)(-)?\d{1,}(\s)(-)?\d{1,}">
<button>Submit</button>
</form>

Regex for only positive decimals from 0 to 1 upto 2 decimal digits only that works without sign in JavaScript? [duplicate]

This question already has answers here:
Regex decimal values between 0 and 1 up to 4 decimal places
(2 answers)
Closed 2 years ago.
I basically want a regex for alpha in RGBA which is always a positive number between 0 & 1. However, I want it to only be upto 2 digits like 0.53 & not more than that like 0.536.
Allowed
Anything between 0 to 1 but only upto 2 decimal places
0
0.0
0.00
0.1
0.12
0.34
1
1.0
1.00
Not Allowed
Anything outside of 0 to 1 & if its between 0 to 1 then it should be less than or equal to 2 decimal places only & even signs not allowed
0.123
90
3
-1
+1
I noticed other similar questions but they allow signs or they allow more than 2 decimal places.
Currently, I have a regex like /^(0+\.?|0*\.\d+|0*1(\.0*)?)$/ which allows for more than 2 decimal places. How do I solve it?

You may use:
^(?:0(?:\.[0-9]{1,2})?|1(?:\.00?)?)$
RegEx Demo
RegEx Details:
^: Start
(?:: Start a non-capture group
0: Match 0
(?:\.[0-9]{1,2})?: Match optional dot followed by 1 or 2 digits
|: OR
1: Match 1
(?:\.00?)?: Match optional 1 or 2 zeroes after dot
): End non-capture group
$: End

Alternatively, try:
^(?!1..?[1-9])[01](?:\.\d\d?)?$
See the online demo
^ - Start string anchor.
(?! - Open negative lookahead:
1..? - A literal "1" followed by any character other than newline and an optional one.
[1-9]- Match a digit ranging from 1-9.
) - Close negative lookahead.
[01] - Match a zero or one.
(?: - Open non-capture group:
\.\d\d? - Match a literal dot, a single digit and an optional one.
)? - Close non-capturing group and make it optional.
$ - End string anchor.

What is the regex that properly splits SVG 'd' attributes into tokens?

I am trying to split the d attribute on a path tag in an svg file into tokens.
This one is relatively easy:
d = "M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7"
tokens = d.split(/[\s,]/)
But this is also a valid d attribute:
d = "M2-12C5,15,21,19,27-2C17,12-3,40,5,7"
The tricky parts are letters and numbers are no longer separated and negative numbers use only the negative sign as the separator. How can I create a regex that handles this?
The rules seem to be:
split wherever there is white space or a comma
split numerics from letters (and keep "-" with the numeric)
I know I can use lookaround, for example:
tokens = pathdef.split(/(?<=\d)(?=\D)|(?<=\D)(?=\d)/)
I'm having trouble forming a single regex that also splits on the minus signs and keeps the minus sign with the numbers.
The above code should tokenize as follows:
[ 'M', '2', '-12', 'C', '5', '15', '21', '19', '27', '-2', 'C', '17', '12', '-3', '40', '5', '7' ]

Brief
Unfortunately, JavaScript doesn't allow lookbehinds, so your options are fairly limited and the regex in the Other Regex Engines section below will not work for you (albeit it will with some other regex engines).
Other Regex Engines
Note: The regex in this section (Other Regex Engines) will not work in Javascript. See the JavaScript solution in the Code section instead.
I think with your original regex you were trying to get to:
[, ]|(?<![, ])(?=-|(?<=[a-z])\d|(?<=\d)[a-z])
This regex allows you to split on those matches (, or , or locations that are followed by -, or locations where a letter precedes a digit or locations where a digit precedes a letter).
Code
var a = [
"M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7",
"M2-12C5,15,21,19,27-2C17,12-3,40,5,7"
]
var r = /-?(?:\d*\.)?\d+|[a-z]/gi
a.forEach(function(s){
console.log(s.match(r));
});
Explanation
-?\d+(?:\.\d+)?|[a-z] Match either of the following
-?\d+(?:\.\d+)?
-? Match - literally zero or one time
(?:\d*\.)? Match the following zero or one time
\d* Match any number of digits
\. Match a literal dot
\d+ Match one or more digits
[a-z] Match any character in the range from a-z (any lowercase alpha character - since i modifier is used this also matches uppercase variants of those letters)
I added (?:\d*\.)? because (to the best of my knowledge) you can have decimal number values in SVG d attributes.
Note: Changed the original regex portion of \d+(?:\.\d+)? to (?:\d*\.)?\d+ in order to catch numbers that don't have the whole number part such as .5 as per #Thomas (see comments below question).

You could go for
-?\d+|[A-Z]
See a demo on regex101.com.
Here, instead of splitting, you could very well just match them:
matches = "M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7".match(/-?\d+|[A-Z]/g)
# matches holds the different tokens

Regex to select all characters that do not match a pattern

I'm weak with regexes but have put together the following regex which selects when my pattern is met, the problem is that i need to select any characters that do not fit the pattern.
/^\d{1,2}[ ]\d{1,2}[ ]\d{1,2}[ ][AB]/i
Correct pattern is:
## ## ## A|B aka [0 < x <= 90]*space*[0 < x <= 90] [0 < x <= 90] [A|B]
EG:
12 34 56 A → good
12 34 56 B → good
12 34 5.6 A → bad - select .
12 34 5.6 C → bad - select . and C
1A 23 45 6 → bad - select A and 6
Edit:
As my impression was that regex is used to perform validation of both characters and pattern/sequence at the same time. The simple question is how to select characters that do not fit the category of non negative numbers, spaces and distinct characters.

Answer 1
Brief
This isn't really realizable with 1 regex due to the nature of the regex. This answer provides a regex that will capture the last incorrect entry. For multiple incorrect entries, a loop must be used. You can correct the incorrect entries by running some code logic on the resulting captured groups to determine why it isn't valid.
My ultimate suggestion would be to split the string by a known delimiter (in this case the space character and then using some logic (or even a small regex) to determine why it's incorrect and how to fix it, as seen in Answer 2.
Non-matches
The following logic is applied in my second answer.
For any users wondering what I did to catch incorrect matches: At the most basic level, all this regex is doing is adding |(.*) to every subsection of the regex. Some sections required additional changes for catching specific invalid string formats, but the |(.*) or slight modifications of this will likely solve anyone else's issues.
Other modifications include:
Using opposite tokens
For example: Matching a digit
Original regex: \d
Opposite regex \D
For example: Matching a digit or whitepace
Original regex: [\d\s]
Opposite regex: [^\d\s]
Note [\D\S] is incorrect as it matches both sets of characters, thus, any non-whitespace or non-digit character (since non-whitespace includes digits and non-digits include whitespace, both will be matched)
Negative lookaheads
For example: Catching up to 31 days in a month
Original regex \b(?:[0-2]?\d|3[01])\b
Opposite regex: \b(?![0-2]?\d\b|3[01]\b)\d+\b
Code
First, creating a more correct regex that also ensures 0 < x <= 90 as per the OP's question.
^(?:(?:[0-8]?\d|90) ){3}[AB]$
See regex in use here
^(?:(?:(?:[0-8]?\d|90) |(\S*) ?)){3}(?:[AB]|(.*))$
Note: This regex uses the mi flags (multiline - assuming input is in that format, and case-insensitive)
Other Formats
Realistically, this following regex would be ideal. Unfortunately, JavaScript doesn't support some of the tokens used in the regex, but I feel it may be useful to the OP or other users that see this question.
See regex in use here
^(?:(?:(?:[0-8]?\d|90) |(?<n>\S*?) |(?<n>\S*?) ?)){3}(?:(?<n>\S*) )?(?:[AB]|(.*))$
Results
Input
The first section (sections separated by the extra newline/break) shows valid strings, while the second shows invalid strings.
0 45 90 A
0 45 90 B
-1 45 90 A
0 45 91 A
12 34 5.6 A
12 34 56 C
1A 23 45 6
11 1A 12 12 A
12 12 A
12 12 A
Output
0 45 90 A VALID
0 45 90 B VALID
-1 45 90 A INVALID: -1
0 45 91 A INVALID: 91
12 34 5.6 A INVALID: 5.6
12 34 56 C INVALID: C
1A 23 45 6 INVALID: 1A, 6
11 1A 12 12 A INVALID: 12 A
12 12 A INVALID: (missing value)
12 12 A INVALID: A, (missing value)
Note: The last entry shows an odd output, but that's due to a limitation with JavaScript's regex engine. The Other Formats section describes this and another method to use to properly catch these cases (using a different regex engine)
Explanation
This uses a simple | (OR) and captures the incorrect matches into a capture group.
^ Assert position at the start of the line
(?:(?:(?:[0-8]?\d|90) |(\S*) ?)){3} Match the following exactly 3 times
(?:(?:[0-8]?\d|90) |(.+)) Match either of the following
(?:[0-8]?\d|90) Match either of the following, followed by a space character literally
[0-8]?\d Match between zero and one of the characters in the set 0-8 (a digit between 0 and 8), followed by any digit
90 Match 90 literally
(\S*) ? Capture any non-whitespace character one or more times into capture group 1, followed by zero or one space character literally
(?:[AB]|(.*)) Match either of the following
[AB] Match any character present in the set (A or B)
(.*) Capture any character any number of times into capture group 2
$ Assert position at the end of the line
Answer 2
Brief
This method splits the string on the given delimiter and tests each section for the proper set of characters. It outputs a message if the value is incorrect. You would likely replace the console outputs with whatever logic you want use.
Code
var arr = [
"0 45 90 A",
"0 45 90 B",
"-1 45 90 A",
"0 45 91 A",
"12 34 5.6 A",
"12 34 56 C",
"1A 23 45 6",
"11 1A 12 12 A",
"12 12 A",
"12 12 A"
];
arr.forEach(function(e) {
var s = e.split(" ");
var l = s.pop();
var numElements = 3;
var maxNum = 90;
var syntaxErrors = [];
if(s.length != numElements) {
syntaxErrors.push(`Invalid number of elements: Number = ${numElements}, Given = ${s.length}`);
}
s.forEach(function(v) {
if(v.match(/\D/)) {
syntaxErrors.push(`Invalid value "${v}" exists`);
} else if(!v.length) {
syntaxErrors.push(`An empty value or double space exists`);
} else if(Number(v) > maxNum) {
syntaxErrors.push(`Value greater than ${maxNum} exists: ${v}`);
}
});
if(l.match(/[^AB]/)) {
syntaxErrors.push(`Last element ${l} in "${e}" is invalid`);
}
if(syntaxErrors.length) {
console.log(`"${e}" [\n\t${syntaxErrors.join('\n\t')}\n]`);
} else {
console.log(`No errors found in "${e}"`);
}
});

Specific Phone number regex

I'm wondering if there is a way to have a regex for a phone number that takes the following forms using Javascript:
0610101010 => 10 digits that starts with 06 or 05.
+212565656566 => (+) followed by 212 then another 9 digits.
Thank You.

This should work:
/^(0(6|5)\d{8}|\+212\d{9})$/

Try something like this:
/0(5|6)[0-9]{8}|\+212[0-9]{9}/
Explained:
/ - The start of the regex
0 - Matches a zero
(5|6) - Matches a five or six
[0-9]{8} - Matches eight characters in the range zero to nine
| - Second expression starts here
\+212 - Matches a plus followed by 212
[0-9]{9} - Matches nine characters in the range zero to nine
/ - End of regex

Develop Reference

JavaScript is the programming language of the Web.

Regex to match numbers from large document in Javascript - javascript

Related

HTML pattern for input that allows only numbers with maximum of 1 dot or (1 or 2 spaces)

Regex for only positive decimals from 0 to 1 upto 2 decimal digits only that works without sign in JavaScript? [duplicate]

What is the regex that properly splits SVG 'd' attributes into tokens?

Regex to select all characters that do not match a pattern

Specific Phone number regex

Categories

Resources