Regex matching multiple numbers in a string - javascript

I would like to extract numbers from a string such as
There are 1,000 people in those 3 towns.
and get an array like ["1,000", "3"].
I got the following number matching Regex from Justin in this question
^[+-]?(\d*|\d{1,3}(,\d{3})*)(\.\d+)?\b$
This works great for checking if it is a number but to make it work on a sentence you need to remove the "^" and "$".
regex101 with start/end defined
regex101 without start/end defined
Without the start and end defined you get a bunch of 0 length matches these can easily be discarded but it also now splits any numbers with a comma in them.
How do I make that regex (or a new regex) work on sentences and still find numbers with commas in them.
A bonus would be not having all the 0 length matches as well.

The expression /-?\d(?:[,\d]*\.\d+|[,\d]*)/g should do it, if you're okay with allowing different groups such as 1,00,000 (which isn't unknown in some locales). I feel like I should be able to simplify that further, but when I try the example "333.33" gets broken up into "333" and "33" as separate numbers. With the above it's kept together.
Live Example:
const str = "There are 10,000 people in those 3 towns. That's 3,333.33 people per town, roughly. Which is about -67.33 from last year.";
const rex = /-?\d(?:[,\d]*\.\d+|[,\d]*)/g;
let match;
while ((match = rex.exec(str)) !== null) {
console.log(match[0]);
}
Breaking /\d(?:[,\d]*\.\d+|[,\d]*)/g down:
-? - an optional minus sign (thank you to x15 for flagging that up in his/her answer!)
\d - a digit
(?:...|...) - a non-capturing group containing an alternation between
[,\d]*\.\d+ - zero or more commas and digits followed by a . and one or more digits, e.g. 3,333.33; or
[,\d]* - zero or more commas and digits
The first alternative will match greedily, falling back to the second alternative if there's no decimal point.

One alternate approach is to split with space and see if the value can be parsed to a number,
let numberExtractor = str => str.split(/\s+/)
.filter(v => v && parseFloat(v.replace(/[.,]/g, '')))
console.log(numberExtractor('There are 1,000 people in those 3 towns. some more numbers -23.012 1,00,000,00'))

To match integer and decimal numbers where the whole part can have optional
comma's that are between numbers but not in the decimal part is done like this:
/[+-]?(?:(?:\d(?:,(?=\d))?)+(?:\.\d*)?|\.\d+)/
https://regex101.com/r/yOuBPx/1
The input sample does not reflect all the boundary conditions this regex handles.
Best to experiment to see it's full effect.

Related

How can I write the Javascript Regular Expression pattern to handle these conditions

In My exercise, I'm given a task to write a regular expression pattern that validates username input for storage into database.
Usernames can only use alpha-numeric characters.
The only numbers in the username have to be at the end. There can be zero or more of them at the end. Username cannot start with the number.
Username letters can be lowercase and uppercase.
Usernames have to be at least two characters long. A two-character username can only use alphabet letters as characters.
I succeeded to pass all tests except one
A1 is not supposed to match the patern
let userCheck = /^[A-Za-z]+\w\d*$/;
let result = userCheck.test(username);
You can use an alternation after ^[a-z] the first letter to require either [a-z]+ one or more letters followed by \d* any amount of digits | OR \d{2,} two or more digits up to $ end of the string.
let userCheck = /^[a-z](?:[a-z]+\d*|\d{2,})$/i;
See this demo at regex101 - Used with the i-flag (ignore case) to shorten [A-Za-z] to [a-z].
PS: Just updated my answer at some late cup of coffee ☕🌙. Had previously misread the question and removed my answer in meanwhile. I would also have missed the part with e.g. Z97 which I just read at the other answers comments. It's much more of a challenge than at first glance... obviously :)
Edit:
My first answer did not fully solve the task. This regex does:
^([A-Za-z]{2}|[A-Za-z]\w{2,})$
it matches either two characters, or one character followed by at least two characters and/or digits (\w == [A-Za-z0-9]). See the demo here: https://regex101.com/r/sh6UpX/1
First answer (incorrect)
This works for your description:
let userCheck = /^[A-Za-z]{2,}\d*$/;
let result = userCheck.test(username);
Let me explain what went wrong in your regex:
/^[A-Za-z]+\w\d*$/
You correctly match, that the first character is only a letter. The '+' however only ensures, that it is matched at least one time. If you want to match something an exact number of times, you can append '{x}' to your match-case. If you rather want to match a minimum and maximum amount of times, you can append '{min, max}'. In your case, you only have a lower limit (2 times), so the max stays empty and means unlimited times: {2,}
After your [2 - unlimited] letters, you want to have [0 - unlimited] numbers. '\w' also matches letters, so we can just remove it. The end of your regex was correct, as '\d' matches any digit, and '*' quantifies the previous match for the range [0 - unlimited].
I recommend using regex101.com for testing and developing regex patterns. You can test your strings and get very good documentation and explanation about all the tags. I added my regex with some example strings to this link:
https://regex101.com/r/qPmwhG/1
The strings that match will be highlighted, the others stay without highlighting.

Match floats but not dates separated by dot

I'm trying to build a regex which can find floats (using a dot or comma as decimal separator) in a string. I ended up with the following:
/([0-9]+[.,]{1}[0-9]+)/g
Which seems to work fine expect that it matches dates separated by . as well:
02.01.2000 // Matches 12.34
I've tried ([0-9]+[.,]{1}[0-9]+)(?![.,]) but that does not work as I expected it :)
How would I omit the date case, but still pass the following scenarios:
I tried some stuff 12.23
D12.34
12.34USD
2.3%
12,2
\n12.1
You can use this regex using alternation:
(?:\d+\.){2}\d+|(\d+[.,]\d+)
and extract your matches from captured group #1.
This regex basically matches and discards date strings on LHS of alternation and then matches and captures floating numbers on RHS of alternation.
RegEx Demo
Code:
const regex = /(?:\d+\.){2}\d+|(\d+[.,]\d+)/gm;
const str = `I tried some stuff 12.23
D12.34
12.34USD 02.01.2000
2.3%
12,2 02.01.2000
\\n12.1`;
let m;
let results = [];
while ((m = regex.exec(str)) !== null) {
if (m[1])
results.push( m[1] );
}
console.log(results);
You want to make sure that it isn't surrounded with more ",." or numbers. Is that right?
/[^,.0-9]([0-9]+[.,]{1}[0-9]+)[^,.0-9]/g
Given the following:
hi 1.3$ is another 1.2.3
This is a date 02.01.2000
But this 5.30USD is a number.
But a number at the end of a sentance 1.7.
Or a number comma number in a list 4,3,4,5.
It will match "1.3" and "5.30"
Using #anubhava's example (?:\d+\.){2}\d+|(\d+[.,]\d+) You get the following result:
It will match "1.3", "5.30", "1.7", "4,3", and "4,5"
Moral of the story is you need to think through all the possible scenarios and understand how you want to treat each scenario. The last line should be a list of 4 numbers. But is "4,3" by itself two separate numbers or from a country where they use commas to denote decimal?

Regex exact match on number, not digit

I have a scenario where I need to find and replace a number in a large string using javascript. Let's say I have the number 2 and I want to replace it with 3 - it sounds pretty straight forward until I get occurrences like 22, 32, etc.
The string may look like this:
"note[2] 2 2_ someothertext_2 note[32] 2finally_2222 but how about mymomsays2."
I want turn turn it into this:
"note[3] 3 3_ someothertext_3 note[32] 3finally_2222 but how about mymomsays3."
Obviously this means .replace('2','3') is out of the picture so I went to regex. I find it easy to get an exact match when I am dealing with string start to end ie: /^2$/g. But that is not what I have. I tried grouping, digit only, wildcards, etc and I can't get this to match correctly.
Any help on how to exactly match a number (where 0 <= number <= 500 is possible, but no constraints needed in regex for range) would be greatly appreciated.
The task is to find (and replace) "single" digit 2, not embedded in
a number composed of multiple digits.
In regex terms, this can be expressed as:
Match digit 2.
Previous char (if any) can not be a digit.
Next char (if any) can not be a digit.
The regex for the first condition is straightforward - just 2.
In other flavours of regex, e.g. PCRE, to forbid the previous
char you could use negative lookbehind, but unfortunately Javascript
regex does not support it.
So, to circumvent this, we must:
Put a capturing group matching either start of text or something
other than a digit: (^|\D).
Then put regex matching just 2: 2.
The last condition, fortunately, can be expressed as negative lookahead,
because even Javascript regex support it: (?!\d).
So the whole regex is:
(^|\D)2(?!\d)
Having found such a match, you have to replace it with the content
of the first capturing group and 3 (the replacement digit).
You can use negative look-ahead:
(\D|^)2(?!\d)
Replace with: ${1}3
If look behind is supported:
(?<!\d)2(?!\d)
Replace with: 3
See regex in use here
(\D|\b)2(?!\d)
(\D|\b) Capture either a non-digit character or a position that matches a word boundary
(?!\d) Negative lookahead ensuring what follows is not a digit
Alternations:
(^|\D)2(?!\d) # Thanks to #Wiktor in the comments below
(?<!\d)2(?!\d) # At the time of writing works in Chrome 62+
const regex = /(\D|\b)2(?!\d)/g
const str = `note[2] 2 2_ someothertext_2 note[32] 2finally_2222 but how about mymomsays2.`
const subst = "$13"
console.log(str.replace(regex, subst))

Regex to count the number of capturing groups in a regex

I need a regex that examines arbitrary regex (as a string), returning the number of capturing groups. So far I have...
arbitrary_regex.toString().match(/\((|[^?].*?)\)/g).length
Which works for some cases, where the assumption that any group that starts with a question mark, is non-capturing. It also counts empty groups.
It does not work for brackets included in character classes, or escaped brackets, and possibly some other scenarios.
Modify your regex so that it will match an empty string, then match an empty string and see how many groups it returns:
var num_groups = (new RegExp(regex.toString() + '|')).exec('').length - 1;
Example: http://jsfiddle.net/EEn6G/
The accepted answer is what you should use in any production system. However, if you wanted to solve it using a regex for fun, you can do that as shown below. It assumes the regex you want the number of groups in is correct.
Note that the number of groups is just the number of non-literal (s in the regex. The strategy we're going to take is instead of matching all the correct (, we're going to split on all the incorrect stuff in between them.
re.toString().split(/(\(\?|\\\[|\[(?:\\\]|.)*?\]|\\\(|[^(])+/g).length - 1
You can see how it works on www.debuggex.com.

Javascript - Matching a hyphen in regex

I'm trying to match a string using regex (of which I am new to) but I can't get it to match.
These should be accepted:
GT-00-TRE
KK-10-HUH
JU-05-OPR
These should not:
HTH-00-AS
HM-99-ASD
NM-05-AK
So the pattern goes 2 letters, hyphen, 2 digits (between 00 and 11 inclusive), hyphen, 3 letters.
So far the best I can come up with is:
var thePattern = /^[a-z]{2}[-][00-11][-][a-z]{3}$/gi;
I can't help but feel that I'm pretty close.
Can anyone give me any pointers?
Thanks.
This should be what you need:
var thePattern = /^[a-z]{2}[-](0\d|1[0-1])[-][a-z]{3}$/gi;
In order to do a range 00-11, you have to say "(0 followed by 0-9) or (1 followed by 0 or 1)". This is because specifying a range within [] only works for single digits. Luckily your case is pretty simple, otherwise it could get quite complex to work around that.
Your regex is OK, but for one thing: the digits matching is a bit more complex
(0\d|10|11)
you want to match a zero followed by a digit (\d) OR (|) a ten OR a eleven.
Something in square brackets represents just a single character in a range. [0-5] means any single digit between 0 and 5, [a-q] means any lowercase letter from a to q. There's no such thing as [00-11] because it would require to work on more than one character at a time.

Categories

Resources