Match floats but not dates separated by dot - javascript

I'm trying to build a regex which can find floats (using a dot or comma as decimal separator) in a string. I ended up with the following:
/([0-9]+[.,]{1}[0-9]+)/g
Which seems to work fine expect that it matches dates separated by . as well:
02.01.2000 // Matches 12.34
I've tried ([0-9]+[.,]{1}[0-9]+)(?![.,]) but that does not work as I expected it :)
How would I omit the date case, but still pass the following scenarios:
I tried some stuff 12.23
D12.34
12.34USD
2.3%
12,2
\n12.1

You can use this regex using alternation:
(?:\d+\.){2}\d+|(\d+[.,]\d+)
and extract your matches from captured group #1.
This regex basically matches and discards date strings on LHS of alternation and then matches and captures floating numbers on RHS of alternation.
RegEx Demo
Code:
const regex = /(?:\d+\.){2}\d+|(\d+[.,]\d+)/gm;
const str = `I tried some stuff 12.23
D12.34
12.34USD 02.01.2000
2.3%
12,2 02.01.2000
\\n12.1`;
let m;
let results = [];
while ((m = regex.exec(str)) !== null) {
if (m[1])
results.push( m[1] );
}
console.log(results);

You want to make sure that it isn't surrounded with more ",." or numbers. Is that right?
/[^,.0-9]([0-9]+[.,]{1}[0-9]+)[^,.0-9]/g
Given the following:
hi 1.3$ is another 1.2.3
This is a date 02.01.2000
But this 5.30USD is a number.
But a number at the end of a sentance 1.7.
Or a number comma number in a list 4,3,4,5.
It will match "1.3" and "5.30"
Using #anubhava's example (?:\d+\.){2}\d+|(\d+[.,]\d+) You get the following result:
It will match "1.3", "5.30", "1.7", "4,3", and "4,5"
Moral of the story is you need to think through all the possible scenarios and understand how you want to treat each scenario. The last line should be a list of 4 numbers. But is "4,3" by itself two separate numbers or from a country where they use commas to denote decimal?

Related

Regex matching multiple numbers in a string

I would like to extract numbers from a string such as
There are 1,000 people in those 3 towns.
and get an array like ["1,000", "3"].
I got the following number matching Regex from Justin in this question
^[+-]?(\d*|\d{1,3}(,\d{3})*)(\.\d+)?\b$
This works great for checking if it is a number but to make it work on a sentence you need to remove the "^" and "$".
regex101 with start/end defined
regex101 without start/end defined
Without the start and end defined you get a bunch of 0 length matches these can easily be discarded but it also now splits any numbers with a comma in them.
How do I make that regex (or a new regex) work on sentences and still find numbers with commas in them.
A bonus would be not having all the 0 length matches as well.
The expression /-?\d(?:[,\d]*\.\d+|[,\d]*)/g should do it, if you're okay with allowing different groups such as 1,00,000 (which isn't unknown in some locales). I feel like I should be able to simplify that further, but when I try the example "333.33" gets broken up into "333" and "33" as separate numbers. With the above it's kept together.
Live Example:
const str = "There are 10,000 people in those 3 towns. That's 3,333.33 people per town, roughly. Which is about -67.33 from last year.";
const rex = /-?\d(?:[,\d]*\.\d+|[,\d]*)/g;
let match;
while ((match = rex.exec(str)) !== null) {
console.log(match[0]);
}
Breaking /\d(?:[,\d]*\.\d+|[,\d]*)/g down:
-? - an optional minus sign (thank you to x15 for flagging that up in his/her answer!)
\d - a digit
(?:...|...) - a non-capturing group containing an alternation between
[,\d]*\.\d+ - zero or more commas and digits followed by a . and one or more digits, e.g. 3,333.33; or
[,\d]* - zero or more commas and digits
The first alternative will match greedily, falling back to the second alternative if there's no decimal point.
One alternate approach is to split with space and see if the value can be parsed to a number,
let numberExtractor = str => str.split(/\s+/)
.filter(v => v && parseFloat(v.replace(/[.,]/g, '')))
console.log(numberExtractor('There are 1,000 people in those 3 towns. some more numbers -23.012 1,00,000,00'))
To match integer and decimal numbers where the whole part can have optional
comma's that are between numbers but not in the decimal part is done like this:
/[+-]?(?:(?:\d(?:,(?=\d))?)+(?:\.\d*)?|\.\d+)/
https://regex101.com/r/yOuBPx/1
The input sample does not reflect all the boundary conditions this regex handles.
Best to experiment to see it's full effect.

Regex exact match on number, not digit

I have a scenario where I need to find and replace a number in a large string using javascript. Let's say I have the number 2 and I want to replace it with 3 - it sounds pretty straight forward until I get occurrences like 22, 32, etc.
The string may look like this:
"note[2] 2 2_ someothertext_2 note[32] 2finally_2222 but how about mymomsays2."
I want turn turn it into this:
"note[3] 3 3_ someothertext_3 note[32] 3finally_2222 but how about mymomsays3."
Obviously this means .replace('2','3') is out of the picture so I went to regex. I find it easy to get an exact match when I am dealing with string start to end ie: /^2$/g. But that is not what I have. I tried grouping, digit only, wildcards, etc and I can't get this to match correctly.
Any help on how to exactly match a number (where 0 <= number <= 500 is possible, but no constraints needed in regex for range) would be greatly appreciated.
The task is to find (and replace) "single" digit 2, not embedded in
a number composed of multiple digits.
In regex terms, this can be expressed as:
Match digit 2.
Previous char (if any) can not be a digit.
Next char (if any) can not be a digit.
The regex for the first condition is straightforward - just 2.
In other flavours of regex, e.g. PCRE, to forbid the previous
char you could use negative lookbehind, but unfortunately Javascript
regex does not support it.
So, to circumvent this, we must:
Put a capturing group matching either start of text or something
other than a digit: (^|\D).
Then put regex matching just 2: 2.
The last condition, fortunately, can be expressed as negative lookahead,
because even Javascript regex support it: (?!\d).
So the whole regex is:
(^|\D)2(?!\d)
Having found such a match, you have to replace it with the content
of the first capturing group and 3 (the replacement digit).
You can use negative look-ahead:
(\D|^)2(?!\d)
Replace with: ${1}3
If look behind is supported:
(?<!\d)2(?!\d)
Replace with: 3
See regex in use here
(\D|\b)2(?!\d)
(\D|\b) Capture either a non-digit character or a position that matches a word boundary
(?!\d) Negative lookahead ensuring what follows is not a digit
Alternations:
(^|\D)2(?!\d) # Thanks to #Wiktor in the comments below
(?<!\d)2(?!\d) # At the time of writing works in Chrome 62+
const regex = /(\D|\b)2(?!\d)/g
const str = `note[2] 2 2_ someothertext_2 note[32] 2finally_2222 but how about mymomsays2.`
const subst = "$13"
console.log(str.replace(regex, subst))

Trying to use RegEx (in JS) to get all values delimited by a special character, between two terminators

I'm trying to use regex to find multiple instances of a custom formatted string (For the sake of this post, lets call them macros) out of a larger string. The macro is basically a string that starts with { and ends with }, then has lower case alphabetical characters, numerical values, hyphens, and (maybe) periods, all delimited by a colon (:)
The first segment of the macro (and sometimes the only part) can be either a numerical value, or lower case alpha characters, between 1 and 5 characters in length. Examples:
{foo}
{barbaz}
{1}
{1234}
But then, just to make it more complicated, these macros may have "modifiers", which are all separated by colons. These modifiers can be:
Alpha characters one or two characters long (EG: a, ab)
Numerical values (EG: 12, 1, 1123123)
Numerical values with a hyphen somewhere in the middle of the numerical values, or before it (EG: 1-2, -12)
Example Macros
Heres a short list of possible macros that will/can be used, and the Regex array result I'm looking for
Macro: {foo} Regex Match: ["foo"]
Macro: {foo:ab:cd:e:f:g} Regex Match: ["foo","ab","cd","e", "f","g"]
Macro: {bar:1-3} Regex Match: ["bar","1-3"]
Macro: {baz:r:uc} Regex Match: ["baz","r","uc"]
Macro: {quux:1:2:uc} Regex Match: ["quux","1","2","uc"]
Example Paragraph
I need need this query to be able to find multiple macros in a larger paragraph, for example:
My name is {namel:uf}, {namef:uf}, I go to {highschool:uw}. My computer username is {namef:1:l}{namel:l}
Test string: {foo}
Test string: ucfirst: {foo:uf}
Test string reversed/uppercase: {foo:u:r}
First 3 chars of test string: {foo:3}
Last 2 chars of test string: {foo:-2}
And I'm looking for a regex pattern that would return:
[
['namel','uf'],
['namef','uf'],
['highschool','uw'],
['namef','l',0],
['namel','l'],
['foo'],
['foo','uf'],
['foo','u','r'],
['foo',3],
['foo','-2']
]
Progress Thus Far
I've been working on this for a bit, and I'm pretty sure im somewhat close.... The pattern I have right now is:
/\{(([a-z]{1,10}|\d+)+)(\:([a-z]{1,2}|\-?\d+|\d+\-\d+)*)*\}/gm
Then heres the regex101.com instance.
As you can see, it matches the macros just fine, but I'm running into two problems:
Problems
It will match the : character that separates the modifiers
It doesn't seem to match all of the modifiers. Take a look at #2 in the Regex101 test strings, which is {foo:ab:cd:e:f:g}. I would expect the result to be: ["foo","ab","cd","e","f","g"], but instead, it matches ["foo","foo",":g","g"].
Any help would be appreciated! Thank you!
-J
Update
I think I fixed problem #1 listed above, where it was returning some of the : delimiters. All I did was add a ?: to the group that started the pattern by looking for the colon, making it a non-capturing group. (Also made another change with now the numerical values are processed, but thats not relevant)
Heres the new pattern
/\{(([a-z]{1,10}|\d+)+)(?:\:([a-z]{1,2}|\d*\-?\d+)*)*\}/gm
Heres the updated regex101.com example. You can still see that problem #2 persists, meaning it doesn't match EVERY macro modifier, it looks like it just matches the first and the last..
Thanks!
You can use .match() with RegExp /\{\w+\}|\{\w+:(\w+|\d+)\}|\{\w+:(\w+|\d+):(\w+|\d+)\}/g, .map(), .replace() with RegExp /\{|\}/g, .split() with RegExp /:/
var str = `My name is {namel:uf}, {namef:uf}, I go to {highschool:uw}. My computer username is {namef:1:l}{namel:l}
Test string: {foo}
Test string: ucfirst: {foo:uf}
Test string reversed/uppercase: {foo:u:r}
First 3 chars of test string: {foo:3}
Last 2 chars of test string: {foo:-2}`;
var res = str.match(/\{\w+\}|\{\w+:(\w+|\d+)\}|\{\w+:(\w+|\d+):(\w+|\d+)\}/g);
res = res.map(s => s.replace(/\{|\}/g, "").split(/:/));
console.log(res);

Regex to count the number of capturing groups in a regex

I need a regex that examines arbitrary regex (as a string), returning the number of capturing groups. So far I have...
arbitrary_regex.toString().match(/\((|[^?].*?)\)/g).length
Which works for some cases, where the assumption that any group that starts with a question mark, is non-capturing. It also counts empty groups.
It does not work for brackets included in character classes, or escaped brackets, and possibly some other scenarios.
Modify your regex so that it will match an empty string, then match an empty string and see how many groups it returns:
var num_groups = (new RegExp(regex.toString() + '|')).exec('').length - 1;
Example: http://jsfiddle.net/EEn6G/
The accepted answer is what you should use in any production system. However, if you wanted to solve it using a regex for fun, you can do that as shown below. It assumes the regex you want the number of groups in is correct.
Note that the number of groups is just the number of non-literal (s in the regex. The strategy we're going to take is instead of matching all the correct (, we're going to split on all the incorrect stuff in between them.
re.toString().split(/(\(\?|\\\[|\[(?:\\\]|.)*?\]|\\\(|[^(])+/g).length - 1
You can see how it works on www.debuggex.com.

Regex for the following string I'd like to split: 1.5 cc or 1.5cc

I have the following sting input that I need to split apart for output. Example 1.5 cc or 1.5cc
I am doing this right now based on the space separating the input string using the following code:
var immunization = msg['dose_size'].toString().trim();
immunization = immunization.split(' ');
var amount = immunization[0];
var unit = immunization[1];
In testing however, I came across the fact the input is not always formatted that way, sometimes the space is missing. so my output for the input with no space is 1.5cc and the second output is empty.
I tried using a regular expression like: \d*.?\d+ and that works, but since it finds a match on the 1.5 I get an empty value for the first output and the "cc" for the second output, I need both.
Any suggestions?
Thanks.
You can use this code grab the data:
/(\d+(?:\.\d*)?|\.\d+)\s*(\S+)/.exec(inputString)
(\d+(?:\.\d*)?|\.\d+) will match decimal number, where it will accept integer 34, regular decimal numbers 34.555, and also allow .5555 and 4533. to pass.
Followed by \s* matching optional spaces.
Followed by (\S+) matching and capturing the next token (without space character).
If there is match, it will return an array, where index 1 of the array is the decimal number, and index 2 of the array is the unit.

Categories

Resources