Regex matches numbers in date but shouldn't - javascript

Why does my regex pattern match the date part of the string? It seems like I'm not accounting for the / (slash) correctly with [^\/] to avoid the pattern to match date strings?
const reg = new RegExp(
/(USD|\$|EUR|€|USDC|USDT)?\s?(\d+[^\/]|\d{1,3}(,\d{3})*)(\.\d+)?(k|K|m|M)?\b/,
"i"
);
const str = "02/22/2021 $50k";
console.log(reg.exec(str));
// result: ['02', undefined, '02', undefined, undefined, undefined, undefined, index: 0, input: '02/22/2021 $50k', groups: undefined]
// was expecting: [$50k,...]

You get those matches for the date part and the undefined ones, because you use a pattern with optional parts and alternations |
In your pattern there is this part (\d+[^\/]|\d{1,3}(,\d{3})*). That first part of the alternation \d+[^\/] matches 1+ digits followed by any char except a / (which can also match a digit) and the minimum amount of characters is 2. That part will match 20, 22 and 2021 in the date part.
If there is 1 digit, the second part of the alternation will match it.
If you want to match only numbers as well, you can assert not / to the left and the right, and make the whole part with the first alternatives like USD optional with the optional whitspace chars as well, to prevent matching that before only digits.
The last alternation can be shortened to a character class [km]? with a case insensitive flag.
See this page for the lookbehind support for Javascript.
(?:(?:USD|\$|EUR|€|USDC|USDT)\s?)?(?<!\/)\b(?:\d{1,3}(?:,\d{3})*(?:\.\d+)?|\d+)(?!\/)[KkMm]?\b
Regex demo
const reg = /(?:(?:USD|\$|EUR|€|USDC|USDT)\s?)?(?<!\/)\b(?:\d{1,3}(?:,\d{3})*(?:\.\d+)?|\d+)(?!\/)[KkMm]?\b/gi;
const str = "02/22/2021 $50k 1,213.3 11111111 $50,000 $50000"
const res = Array.from(str.matchAll(reg), m => m[0]);
console.log(res)
If the currency is not optional:
(?:USD|\$|EUR|€|USDC|USDT)\s?(?:\d{1,3}(?:,\d{3})*(?:\.\d+)?|\d+)[KkMm]?\b
Regex demo

I can't get your regex well. so i try to figure out what result you would expect. check this. in groups you have each part of your string.
const regex = /(\d{2})*\/?(\d{2})\/(\d{2,4})?\s*(USD|\$|EUR|€|USDC|USDT)?(\d*)(k|K|m|M)?\b/i
const regexNamed = /(?<day>\d{2})*\/?(?<month>\d{2})\/(?<year>\d{2,4})?\s*(?<currency>USD|\$|EUR|€|USDC|USDT)?(?<value>\d*)(?<unit>k|K|m|M)?\b/i
const str1 = '02/22/2021 $50k'
const str2 = '02/2021 €50m'
const m1 = str1.match(regex)
const m2 = str2.match(regexNamed)
console.log(m1)
console.log(m2.groups)
Blockquote

Related

How do I replace the last character of the selected regex?

I want this string {Rotation:[45f,90f],lvl:10s} to turn into {Rotation:[45,90],lvl:10}.
I've tried this:
const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d)\w+/g
console.log(bar.replace(regex, '$&'.substring(0, -1)))
I've also tried to just select the letter at the end using $ but I can't seem to get it right.
You can use
bar.replace(/(\d+)[a-z]\b/gi, '$1')
See the regex demo.
Here,
(\d+) - captures one or more digits into Group 1
[a-z] - matches any letter
\b - at the word boundary, ie. at the end of the word
gi - all occurrences, case insensitive
The replacement is Group 1 value, $1.
See the JavaScript demo:
const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d+)[a-z]\b/gi
console.log(bar.replace(regex, '$1'))
Check this out :
const str = `{Rotation:[45f,90f],lvl:10s}`.split('');
const x = str.splice(str.length - 2, 1)
console.log(str.join(''));
You can use positive lookahead to match the closing brace, but not capture it. Then the single character can be replaced with a blank string.
const bar= '{Rotation:[45f,90f],lvl:10s}'
const regex = /.(?=})/g
console.log(bar.replace(regex, ''))
{Rotation:[45f,90f],lvl:10}
The following regex will match each group of one or more digits followed by f or s.
$1 represents the contents captured by the capture group (\d).
const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d+)[fs]/g
console.log(bar.replace(regex, '$1'))

JavaScript Regex split at first letter?

Since many cases using Regex, differs from case to case, depending on what format your string is in, I'm having a hard time finding a solution to my problem.
I have an array containing strings in the format, as an example:
"XX:XX - XX:XX Algorithm and Data Structures"
Where "XX:XX - XX:XX" is timespan for a lecture, and X being a number.
I'm new to Regex and trying to split the string at the first letter occurring, like so:
let str = "08:15 - 12:50 Algorithm and Data Structures";
let re = //Some regex expression
let result = str.split(re); // Output: ["08:15 - 12:50", "Algorithm and Data Structures"]
I'm thinking it should be something like /[a-Z]/ but I'm not sure at all...
Thanks in advance!
The easiest way is probably to "mark" where you want to split and then split:
const str = '12 34 abcde 45 abcde'.replace(/^([^a-z]+)([a-z])/i, '$1,$2');
// '12 34 ,abcde 45 abcde'
str.split(',')
// [ '12 34 ', 'abcde 45 abcde' ]
This finds the place where the string starts, has a bunch of non a-z characters, then has an a-z characters, and puts a comma right in-between. Then you split by the comma.
You can also split directly with a positive look ahead but it might make the regex a bit less readable.
console.log(
"08:15 - 12:50 Algorithm and Data Structures".split(/ ([A-Za-z].*)/).filter(Boolean)
)
or, if it's really always XX:XX - XX:XX, easier to just do:
const splitTimeAndCourse = (input) => {
return [
input.slice(0, "XX:XX - XX:XX".length),
input.slice("XX:XX - XX:XX".length + 1)
]
}
console.log(splitTimeAndCourse("08:15 - 12:50 Algorithm and Data Structures"))
If you have a fixed length of the string where the time is, you can use this regex for example
(^.{0,13})(.*)
Check this here https://regex101.com/r/ANMHy5/1
I know you asked about regex in particular, but here is a way to this without regex...
Provided your time span is always at the beginning of your string and will always be formatted with white space between the numbers as XX:XX - XX:XX. You could use a function that splits the string at the white space and reconstructs the first three indexed strings into one chunk, the time span, and the last remaining strings into a second chunk, the lecture title. Then return the two chunks as an array.
let str = "08:15 - 12:50 Algorithm and Data Structures";
const splitString = (str) => {
// split the string at the white spaces
const strings = str.split(' ')
// define variables
let lecture = '',
timespan = '';
// loop over the strings
strings.forEach((str, i) => {
// structure the timespan
timespan = `${strings[0]} ${strings[1]} ${strings[2]}`;
// conditional to get the remaining strings and concatenate them into a new string
i > 2 && i < strings.length?lecture += `${str} `: '';
})
// place them into an array and remove white space from end of second string
return [timespan, lecture.trimEnd()]
}
console.log(splitString(str))
For that format, you might also use 2 capture groups instead of using split.
^(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2})\s+([A-Za-z].*)
The pattern matches:
^ Start of string
(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2}) Capture group 1, match a timespan like pattern
\s+ Match 1+ whitspac chars
([A-Za-z].*) Capture group 2, start with a char A-Za-z and match the rest of the line.
Regex demo
let str = "08:15 - 12:50 Algorithm and Data Structures";
let regex = /^(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2})\s+([A-Za-z].*)/;
let [, ...groups] = str.match(regex);
console.log(groups);
Another option using split might be asserting not any chars a-zA-Z to the left from the start of the string using a lookbehind (see this link for the support), match 1+ whitespace chars and asserting a char a-zA-Z to the right.
(?<=^[^a-zA-Z]+)\s+(?=[A-Za-z])
Regex demo
let str = "08:15 - 12:50 Algorithm and Data Structures";
let regex = /(?<=^[^a-zA-Z]+)\s+(?=[A-Za-z])/;
console.log(str.split(regex))

Reg Exp for finding hashtag words

I have the following sentence as a test:
This is a test with #shouldshow and to see if there #show
#yes this#shouldnotshow what is going on here
I have figured out most of the Reg Exp I need. Here's what I have so far: /((?<=#)([A-Z]*))/gi
This matches every tag but also matches the shouldnotshow portion. I want to not match words that are prefixed by anything but # (excluding whitespace & \n).
So the only matched words I should get are: shouldshow show yes.
Note: after #show is a newline
You just need to see if the hash is prefixed with whitespace or starts the string
https://regex101.com/r/JDuGvr/1
/(\s|^)#(\w+)/gm
with positive lookbehind as OP used
https://regex101.com/r/06X3ZX/1
/(?<=(\s|^)#)(\w+)/gm;
use [a-zA-Z0-9] if you do not want an underscore
const re1 = /(\s|^)#(\w+)/gm;
const re2 = /(?<=(\s|^)#)(\w+)/gm;
const str = `This is a test with #shouldshow and to see if there #show
#yes this#shouldnotshow what is going on here`;
const res1 = [...str.matchAll(re1)].map(match => match[2]); // here the match is the third item
console.log(res1)
const res2 = [...str.matchAll(re2)].map(match => match[0]); // match is the first item
console.log(res2)
Another option could be using your pattern asserting a # on the left that does not have a non whitespace char before it using (?<!\S)# and get the match only without capture groups.
Match at least 1+ times a char A-Z to prevent matching an empty string.
(?<=(?<!\S)#)[A-Z]+
Regex demo
const regex = /(?<=(?<!\S)#)[A-Z]+/gi;
const str = `This is a test with #shouldshow and to see if there #show
#yes this#shouldnotshow what is going on her`;
console.log(str.match(regex));

Javascript replace regex to accept only numbers, including negative ones, two decimals, replace 0s in the beginning, except if number is 0

The question became a bit long, but it explains the expected behaviour.
let regex = undefined;
const format = (string) => string.replace(regex, '');
format('0')
//0
format('00')
//0
format('02')
//2
format('-03')
//-3
format('023.2323')
//23.23
format('00023.2.3.2.3')
//23.23
In the above example you can see the expected results in comments.
To summarize. I'm looking for a regex not for test, for replace which formats a string:
removes 0s from the beginning if it's followed by any numbers
allows decimal digits, but just 2
allows negative numbers
allows decimal points, but just one (followed by min 1, max 2 decimal digits)
The last one is a bit difficult to handle as the user can't enter period at the same time, I'll have two formatter functions, one will be the input in the input field, and one for the closest valid value at the moment (for example '2.' will show '2.' in the input field, but the handler will receive the value '2').
If not big favour, I'd like to see explanation of the solution, why it works, and what's the purpose of which part.
Right now I'm having string.replace(/[^\d]+(\.\[^\d{1,2}])+|^0+(?!$)/g, ''), but it doesn't fulfill all the requirements.
You may use this code:
const arr = ['0', '00', '02', '-03', '023.2323', '00023.2.3.2.3', '-23.2.3.2.3']
var narr = []
// to remove leading zeroes
const re1 = /^([+-]?)0+?(?=\d)/
// to remove multiple decimals
const re2 = /^([+-]?\d*\.\d+)\.(\d+).*/
arr.forEach( el => {
el = el.replace(re1, '$1').replace(re2, '$1$2')
if (el.indexOf('.') >= 0)
el = Number(el).toFixed(2)
narr.push(el)
})
console.log(narr)
//=> ["0", "0", "2", "-3", "23.23", "23.23"]
If you aren't bound to the String#replace method, you can try this regex:
/^([+-])?0*(?=\d+$|\d+\.)(\d+)(?:\.(\d{1,2}))?$/
Inspect on regex101.com
It collects the parts of the number into capturing groups, as follows:
Sign: the sign of the number, +, - or undefined
Integer: the integer part of the number, without leading zeros
Decimal: the decimal part of the number, undefined if absent
This regex won't match if more then 2 decimal places present. To strip it instead, use this:
/^([+-])?0*(?=\d+$|\d+\.)(\d+)(?:\.(\d{1,2})\d*)?$/
Inspect on regex101.com
To format a number using one of the above, you can use something like:
let regex = /^([+-])?0*(?=\d+$|\d+\.)(\d+)(?:\.(\d{1,2}))?$/
const format = string => {
try{
const [, sign, integer, decimal = ''] = string.match(regex)
return `${(sign !== '-' ? '' : '-')}${integer}${(decimal && `.${decimal}`)}`
}catch(e){
//Invalid format, do something
return
}
}
console.log(format('0'))
//0
console.log(format('00'))
//0
console.log(format('02'))
//2
console.log(format('-03'))
//-3
console.log(format('023.23'))
//23.23
console.log(format('023.2323'))
//undefined (invalid format)
console.log(format('00023.2.3.2.3'))
//undefined (invalid format)
//Using the 2nd regex
regex = /^([+-])?0*(?=\d+$|\d+\.)(\d+)(?:\.(\d{1,2})\d*)?$/
console.log(format('0'))
//0
console.log(format('00'))
//0
console.log(format('02'))
//2
console.log(format('-03'))
//-3
console.log(format('023.23'))
//23.23
console.log(format('023.2323'))
//23.23
console.log(format('00023.2.3.2.3'))
//undefined (invalid format)
Another option is to use pattern with 3 capturing groups. In the replacement, use all 3 groups "$1$2$3"
If the string after the replacement is empty, return a single zero.
If the string is not empty, concat group 1, group 2 and group 3 where for group 3, remove all the dots except for the first one to keep it for the decimal and take the first 3 characters (which is the dot and 2 digits)
^([-+]?)0*([1-9]\d*)((?:\.\d+)*)|0+$
In parts
^ Start of string
( Capture group 1
[-+]? Match an optional + or -
) Close group
0* Match 0+ times a zero
( Capture group 2
[1-9]\d* Match a digit 1-9 followed by optional digits 0-9
) Close group
( Capture group 3
(?:\.\d+)* Repeat 0+ times matching a dot and a digit
) Close group
| Or
0+ Match 1+ times a zero
$ End of string
Regex demo
const strings = ['0', '00', '02', '-03', '023.2323', '00023.2.3.2.3', '-23.2.3.2.3', '00001234', '+0000100005.0001']
let pattern = /^([-+]?)0*([1-9]\d*)((?:\.\d+)*)|0+$/;
let format = s => {
s = s.replace(pattern, "$1$2$3");
return s === "" ? '0' : s.replace(pattern, (_, g1, g2, g3) =>
g1 + g2 + g3.replace(/(?!^)\./g, '').substring(0, 3)
);
};
strings.forEach(s => console.log(format(s)));

Making Regex more safe

i'm trying to turn a bunch of regex more safe, what i mean by more safe, i want more accuracy.
So, i'm very new with RegExp, and i want know if i'm doing this right (not the Regex, but turn into more safety).
So, i'm starting now, and this is the first RegExp that i want change, i want push the 01/2011.
Past RegExp:
var text = 'INSCRIÇÃO: 60.537.263/0001-66 COMP: 01/2011 COD REC: 150';
var reg = /COMP.*?(\d\S*)/;
var match = reg.exec(text);
console.log(match[1]);
New RegExp:
var text = 'INSCRIÇÃO: 60.537.263/0001-66 COMP: 01/2011 COD REC: 150';
var reg = /COMP:\s([0-9]{0,2}\/[0-9]{0,4})/;
var match = reg.exec(text);
console.log(match[1]);
Why this? This text is just a part of a huge text, so i need accuraci.
Other question is about turn the Regex optional, so if doesn't match anything, return undefined.
Thanks.
According to your feedback:
i want specifically push the value with two numbers, one / and four numbers
You can use
/\bCOMP:\s*(\d{2}\/\d{4})(?!\d)/g
The \b is a word boundary, thus 5COMP won't be matched.
The \s* will match 0 or more whitespace (if there must be whitespace, use + quantifier instead).
The \d{2} will match exactly 2 digits.
The \d{4} will match 4 digits and no more because of the look-ahead (?!\d). This look-ahead just makes sure there is no digit after the 4 previous digits. You may use \b here as well to ensure matching a word boundary.
arr = [];
var re = /\bCOMP:\s*(\d{2}\/\d{4})(?!\d)/g;
var str = 'COMP:10/9995, COMP: 21/1234, COMP: 21/123434, REGCOMP: 21/1234';
var m;
while ((m = re.exec(str)) !== null) {
arr.push(m[1]);
}
console.log(arr);

Categories

Resources