Use regex to split a string at the first letter

Use regex to split a string at the first letter - javascript

I have a regex that can split a string at the first digit
var name = 'Achill Road Boy 1:30 Ayr'
var horsedata = name.match(/^(\D+)(.*)$/);
var horse = horsedata[1]; // "Achill Road Boy "
var meeting = horsedata[2]; // "1:30 Ayr"
However, I now need to further split
var meetingdata = meeting.match(?what is the regex);
var racetime = meetingdata[1]; // "1:30 "
var course = meetingdata[2]; // "Ayr"
What is the regex to split the string at the first letter?

You can use single regex to do that:
^([^\d]+) +(\d+):(\d+) (.*)$
It will catch name, hour and minute separately, and track name, in groups 1, 2, 3 and 4.
Note that I have added ^ and $ to the expression, meaning that this expression should match given string completely, from start to finish, which I think are useful safeguards against matching something inside the string which you didn't expect initially. They may, however, interfere with your task, so you can remove them if you don't need them.
When tinkering with regular expressions I always use this nifty tool, http://regex101.com - it has a very useful interface to debug regular expressions and also their execution time. Here's a link to a regular expression above: https://regex101.com/r/jYgc9K/1. It also gives you a nice clear breakdown of this regular expression:
Full match 0-24 `Achill Road Boy 1:30 Ayr`
Group 1. 0-15 `Achill Road Boy`
Group 2. 16-17 `1`
Group 3. 18-20 `30`
Group 4. 21-24 `Ayr`
Last, word of advice: there's a famous saying by Jamie Zawinski, a very smart guy. It goes like this:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems".
There's a lot of truth in this saying.

Given the string:
1:30 Ayr
The capture group from this regex will give you Ayr:
^[^a-zA-Z]*([a-zA-Z\s]+)$
Regular Expression Key:
^ - start of match
[^a-zA-Z]* - from 0 to any number of non-alphabet characters
([a-zA-Z\s]+) - capture group: from 1 to any number of alphabet characters and spaces
$ - end of match

Related

Regex exact match on number, not digit

I have a scenario where I need to find and replace a number in a large string using javascript. Let's say I have the number 2 and I want to replace it with 3 - it sounds pretty straight forward until I get occurrences like 22, 32, etc.
The string may look like this:
"note[2] 2 2_ someothertext_2 note[32] 2finally_2222 but how about mymomsays2."
I want turn turn it into this:
"note[3] 3 3_ someothertext_3 note[32] 3finally_2222 but how about mymomsays3."
Obviously this means .replace('2','3') is out of the picture so I went to regex. I find it easy to get an exact match when I am dealing with string start to end ie: /^2$/g. But that is not what I have. I tried grouping, digit only, wildcards, etc and I can't get this to match correctly.
Any help on how to exactly match a number (where 0 <= number <= 500 is possible, but no constraints needed in regex for range) would be greatly appreciated.

The task is to find (and replace) "single" digit 2, not embedded in
a number composed of multiple digits.
In regex terms, this can be expressed as:
Match digit 2.
Previous char (if any) can not be a digit.
Next char (if any) can not be a digit.
The regex for the first condition is straightforward - just 2.
In other flavours of regex, e.g. PCRE, to forbid the previous
char you could use negative lookbehind, but unfortunately Javascript
regex does not support it.
So, to circumvent this, we must:
Put a capturing group matching either start of text or something
other than a digit: (^|\D).
Then put regex matching just 2: 2.
The last condition, fortunately, can be expressed as negative lookahead,
because even Javascript regex support it: (?!\d).
So the whole regex is:
(^|\D)2(?!\d)
Having found such a match, you have to replace it with the content
of the first capturing group and 3 (the replacement digit).

You can use negative look-ahead:
(\D|^)2(?!\d)
Replace with: ${1}3
If look behind is supported:
(?<!\d)2(?!\d)
Replace with: 3

See regex in use here
(\D|\b)2(?!\d)
(\D|\b) Capture either a non-digit character or a position that matches a word boundary
(?!\d) Negative lookahead ensuring what follows is not a digit
Alternations:
(^|\D)2(?!\d) # Thanks to #Wiktor in the comments below
(?<!\d)2(?!\d) # At the time of writing works in Chrome 62+
const regex = /(\D|\b)2(?!\d)/g
const str = `note[2] 2 2_ someothertext_2 note[32] 2finally_2222 but how about mymomsays2.`
const subst = "$13"
console.log(str.replace(regex, subst))

Javascript Regex 1-10

I am trying to get a regex that can get from 1-10 and i have been having trouble
I have tried :
/^[1-9]|10$/ //this will matches out 1-9 but not 10
/^10|[1-9]$/ //this will matches 10 but no 1-9 digits
I feel weird because i have seen this question before and people said either of these expression should work. Any ideas of another way to get any # from 1-10?

Assuming you want to match a string containing a number 1–10 and nothing else, you were close with /^[1-9]|10$/. The problem here is that the alternation | includes the ^ and $ characters, i.e. this expression will match either ^[1-9] (any string beginning with 1–9) or10$` (any string ending with 10). Parentheses solve this neatly:
/^([1-9]|10)$/
See it in action below:
const regex = /^([1-9]|10)$/gm; // `gm` flags for demo only; read below
const str = `1
2
3
4
foo
5
6
7
8
9
10`;
let m;
while (m = regex.exec(str)) {
console.log('Found match:', m[0]);
}
.as-console-wrapper{min-height:100%}
The snippet uses the g and m flags to find all of the matches across multiple lines. m makes ^ and $ match the beginning and end of each line rather than the beginning and end of the string. For your use case you probably don't want either flag.

Without knowing your specific use case, I don't see a reason to include ^ and & here, which requires that the number is surrounded by a start and end of line. Even if your inner regex was correct, it wouldn't match the '6' in: 'I have a 6 year old son.'
Instead, surround the inner regex with \b (a word boundary). So, to use Jordan's example, \b([1-9]|10)\b would match all numbers from 1-10. To use regex in javascript, you need to utilize one of several functions that accept it as an argument, or methods that can be called on it:
let regex = /\b([1-9]|10)\b/;
console.log('I have a 6 year old son.'.match(regex)[0]);
console.log('I have a six year old son.'.match(regex));
console.log('I have a 6 year old son.'.replace(regex, 'six'));
console.log(regex.test('I have a 6 year old son.'));
console.log(regex.test('I have a six year old son.'));

Trying to use RegEx (in JS) to get all values delimited by a special character, between two terminators

I'm trying to use regex to find multiple instances of a custom formatted string (For the sake of this post, lets call them macros) out of a larger string. The macro is basically a string that starts with { and ends with }, then has lower case alphabetical characters, numerical values, hyphens, and (maybe) periods, all delimited by a colon (:)
The first segment of the macro (and sometimes the only part) can be either a numerical value, or lower case alpha characters, between 1 and 5 characters in length. Examples:
{foo}
{barbaz}
{1}
{1234}
But then, just to make it more complicated, these macros may have "modifiers", which are all separated by colons. These modifiers can be:
Alpha characters one or two characters long (EG: a, ab)
Numerical values (EG: 12, 1, 1123123)
Numerical values with a hyphen somewhere in the middle of the numerical values, or before it (EG: 1-2, -12)
Example Macros
Heres a short list of possible macros that will/can be used, and the Regex array result I'm looking for
Macro: {foo} Regex Match: ["foo"]
Macro: {foo:ab:cd:e:f:g} Regex Match: ["foo","ab","cd","e", "f","g"]
Macro: {bar:1-3} Regex Match: ["bar","1-3"]
Macro: {baz:r:uc} Regex Match: ["baz","r","uc"]
Macro: {quux:1:2:uc} Regex Match: ["quux","1","2","uc"]
Example Paragraph
I need need this query to be able to find multiple macros in a larger paragraph, for example:
My name is {namel:uf}, {namef:uf}, I go to {highschool:uw}. My computer username is {namef:1:l}{namel:l}
Test string: {foo}
Test string: ucfirst: {foo:uf}
Test string reversed/uppercase: {foo:u:r}
First 3 chars of test string: {foo:3}
Last 2 chars of test string: {foo:-2}
And I'm looking for a regex pattern that would return:
[
['namel','uf'],
['namef','uf'],
['highschool','uw'],
['namef','l',0],
['namel','l'],
['foo'],
['foo','uf'],
['foo','u','r'],
['foo',3],
['foo','-2']
]
Progress Thus Far
I've been working on this for a bit, and I'm pretty sure im somewhat close.... The pattern I have right now is:
/\{(([a-z]{1,10}|\d+)+)(\:([a-z]{1,2}|\-?\d+|\d+\-\d+)*)*\}/gm
Then heres the regex101.com instance.
As you can see, it matches the macros just fine, but I'm running into two problems:
Problems
It will match the : character that separates the modifiers
It doesn't seem to match all of the modifiers. Take a look at #2 in the Regex101 test strings, which is {foo:ab:cd:e:f:g}. I would expect the result to be: ["foo","ab","cd","e","f","g"], but instead, it matches ["foo","foo",":g","g"].
Any help would be appreciated! Thank you!
-J
Update
I think I fixed problem #1 listed above, where it was returning some of the : delimiters. All I did was add a ?: to the group that started the pattern by looking for the colon, making it a non-capturing group. (Also made another change with now the numerical values are processed, but thats not relevant)
Heres the new pattern
/\{(([a-z]{1,10}|\d+)+)(?:\:([a-z]{1,2}|\d*\-?\d+)*)*\}/gm
Heres the updated regex101.com example. You can still see that problem #2 persists, meaning it doesn't match EVERY macro modifier, it looks like it just matches the first and the last..
Thanks!

You can use .match() with RegExp /\{\w+\}|\{\w+:(\w+|\d+)\}|\{\w+:(\w+|\d+):(\w+|\d+)\}/g, .map(), .replace() with RegExp /\{|\}/g, .split() with RegExp /:/
var str = `My name is {namel:uf}, {namef:uf}, I go to {highschool:uw}. My computer username is {namef:1:l}{namel:l}
Test string: {foo}
Test string: ucfirst: {foo:uf}
Test string reversed/uppercase: {foo:u:r}
First 3 chars of test string: {foo:3}
Last 2 chars of test string: {foo:-2}`;
var res = str.match(/\{\w+\}|\{\w+:(\w+|\d+)\}|\{\w+:(\w+|\d+):(\w+|\d+)\}/g);
res = res.map(s => s.replace(/\{|\}/g, "").split(/:/));
console.log(res);

JavaScript and regular expressions: get the number of parenthesized subpattern

I have to get the number of parenthesized substring matches in a regular expression:
var reg=/([A-Z]+?)(?:[a-z]*)(?:\([1-3]|[7-9]\))*([1-9]+)/g,
nbr=0;
//Some code
alert(nbr); //2
In the above example, the total is 2: only the first and the last couple of parentheses will create grouping matches.
How to know this number for any regular expressions?
My first idea was to check the value of RegExp.$1 to RegExp.$9, but even if there are no corresponding parenthseses, these values are not null, but empty string...
I've also seen the RegExp.lastMatch property, but this one represents only the value of the last matched characters, not the corresponding number.
So, I've tried to build another regular expression to scan any RegExp and count this number, but it's quite difficult...
Do you have a better solution to do that?
Thanks in advance!

Javascripts RegExp.match() method returns an Array of matches. You might just want to check the length of that result array.
var mystr = "Hello 42 world. This 11 is a string 105 with some 2 numbers 55";
var res = mystr.match(/\d+/g);
console.log( res.length );

Well, judging from the code snippet we can assume that the input pattern is always a valid regular expression, because otherwise it would fail before the some code partm right? That makes the task much easier!
Because We just need to count how many starting capturing parentheses there are!
var reg = /([A-Z]+?)(?:[a-z]*)(?:\([1-3]|[7-9]\))*([1-9]+)/g;
var nbr = (' '+reg.source).match(/[^\\](\\\\)*(?=\([^?])/g);
nbr = nbr ? nbr.length : 0;
alert(nbr); // 2
And here is a breakdown:
[^\\] Make sure we don't start the match with an escaping slash.
(\\\\)* And we can have any number of escaped slash before the starting parenthes.
(?= Look ahead. More on this later.
\( The starting parenthes we are looking for.
[^?] Make sure it is not followed by a question mark - which means it is capturing.
) End of look ahead
Why match with look ahead? To check that the parenthes is not an escaped entity, we need to capture what goes before it. No big deal here. We know JS doens't have look behind.
Problem is, if there are two starting parentheses sticking together, then once we capture the first parenthes the second parenthes would have nothing to back it up - its back has already been captured!
So to make sure a parenthes can be the starting base of the next one, we need to exclude it from the match.
And the space added to the source? It is there to be the back of the first character, in case it is a starting parenthes.

Regular Expression for date validation - Explain

I was surfing online for date validation, but didn't exactly understand the regex. Can anyone explain it? I'm confused with ?, {} and $. Why do we need them?
dateReg = /^[0,1]?\d{1}\/(([0-2]?\d{1})|([3][0,1]{1}))\/(([1]{1}[9]{1}[9]{1}\d{1})|([2-9]{1}\d{3}))$/;

? means “zero or one occurences”.
{x} (where x is a number) means “exactly x occurences”
$ means “end of line”
These are very basic regex, I recommand you to read some documentation.

^ = beginning of the string
[0,1]? = optional zero, one or comma (the comma is probably an error)
\d{1} = exactly one digit (the {1} is redundant)
\/ = a forward slash
[0-2]? = optional zero, one or two (range character class) followed by any single digit (\d{1})
OR [3] = three (character class redundant here) followed by exactly one zero, one or comma
\/ = forward slash
[1]{1}[9]{1}[9]{1}\d{1} = 199 followed by any digit
OR 2-9 followed by any 3 digits
Overall, that's a really poorly written expression. I'd suggest finding a better one, or using a real date parser.

In Javascript you could validate date by passing it to Date.Parse() function. Successful conversion to a date object means you have a valid date.
Wouldn't recommend using regex for this. Too many edge cases and code gets hard to maintain.

? means "Zero or one of the aforementioned"
{n} means "exactly n of the aforementioned"
$ is the end of the String (Thanks #Andy E)

To summarize briefly:
`?' will match 0 or 1 times the pattern group you put in front of it. In this case, it's possibly being misused and should be left out, but it all depends on just what you want to match.
`{x}' tells the regex to match the preceding pattern group exactly x times.
`$' means to match the end of the line.

Well:
^ // start of the text
$ // end of the text
X{n} // number n inside these curly parenthesis define how many exact occurrences of X
X{m,n} // between m to n occurrences of X
X? // 0 or 1 occurrence of X
\d // any digits 0-9
For more help about Javascript date validation please see: Regular Expression to only grab date

Develop Reference

JavaScript is the programming language of the Web.

Use regex to split a string at the first letter - javascript

Related

Regex exact match on number, not digit

Javascript Regex 1-10

Trying to use RegEx (in JS) to get all values delimited by a special character, between two terminators

JavaScript and regular expressions: get the number of parenthesized subpattern

Regular Expression for date validation - Explain

Categories

Resources