Regex match only first or only second occurrence

Regex match only first or only second occurrence - javascript

I need to find two digits in a string, but I only know whether the first two digits or the second two digits are needed, don't know where they are in the string exactly and what surrounds them. The strings are dates but the format is, for all intents and purposes, random.
I came up with this so far:
(?<=\D)\d{1,2}(?=\D)
It matches what I need, but I need to stop the matching either after the first or the second occurrence, so it won't find almost everything in these examples:
2019-01-05 23:59:59
2019 01 05 23:59:59
2019. 01. 05. 23:59:59
2019.01.05. 23:59:59
05-01-2019 23:59:59
5-1-2019 23:59:59
05/01/2019 23:59:59
5/1/2019 23:59:59
5 1 2019 23:59:59
05 1 2019 23:59:59
05. 1. 2019 23:59:59
5. 1. 2019 23:59:59
Here basically I want to match either the "1"/"01" or the "5"/"05" in every line.
I already searched a lot of forums but can't find any solution that would help, it seems everywhere the solution is dependent on the string and not exactly a "find only the nth occurrence". In my case that would really be the only thing that solves the problem, at least I couldn't come up with any patter that would definitely match every time. The examples above are not even the only possibilities, any way you can format a date is one of them, the only thing I know for a fact that it's consistent across all documents where I want to search, the date always has separators in it and it comes before the time.

This expression might help you to only get the first occurrence of your desired month and day:
[-\s.]+(\d{2})[-\s.]+(\d{2})[\s\S]*
It is not the best expression to do so, however it may give you the general idea that how this greedy chars [\s\S]* go through the rest of your undesired char.
You can simply change my initial boundaries as you wish, you might use your original expression with minor changes, only followed by [\s\S]*.
Graph
The graph shows how it would work:

You could match a 'date like' pattern and use capturing groups to extract the month or the day part. The month and day part are either in the first or third capturing group and to get a consistent delimiters you could use backreferences to the capturing groups.
To get the separate values you can split on matching not a digit \D
(?:\d{4}(([- .]|\. )\d{2}\2\d{2})\.? |(\d{1,2}([-\/ ]|\. )\d{1,2})\4\d{4} )\d{2}:\d{2}:\d{2}
(?: Non capturing group
\d{4} Match 4 digits
( Capture group 1
( Capture group 2
[- .]|\. Match either -, /, space OR dot and space
) Close capture group 2
\d{2}\2\d{2} Match 2 digits, backreference to group 2, 2 digits
) Close group 1
\.? Match optional dot and space
| Or
( Capture group 3
\d{1,2} match 1-2 digits
([-\/ ]|\.? ) Group 4, match either -, /, space OR dot and space
\d{1,2})\4\d{4} Match 1-2 digits, backreference to group 4 and 4 digits
) Close non capturing group
\d{2}:\d{2}:\d{2} Match the 'time like' part
Regex demo
For example:
let pattern = /(?:\d{4}(([- .]|\. )\d{2}\2\d{2})\.? |(\d{1,2}([-\/ ]|\.? )\d{1,2})\4\d{4} )\d{2}:\d{2}:\d{2}/;
[
"2019-01-05 23:59:59",
"2019 01 05 23:59:59",
"2019. 01. 05. 23:59:59",
"2019.01.05. 23:59:59",
"05-01-2019 23:59:59",
"5-1-2019 23:59:59",
"05/01/2019 23:59:59",
"5/1/2019 23:59:59",
"5 1 2019 23:59:59",
"05 1 2019 23:59:59",
"05. 1. 2019 23:59:59",
"5. 1. 2019 23:59:59"
].forEach(s => {
let m = s.match(pattern);
let res = m[1] || m[3];
console.log(res.split(/\D+/).filter(Boolean));
});
As an alternative based on your pattern, instead of using a 2 positive lookaheads (?=, you could use 2 negative lookaheads (?! to assert what is on the left and what is on the right is not a digit but this will not take the date like pattern into account.
To get only a single match you can leave out the /g global flag
/(?<!\d)\d{1,2}(?!\d)/
Regex demo
Note that the lookbehind is not widely supported and will work in Chrome.
[
"2019-01-05 23:59:59",
"2019 01 05 23:59:59",
"2019. 01. 05. 23:59:59",
"2019.01.05. 23:59:59",
"05-01-2019 23:59:59",
"5-1-2019 23:59:59",
"05/01/2019 23:59:59",
"5/1/2019 23:59:59",
"5 1 2019 23:59:59",
"05 1 2019 23:59:59",
"05. 1. 2019 23:59:59",
"5. 1. 2019 23:59:59"
].forEach(s => console.log(s.match(/(?<!\d)\d{1,2}(?!\d)/)[0]));
Without the lookbehind you might use a capturing group and start the match with the start of the string ^ or not a digit \D:
(?:^|\D)(\d{1,2})(?!\d)
Regex demo

Related

Regex to Identify Month from the first 3 letter

I have a date format of
12March2018
I am trying to get a regular expression to identify the pattern using a regex expression. 1st letter must be numeric.Followed by the first 3 letters of the month. If the word which follows the numeric matches with any first 3 letter of any month it should return true. How do I get it?

Can you try using this regex:
/(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)/g
In your case for identifying March from 12March2018, you can use as:
(?:Mar(?:ch))
Here (?:Mar) is for identifying a month which is denote by 3 character (simple representation). And by using (?:ch), the full name of the month is matched or identified from string.
You can test this here.

if you wanna match groups of months, say, those with 31-days, with all input being first 3 letters, properly cased, here's the most succinct regex I could conjure up to match all 7 of months using just 4 letters :
{m,g,n}awk '/[acgl]/'
Jan J [a] n
Mar M [a] r
May M [a] y
Jul J u [l]
Aug A u [g]
Oct O [c] t
Dec D [c] t
if they're already all lowercased, then
/[^a]a|[cgl]/
/a[^p]|[cgl]/
jan
mar
may
jul
aug
oct
dec
for the 30-day ones, regardless of proper or lowercase,
/p|v|un/
apr a [p] r
jun j [u][n]
sep s e [p]
nov n o [v]
if you wanna match what people frequently describe as then "summer months", despite that not being officially defined :
/u/
Jun J [u] n
Jul J [u] l
Aug A [u] g
just the 4th-quarter months :
/[cv]/
Oct
Nov
Dec
depending on whether you want to maximize the amount of months captured, or require uniqueness, here's a quick reference table as to uniqueness of the first 3 letters, when properly cased :
repeated : unique DFNOS bglotvy
repeated : 2-times AM cnpr
repeated : 3-times J aeu
when they're all uni-cased :
repeated : unique BDFGLSTVY repeated : 3-times EJNU
repeated : 2-times CMOPR repeated : 5-times A
when you include every letter of their full english names, uni-cased:
repeated : unique FGHIV repeated : 6-times U
repeated : 2-times DLPS repeated : 8-times A
repeated : 3-times CJOT repeated : 9-times R
repeated : 4-times NY repeated : 11-times E
repeated : 5-times BM

Is there a way to have a regExp that cover two different kind of Dates?

I found several answers here that covers similar kind of dates but not different ones.
For example I would like to have a RegExp that matches both cases:
2017-07-28T06:33:45.206Z
Tue Jul 25 2017 00:00:00 GMT+0200 (CEST)
28-7-2017
Not as strings and numbers of course but as dates.
I'm using Reactive Forms so I'm using the Validator.pattern(myRegExp) so that would be very convenient.

Use an alternation (|):
/optiona|optionb/
From MDN:
x|y
Matches either 'x' or 'y'.
For example, /green|red/ matches 'green' in "green apple" and 'red' in "red apple."
The operands to | are fairly greedy, but you may well need groups (probably non-capturing ones) around the alternatives, e.g.:
/(?:optiona)|(?:optionb)/
...depending on how you write it.

Regex for timestamp string

I'm looking to target the following string using jQuery. Specifically, I need to wrap it in a strong tag for styling purposes. I can't change the source data. My regex-fu is pathetic. Any suggestions?
Nov 18, 2013, 4pm CST:
Thanks guys - these are excellent answers. I should have been slightly more specific - I need to match all occurrences of this format within a collection, e.g.:
$('.admin-comments').match(/[A-Z]{1}[a-z]{2}\s[0-9]{1,2},\s[0-9]{4},\s[0-9]{1,2}[a|p]m\s[A-Z]{3}/)
(I have a log of comments and I'm trying to wrap the timestamp in a strong element.)
Edit: Final Working Solution
var adminComment = $('.admin-comments');
if (adminComment.length) {
var adminCommentTxt = adminComment.text();
var formatCommentTimestamp = adminCommentTxt.replace(/([A-Z]{1}[a-z]{2}\s[0-9]{1,2},\s[0-9]{4},\s[0-9\s]{1,2}[ap]m\s[A-Z]{3}\:)/g, "<strong>$1</strong>");
adminComment.html(formatCommentTimestamp);
}

Here you go: /^[A-Z]{1}[a-z]{2}\s[0-9]{1,2},\s[0-9]{4},\s[0-9]{1,2}[a|p]m\s[A-Z]{3}\:$/
'Nov 18, 2013, 4pm CST'.match(/^[A-Z]{1}[a-z]{2}\s[0-9]{1,2},\s[0-9]{4},\s[0-9]{1,2}[a|p]m\s[A-Z]{3}\:$/)
["Nov 18, 2013, 4pm CST"]
Keep in mind that this regex is expecting the line to start and end with this date, if the date is contained within other text, remove the ^ from the start and the $ from the end.
Hope this helps.
To further explain the regex and hopefully ++ your "regex-fu"
[A-Z]{1} - match one upper case letter
[a-z]{2} - match two lower case letters
So far we are at Nov, Oct, Jan, etc.
\s - space
[0-9]{1,2} - a 1 (min) or 2 (max) digit number
, - literal comma
\s - space
[0-9]{4} - a 4 digit number (the year)
So now we have matched: Nov 18, 2013
, - literal comma
\s - space
[0-9]{1,2} - just like before, a one or two digit number
[a|p]m - 'a' or 'p' followed by an 'm'
Now we've matched: Nov 18, 2013, 4pm
[A-Z]{3} An upper case three character string
\: literal colon
That is the entire string.
Putting ^ at the beginning of the regex means the text we are matching against MUST begin with the pattern; similarly, the $ states that the text we are matching MUST end with the pattern.
Good luck!

Rob M's answer is perfectly valid, and a more generic version than mine and may be exactly what you're looking for. However, if you want to be more specific with your months and time zones this may be useful for you:
(Oct)?(Nov)?\s\d{1,2},\s\d{4},\s\d{1,2}(pm)?(am)?\s(CST)?(PST)?:
This will match all of:
Nov 18, 2013, 4pm CST:
Oct 8, 2011, 11am PST:
Nov 02, 1981, 2am PST:
Oct 31, 1843, 12pm CST:
If you needed more months, you simply add each one like so:
(Mmm)? where Mmm corresponds to the month you want to match.
Similarly if you need more time zones, you'd add them like so:
(ZZZ)? where ZZZ corresponds to the timezone you want to match.
Similar to Rob's answer, if your date string is the only thing on the line, you could add the /^ and &/ prefix & suffix.

Need a regex that makes sure the number starts with 01 02 or 08 and is 10 or 11 digits long

Am having hard time crediting a regex to check if a number starts with 01, 02 or 08 and is ten or eleven digits long.
For example I want numbers formatted like this to pass:
01614125745
02074125475
0845895412
08004569321
and numbers other forms to fail, number like:
0798224141
441544122444
0925456754
and so on. Basically, any number that does not start with 01 02 or 08.

The following pattern follows the criteria:
/^(?=\d{10,11}$)(01|02|08)\d+/ # This pattern can easily be extended
/^(?=\d{10,11}$)0[128]\d{8,9}/ # Same effect
Explanation:
^ Begin of string
(?=\d{10,11}$) Followed by 10 or 11 digits, followed by the end of the string
(01|02|08) 01, 02 or 08
\d+ The remaining digits

Rob W's answer is correct. But I'd use a simpler expression like this: (No need for lookahead)
/^0[128][0-9]{8,9}$/

How can I recognize these "default-more-or-less" type of strings?

I have some strings like these :
00. 00:00:00 -
00. - 00:00:00 -
00. 00:00:00
00 - 00:00:00
00) 00:00:00
so, how can you see, they are similar (not equal). I need to "extrapolate" the internal block (formed by 00:00:00) and remove the rest of characters.
Every 0 in the example must be integer from 0 to 9 or the char ?.
How can I do it on jQuery? Regex?
As starting point to check this I've made a Fiddle

var result = strings.match(/[\d?]{2}:[\d?]{2}:[\d?]{2}/g);
result will be an array with all the matches of NN:NN:NN, ??:??:??, or any combined version of N and ?.
Here is a demo for a string with a single match & a string with multiple matches.

Develop Reference

JavaScript is the programming language of the Web.

Regex match only first or only second occurrence - javascript

Related

Regex to Identify Month from the first 3 letter

Is there a way to have a regExp that cover two different kind of Dates?

Regex for timestamp string

Need a regex that makes sure the number starts with 01 02 or 08 and is 10 or 11 digits long

How can I recognize these "default-more-or-less" type of strings?

Categories

Resources