Regex to Identify Month from the first 3 letter - javascript

I have a date format of
12March2018
I am trying to get a regular expression to identify the pattern using a regex expression. 1st letter must be numeric.Followed by the first 3 letters of the month. If the word which follows the numeric matches with any first 3 letter of any month it should return true. How do I get it?

Can you try using this regex:
/(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)/g
In your case for identifying March from 12March2018, you can use as:
(?:Mar(?:ch))
Here (?:Mar) is for identifying a month which is denote by 3 character (simple representation). And by using (?:ch), the full name of the month is matched or identified from string.
You can test this here.

if you wanna match groups of months, say, those with 31-days, with all input being first 3 letters, properly cased, here's the most succinct regex I could conjure up to match all 7 of months using just 4 letters :
{m,g,n}awk '/[acgl]/'
Jan J [a] n
Mar M [a] r
May M [a] y
Jul J u [l]
Aug A u [g]
Oct O [c] t
Dec D [c] t
if they're already all lowercased, then
/[^a]a|[cgl]/
/a[^p]|[cgl]/
jan
mar
may
jul
aug
oct
dec
for the 30-day ones, regardless of proper or lowercase,
/p|v|un/
apr a [p] r
jun j [u][n]
sep s e [p]
nov n o [v]
if you wanna match what people frequently describe as then "summer months", despite that not being officially defined :
/u/
Jun J [u] n
Jul J [u] l
Aug A [u] g
just the 4th-quarter months :
/[cv]/
Oct
Nov
Dec
depending on whether you want to maximize the amount of months captured, or require uniqueness, here's a quick reference table as to uniqueness of the first 3 letters, when properly cased :
repeated : unique DFNOS bglotvy
repeated : 2-times AM cnpr
repeated : 3-times J aeu
when they're all uni-cased :
repeated : unique BDFGLSTVY repeated : 3-times EJNU
repeated : 2-times CMOPR repeated : 5-times A
when you include every letter of their full english names, uni-cased:
repeated : unique FGHIV repeated : 6-times U
repeated : 2-times DLPS repeated : 8-times A
repeated : 3-times CJOT repeated : 9-times R
repeated : 4-times NY repeated : 11-times E
repeated : 5-times BM

Related

Regex match only first or only second occurrence

I need to find two digits in a string, but I only know whether the first two digits or the second two digits are needed, don't know where they are in the string exactly and what surrounds them. The strings are dates but the format is, for all intents and purposes, random.
I came up with this so far:
(?<=\D)\d{1,2}(?=\D)
It matches what I need, but I need to stop the matching either after the first or the second occurrence, so it won't find almost everything in these examples:
2019-01-05 23:59:59
2019 01 05 23:59:59
2019. 01. 05. 23:59:59
2019.01.05. 23:59:59
05-01-2019 23:59:59
5-1-2019 23:59:59
05/01/2019 23:59:59
5/1/2019 23:59:59
5 1 2019 23:59:59
05 1 2019 23:59:59
05. 1. 2019 23:59:59
5. 1. 2019 23:59:59
Here basically I want to match either the "1"/"01" or the "5"/"05" in every line.
I already searched a lot of forums but can't find any solution that would help, it seems everywhere the solution is dependent on the string and not exactly a "find only the nth occurrence". In my case that would really be the only thing that solves the problem, at least I couldn't come up with any patter that would definitely match every time. The examples above are not even the only possibilities, any way you can format a date is one of them, the only thing I know for a fact that it's consistent across all documents where I want to search, the date always has separators in it and it comes before the time.
This expression might help you to only get the first occurrence of your desired month and day:
[-\s.]+(\d{2})[-\s.]+(\d{2})[\s\S]*
It is not the best expression to do so, however it may give you the general idea that how this greedy chars [\s\S]* go through the rest of your undesired char.
You can simply change my initial boundaries as you wish, you might use your original expression with minor changes, only followed by [\s\S]*.
Graph
The graph shows how it would work:
You could match a 'date like' pattern and use capturing groups to extract the month or the day part. The month and day part are either in the first or third capturing group and to get a consistent delimiters you could use backreferences to the capturing groups.
To get the separate values you can split on matching not a digit \D
(?:\d{4}(([- .]|\. )\d{2}\2\d{2})\.? |(\d{1,2}([-\/ ]|\. )\d{1,2})\4\d{4} )\d{2}:\d{2}:\d{2}
(?: Non capturing group
\d{4} Match 4 digits
( Capture group 1
( Capture group 2
[- .]|\. Match either -, /, space OR dot and space
) Close capture group 2
\d{2}\2\d{2} Match 2 digits, backreference to group 2, 2 digits
) Close group 1
\.? Match optional dot and space
| Or
( Capture group 3
\d{1,2} match 1-2 digits
([-\/ ]|\.? ) Group 4, match either -, /, space OR dot and space
\d{1,2})\4\d{4} Match 1-2 digits, backreference to group 4 and 4 digits
) Close non capturing group
\d{2}:\d{2}:\d{2} Match the 'time like' part
Regex demo
For example:
let pattern = /(?:\d{4}(([- .]|\. )\d{2}\2\d{2})\.? |(\d{1,2}([-\/ ]|\.? )\d{1,2})\4\d{4} )\d{2}:\d{2}:\d{2}/;
[
"2019-01-05 23:59:59",
"2019 01 05 23:59:59",
"2019. 01. 05. 23:59:59",
"2019.01.05. 23:59:59",
"05-01-2019 23:59:59",
"5-1-2019 23:59:59",
"05/01/2019 23:59:59",
"5/1/2019 23:59:59",
"5 1 2019 23:59:59",
"05 1 2019 23:59:59",
"05. 1. 2019 23:59:59",
"5. 1. 2019 23:59:59"
].forEach(s => {
let m = s.match(pattern);
let res = m[1] || m[3];
console.log(res.split(/\D+/).filter(Boolean));
});
As an alternative based on your pattern, instead of using a 2 positive lookaheads (?=, you could use 2 negative lookaheads (?! to assert what is on the left and what is on the right is not a digit but this will not take the date like pattern into account.
To get only a single match you can leave out the /g global flag
/(?<!\d)\d{1,2}(?!\d)/
Regex demo
Note that the lookbehind is not widely supported and will work in Chrome.
[
"2019-01-05 23:59:59",
"2019 01 05 23:59:59",
"2019. 01. 05. 23:59:59",
"2019.01.05. 23:59:59",
"05-01-2019 23:59:59",
"5-1-2019 23:59:59",
"05/01/2019 23:59:59",
"5/1/2019 23:59:59",
"5 1 2019 23:59:59",
"05 1 2019 23:59:59",
"05. 1. 2019 23:59:59",
"5. 1. 2019 23:59:59"
].forEach(s => console.log(s.match(/(?<!\d)\d{1,2}(?!\d)/)[0]));
Without the lookbehind you might use a capturing group and start the match with the start of the string ^ or not a digit \D:
(?:^|\D)(\d{1,2})(?!\d)
Regex demo

Regex for timestamp string

I'm looking to target the following string using jQuery. Specifically, I need to wrap it in a strong tag for styling purposes. I can't change the source data. My regex-fu is pathetic. Any suggestions?
Nov 18, 2013, 4pm CST:
Thanks guys - these are excellent answers. I should have been slightly more specific - I need to match all occurrences of this format within a collection, e.g.:
$('.admin-comments').match(/[A-Z]{1}[a-z]{2}\s[0-9]{1,2},\s[0-9]{4},\s[0-9]{1,2}[a|p]m\s[A-Z]{3}/)
(I have a log of comments and I'm trying to wrap the timestamp in a strong element.)
Edit: Final Working Solution
var adminComment = $('.admin-comments');
if (adminComment.length) {
var adminCommentTxt = adminComment.text();
var formatCommentTimestamp = adminCommentTxt.replace(/([A-Z]{1}[a-z]{2}\s[0-9]{1,2},\s[0-9]{4},\s[0-9\s]{1,2}[ap]m\s[A-Z]{3}\:)/g, "<strong>$1</strong>");
adminComment.html(formatCommentTimestamp);
}
Here you go: /^[A-Z]{1}[a-z]{2}\s[0-9]{1,2},\s[0-9]{4},\s[0-9]{1,2}[a|p]m\s[A-Z]{3}\:$/
'Nov 18, 2013, 4pm CST'.match(/^[A-Z]{1}[a-z]{2}\s[0-9]{1,2},\s[0-9]{4},\s[0-9]{1,2}[a|p]m\s[A-Z]{3}\:$/)
["Nov 18, 2013, 4pm CST"]
Keep in mind that this regex is expecting the line to start and end with this date, if the date is contained within other text, remove the ^ from the start and the $ from the end.
Hope this helps.
To further explain the regex and hopefully ++ your "regex-fu"
[A-Z]{1} - match one upper case letter
[a-z]{2} - match two lower case letters
So far we are at Nov, Oct, Jan, etc.
\s - space
[0-9]{1,2} - a 1 (min) or 2 (max) digit number
, - literal comma
\s - space
[0-9]{4} - a 4 digit number (the year)
So now we have matched: Nov 18, 2013
, - literal comma
\s - space
[0-9]{1,2} - just like before, a one or two digit number
[a|p]m - 'a' or 'p' followed by an 'm'
Now we've matched: Nov 18, 2013, 4pm
[A-Z]{3} An upper case three character string
\: literal colon
That is the entire string.
Putting ^ at the beginning of the regex means the text we are matching against MUST begin with the pattern; similarly, the $ states that the text we are matching MUST end with the pattern.
Good luck!
Rob M's answer is perfectly valid, and a more generic version than mine and may be exactly what you're looking for. However, if you want to be more specific with your months and time zones this may be useful for you:
(Oct)?(Nov)?\s\d{1,2},\s\d{4},\s\d{1,2}(pm)?(am)?\s(CST)?(PST)?:
This will match all of:
Nov 18, 2013, 4pm CST:
Oct 8, 2011, 11am PST:
Nov 02, 1981, 2am PST:
Oct 31, 1843, 12pm CST:
If you needed more months, you simply add each one like so:
(Mmm)? where Mmm corresponds to the month you want to match.
Similarly if you need more time zones, you'd add them like so:
(ZZZ)? where ZZZ corresponds to the timezone you want to match.
Similar to Rob's answer, if your date string is the only thing on the line, you could add the /^ and &/ prefix & suffix.

Need a regex that makes sure the number starts with 01 02 or 08 and is 10 or 11 digits long

Am having hard time crediting a regex to check if a number starts with 01, 02 or 08 and is ten or eleven digits long.
For example I want numbers formatted like this to pass:
01614125745
02074125475
0845895412
08004569321
and numbers other forms to fail, number like:
0798224141
441544122444
0925456754
and so on. Basically, any number that does not start with 01 02 or 08.
The following pattern follows the criteria:
/^(?=\d{10,11}$)(01|02|08)\d+/ # This pattern can easily be extended
/^(?=\d{10,11}$)0[128]\d{8,9}/ # Same effect
Explanation:
^ Begin of string
(?=\d{10,11}$) Followed by 10 or 11 digits, followed by the end of the string
(01|02|08) 01, 02 or 08
\d+ The remaining digits
Rob W's answer is correct. But I'd use a simpler expression like this: (No need for lookahead)
/^0[128][0-9]{8,9}$/

How can I recognize these "default-more-or-less" type of strings?

I have some strings like these :
00. 00:00:00 -
00. - 00:00:00 -
00. 00:00:00
00 - 00:00:00
00) 00:00:00
so, how can you see, they are similar (not equal). I need to "extrapolate" the internal block (formed by 00:00:00) and remove the rest of characters.
Every 0 in the example must be integer from 0 to 9 or the char ?.
How can I do it on jQuery? Regex?
As starting point to check this I've made a Fiddle
var result = strings.match(/[\d?]{2}:[\d?]{2}:[\d?]{2}/g);
result will be an array with all the matches of NN:NN:NN, ??:??:??, or any combined version of N and ?.
Here is a demo for a string with a single match & a string with multiple matches.

Javascript function

The function in the tutorial copied below returns Thu Apr 27 2006 00:00:00 GMT+0900 (Japan Standard Time)
Can someone explain what the number pairs 11.4, 8,2 and 5,2 do, and why is one of the numbered pairs followed by -1? I assume those numbered pairs are passed into function number at as values for start and length? is that correct? by why those specific numbers and what`s with the -1?
function extractDate(paragraph) {
function numberAt(start, length) {
return Number(paragraph.slice(start, start + length));
}
return new Date(numberAt(11, 4), numberAt(8, 2) - 1,
numberAt(5, 2));
}
show(extractDate("died 27-04-2006: Black Leclère"));
It's telling your method where to start cutting off a piece of the paragraph and how many characters it should cut.
11,4 mean that it should start at the 11th character and chop of 4 characters from there. Keep in mind you start from 0.
d i e d 2 7 - 0 4 - 2 0 0 6 : B l a c k L e c l è r e
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
11, 4 - 11th character = 2. Then 4 characters from there = 6 | So 2006
8, 2 - 8th char = 0. 2 from there = 4| so 04
Basically, your method cuts those pieces and then creates a date with that.
EDIT
By cut, I mean that it returns a part of that string, but doesn't do anything to the original string.
See here for reference : http://www.w3schools.com/jsref/jsref_slice_array.asp
The pairs correspond to start location and length of substrings (e.g. 11,4 means the substring starting at the 12th character and ending on the 15th inclusive).
11,4 is the year. 8,2 is the month. 5,2 is the day. The -1 is there because as JohnP mentions, months for the Date() function start at 0 for January (starting index) and not 1 (popular vernacular).
EDIT: Cleared up some wording.
extract date gets a "paragraph" string, in which case, it is
"died 27-04-2006: Black Leclère"
now your number at just gets a number starting from the nth character of your string input. so int your first number at(numberAt(11,4), it gets the 11th character and the next 4 letters..which in the string "died 27-04-2006: Black Leclère" is '2006'. numberAt(8,2) gives you the 8th character which is 0 and only gets 2 characters so it returns '04'. you subtract 1 from it so it gives you '03' then numberAt(5,2) gives you '27'.
The function numberAt() returns a number. It does this by extracting a string from the paragraph, and converting that string to a number. The function's parameters start, and length specify what part of the paragraph should be extracted.
So numberAt(11,4) would extract a 4-digit string, starting with the 12'th character in the paragraph. Assuming that this string contains only digits, it will be converted to a 4-digit number, and returned.
The - 1 is part of an arithmetic expression: numberAt(8,2) - 1. The result will be one less than whatever number is returned by numberAt(8,2).

Categories

Resources