Regex to match a rather complex string - javascript

Any experts on Regex, that could potentially find a pattern on this data, im looking for one that will match exactly, down to spaces and commas and dashes. Here is the sample data of what i need to match:
word word, alphanumeric-PRT-word-number
word word, alphanumeric-PRT-number
-word: any size word
-alphanumeric: 3 letters and up to 2 numbers, so XXX# or XXX##
-number: up to 3 digits, so # or ## or ###
-PRT: is the only static value here
NOTE: no other punctuation other than the spaces, comma and dashes where they are.
So far have something close to it but rather clunky and it doesnt cover all bases, i built it here: http://buildregex.com/ using their logic and it kinda works:
/(?:[^_\ ]+)(?:\ )(?:[^_\ ]+), (?:[^_\ ]+)-PRT-(?:[^_\ ]*)/gi
If any can assist in refining this that will be welcome
https://regex101.com/r/8cc52u/2
Thanks a lot

Here's one way to do it:
/^[a-z]+\s[a-z]+,\s[a-z]{3}\d{1,2}-prt-([a-z]+-){0,1}\d{1,3}$/gi
^: start of line
[a-z]+: one or more letters
\s: any space character
[a-z]+: one or more letters
,: ,
\s: any space character
[a-z]{3}: three letters
\d{1,2}: one or two digits
-prt-: -prt-
([a-z]+-){0,1}: one or more letters followed by -, zero or one time
\d{1,3}: one, two or three digits
$: end of line
Example: https://regex101.com/r/BhS8kM/5
Or, as suggested by revo:
/^[a-z]+ [a-z]+, [a-z]{3}\d{1,2}-prt-([a-z]+-)?\d{1,3}$/gi
Example: https://regex101.com/r/BhS8kM/7

Related

Regex for a valid hashtag

I need regular expression for validating a hashtag. Each hashtag should starts with hashtag("#").
Valid inputs:
1. #hashtag_abc
2. #simpleHashtag
3. #hashtag123
Invalid inputs:
1. #hashtag#
2. #hashtag#hashtag
I have been trying with this regex /#[a-zA-z0-9]/ but it is accepting invalid inputs also.
Any suggestions for how to do it?
The current accepted answer fails in a few places:
It accepts hashtags that have no letters in them (i.e. "#11111", "#___" both pass).
It will exclude hashtags that are separated by spaces ("hey there #friend" fails to match "#friend").
It doesn't allow you to place a min/max length on the hashtag.
It doesn't offer a lot of flexibility if you decide to add other symbols/characters to your valid input list.
Try the following regex:
/(^|\B)#(?![0-9_]+\b)([a-zA-Z0-9_]{1,30})(\b|\r)/g
It'll close up the above edge cases, and furthermore:
You can change {1,30} to your desired min/max
You can add other symbols to the [0-9_] and [a-zA-Z0-9_] blocks if you wish to later
Here's a link to the demo.
To answer the current question...
There are 2 issues:
[A-z] allows more than just letter chars ([, , ], ^, _, ` )
There is no quantifier after the character class and it only matches 1 char
Since you are validating the whole string, you also need anchors (^ and $)to ensure a full string match:
/^#\w+$/
See the regex demo.
If you want to extract specific valid hashtags from longer texts...
This is a bonus section as a lot of people seek to extract (not validate) hashtags, so here are a couple of solutions for you. Just mind that \w in JavaScript (and a lot of other regex libraries) equal to [a-zA-Z0-9_]:
#\w{1,30}\b - a # char followed with one to thirty word chars followed with a word boundary
\B#\w{1,30}\b - a # char that is either at the start of string or right after a non-word char, then one to thirty word (i.e. letter, digit, or underscore) chars followed with one to thirty word chars followed with a word boundary
\B#(?![\d_]+\b)(\w{1,30})\b - # that is either at the start of string or right after a non-word char, then one to thirty word (i.e. letter, digit, or underscore) chars (that cannot be just digits/underscores) followed with a word boundary
And last but not least, here is a Twitter hashtag regex from https://github.com/twitter/twitter-text/tree/master/js... Sorry, too long to paste in the SO post, here it is: https://gist.github.com/stribizhev/715ee1ee2dc1439ffd464d81d22f80d1.
You could try the this : /#[a-zA-Z0-9_]+/
This will only include letters, numbers & underscores.
A regex code that matches any hashtag.
In this approach any character is accepted in hashtags except main signs !##$%^&*()
(?<=(\s|^))#[^\s\!\#\#\$\%\^\&\*\(\)]+(?=(\s|$))
Usage Notes
Turn on "g" and "m" flags when using!
It is tested for Java and JavaScript languages via https://regex101.com and VSCode tools.
It is available on this repo.
Unicode general categories can help with that task:
/^#[\p{L}\p{Nd}_]+$/gu
I use \p{L} and \p{Nd} unicode categories to match any letter or decimal digit number. You can add any necessary category for your regex. The complete list of categories can be found here: https://unicode.org/reports/tr18/#General_Category_Property
Regex live demo:
https://regexr.com/5tvmo
useful and tested regex for detecting hashtags in the text
/(^|\s)(#[a-zA-Z\d_]+)/ig
examples of valid matching hashtag:
#abc
#ab_c
#ABC
#aBC
/\B(?:#|#)((?![\p{N}_]+(?:$|\b|\s))(?:[\p{L}\p{M}\p{N}_]{1,60}))/ug
allow any language characters or characters with numbers or _.
numbers alone or numbers with _ are not allowed.
It's unicode regex, so if you are using Python, you may need to install regex.
to test it https://regex101.com/r/NLHUQh/1

How to write repeating Regex pattern in JavaScript

How to write a regex for the following pattern in JavaScript:
1|dc35_custom|3;od;CZY;GL|2;ob;BNP;MT|4;sd;ABC;MT|5;ih;DFT;FR|6;oh;AQW;MT|7;ip;CAN;MT|8;op;CAR;MT|9;ec;SMO;GL|10;do;CZT;KU|
where
the first part 1|dc35_custom| is fixed.
the second part onwards, the pattern repeats 9 times(i.e. 3;od;CZY;GL| 2;ob;BNP;MT| and so on.
The 1st character in it ranges from 2-11 and should not repeat. For example 3 appears in the first pattern, so should not appear again.
I'm making a lot of assumptions with this, but here's a crack at it:
1\|dc35_custom\|(([2-9]|10|11);[a-z]{2};[A-Z]{3};[A-Z]{2}\|){9}
How it works
1\|dc35_custom\| is just literal text, escaping the vertical bar operators
([2-9]|10|11) will match any number from 2 to 11.
[a-z]{2} will match two lowercase letters
[A-Z]{3} will match three uppercase letters
[A-Z]{2} will match two uppercase letters
{9} looks for nine consecutive matches of the entire sequence enclosed in parentheses
It will not, as Amadan points out, check for uniqueness, because that's a bit beyond what regex is for.
A bit tricky, but here you go
The regex: /^1\|dc35_custom(?:\|([2-9]|1[01]);[a-z]{2};[A-Z]{1,3};[A-Z]{1,2}){9}\|$/
and the Unit tests: https://regex101.com/r/lU6sJ6/2 (hit 'Unit Tests' on the left)
I assume the following:
the first group WILL ALWAYS BE THE SAME
The first part of the pattern 3;od;CZY;GL is a number between 2-11 and NO NUMBER CAN REPEAT
The second part is lowercase letters a-z, exactly two of them
The third part is uppercase letters A-Z, between 1 and 3 ( the {1,3} thing, you can change it to {3} if it's exact)
The fourth and last part is between 1 and 2 uppercase letters A-Z

Regular expression to include numeric only or character only or ignore first two conditions if alpha numeric

I wrote Regular expression for the below cases :
only numbers(length:4)
only alphabets(should contain vowel)
([0-9]{1,4})|((?=[a-z]*[aeiou])[a-z]*)
eg: 9987, tyde
How to add the below condition?
Ignore the first two cases if the string contains alphanumeric
characters.
eg: 9ty87
If I decypher well your question, I think your are looking for that:
a string with only digits and between one and four characters
a string with only letters with at least a vowel
a string with only letters and digits with at least one letter and one digit.
pattern:
/^(?:[0-9]{1,4}|[bcdfghj-np-tv-z]*[aeiou][a-z]*|[a-z]+[0-9][a-z0-9]*|[0-9]+[a-z][a-z0-9]*)$/i
or more factorized
/^(?:[0-9]{1,4}(?:[0-9]*[a-z][a-z0-9]*)?|[bcdfghj-np-tv-z]*(?:[aeiou][a-z]*|[a-z]+[0-9][a-z0-9]*))$/i
It is a simple alternation (I don't think you need something more complicated). So only one of the branches will succeed.
Note that anchors ^ and $ are essential for this kind of task to ensure that whole string is taken in account.

Match backwards from a given word with javascript

Using Javascript, I need to find an occurrence of a phrase in some text then match everything from it back to the last occurrence of a 5 digit number. (or at least thats the best way I know how to describe what I need)
Consider the following text:
24854
Random words
Ending Words
34975
Random words
Ending Words
47593
Random words
Ending Words
Target Word
32302
Random words
Ending Words
Given the above, I'd like my regex to match Every thing from 47593 to Target Word.
Each match should include both 47593 and Target Word
It needs to be greedy in that there will be multiple matches in my actual text and I need them all returned in an array.
This is what I've tried: .match(/[0-9]{5}[\s\S]+?Target Word/g)
My problem (as always with these) is the new lines. In order to match across multiple lines, I'm using [\s\S] but doing so makes the regex match everything from the first 5 digit number to the first occurrence of Target Word
How can I change this to achieve the desired result? I'm thinking I need to use lookbehind but most examples I've found have been very confusing for me.
You could use negative lookahead,
[0-9]{5}(?:(?![0-9]{5})[\S\s])*?Target\s*Word
DEMO
The above negative lookahead (?:(?![0-9]{5})[\S\s])* asserts that after the 5 digit number, match any space or non-space character zero or more times but it must not be a 5 digit number.
if there are no 5 digit pattern in the random words, you may perhaps use
/([\d]{5}(?:[^\d]{5})+?Target Word)/gm
demo here

simple regex to matching multiple word with spaces/multiple space or no spaces

I am trying to match all words with single or multiple spaces. my expression
(\w+\s*)* is not working
edit 1:
Let say i have a sentence in this form
[[do "hi i am bob"]]
[[do "hi i am Bob"]]
now I have to replace this with
cool("hi i am bob") or
cool("hi i am Bob")
I do not care about replacing multiple spaces with single .
I can achieve this for a single word like
\[\[do\"(\w+)\"\]\] and replacing regex cool\(\"$1\") but this does not look like an effective solution and does not match multiple words ....
I apologies for incomplete question
any help will be aprecciated
Find this Regular Expression:
/\[\[do\s+("[\w\s]+")\s*\]\]/
And do the following replacement:
'cool($1)'
The only special thing that's being done here is using character classes to our advantage with
[\w\s]+
Matches one or more word or space characters (a-z, A-Z, 0-9, _, and whitespace). That';; eat up your internal stuff no problem.
'[[do "hi i am Bob"]]'.replace(/\[\[do\s+("[\w\s]+")\s*\]\]/, 'cool($1)')
Spits out
cool("hi i am Bob")
Though - if you want to add punctuation (which you probably will), you should do it like this:
/\[\[do\s+("[^"]+")\s*\]\]/
Which will match any character that's not a double quote, preserving your substring. There are more complicated ones to allow you to deal with escaped quotation marks, but I think that's outside the scope of this question.
To match "all words with single or multiple spaces", you cannot use \s*, as it will match even no spaces.
On the other hand, it looks like you want to match even "hi", which is one word with no spaces.
You probably want to match one or more words separated by spaces. If so, use regex pattern
(\w+(?:$|\s+))+
or
\w+(\s+\w+)*
I'm not sure, but maybe this is what you're trying to get:
"Hi I am bob".match(/\b\w+\b/g); // ["Hi", "I", "am", "bob"]
Use regex pattern \w+(\s+\w+)* as follows:
m = s.match(/\w+(\s+\w+)*/g);
Simple. Match all groups of characters that are not white spaces
var str = "Hi I am Bob";
var matches = str.match(/[^ ]+/g); // => ["Hi", "I", "am", "Bob"]
What your regex is doing is:
/([a-zA-Z0-9_]{1,}[ \r\v\n\t\f]{0,}){0,}/
That is, find the first match of one or more of A through Z bother lower and upper along with digits and underscore, then followed by zero or more space characters which are:
A space character
A carriage return character
A vertical tab character
A new line character
A tab character
A form feed character
Then followed by zero or more of A through Z bother lower and upper along with digits and underscore.
\s matches more than just simple spaces, you can put in a literal space, and it will work.
I believe you want:
/(\w+ +\w+)/g
Which all matches of one or more of A through Z bother lower and upper along with digits and underscore, followed by one or more spaces, then followed by one or more of A through Z bother lower and upper along with digits and underscore.
This will match all word-characters separated by spaces.
If you just want to find all clusters of word characters, without punctuation or spaces, then, you would use:
/(\w+)/g
Which will find all word-characters that are grouped together.
var regex=/\w+\s+/g;
Live demo: http://jsfiddle.net/GngWn/
[Update] I was just answering the question, but based on the comments this is more likely what you're looking for:
var regex=/\b\w+\b/g;
\b are word boundaries.
Demo: http://jsfiddle.net/GngWn/2/
[Update2] Your edit makes it a completely different question:
string.replace(/\[\[do "([\s\S]+)"\]\]/,'cool("$1")');
Demo: http://jsfiddle.net/GngWn/3/

Categories

Resources