Match backwards from a given word with javascript - javascript

Using Javascript, I need to find an occurrence of a phrase in some text then match everything from it back to the last occurrence of a 5 digit number. (or at least thats the best way I know how to describe what I need)
Consider the following text:
24854
Random words
Ending Words
34975
Random words
Ending Words
47593
Random words
Ending Words
Target Word
32302
Random words
Ending Words
Given the above, I'd like my regex to match Every thing from 47593 to Target Word.
Each match should include both 47593 and Target Word
It needs to be greedy in that there will be multiple matches in my actual text and I need them all returned in an array.
This is what I've tried: .match(/[0-9]{5}[\s\S]+?Target Word/g)
My problem (as always with these) is the new lines. In order to match across multiple lines, I'm using [\s\S] but doing so makes the regex match everything from the first 5 digit number to the first occurrence of Target Word
How can I change this to achieve the desired result? I'm thinking I need to use lookbehind but most examples I've found have been very confusing for me.

You could use negative lookahead,
[0-9]{5}(?:(?![0-9]{5})[\S\s])*?Target\s*Word
DEMO
The above negative lookahead (?:(?![0-9]{5})[\S\s])* asserts that after the 5 digit number, match any space or non-space character zero or more times but it must not be a 5 digit number.

if there are no 5 digit pattern in the random words, you may perhaps use
/([\d]{5}(?:[^\d]{5})+?Target Word)/gm
demo here

Related

Regex finding second string

I'm attempting to get the last word in the following strings.
After about 45 minutes I can't seem to find the right combination of slashes, dashes and brackets.
The closest I've got is
/(?![survey])[a-z]+/gi
It matches the following strings, except for "required" it is returning the match "quired" I'm assuming it's because the re are in the word survey.
survey[1][title]
survey[1][required]
survey[2][anotherString]
You're using a character set, which will exclude any of the characters from being the first character in the match, which isn't what you want. Using plain negative lookahead would be a start:
(?!survey)[a-z]+
But you also want to match the final word, which can be done by matching word characters that are followed with \]$ - that is, by a ] and the end of the string:
[a-z]+(?=\]$)
https://regex101.com/r/rLvsY5/1
If you want to be more efficient, match the whole string, but capture what comes between the square brackets in a capturing group - the last repeated captured group will be in the result:
survey(?:\[(\w+)\])+
https://regex101.com/r/rLvsY5/2
One way to solve this is to match the full line and only capture the part you need.
survey\[\d+\]\[([a-z]+)\]

Regex to match a rather complex string

Any experts on Regex, that could potentially find a pattern on this data, im looking for one that will match exactly, down to spaces and commas and dashes. Here is the sample data of what i need to match:
word word, alphanumeric-PRT-word-number
word word, alphanumeric-PRT-number
-word: any size word
-alphanumeric: 3 letters and up to 2 numbers, so XXX# or XXX##
-number: up to 3 digits, so # or ## or ###
-PRT: is the only static value here
NOTE: no other punctuation other than the spaces, comma and dashes where they are.
So far have something close to it but rather clunky and it doesnt cover all bases, i built it here: http://buildregex.com/ using their logic and it kinda works:
/(?:[^_\ ]+)(?:\ )(?:[^_\ ]+), (?:[^_\ ]+)-PRT-(?:[^_\ ]*)/gi
If any can assist in refining this that will be welcome
https://regex101.com/r/8cc52u/2
Thanks a lot
Here's one way to do it:
/^[a-z]+\s[a-z]+,\s[a-z]{3}\d{1,2}-prt-([a-z]+-){0,1}\d{1,3}$/gi
^: start of line
[a-z]+: one or more letters
\s: any space character
[a-z]+: one or more letters
,: ,
\s: any space character
[a-z]{3}: three letters
\d{1,2}: one or two digits
-prt-: -prt-
([a-z]+-){0,1}: one or more letters followed by -, zero or one time
\d{1,3}: one, two or three digits
$: end of line
Example: https://regex101.com/r/BhS8kM/5
Or, as suggested by revo:
/^[a-z]+ [a-z]+, [a-z]{3}\d{1,2}-prt-([a-z]+-)?\d{1,3}$/gi
Example: https://regex101.com/r/BhS8kM/7

Specific Length Regular Expression With Padding

Goal: to make a generalized Regular Expression for a fixed length string inside a larger string. This string has a specified padding character, followed by an integer counter that increments. Ideally, there would be some way to say, "I want this group to be of length 10 and contain only one type of character followed by a different character."
I am trying to match this within variable data (could be numbers could be letters could be symbols):
The padding-characters + numbers add up to a specified length, here would be 5.
These are the allowed padding + number combinations.
$$$$1
$$$12
$$123
$1234
Here is an example:
<variable-data> <padding-characters> <numbers> <variable-data>
............... .................... ddddddddd ...............
(where periods are any characters and 'd' is any digit)
Example Data:
ABC $$$$ 1 $!#
Example Regex:
ABC\$*\d+\$!#
Match:
ABC$$$$1$!#
ABC$$$12$!#
ABC$$123$!#
ABC$1234$!#
ABC12345$!#
No Match:
ABC$$123456789$!#
ABC1$2$34$!#
Regex101
What I've Tried:
ABC(?=.{5})\$*\d+\$!#
This does not work because it still matches into the next digits because of \d+. Another thing I tried was
ABC(?=[\$\d]{5}[^\$\d])(\$*\d+)\$!#
Which aims to stop looking after it encounters a non-digit or non $, but that's not helpful since the next part of the string COULD start with a $ or a digit.
The easiest Regex to solve this:
(\$\$\$\$\d|\$\$\$\d\d|\$\$\d\d\d|\$\d\d\d\d|\d\d\d\d\d)
But I am trying to make this more generalized, and there can be a variable amount of padding E.G.
$$$$$$$$$1
$$$$$$$$12
...
You could look ahead to check that you don't have an inverted sequence of padding character and digit within the scope of the next 5 characters, and then require and capture 5 characters that are only digits and padding characters:
ABC(?!.{0,3}\d\$)([\$\d]{5})\$!#
If you need at least one digit, then:
ABC(?!.{0,3}\d\$)([\$\d]{4}\d)\$!#
ABC(?=.{5}\$!#)\$*\d+\$!#
This is very similar to your first attempt, but with the slight difference that the lookahead also contains the terminating string. This gives it something to anchor to, to make sure the regex doesn't match anything more.

How to find all words with x (and one or more) occurrences of a letter?

I have an answer to my second question right here:
To find words with one or more occurrences of the letter 'a' in it
var re = /(\w+a)/;
With regards to the above, how does it work? For example,
var re = /(\w+a)/g;
var str = "gamma";
console.log(re.exec(str));
Output:
[ 'gamma', 'gamma', index: 0, input: 'gamma' ]
However; these are not the results I expected (although it IS what I want). That is to say, re should have found patterns such that there were any number of occurrences of \w. Then the first occurrence of the letter 'a'. Then stop.
I.e. I expected: ga.
Then mma
Next, how do I look for words with a pre-defined number of occurrences (call it x) of the letter 'a'. Such that f(x)=gamma iff x=2.
Repetition in regex is greedy. That is it takes as much as possible. You happen to get the full word, because it ends in an a. To make it ungreedy, (stop at the first one), you'd use:
\w+?a
But to actually get the full word, I'd rather use
\w*a\w*
Note the *, otherwise you'll get problems with words that have an a only as the first or last letter.
To get words with exactly 2 a you need to exclude a from the repeated letters. This is best done with a negated character class, that disallows non-word characters and as. In addition you need to make sure, that you get full words. This is easily done with the word boundary \b:
\b[^\Wa]*a[^\Wa]*a[^\Wa]*\b
For more flexibility in terms of the number of repetitions, this can be rewritten as
\b[^\Wa]*(?:a[^\Wa]*){2}\b
Regular expressions are greedy by default. That means that if they can grab more characters they will. You need to consider greed when using quantifiers, like + and *.
To make a quantifier not greedy (lazy) suffix it with a ?.
/(\w+?a)/
You can use regex for something, such as
/\b\w*a\w*\b/ - find a word with at least 1 a (can match the word 'a')
/\b\w*(?:a\w*){2}\b/ - find a word with at least 2 as
But it gets tricky when the amount is exact, because you must change the \w to include all letters except a... works by the negated class, thus
/\b[^\Wa]*(?:a[^\Wa]*){2}\b/ - matches a word with exactly 2 as
To find the syllables or so until the "a" letter, then you can use
/\b(?:[^\Wa]*a)/ - matches ga alone and in gamma
/\b(?:[^\Wa]*a){1,4}/ - matches word having 1-4 a, ending in a.
The easiest way to achieve something like this is however is to match all words /\w+/, and filter them by Javascript.

RegEx in JS to find No 3 Identical consecutive characters

How to find a sequence of 3 characters, 'abb' is valid while 'abbb' is not valid, in JS using Regex (could be alphabets,numerics and non alpha numerics).
This question is a variation of the question that I have asked in here : How to combine these regex for javascript.
This is wrong : /(^([0-9a-zA-Z]|[^0-9a-zA-Z]))\1\1/ , so what is the right way to do it?
This depends on what you actually mean. If you only want to match three non-identical characters (that is, if abb is valid for you), you can use this negative lookahead:
(?!(.)\1\1).{3}
It first asserts, that the current position is not followed by three times the same character. Then it matches those three characters.
If you really want to match 3 different characters (only stuff like abc), it gets a bit more complicated. Use these two negative lookaheads instead:
(.)(?!\1)(.)(?!\1|\2).
First match one character. Then we assert, the this is not followed by the same character. If so, we match another character. Then we assert that these are followed neither by the first nor the second character. Then we match a third character.
Note that those negative lookaheads ((?!...)) do not consume any characters. That is why they are called lookaheads. They just check what is coming next (or in this case what is not coming next) and then the regex continues from where it left of. Here is a good tutorial.
Note also that this matches anything but line breaks, or really anything if you use the DOTALL or SINGLELINE option. Since you are using JavaScript you can just activate the option by appending s after the regexes closing delimiter. If (for some reason) you don't want to use this option, replace the .s by [\s\S] (this always matches any character).
Update:
After clarification in the comments, I realised that you do not want to find three non-identical characters, but instead you want to assert that your string does not contain three identical (and consecutive) characters.
This is a bit easier, and closer to your former question, since it only requires one negative lookahead. What we do is this: we search the string from the beginning for three consecutive identical characters. But since we want to assert that these do not exist we wrap this in a negative lookahead:
^(?!.*(.)\1\1)
The lookahead is anchored to the beginning of the string, so this is the only place where we will look. The pattern in the lookahead then tries to find three identical characters from any position in the string (because of the .*; the identical characters are matched in the same way as in your previous question). If the pattern finds these, the negative lookahead will thus fail, and so the string will be invalid. If not three identical characters can be found, the inner pattern will never match, so the negative lookahead will succeed.
To find non-three-identical characters use regex pattern
([\s\S])(?!\1\1)[\s\S]{2}

Categories

Resources