Regex (regular expressions), replace the second occurence in javascript - javascript

This is an example of the string that's being worked with:
xxxxxx[xxxxxx][7][xxxxxx][9][xxxxxx]
I'm having a little trouble matching the second occurrence of a match, I want to return the 2nd square brackets with a number inside. I have some regex finding the first square backets with numbers in a string:
\[+[0-9]+\]
This returns [7], however I want to return [9].
I'm using Javascript's replace function, the following regex matches the second occurrence (the [9]) in regex testeing apps, however it isn't replaced correctly in the Javascript replace function:
(?:.*?(\[+[0-9]+\])){2}
My question is how do I use the above regex to replace the [9] in Javasctipt or is there another regex that matches the second occurrence of a number in square brackets.
Cheers!

If xxx is just any string, and not necessarily a number, then this might be what you want:
(\[[0-9]+\]\[.*?\])\[([0-9]+)\]
This looks for the second number in []. Replace it with $1[<replacement>]. Play with it on rubular.
Your regular expression fails to work as intended because groups followed by + only end up holding the last [xxx].

Try
result = subject.replace(/(\[\d\]\[[^\]]+\])\[\d\]/, "$1[replace]");
As a commented regex:
( # capture the following in backref 1:
\[\d\] # first occurrence of [digit]
\[ # [
[^\]]+ # any sequence of characters except ]
\] # ]
) # end of capturing group
\[\d\] # match the second occurence of [digit]
If the number of [xxx] groups between the first and second [digit] group is variable, then use
result = subject.replace(/(\[\d\](?:\[[^\]]+\])*?)\[\d\]/, "$1[replace]");
By surrounding the part that matches the [xxx] groups with (non-capturing) parentheses and the lazy quantifier *? I'm asking the regex engine to match as few of those groups as possible, but as many as necessary so the next group is a [digit] group.

console.log( "xxxxxx[xxxxxx][7][xxxxxx][9][xxxxxx]".replace(
/^(.*\[[0-9]+\].*)(\[[0-9]+\])(.*)$/,
'$1[15]$3')); // replace with matches before ($1) and after ($3) your match ($2)
returns:
// xxxxxx[xxxxxx][7][xxxxxx][15][xxxxxx]
It will match where [n] is preceeded by 1 set of brackets with numbers inside.

Related

REgex for non repeating alphabets comma seperated

I have a requirement where I need a regex which
should not repeat alphabet
should only contain alphabet and comma
should not start or end with comma
can contain more than 2 alphabets
example :-
A,B --- correct
A,B,C,D,E,F --- correct
D,D,A --- wrong
,B,C --- wrong
B,C, --- wrong
A,,B,C --- wrong
Can anyone help ?
Another idea with capturing and checking by use of a lookahead:
^(?:([A-Z])(?!.*?\1),?\b)+$
You can test here at regex101 if it meets your requirements.
If you don't want to match single characters, e.g. A, change the + quantifier to {2,}.
The statement of the question is incomplete in several respects. I have made the following assumptions:
Considering that D,D,A is incorrect I assume that a letter cannot be followed by a comma followed by the same letter.
The string may contain the same letter more than once as long as #1 is satisfied.
Considering that A,,B,C is incorrect I assume a comma cannot follow a comma.
Since the examples contain only capital letters I will assume that lower-case letters are not permitted (though one need only set the case-indifferent flag (i) to permit either case).
We observe that the requirements are satisfied if and only if the string begins with a capital letter and is followed by a sequence of comma-capital letter pairs, provided that no capital letter is followed by a comma followed by the same letter. We therefore can attempt to match the following regular expression.
^(?:([A-Z]),(?!\1))*[A-Z]$
Demo
The elements of the expression are as follows.
^ # match beginning of string
(?: # begin a non-capture group
([A-Z]) # match a capital letter and save to capture group 1
, # match a comma
(?!\1) # use negative lookahead to assert next character is not equal
# to the content of capture group 1
)* # end non-capture group and execute it zero or more times
[A-Z] # match a capital letter
$ # match end of string
Here is a big ugly regex solution:
var inputs = ['A,B', 'D,D,D', ',B,C', 'B,C,', 'A,,B'];
for (var i=0; i < inputs.length; ++i) {
if (/^(?!.*?([^,]+).*,\1(?:,|$))[^,]+(?:,[^,]+)*$/.test(inputs[i])) {
console.log(inputs[i] + " => VALID");
}
else {
console.log(inputs[i] + " => INVALID");
}
}
The regex has two parts to it. It uses a negative lookahead to assert that no two CSV entries ever repeat in the input. Then, it uses a straightforward pattern to match any proper CSV delimited input. Here is an explanation:
^ from the start of the input
(?!.*?([^,]+).*,\1(?:,|$)) assert that no CSV element ever repeats
[^,]+ then match a CSV element
(?:,[^,]+)* followed by comma and another element, 0 or more times
$ end of the input
This one could suit your needs:
^(?!,)(?!.*,,)(?!.*(\b[A-Z]+\b).*\1)[A-Z,]+(?<!,)$
^: the start of the string
(?!,): should not be directly followed by a comma
(?!.*,,): should not be followed by two commas
(?!.*(\b[A-Z]+\b).*\1): should not be followed by a value found twice
[A-Z,]+: should contain letters and commas only
$: the end of the string
(?<!,): should not be directly preceded by a comma
See https://regex101.com/r/1kGVSB/1

exclude full word with javascript regex word boundary

I'am looking to exclude matches that contain a specific word or phrase. For example, how could I match only lines 1 and 3? the \b word boundary does not work intuitively like I expected.
foo.js # match
foo_test.js # do not match
foo.ts # match
fun_tset.js # match
fun_tset_test.ts # do not match
UPDATE
What I want to exclude is strings ending explicitly with _test before the extension. At first I had something like [^_test], but that also excludes any combination of those characters (like line 3).
Regex: ^(?!.*_test\.).*$
Working examples: https://regex101.com/r/HdGom7/1
Why it works: uses negative lookahead to check if _test. exists somewhere in the string, and if so doesn't match it.
Adding to #pretzelhammer's answer, it looks like you want to grab strings that are file names ending in ts or js:
^(?!.*_test)(.*\.[jt]s)
The expression in the first parentheses is a negative lookahead that excludes any strings with _test, the second parentheses matches any strings that end in a period, followed by [jt] (j or t), followed by s.

Why isn't this group capturing all items that appear in parentheses?

I'm trying to create a regex that will capture a string not enclosed by parentheses in the first group, followed by any amount of strings enclosed by parentheses.
e.g.
2(3)(4)(5)
Should be: 2 - first group, 3 - second group, and so on.
What I came up with is this regex: (I'm using JavaScript)
([^()]*)(?:\((([^)]*))\))*
However, when I enter a string like A(B)(C)(D), I only get the A and D captured.
https://regex101.com/r/HQC0ib/1
Can anyone help me out on this, and possibly explain where the error is?
Since you cannot use a \G anchor in JS regex (to match consecutive matches), and there is no stack for each capturing group as in a .NET / PyPi regex libraries, you need to use a 2 step approach: 1) match the strings as whole streaks of text, and then 2) post-process to get the values required.
var s = "2(3)(4)(5) A(B)(C)(D)";
var rx = /[^()\s]+(?:\([^)]*\))*/g;
var res = [], m;
while(m=rx.exec(s)) {
res.push(m[0].split(/[()]+/).filter(Boolean));
}
console.log(res);
I added \s to the negated character class [^()] since I added the examples as a single string.
Pattern details
[^()\s]+ - 1 or more chars other than (, ) and whitespace
(?:\([^)]*\))* - 0 or more sequences of:
\( - a (
[^)]* - 0+ chars other than )
\) - a )
The splitting regex is [()]+ that matches 1 or more ) or ( chars, and filter(Boolean) removes empty items.
You cannot have an undetermined number of capture groups. The number of capture groups you get is determined by the regular expression, not by the input it parses. A capture group that occurs within another repetition will indeed only retain the last of those repetitions.
If you know the maximum number of repetitions you can encounter, then just repeat the pattern that many times, and make each of them optional with a ?. For instance, this will capture up to 4 items within parentheses:
([^()]*)(?:\(([^)]*)\))?(?:\(([^)]*)\))?(?:\(([^)]*)\))?(?:\(([^)]*)\))?
It's not an error. It's just that in regex when you repeat a capture group (...)* that only the last occurence will be put in the backreference.
For example:
On a string "a,b,c,d", if you match /(,[a-z])+/ then the back reference of capture group 1 (\1) will give ",d".
If you want it to return more, then you could surround it in another capture group.
--> With /((?:,[a-z])+)/ then \1 will give ",b,c,d".
To get those numbers between the parentheses you could also just try to match the word characters.
For example:
var str = "2(3)(14)(B)";
var matches = str.match(/\w+/g);
console.log(matches);

Why the return of the regex is false?

The code is showed as follows:
alert(/symbol([.\n]+?)symbol/gi.test('symbolbbbbsymbol'));
or
alert(/#([.\n]+?)#/gi.test('#bbbb#'));
Because you are looking for dots with a character class inside of < and >. Remove the character class:
/<(.+?)>/
Clarification after question edit:
First code block should be using this pattern: /symbol(.+?)symbol/
Second code block should be using this pattern: /#(.+?)#/
The regex returns false because a dot loses its special power to match any character (but newlines) when placed within a character class [] - it only matches a simple ".".
To match and capture the substring delimited at either end by the same single character, the most efficient pattern to use is
/#([^#]+)#/
To match and capture the substring delimited at either end by the same sequence of characters, the pattern to use is
/symbol(.+?)symbol/
or, if you want to match across newlines
/symbol([\s\S]+?)symbol/
where [\s\S] matches any space or non-space character, which equates to any character.
The ? is inlcuded to make the pattern match lazily, i.e. to make sure the match ends on the first occurence of "symbol".

How to match with an exact string using regular expression

I have small requirement.I want to search a string with exact match.
Suppose i want to search for None_1, i am searching for 'None_1' using /None_1/, but it is matching even "xxxNone" but my requirement is it should match only None_[any digit].
Here is my code
/^None_+[0-9]{?}/
So it should match only None_1 , None_2
You should also anchor the expression at the end of the line. But that alone will not make it work. Your expression is wrong. I think it should be:
/^None_[0-9]+$/
^ matches the beginning of a line
[0-9]+ matches one or more digits
None_ matches None_
$ matches the end of a line
If you only want to match one digit, remove the +.
Your original expression /^None_+[0-9]{?}/ worked like this:
^ matches the beginning of a line
None matches None
_+ matches one or more underscores
[0-9] matches one digit
{? matches an optional opening bracket {
} matches }
Try this:
/^None_+[0-9]{?}$/

Categories

Resources