Setting the end of the match - javascript

I have the following string:
[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]
And I would like to get the value of the prefix of TITLE, which is a.
I have tried it with (?<=TITLE|)(?<=prefix=).*?(?=]|\|) and that seems to work but that gives me also the prefix of STORENAME (b). So if [TITLE|prefix=a] will be missing in the string, I'll have the wrong value.
So I need to set the end of the match with ] that belongs to [TITLE. Please notice that this string is dynamic. So it could be [TITLE|suffix=x|prefix=y] as well.
const regex = "[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]".match(/(?<=TITLE|)(?<=prefix=).*?(?=]|\|)/);
console.log(regex);

You can use
(?<=TITLE(?:\|suffix=[^\]|]+)?\|prefix=)[^\]|]+
See the regex demo. Details:
(?<=TITLE(?:\|suffix=[^\]|]+)?\|prefix=) - a location in string immediately preceded with TITLE|prefix| or TITLE|suffix=...|prefix|
[^\]|]+ - one or more chars other than ] and |.
See JavaScript demo:
const texts = ['[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]', '[TITLE|suffix=s|prefix=a]'];
for (let s of texts) {
console.log(s, '=>', s.match(/(?<=TITLE(?:\|suffix=[^\]|]+)?\|prefix=)[^\]|]+/)[0]);
}

You could also use a capturing group
\[TITLE\|(?:[^|=\]]*=[^|=\]]*\|)*prefix=([^|=\]]*)[^\]]*]
Explanation
\[TITLE\| Match [TITLE|
(?:\w+=\w+\|)* Repeat 0+ occurrences wordchars = wordchars and |
prefix= Match literally
(\w+) Capture group 1, match 1+ word chars
[^\]]* Match any char except ]
] Match the closing ]
Regex demo
const regex = /\[TITLE\|(?:\w+=\w+\|)*prefix=(\w+)[^\]]*\]/g;
const str = `[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]
[TITLE|suffix=x|prefix=y]`;
let m;
while ((m = regex.exec(str)) !== null) {
console.log(m[1]);
}
Or with a negated character class instead of \w
\[TITLE\|(?:[^|=\]]*=[^|=\]]*\|)*prefix=([^|=\]]*)[^\]]*]
Regex demo

Related

How do I replace the last character of the selected regex?

I want this string {Rotation:[45f,90f],lvl:10s} to turn into {Rotation:[45,90],lvl:10}.
I've tried this:
const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d)\w+/g
console.log(bar.replace(regex, '$&'.substring(0, -1)))
I've also tried to just select the letter at the end using $ but I can't seem to get it right.
You can use
bar.replace(/(\d+)[a-z]\b/gi, '$1')
See the regex demo.
Here,
(\d+) - captures one or more digits into Group 1
[a-z] - matches any letter
\b - at the word boundary, ie. at the end of the word
gi - all occurrences, case insensitive
The replacement is Group 1 value, $1.
See the JavaScript demo:
const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d+)[a-z]\b/gi
console.log(bar.replace(regex, '$1'))
Check this out :
const str = `{Rotation:[45f,90f],lvl:10s}`.split('');
const x = str.splice(str.length - 2, 1)
console.log(str.join(''));
You can use positive lookahead to match the closing brace, but not capture it. Then the single character can be replaced with a blank string.
const bar= '{Rotation:[45f,90f],lvl:10s}'
const regex = /.(?=})/g
console.log(bar.replace(regex, ''))
{Rotation:[45f,90f],lvl:10}
The following regex will match each group of one or more digits followed by f or s.
$1 represents the contents captured by the capture group (\d).
const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d+)[fs]/g
console.log(bar.replace(regex, '$1'))

Multiple OR conditions for words in JavaScript regular expression

I trying to have a regular expression which is finding between two words but those words are not certain one.
2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞASınıfıE10/ENo303
This is my text. I'm trying to find the word between Soyadı and Sınıfı, in this case ERTANĞA, but the word Sınıfı also can be no, numara or any number. This is what I did.
soyad[ıi](.*)S[ıi]n[ıi]f[ıi]|no|numara|[0-9]
[ıi] is for Turkish character issue, don't mind that.
You can use something like below :
/.*Soyad(ı|i)|S(ı|i)n(ı|i)f(ı|i).*|no.*|numera.*|[0-9]/gmi
Here is the link I worked on : https://regex101.com/r/QXLjLF/1
In JS code:
const regex = /.*Soyad(ı|i)|S(ı|i)n(ı|i)f(ı|i).*|no.*|numera.*|[0-9]/gmi;
var str = `2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞASınıfıE10/ENo303`;
var newStr = str.replace(regex, '');
console.log(newStr);
You can use a single capture group to get the word ERTANĞA, keep the character class [ıi] instead of using an alternation for (ı|i) and group the alternatives at the end of the pattern using a non capture group (?:
soyad[ıi](.+?)(?:S[ıi]n[ıi]f[ıi]|n(?:o|umara)|[0-9])
soyad[ıi] Match soyadı or soyadi
(.+?) Capture group 1, match 1 or more chars as least as possible
(?: Non capture group
S[ıi]n[ıi]f[ıi] Match S and then ı or i etc..
| Or
n(?:o|umara) Match either no or numara
| Or
[0-9] Match a digit 0-9
) Close non capture group
Note that you don't need the /m flag as there are no anchors in the pattern.
Regex demo
const regex = /soyad[ıi](.+?)(?:S[ıi]n[ıi]f[ıi]|n(?:o|umara)|[0-9])/gi;
const str = "2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞASınıfıE10/ENo303\n";
console.log(Array.from(str.matchAll(regex), m => m[1]));
This might do it
const str = `2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞASınıfıE10/ENo303
2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞAnumaraE10/ENo303
2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞAnoE10/ENo303`
const re = /(?:Soyad(ı|i))(.*?)(?:S(ı|i)n(ı|i)f(ı|i)|no|numara)/gmi
console.log([...str.matchAll(re)].map(x => x[2]))
ES5
const str = `2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞASınıfıE10/ENo303
2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞAnumaraE10/ENo303
2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞAnoE10/ENo303`
const re = /(?:Soyad(ı|i))(.*?)(?:S(ı|i)n(ı|i)f(ı|i)|no|numara)/gmi
const res = []
let match;
while ((match = re.exec(str)) !== null) res.push(match[2])
console.log(res)

Reg Exp for finding hashtag words

I have the following sentence as a test:
This is a test with #shouldshow and to see if there #show
#yes this#shouldnotshow what is going on here
I have figured out most of the Reg Exp I need. Here's what I have so far: /((?<=#)([A-Z]*))/gi
This matches every tag but also matches the shouldnotshow portion. I want to not match words that are prefixed by anything but # (excluding whitespace & \n).
So the only matched words I should get are: shouldshow show yes.
Note: after #show is a newline
You just need to see if the hash is prefixed with whitespace or starts the string
https://regex101.com/r/JDuGvr/1
/(\s|^)#(\w+)/gm
with positive lookbehind as OP used
https://regex101.com/r/06X3ZX/1
/(?<=(\s|^)#)(\w+)/gm;
use [a-zA-Z0-9] if you do not want an underscore
const re1 = /(\s|^)#(\w+)/gm;
const re2 = /(?<=(\s|^)#)(\w+)/gm;
const str = `This is a test with #shouldshow and to see if there #show
#yes this#shouldnotshow what is going on here`;
const res1 = [...str.matchAll(re1)].map(match => match[2]); // here the match is the third item
console.log(res1)
const res2 = [...str.matchAll(re2)].map(match => match[0]); // match is the first item
console.log(res2)
Another option could be using your pattern asserting a # on the left that does not have a non whitespace char before it using (?<!\S)# and get the match only without capture groups.
Match at least 1+ times a char A-Z to prevent matching an empty string.
(?<=(?<!\S)#)[A-Z]+
Regex demo
const regex = /(?<=(?<!\S)#)[A-Z]+/gi;
const str = `This is a test with #shouldshow and to see if there #show
#yes this#shouldnotshow what is going on her`;
console.log(str.match(regex));

Trying to matchAll a regex on a JavaScript string

I'm trying to string.matchAll the following string:
const text = 'textA [aaa](bbb) textB [ccc](ddd) textC'
I want to match the following:
1st: "textA [aaa](bbb)"
2nd: " textB [ccc](ddd)"
3rd: " textC"
NOTE: The capturing groups are already present in the regex. That's what I need.
It's almost working, but so far I couldn't think of a way to match the last part of the string, which is just " textC", and doesn't have the [*](*) pattern.
What am I doing wrong?
const text = 'textA [aaa](bbb) textB [ccc](ddd) textC'
const regexp = /(.*?)\[(.+?)\]\((.+?)\)/g;
const array = Array.from(text.matchAll(regexp));
console.log(JSON.stringify(array[0][0]));
console.log(JSON.stringify(array[1][0]));
console.log(JSON.stringify(array[2][0]));
UPDATE:
Besides the good solutions provided in the answers below, this is also an option:
const text= 'textA [aaa](bbb) textB [ccc](ddd) textC'
const regexp = /(?!$)([^[]*)(?:\[(.*?)\]\((.*?)\))?/gm;
const array = Array.from(text.matchAll(regexp));
console.log(array);
It's because there is no third match. After the first two matches, the only thing left in the string is "text C":
https://regex101.com/r/H9Kn0G/1/
to fix this, make the whole second part optional (also note the initial \w instead of . to prevent that dot from eating the whole string, as well as the "grouping only" parens used to surround the optional part, which keeps your match groups the same):
(\w+)(?:\s\[(.+?)\]\((.+?)\))?
https://regex101.com/r/Smo1y1/2/
Solution 1: Splitting through matching
You may split by matching the pattern and getting substrings from the previous index up to the end of the match:
const text = 'textA [aaa](bbb) textB [ccc](ddd) textC'
const regexp = /\[[^\][]*\]\([^()]*\)/g;
let m, idx = 0, result=[];
while(m=regexp.exec(text)) {
result.push(text.substring(idx, m.index + m[0].length).trim());
idx = m.index + m[0].length;
}
if (idx < text.length) {
result.push(text.substring(idx, text.length).trim())
}
console.log(result);
Note:
\[[^\][]*\]\([^()]*\) matches [, any 0+ chars other than [ and ] (with [^\][]*), then ](, then 0+ chars other than ( and ) (with [^()]*) and then a ) (see the regex demo)
The capturing groups are removed, but you may restore them and save in the resulting array separately (or in another array) if needed
.trim() is added to get rid of the leading/trailing whitespace (remove if not necessary).
Solution 2: Matching optional pattern
The idea is to match any chars before the pattern you have and then match either your pattern or end of string:
let result = text.match(/(?!$)(.*?)(?:\[(.*?)\]\((.*?)\)|$)/g);
If the string can have line breaks, replace . with [\s\S], or consider this pattern:
let result = text.match(/(?!$)([\s\S]*?)(?:\[([^\][]*)\]\(([^()]*)\)|$)/g);
See the regex demo.
JS demo:
const text = 'textA [aaa](bbb) textB [ccc](ddd) textC'
const regexp = /(?!$)(.*?)(?:\[(.*?)\]\((.*?)\)|$)/g;
const array = Array.from(text.matchAll(regexp));
console.log(JSON.stringify(array[0][0]));
console.log(JSON.stringify(array[1][0]));
console.log(JSON.stringify(array[2][0]));
Regex details
(?!$) - not at the end of string
(.*?) - Group 1: any 0+ chars other than line break chars as few as possible (change to [\s\S]*? if there can be line breaks or add s modifier since you target ECMAScript 2018)
(?:\[(.*?)\]\((.*?)\)|$) - either of the two alternatives:
\[(.*?)\]\((.*?)\) - [, Group 2: any 0+ chars other than line break chars as few as possible, ](, Group 3: any 0+ chars other than line break chars as few as possible, and a )
| - or
$ - end of string.
That is what I've ended up using:
const text= 'textA [aaa](bbb) textB [ccc](ddd) textC'
const regexp = /(?!$)([^[]*)(?:\[(.*?)\]\((.*?)\))?/gm;
const array = Array.from(text.matchAll(regexp));
console.log(array);

Regex - ignoring text between quotes / HTML(5) attribute filtering

So I have this Regular expression, which basically has to filter the given string to a HTML(5) format list of attributes. It currently isn't doing my fulfilling, but that's about to change! (I hope so)
I'm trying to achieve that whenever an occurrence is found, it selects the text until the next occurrence OR the end of the string, as the second match. So if you'd take a look at the current regular expression:
/([a-zA-Z]+|[a-zA-Z]+-[a-zA-Z0-9]+)=["']/g
A string like this: hey="hey world" hey-heyhhhhh3123="Hello world" data-goed="hey"
Would be filtered / matched out like this:
MATCH 1. [0-3] `hey`
MATCH 2. [16-32] `hey-heyhhhhh3123`
MATCH 3. [47-56] `data-goed`
This has to be seen as the attribute-name(s), and now.. we just have to fetch the attribute's value(s). So the mentioned string has to have an outcome like this:
MATCH 1.
1 [0-3] `hey`
2 [6-14] `hey world`
MATCH 2.
1 [16-32] `hey-heyhhhhh3123`
2 [35-45] `Hello world`
MATCH 3.
1 [47-56] `data-goed`
2 [59-61] `hey`
Could anyone try and help me to get my fulfilling? It would be appericiated a lot!
You can use
/([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+))/g
See regex demo
Pattern details:
([^\s=]+) - Group 1 capturing 1 or more characters other than whitespace and = symbol
= - an equal sign
(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+)) - a non-capturing group of 2 alternatives (one more '([^'\\]*(?:\\.[^'\\]*)*)' alternative can be added to account for single quoted string literals)
"([^"\\]*(?:\\.[^"\\]*)*)" - a double quoted string literal pattern:
" - a double quote
([^"\\]*(?:\\.[^"\\]*)*) - Group 2 capturing 0+ characters other than \ and ", followed with 0+ sequences of any escaped symbol followed with 0+ characters other than \ and "
" - a closing dlouble quote
| - or
(\S+) - Group 3 capturing one or more non-whitespace characters
JS demo (no single quoted support):
var re = /([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+))/g;
var str = 'hey="hey world" hey-heyhhhhh3123="Hello \\"world\\"" data-goed="hey" more=here';
var res = [];
while ((m = re.exec(str)) !== null) {
if (m[3]) {
res.push([m[1], m[3]]);
} else {
res.push([m[1], m[2]]);
}
}
console.log(res);
JS demo (with single quoted literal support)
var re = /([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|'([^'\\]*(?:\\.[^'\\]*)*)'|(\S+))/g;
var str = 'pseudoprefix-before=\'hey1"\' data-hey="hey\'hey" more=data and="more \\"here\\""';
var res = [];
while ((m = re.exec(str)) !== null) {
if (m[2]) {
res.push([m[1], m[2]])
} else if (m[3]) {
res.push([m[1], m[3]])
} else if (m[4]) {
res.push([m[1], m[4]])
}
}
console.log(res);

Categories

Resources