Regex match any characters in string, up until next match - javascript

I have this string: title: one description: two
and want to split it into groups like [title: one, description: two]
options.match(/(title|description):.+?/gi)
this was my attempt, but it only captures up to the : and 1 space after, it does not include the text after it, which I want to include all of, up until the second match.

Split on a lookahead for title or description:
const str = 'title: one description: two';
console.log(
str.split(/ (?=title|description)/)
);

You could also get the matches with a capture group and match the whitespace in between
(\b(?:title|description):.+?)\s*(?=\b(?:title|description):|$)
The pattern matches:
( Capture group 1
\b(?:title|description): Match either title: or description: and :
.+? Match 1+ times any char no greedy (lazy)
) Close group 1
\s* Match optional whitespace chars
(?= Positive lookahead, assert what is at the right is
\b(?:title|description):|$ Match either title: or description: or assert the end of the string for the last item
) Close lookahead
Regex demo
const regex = /(\b(?:title|description):.+?)\s*(?=\b(?:title|description):|$)/gi;
let s = "title: one description: two";
console.log(Array.from(s.matchAll(regex), m => m[1]));

var str = "title: one description: two";
/* split with positive lookbehinds . A space must precede by all but : */
var res=str.split(/(?<=[^:])\s/);
console.log(res);
/* match general rule */
var res=str.match(/([^:\s]+:\s[^:\s]+)/gi);
console.log(res);
/* match with spacific words */
var res=str.match(/((title|description)+:\s[^:\s]+)/gi);
console.log(res);

Related

How do I replace the last character of the selected regex?

I want this string {Rotation:[45f,90f],lvl:10s} to turn into {Rotation:[45,90],lvl:10}.
I've tried this:
const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d)\w+/g
console.log(bar.replace(regex, '$&'.substring(0, -1)))
I've also tried to just select the letter at the end using $ but I can't seem to get it right.
You can use
bar.replace(/(\d+)[a-z]\b/gi, '$1')
See the regex demo.
Here,
(\d+) - captures one or more digits into Group 1
[a-z] - matches any letter
\b - at the word boundary, ie. at the end of the word
gi - all occurrences, case insensitive
The replacement is Group 1 value, $1.
See the JavaScript demo:
const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d+)[a-z]\b/gi
console.log(bar.replace(regex, '$1'))
Check this out :
const str = `{Rotation:[45f,90f],lvl:10s}`.split('');
const x = str.splice(str.length - 2, 1)
console.log(str.join(''));
You can use positive lookahead to match the closing brace, but not capture it. Then the single character can be replaced with a blank string.
const bar= '{Rotation:[45f,90f],lvl:10s}'
const regex = /.(?=})/g
console.log(bar.replace(regex, ''))
{Rotation:[45f,90f],lvl:10}
The following regex will match each group of one or more digits followed by f or s.
$1 represents the contents captured by the capture group (\d).
const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d+)[fs]/g
console.log(bar.replace(regex, '$1'))

Break down a string with regex

I have some example strings I need to process
string1 = "_Wondrous item, common (requires attunement by a wizard or cleric)_"
string2 = "_Weapon (glaive), rare (requires attunement)_"
string3 = "_Wondrous item, common_"
I want to break them down into the following
group1 = {
type: "Wonderous item";
rarity: "common";
attune: True
class: "wizard or cleric"
}
group2 = {
type: "Weapon (glaive)";
rarity: "rare";
attune : True
}
group3 = {
type: "Wondrous item"
rarity: "common"
attune: False
}
the regex that I have currently is messy and probably inefficient but it only breaks down the first one.
regex = /_(?<type>[^:]*),\s(?<rarity>[^:]*)\s\((?<attune>[^:]+)by a(?<class>[^:]*)\)_/U
added Details
This will be used when processing text documents one by one
The sting will occur once in each document
I am using this in Obsidian.MD with templater if anyone is curious
And yes this is to process captured D&D magic items captured from Reddit
To get all groups for the 3 lines using your pattern:
_(?<type>[^:]*?),\s+(?<rarity>[^:]*?)(?:\s+\((?<attune>[^:]+?)\s*(?:by\s+a\s+(?<class>[^:]*?))?\))?_
_(?<type>[^:]*?) Match _, group type matches any char except : non greedy
,\s Match , and a whitespace char
(?<rarity>[^:]*?) Group rarity matches any char except : non greedy
(?: Non capture group
\s\( Match a whitespace char and (
(?<attune>[^:]+?)\s* group attune matches any char except : non greedy
(?:by a\s+(?<class>[^:]*?))? Optionally match by a and group class which matches any char except : non greedy
\) Match )
)?_ Make the outer group optional and match _
See a regex demo.
Using the groups property if supported, you can check for the values and update the object accordingly.
const regex = /_(?<type>[^:]*?),\s+(?<rarity>[^:]*?)(?:\s+\((?<attune>[^:]+?)\s*(?:by\s+a\s+(?<class>[^:]*?))?\))?_/;
[
"_Wondrous item, common (requires attunement by a wizard or cleric)_",
"_Weapon (glaive), rare (requires attunement)_",
"_Wondrous item, common_"
].forEach(s => {
const m = s.match(regex);
if (m) {
if (m.groups.class === undefined) {
delete m.groups.class;
}
m.groups.attune = m.groups.attune === undefined ? false : true;
console.log(m.groups)
}
});
Note that in your pattern you want to prevent matching : in the negated character class but there is no : in the example data.
For the fist negated character class you can change that to not match the comma, and for the others exclude matching the parenthesis to get the same result.
That way not all quantifiers have to be non greedy and it can prevent some unnecessary backtracking.
_(?<type>[^,]*),\s(?<rarity>[^:()]*)(?:\s\((?<attune>[^()]+?)\s*(?:by a\s+(?<class>[^()]*))?\))?_
See another regex demo.

Is there a regex to remove everything after comma in a string except first letter

I am trying to remove all the characters from the string after comma except the first letter. The string is basically the last name,first name.
For example:
Smith,John
I tried as below but it removes comma and everything after comma.
let str = "Smith,John";
str = str.replace(/\s/g, ""); // to remove all whitespace if there is any at the beginning, in the middle and at the end
str = str.split(',')[0];
Expected output: Smith,J
Thank you!
Or try (,\w).* with replace:
let str = "Smith,John";
str = str.replace(/(,\w).*/, '$1');
console.log(str);
Try this regex out:
\w+,\w
This matches one or more characters before the comma and then matches only 1 character.
Here is the demo: https://regex101.com/r/bKpWt7/1
Note: \w matches any character from [a-zA-Z0-9_].
Taking optional spaces around the comma in to account, and perhaps multiple "names" before the comma:
*([^\s,][^,\n]*?) *, *([^\s,]).*
* Match optional spaces
( Capture group 1
*([^\s,] Match optional spaces and match at least a single char other than a whitespace char or a ,
[^,\n]*? Match any char except a , or a newline non greedy
) Close group 1
*, * Match a comma between optional spaces
([^\s,]) Capture group 2, match a single char other than , or a whitespace char
.* Match the rest of the line
Regex demo
In the replacement using group 1 and group 2 with a comma in between $1,$2
const regex = / *([^\s,][^,\n]*?) *, *([^\s,]).*/;
[
"Smith,John Jack",
"Smith Lastname , Jack John",
"Smith , John",
" ,Jack"
].forEach(s => console.log(s.replace(regex, "$1,$2")));

Setting the end of the match

I have the following string:
[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]
And I would like to get the value of the prefix of TITLE, which is a.
I have tried it with (?<=TITLE|)(?<=prefix=).*?(?=]|\|) and that seems to work but that gives me also the prefix of STORENAME (b). So if [TITLE|prefix=a] will be missing in the string, I'll have the wrong value.
So I need to set the end of the match with ] that belongs to [TITLE. Please notice that this string is dynamic. So it could be [TITLE|suffix=x|prefix=y] as well.
const regex = "[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]".match(/(?<=TITLE|)(?<=prefix=).*?(?=]|\|)/);
console.log(regex);
You can use
(?<=TITLE(?:\|suffix=[^\]|]+)?\|prefix=)[^\]|]+
See the regex demo. Details:
(?<=TITLE(?:\|suffix=[^\]|]+)?\|prefix=) - a location in string immediately preceded with TITLE|prefix| or TITLE|suffix=...|prefix|
[^\]|]+ - one or more chars other than ] and |.
See JavaScript demo:
const texts = ['[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]', '[TITLE|suffix=s|prefix=a]'];
for (let s of texts) {
console.log(s, '=>', s.match(/(?<=TITLE(?:\|suffix=[^\]|]+)?\|prefix=)[^\]|]+/)[0]);
}
You could also use a capturing group
\[TITLE\|(?:[^|=\]]*=[^|=\]]*\|)*prefix=([^|=\]]*)[^\]]*]
Explanation
\[TITLE\| Match [TITLE|
(?:\w+=\w+\|)* Repeat 0+ occurrences wordchars = wordchars and |
prefix= Match literally
(\w+) Capture group 1, match 1+ word chars
[^\]]* Match any char except ]
] Match the closing ]
Regex demo
const regex = /\[TITLE\|(?:\w+=\w+\|)*prefix=(\w+)[^\]]*\]/g;
const str = `[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]
[TITLE|suffix=x|prefix=y]`;
let m;
while ((m = regex.exec(str)) !== null) {
console.log(m[1]);
}
Or with a negated character class instead of \w
\[TITLE\|(?:[^|=\]]*=[^|=\]]*\|)*prefix=([^|=\]]*)[^\]]*]
Regex demo

Match words that consist of specific characters, excluding between special brackets

I'm trying to match words that consist only of characters in this character class: [A-z'\\/%], excluding cases where:
they are between < and >
they are between [ and ]
they are between { and }
So, say I've got this funny string:
[beginning]<start>How's {the} /weather (\\today%?)[end]
I need to match the following strings:
[ "How's", "/weather", "\\today%" ]
I've tried using this pattern:
/[A-z'/\\%]*(?![^{]*})(?![^\[]*\])(?![^<]*>)/gm
But for some reason, it matches:
[ "[beginning]", "", "How's", "", "", "", "/weather", "", "", "\\today%", "", "", "[end]", "" ]
I'm not sure why my pattern allows stuff between [ and ], since I used (?![^\[]*\]), and a similar approach seems to work for not matching {these cases} and <these cases>. I'm also not sure why it matches all the empty strings.
Any wisdom? :)
There are essentially two problems with your pattern:
Never use A-z in a character class if you intend to match only letters (because it will match more than just letters1). Instead, use a-zA-Z (or A-Za-z).
Using the * quantifier after the character class will allow empty matches. Use the + quantifier instead.
So, the fixed pattern should be:
[A-Za-z'/\\%]+(?![^{]*})(?![^\[]*\])(?![^<]*>)
Demo.
1 The [A-z] character class means "match any character with an ASCII code between 65 and 122". The problem with that is that codes between 91 and 95 are not letters (and that's why the original pattern matches characters like '[' and ']').
Split it with regular expression:
let data = "[beginning]<start>How's {the} /weather (\\today%?)[end]";
let matches = data.split(/\s*(?:<[^>]+>|\[[^\]]+\]|\{[^\}]+\}|[()])\s*/);
console.log(matches.filter(v => "" !== v));
You can match all the cases that you don't want using an alternation and place the character class in a capturing group to capture what you want to keep.
The [^ is a negated character class that matches any character except what is specified.
(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)
Explanation
(?: Non capture group
\[[^\][]*] Match from opening till closing []
| Or
<[^<>]*> Match from opening till closing <>
| Or
{[^{}]*} Match from opening till closing {}
) Close non capture group
| Or
([A-Za-z'/\\%]+) Repeat the character class 1+ times to prevent empty matches and capture in group 1
Regex demo
const regex = /(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)/g;
const str = `[beginning]<start>How's {the} /weather (\\\\today%?)[end]`;
let m;
while ((m = regex.exec(str)) !== null) {
if (m[1] !== undefined) console.log(m[1]);
}

Categories

Resources