Break down a string with regex

Break down a string with regex - javascript

I have some example strings I need to process
string1 = "_Wondrous item, common (requires attunement by a wizard or cleric)_"
string2 = "_Weapon (glaive), rare (requires attunement)_"
string3 = "_Wondrous item, common_"
I want to break them down into the following
group1 = {
type: "Wonderous item";
rarity: "common";
attune: True
class: "wizard or cleric"
}
group2 = {
type: "Weapon (glaive)";
rarity: "rare";
attune : True
}
group3 = {
type: "Wondrous item"
rarity: "common"
attune: False
}
the regex that I have currently is messy and probably inefficient but it only breaks down the first one.
regex = /_(?<type>[^:]*),\s(?<rarity>[^:]*)\s\((?<attune>[^:]+)by a(?<class>[^:]*)\)_/U
added Details
This will be used when processing text documents one by one
The sting will occur once in each document
I am using this in Obsidian.MD with templater if anyone is curious
And yes this is to process captured D&D magic items captured from Reddit

To get all groups for the 3 lines using your pattern:
_(?<type>[^:]*?),\s+(?<rarity>[^:]*?)(?:\s+\((?<attune>[^:]+?)\s*(?:by\s+a\s+(?<class>[^:]*?))?\))?_
_(?<type>[^:]*?) Match _, group type matches any char except : non greedy
,\s Match , and a whitespace char
(?<rarity>[^:]*?) Group rarity matches any char except : non greedy
(?: Non capture group
\s\( Match a whitespace char and (
(?<attune>[^:]+?)\s* group attune matches any char except : non greedy
(?:by a\s+(?<class>[^:]*?))? Optionally match by a and group class which matches any char except : non greedy
\) Match )
)?_ Make the outer group optional and match _
See a regex demo.
Using the groups property if supported, you can check for the values and update the object accordingly.
const regex = /_(?<type>[^:]*?),\s+(?<rarity>[^:]*?)(?:\s+\((?<attune>[^:]+?)\s*(?:by\s+a\s+(?<class>[^:]*?))?\))?_/;
[
"_Wondrous item, common (requires attunement by a wizard or cleric)_",
"_Weapon (glaive), rare (requires attunement)_",
"_Wondrous item, common_"
].forEach(s => {
const m = s.match(regex);
if (m) {
if (m.groups.class === undefined) {
delete m.groups.class;
}
m.groups.attune = m.groups.attune === undefined ? false : true;
console.log(m.groups)
}
});
Note that in your pattern you want to prevent matching : in the negated character class but there is no : in the example data.
For the fist negated character class you can change that to not match the comma, and for the others exclude matching the parenthesis to get the same result.
That way not all quantifiers have to be non greedy and it can prevent some unnecessary backtracking.
_(?<type>[^,]*),\s(?<rarity>[^:()]*)(?:\s\((?<attune>[^()]+?)\s*(?:by a\s+(?<class>[^()]*))?\))?_
See another regex demo.

Related

regex for ignoring character if inside () parenthesis?

I was doing some regex, but I get this bug:
I have this string for example "+1/(1/10)+(1/30)+1/50" and I used this regex /\+.[^\+]*/g
and it working fine since it gives me ['+1/(1/10)', '+(1/30)', '+1/50']
BUT the real problem is when the + is inside the parenthesis ()
like this: "+1/(1+10)+(1/30)+1/50"
because it will give ['+1/(1', '+10)', '+(1/30)', '+1/50']
which isn't what I want :(... the thing I want is ['+1/(1+10)', '+(1/30)', '+1/50']
so the regex if it see \(.*\) skip it like it wasn't there...
how to ignore in regex?
my code (js):
const tests = {
correct: "1/(1/10)+(1/30)+1/50",
wrong : "1/(1+10)+(1/30)+1/50"
}
function getAdditionArray(string) {
const REGEX = /\+.[^\+]*/g; // change this to ignore the () even if they have the + sign
const firstChar = string[0];
if (firstChar !== "-") string = "+" + string;
return string.match(REGEX);
}
console.log(
getAdditionArray(test.correct),
getAdditionArray(test.wrong),
)

You can exclude matching parenthesis, and then optionally match (...)
\+[^+()]*(?:\([^()]*\))?
The pattern matches:
\+ Match a +
[^+()]* Match optional chars other than + ( )
(?: Non capture group to match as a whole part
\([^()]*\) Match from (...)
)? Close the non capture group and make it optional
See a regex101 demo.
Another option could be to be more specific about the digits and the + and / and use a character class to list the allowed characters.
\+(?:\d+[+/])?(?:\(\d+[/+]\d+\)|\d+)
See another regex101 demo.

How can I Regex filename with exactly 1 underscores in javascript?

I need to match if filenames have exactly 1 underscores. For example:
Prof. Leonel Messi_300001.pdf -> true
Christiano Ronaldo_200031.xlsx -> true
Eden Hazard_3322.pdf -> true
John Terry.pdf -> false
100023.xlsx -> false
300022_Fernando Torres.pdf -> false
So the sample : name_id.extnames
Note : name is string and id is number
I try like this : [a-zA-Z\d]+_[0-9\d]
Is my regex correct?

As the filename will be name_id.extension, as name string or space [a-z\s]+? then underscore _, then the id is a number [0-9]+?, then the dot, as dot is a special character you need to scape it with backslash \., then the extension name with [a-z]+
const checkFileName = (fileName) => {
const result = /[a-z\s]+?_\d+?\.[a-z]+/i.test(fileName);
console.log(result);
return result;
}
checkFileName('Prof. Leonel Messi_300001.pdf')
checkFileName('Christiano Ronaldo_200031.xlsx')
checkFileName('Eden Hazard_3322.pdf')
checkFileName('John Terry.pdf')
checkFileName('100023.xlsx')
checkFileName('300022_Fernando Torres.pdf')

[a-zA-Z]+_[0-9\d]+
or
[a-zA-Z]+_[\d]+

You should use ^...$ to match the line. Then just try to search a group before _ which doesn't have _, and the group after, without _.
^(?<before>[^_]*)_(?<after>[^_]*)\.\w+$
https://regex101.com/r/ZrA7B1/1

Regex
My try with separate groups for
name: Can contain anything. Last _ occurrence should be the end
id: Can contain only numbers. Last _ occurrence should be the start
ext: Before last .. Can only contain a-z and should be more than one character.
/^(?<name>.+)\_(?<id>\d+)\.(?<ext>[a-z]+)/g
Regex 101 Demo
JS
const fileName = "Lionel Messi_300001.pdf"
const r = /^(?<name>.+)\_(?<id>\d+)\.(?<ext>[a-z]+)/g
const fileNameMatch = r.test(fileName)
if (fileNameMatch) {
r.lastIndex = 0
console.log(r.exec(fileName).groups)
}
See CodePen

Regex match any characters in string, up until next match

I have this string: title: one description: two
and want to split it into groups like [title: one, description: two]
options.match(/(title|description):.+?/gi)
this was my attempt, but it only captures up to the : and 1 space after, it does not include the text after it, which I want to include all of, up until the second match.

Split on a lookahead for title or description:
const str = 'title: one description: two';
console.log(
str.split(/ (?=title|description)/)
);

You could also get the matches with a capture group and match the whitespace in between
(\b(?:title|description):.+?)\s*(?=\b(?:title|description):|$)
The pattern matches:
( Capture group 1
\b(?:title|description): Match either title: or description: and :
.+? Match 1+ times any char no greedy (lazy)
) Close group 1
\s* Match optional whitespace chars
(?= Positive lookahead, assert what is at the right is
\b(?:title|description):|$ Match either title: or description: or assert the end of the string for the last item
) Close lookahead
Regex demo
const regex = /(\b(?:title|description):.+?)\s*(?=\b(?:title|description):|$)/gi;
let s = "title: one description: two";
console.log(Array.from(s.matchAll(regex), m => m[1]));

var str = "title: one description: two";
/* split with positive lookbehinds . A space must precede by all but : */
var res=str.split(/(?<=[^:])\s/);
console.log(res);
/* match general rule */
var res=str.match(/([^:\s]+:\s[^:\s]+)/gi);
console.log(res);
/* match with spacific words */
var res=str.match(/((title|description)+:\s[^:\s]+)/gi);
console.log(res);

Match words that consist of specific characters, excluding between special brackets

I'm trying to match words that consist only of characters in this character class: [A-z'\\/%], excluding cases where:
they are between < and >
they are between [ and ]
they are between { and }
So, say I've got this funny string:
[beginning]<start>How's {the} /weather (\\today%?)[end]
I need to match the following strings:
[ "How's", "/weather", "\\today%" ]
I've tried using this pattern:
/[A-z'/\\%]*(?![^{]*})(?![^\[]*\])(?![^<]*>)/gm
But for some reason, it matches:
[ "[beginning]", "", "How's", "", "", "", "/weather", "", "", "\\today%", "", "", "[end]", "" ]
I'm not sure why my pattern allows stuff between [ and ], since I used (?![^\[]*\]), and a similar approach seems to work for not matching {these cases} and <these cases>. I'm also not sure why it matches all the empty strings.
Any wisdom? :)

There are essentially two problems with your pattern:
Never use A-z in a character class if you intend to match only letters (because it will match more than just letters1). Instead, use a-zA-Z (or A-Za-z).
Using the * quantifier after the character class will allow empty matches. Use the + quantifier instead.
So, the fixed pattern should be:
[A-Za-z'/\\%]+(?![^{]*})(?![^\[]*\])(?![^<]*>)
Demo.
1 The [A-z] character class means "match any character with an ASCII code between 65 and 122". The problem with that is that codes between 91 and 95 are not letters (and that's why the original pattern matches characters like '[' and ']').

Split it with regular expression:
let data = "[beginning]<start>How's {the} /weather (\\today%?)[end]";
let matches = data.split(/\s*(?:<[^>]+>|\[[^\]]+\]|\{[^\}]+\}|[()])\s*/);
console.log(matches.filter(v => "" !== v));

You can match all the cases that you don't want using an alternation and place the character class in a capturing group to capture what you want to keep.
The [^ is a negated character class that matches any character except what is specified.
(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)
Explanation
(?: Non capture group
\[[^\][]*] Match from opening till closing []
| Or
<[^<>]*> Match from opening till closing <>
| Or
{[^{}]*} Match from opening till closing {}
) Close non capture group
| Or
([A-Za-z'/\\%]+) Repeat the character class 1+ times to prevent empty matches and capture in group 1
Regex demo
const regex = /(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)/g;
const str = `[beginning]<start>How's {the} /weather (\\\\today%?)[end]`;
let m;
while ((m = regex.exec(str)) !== null) {
if (m[1] !== undefined) console.log(m[1]);
}

Javascript validation regex for names

I am looking to accept names in my app with letters and hyphens or dashes, i based my code on an answer i found here
and coded that:
function validName(n){
var nameRegex = /^[a-zA-Z\-]+$/;
if(n.match(nameRegex) == null){
return "Wrong";
}
else{
return "Right";
}
}
the only problem is that it accepts hyphen as the first letter (even multiple ones) which i don't want.
thanks

Use negative lookahead assertion to avoid matching the string starting with a hyphen. Although there is no need to escape - in the character class when provided at the end of character class. Use - removed character class for avoiding - at ending or use lookahead assertion.
var nameRegex = /^(?!-)[a-zA-Z-]*[a-zA-Z]$/;
// or
var nameRegex = /^(?!-)(?!.*-$)[a-zA-Z-]+$/;
var nameRegex = /^(?!-)[a-zA-Z-]*[a-zA-Z]$/;
// or
var nameRegex1 = /^(?!-)(?!.*-$)[a-zA-Z-]+$/;
function validName(n) {
if (n.match(nameRegex) == null) {
return "Wrong";
} else {
return "Right";
}
}
function validName1(n) {
if (n.match(nameRegex1) == null) {
return "Wrong";
} else {
return "Right";
}
}
console.log(validName('abc'));
console.log(validName('abc-'));
console.log(validName('-abc'));
console.log(validName('-abc-'));
console.log(validName('a-b-c'));
console.log(validName1('abc'));
console.log(validName1('abc-'));
console.log(validName1('-abc'));
console.log(validName1('-abc-'));
console.log(validName1('a-b-c'));
FYI : You can use RegExp#test method for searching regex match and which returns boolean based on regex match.
if(nameRegex.test(n)){
return "Right";
}
else{
return "Wrong";
}
UPDATE : If you want only single optional - in between words, then use a 0 or more repetitive group which starts with -as in #WiktorStribiżew answer .
var nameRegex = /^[a-zA-Z]+(?:-[a-zA-Z]+)*$/;

You need to decompose your single character class into 2 , moving the hyphen outside of it and use a grouping construct to match sequences of the hyphen + the alphanumerics:
var nameRegex = /^[a-zA-Z]+(?:-[a-zA-Z]+)*$/;
See the regex demo
This will match alphanumeric chars (1 or more) at the start of the string and then will match 0 or more occurrences of - + one or more alphanumeric chars up to the end of the string.
If there can be only 1 hyphen in the string, replace * at the end with ? (see the regex demo).
If you also want to allow whitespace between the alphanumeric chars, replace the - with [\s-] (demo).

You can either use a negative lookahead like Pranav C Balan propsed or just use this simple expression:
^[a-zA-Z]+[a-zA-Z-]*$
Live example: https://regex101.com/r/Dj0eTH/1

The below regex is useful for surnames if one wants to forbid leading or trailing non-alphabetic characters, while permitting a small set of common word-joining characters in between two names.
^[a-zA-Z]+[- ']{0,1}[a-zA-Z]+$
Explanation
^[a-zA-Z]+ must begin with at least one letter
[- ']{0,1} allow zero or at most one of any of -, or '
[a-zA-Z]+$ must end with at least one letter
Test cases
(The double-quotes have been added purely to illustrate the presence of whitespace.)
"Blair" => match
" Blair" => no match
"Blair " => no match
"-Blair" => no match
"- Blair" => no match
"Blair-" => no match
"Blair -" => no match
"Blair-Nangle" => match
"Blair--Nangle" => no match
"Blair Nangle" => match
"Blair -Nangle" => no match
"O'Nangle" => match
"BN" => match
"BN " => no match
" O'Nangle" => no match
"B" => no match
"3Blair" => no match
"!Blair" => no match
"van Nangle" => match
"Blair'" => no match
"'Blair" => no match
Limitations include:
No single-character surnames
No surnames composed of more than two words
Check it out on regex101.

Develop Reference

JavaScript is the programming language of the Web.

Break down a string with regex - javascript

Related

regex for ignoring character if inside () parenthesis?

How can I Regex filename with exactly 1 underscores in javascript?

Regex match any characters in string, up until next match

Match words that consist of specific characters, excluding between special brackets

Javascript validation regex for names

Categories

Resources