How to extract content in between an opening and a closing bracket? - javascript

I am trying to split a string into an array of text contents which each are present within the [# and ] delimiters. Just characters in between [#and ] are allowed to match. Being provided with a string like ...
const stringA = '[#Mary James], [#Jennifer John] and [#Johnny Lever[#Patricia Robert] are present in the meeting and [#Jerry[#Jeffery Roger] is absent.'
... the following result is expected ...
[
'Mary James',
'Jennifer John',
'Patricia Robert',
'Jeffery Roger'
]
Any logic which leads to the expected outcome can be used.
A self search for a solution brought up the following applied regex ...
stringA.match(/(?<=\[#)[^\]]*(?=\])/g);
But the result doesn't fulfill the requirements because the array features the following items ...
[
'Mary James',
'Jennifer John',
'Johnny Lever[#Patricia Robert',
'Jerry[#Jeffery Roger'
]

The OP's regex does not feature the opening bracket within the negated character class, thus changing the OP's /(?<=\[#)[^\]]*(?=\])/g to (?<=\[#)[^\[\]]*(?=\]) already solves the OP's problem for most environments not including safari browsers due to the lookbehind which is not supported.
Solution based on a regex ... /\[#(?<content>[^\[\]]+)\]/g ... with a named capture group ...
const sampleText = '[#Mary James], [#Jennifer John] and [#Johnny Lever[#Patricia Robert] are present in the meeting and [#Jerry[#Jeffery Roger] is absent.'
// see ... [https://regex101.com/r/v234aT/1]
const regXCapture = /\[#(?<content>[^\[\]]+)\]/g;
console.log(
Array.from(
sampleText.matchAll(regXCapture)
).map(
({ groups: { content } }) => content
)
);

Close, just missing the exclusion of [:
stringA.match(/(?<=\[#)[^\[\]]*(?=\])/g);
// ^^ exclude '[' as well as ']'

Related

Regex solution for matching groups does not work

Imagine a text like in this example:
some unimportant content
some unimportant content [["string1",1,2,5,"string2"]] some unimportant content
some unimportant content
I need a REGEX pattern which will match the parts in [[ ]] and I need to match each part individually separated by commas.
I already tried
const regex = /\[\[(([^,]*),?)*\]\]/g
const found = result.match(regex)
but it doesn't work as expected. It matches only the full string and have no group matches. Also it has a catastrophic backtracking according to regex101.com if the sample text is larger.
Output should be a JS array ["string1", 1, 2, 5, "string2"]
Thank you for your suggestions.
What about going with a simple pattern like /\[\[(.*)\]\]/g and then you'd just have to split the result (and apparently strip those extra quotation marks):
const result = `some unimportant content
some unimportant content [["string1",1,2,5,"string2"]] some unimportant content
some unimportant content`;
// const found = /\[\[(.*)\]\]/g.exec(result);
const found = /\[\[(.*?)\]\]/g.exec(result); // As suggested by MikeM
const arr_from_found = found[1].replace(/\"/g, '').split(',');
console.log(arr_from_found); // [ 'string1', '1', '2', '5', 'string2' ]
Try replace method.
let cleantext = result.replace("[", "")
then
let more_cleantext = cleantext.replace("]", "")
but if your result variable is array then just
result[0]

Regex to accept only 5 numbers and then a dash or a letter on typescript

I am dealing with an issue with Regex.
I have a input which has maxLength 10.
I achieved till now to have the first given value can be digits for example 12345 but then it waits for a dash and after it you can write a letter or again a number maxLength=10 for example: 12345-a121 is allowed and it works with the currrent
But I want to be possible after the 5 digits to be allowed letters or dash because for the moment with this regex it is allowed only dash after 5 digits.
For example 12345a or 12345- to be allowed.
This is the actual regex what I am using.
Valid/Matches: 12345a235, 123a, 12345-aa1, 12345a, 12345-a.
Not Valid/Does not matches: -11, 111111, aaaa,
(?=^[^W_]{1,5}-[^W_]{1,8}$)^.{1,10}$|^[^W_]{1,5}$
I am debugging on the regex101.com but I am not finding a way for that to allow.
12345a for example
This is the condition to check if it matches or not.
if (!this.value.toString().match('^\d{1,5}(?!\d+)[-\p{L}\d]+$') && this.value.toString()) {
return ValidationInfo.errorCode("You need to do something");
Thank you for the help
Edit since the patterns of the first approach can be simplified and also were missing the limitations of the ending sequence's length.
for matching only with Letter unicode property escapes
/^\d{1,5}[-\p{L}][-\p{L}\d]{0,9}$/u
matching and capturing with Letter unicode property escapes
/^(?<digitsOnly>\p{N}{1,5})(?<miscellaneous>[-\p{L}][-\p{L}\p{N}]{0,9})$/u
Example code ...
const multilineSample = `12345a235
123a
12345-aa1
12345a
12345-a
12-a235dkfsf
12-a235dkfsfs
123a-dssava-y
123a-dssava-1a
12345-aa1--asd-
12345-aa1--asd-s
-11
111111
aaaa`;
// see ... [https://regex101.com/r/zPkcwv/3]
const regXJustMatch = /^\d{1,5}[-\p{L}][-\p{L}\d]{0,9}$/gmu;
// see ... [https://regex101.com/r/zPkcwv/4]
const regXNamedGroups =
/^(?<digitsOnly>\p{N}{1,5})(?<miscellaneous>[-\p{L}][-\p{L}\p{N}]{0,9})$/gmu;
console.log(
'matches only ...',
multilineSample.match(regXJustMatch)
);
console.log(
'matches and captures ...', [
...multilineSample.matchAll(regXNamedGroups)
]
.map(({ 0: match, groups }) => ({ match, ...groups }))
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
1st approach
straightforward and plain ... /^\d{1,5}(?!\d+)[-\p{L}\d]+$/u
with named capture groups ... /^(?<digitsOnly>\p{N}{1,5}(?!\p{N}+))(?<miscellaneous>[-\p{L}\p{N}]+)$/u
For both variants it is obvious to start with ...
a digit sequence of at least 1 and up to 5 digits ...
traditional ... ^\d{1,5}
unicode property escapes ... ^\p{N}{1,5}
It's also clear, one wants to end with a character sequence of any of dash and/or word. Due to having to exclude _ one can not just use the \w escape for letters and digits since \w covers/includes _ as well. But one could use unicode property escapes, thus ...
a regex covering the end of a line with a valid character class is ...
already mixed ... [-\p{L}\d]+$
mostly unicode escapes ... [-\p{L}\p{N}]+)$
A combined regex like ... /^\d{1,5}[-\p{L}\d]+$/u ... almost covers the requirements but fails for 111111 which of cause gets matched even though it shouldn't according to the requirements.
A negative lookahead ... (?!\d+) respectively (?!\p{N}+) ... which follows the starting digit sequence does prevent any other (terminating) digit-only sequence, thus 123456 does not get matched anymore.
Example code ...
const multilineSample = `12345a235
123a
12345-aa1
12345a
12345-a
-11
111111
aaaa`;
// see ... [https://regex101.com/r/zPkcwv/1]
const regXJustMatch = /^\d{1,5}(?!\d+)[-\p{L}\d]+$/gmu;
// see ... [https://regex101.com/r/zPkcwv/2]
const regXNamedGroups =
/^(?<digitsOnly>\p{N}{1,5}(?!\p{N}+))(?<miscellaneous>[-\p{L}\p{N}]+)$/gmu;
console.log(
'matches only ...',
multilineSample.match(regXJustMatch)
);
console.log(
'matches and captures ...', [
...multilineSample.matchAll(regXNamedGroups)
]
.map(({ 0: match, groups }) => ({ match, ...groups }))
);
.as-console-wrapper { min-height: 100%!important; top: 0; }

regex exclude matches that don't meet one of two patterns separated by delimiter

In Javascript using string.match():
I have a string like: foo_2:asc,foo2:desc,foo3,foo4:wrong
the matches should look like ["foo_2:asc", "foo2:desc", "foo3"]
but instead the best I can get it to so far is a match returning ["foo_2:asc", "foo2:desc", "foo3", "wrong"]
the regex that I'm using currently for the above wrong match is: /([a-z0-9_]+?[:asc|:desc]*?)(?=,|$)/gi
I also need a regex that will return the opposite, i.e. find a match for all patterns between the delimiter that doesn't match the pattern rules of thing_1:asc, thing_1:desc, or thing_1 i.e. this would be used to validate the string, while the other would be used to gather the values (i.e. instead of splitting the string manually). So the result of the original would be ["foo4:wrong"] as the part of that string that doesn't meet the pattern.
Assuming that the only valid forms are words followed by one of :asc, :desc or nothing, you can do what you want by splitting the string, first on , and then on : and checking whether there are two values as a result of the last split and the second is not one of asc or desc:
const str = 'foo_2:asc,foo2:desc,foo3,foo4:wrong';
const errs = str.split(',').filter(v => v.split(':').length == 2 && ['asc', 'desc'].indexOf(v.split(':')[1]) == -1);
console.log(errs);
If you must use regex, you can split on , and then filter based on the value not matching ^\w+(:(asc|desc))$:
const str = 'foo_2:asc,foo2:desc,foo3,foo4:wrong';
const errs = str.split(',').filter(v => !v.match(/^\w+(:(?:asc|desc))?$/));
console.log(errs);
If the format of the string is guaranteed to be \w+(:\w+)?(,\w+(:\w+)?)* you can simplify to this:
const str = 'foo_2:asc,foo2:desc,foo3,foo4:wrong';
const errs = str.match(/\w+:(?!(?:asc|desc)\b)\w+/g);
console.log(errs);
If you'd like regex for this purpose, you probably can just add start from coma or string start.
/(^|\,)([a-z0-9_]+?(:asc|:desc)*?)(?=,|$)/gi
also pay attention [:asc|:desc] changed to (:asc|:desc), to avoid false positive cases like:
foo5:aaa,foo6:d,foo7:,foo8|,et:c
it just matches by any char in square brackets.
Regarding opposite, try something like:
/(^|\,)(?!([a-z0-9_]+?(:asc|:desc)*?)(?=,|$))[^,$]+/gi
seems to do the job.
For the match I came up with
/(?<=(^|,))((\w+(?!:)|\w+(:asc|:desc)))(?=($|,))/g
Example: https://regex101.com/r/QLJeDV/3/
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".match(/(?<=(^|,))((\w+(?!:)|\w+(:asc|:desc)))(?=($|,))/g)
[ 'foo_2:asc', 'foo2:desc', 'foo3' ]
Or even
/(?<=(^|,))\w+(:asc|:desc)?(?=($|,))/g
should work. Example: https://regex101.com/r/QLJeDV/6/
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".match(/(?<=(^|,))\w+(:asc|:desc)?(?=($|,))/g)
[ 'foo_2:asc', 'foo2:desc', 'foo3' ]
They are using lookahead and lookbehind.
For the "opposite", I don't know how to match something and then "negate" a later pattern, but only know how to negate the result of whether it is a complete match, so I had to split it. The "opposite":
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".split(",").filter(s => !/^((\w+(?!:)|\w+(:asc|:desc)))$/.test(s))
[ 'foo4:wrong' ]
and the "original":
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".split(",").filter(s => /^((\w+(?!:)|\w+(:asc|:desc)))$/.test(s))
[ 'foo_2:asc', 'foo2:desc', 'foo3' ]
Or it can be simplified as:
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".split(",").filter(s => !/^\w+(:asc|:desc)?$/.test(s))
[ 'foo4:wrong' ]
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".split(",").filter(s => /^\w+(:asc|:desc)?$/.test(s))
[ 'foo_2:asc', 'foo2:desc', 'foo3' ]

Regular expression to match environment

I'm using JavaScript and I'm looking for a regex to match the placeholder "environment", which will be a different value like "production" or "development" in "real" strings.
The regex should match "environment" in both strings:
https://company-application-environment.company.local
https://application-environment.company.local
I have tried:
[^-]+$ which matches environment.company.local
\.[^-]+$ which matches .company.local
How do I get environment?
You may use this regex based on a positive lookahead:
/[^.-]+(?=\.[^-]+$)/
Details:
[^.-]+: Match 1+ of any char that is not - and .
(?=\.[^-]+$): Lookahead to assert that we have a dot and 1+ of non-hyphen characters till end.
RegEx Demo
Code:
const urls = [
"https://company-application-environment.company.local",
"https://application-environment.company.local",
"https://application-production.any.thing",
"https://foo-bar-baz-development.any.thing"
]
const regex = /[^.-]+(?=\.[^-]+$)/;
urls.forEach(url =>
console.log(url.match(regex)[0])
)
Not the fanciest reg exp, but gets the job done.
const urls = [
"https://company-application-environment.company.local",
"https://application-environment.company.local",
"https://a-b-c-d-e-f.foo.bar"
]
urls.forEach(url =>
console.log(url.match(/-([^-.]+)\./)[1])
)
As an alternative you might use URL, split on - and get the last item from the array. Then split on a dot and get the first item.
[
"https://company-application-environment.company.local",
"https://application-environment.company.local"
].forEach(s => {
let env = new URL(s).host.split('-').pop().split('.')[0];
console.log(env);
})
Match for known environments
var tests = [
'https://company-application-development.company.local',
'https://application-production.company.local',
'https://appdev.company.local',
'https://appprod.company.local'
];
tests.forEach(test => {
var pattern = /(development|dev|production|prod)/g;
var match = test.match(pattern);
console.log(`environment = ${match}`);
});
In this case, the best way to match is to literally use the word you are looking for.
And if you need to match multiple values in the environment position, use the RegEx or format. See the MDN.
(production|development)

Multiple conditions regex JavaScript split

I'd like to split a string if:
It doesn't starts with a quatre or a soixante AND
It doesn't ends with a dix or a vingt
For example:
'deux-cent-quatre-vingt-trois'.split(/**/);
> ['deux', 'cent', 'quatre-vingt', 'trois' ]
I've had a few tries and failures, for example:
'deux-cent-quatre-vingt-dix-trois'
.split(/^(?![quatre|soixante]-[dix|vingt])(\w*)-(\w*)/);
> [ '', 'deux', 'cent', '-quatre-vingt-trois' ]
or:
'deux-cent-quatre-vingt-dix-trois'.split(/(?!quatre|soixante)-(?!vingt|dix)/);
> [ 'deux' 'cent', 'quatre-vingt', 'trois' ]
which works, but this does not:
'cent-vingt'.split(/(?!quatre|soixante)-(?!vingt|dix)/);
> [ 'cent-vingt' ]
I know using a matcher or a find would be so easy, but it would be great to do it in a single split...
You can do it like this:
var text = "deux-cent-quatre-vingt-trois";
console.log(text.split(/(?:^|-)(quatre-vingt(?:-dix|s$)?|soixante-dix|[^-]+)/));
The idea is to add a capturing group whose content is added to the split list.
The capturing group contains at first particular cases and after the most general, described with [^-]+ (all that is not a -)
Notice: since quatre-vingt is written with a s when it is not followed by a number, i added s$ as a possibility.

Categories

Resources