javascript regexp split including delimiter

javascript regexp split including delimiter - javascript

I want to split '9088{2}12{1}729' into [ "9088", "{2}12", "{1}729" ]
or even more useful to me: [ "9088", "2-12", "1-729" ]
tried:
'9088{2}12{1}729'.split(/\{[0-9]+\}/); => ["9088", "12", "729"]
also tried:
'9088{2}12{1}729'.match(/\{[0-9]+\}/); => ["{2}"]
I know it probably involved some other regexp string to split including delimiters.
Tried it in php, I guess you can do it in one line also.
preg_split( '/{/', preg_replace( '/}/', '-', "9088{2}12{1}729" ) )
Array ( [0] => 9088 [1] => 2-12 [2] => 1-729 )
Just have to wrap the replace function with split to get the preference order correct.
I think I like js more :)

even more useful to me: [ "9088", "2-12", "1-729" ]
It can be done using simple tricks!
"9088{2}12{1}729".replace(/\}/g,'-').split(/\{/g)
// ["9088", "2-12", "1-729"]

You can use a simple zero-width positive lookahead with /(?={)/:
'9088{2}12{1}729'.split(/(?=\{)/); // => ["9088","{2}12","{1}729"]
The "zero-width" part means that the actual matched text is the empty string so the split throws away nothing, and the lookahead means it matches just before the contained pattern, so /(?=\{)/ matches the empty strings between characters where indicated by an arrow:
9 0 8 8 { 2 } 1 2 { 1 } 7 2 9
↑ ↑
You can then use Array.prototype.map to convert from {1}2 form to 1-2 form.
'9088{2}12{1}729'.split(/(?=\{)/)
.map(function (x) { return x.replace('{', '').replace('}', '-'); });
yields
["9088","2-12","1-729"]

Related

Regex to accept only 5 numbers and then a dash or a letter on typescript

I am dealing with an issue with Regex.
I have a input which has maxLength 10.
I achieved till now to have the first given value can be digits for example 12345 but then it waits for a dash and after it you can write a letter or again a number maxLength=10 for example: 12345-a121 is allowed and it works with the currrent
But I want to be possible after the 5 digits to be allowed letters or dash because for the moment with this regex it is allowed only dash after 5 digits.
For example 12345a or 12345- to be allowed.
This is the actual regex what I am using.
Valid/Matches: 12345a235, 123a, 12345-aa1, 12345a, 12345-a.
Not Valid/Does not matches: -11, 111111, aaaa,
(?=^[^W_]{1,5}-[^W_]{1,8}$)^.{1,10}$|^[^W_]{1,5}$
I am debugging on the regex101.com but I am not finding a way for that to allow.
12345a for example
This is the condition to check if it matches or not.
if (!this.value.toString().match('^\d{1,5}(?!\d+)[-\p{L}\d]+$') && this.value.toString()) {
return ValidationInfo.errorCode("You need to do something");
Thank you for the help

Edit since the patterns of the first approach can be simplified and also were missing the limitations of the ending sequence's length.
for matching only with Letter unicode property escapes
/^\d{1,5}[-\p{L}][-\p{L}\d]{0,9}$/u
matching and capturing with Letter unicode property escapes
/^(?<digitsOnly>\p{N}{1,5})(?<miscellaneous>[-\p{L}][-\p{L}\p{N}]{0,9})$/u
Example code ...
const multilineSample = `12345a235
123a
12345-aa1
12345a
12345-a
12-a235dkfsf
12-a235dkfsfs
123a-dssava-y
123a-dssava-1a
12345-aa1--asd-
12345-aa1--asd-s
-11
111111
aaaa`;
// see ... [https://regex101.com/r/zPkcwv/3]
const regXJustMatch = /^\d{1,5}[-\p{L}][-\p{L}\d]{0,9}$/gmu;
// see ... [https://regex101.com/r/zPkcwv/4]
const regXNamedGroups =
/^(?<digitsOnly>\p{N}{1,5})(?<miscellaneous>[-\p{L}][-\p{L}\p{N}]{0,9})$/gmu;
console.log(
'matches only ...',
multilineSample.match(regXJustMatch)
);
console.log(
'matches and captures ...', [
...multilineSample.matchAll(regXNamedGroups)
]
.map(({ 0: match, groups }) => ({ match, ...groups }))
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
1st approach
straightforward and plain ... /^\d{1,5}(?!\d+)[-\p{L}\d]+$/u
with named capture groups ... /^(?<digitsOnly>\p{N}{1,5}(?!\p{N}+))(?<miscellaneous>[-\p{L}\p{N}]+)$/u
For both variants it is obvious to start with ...
a digit sequence of at least 1 and up to 5 digits ...
traditional ... ^\d{1,5}
unicode property escapes ... ^\p{N}{1,5}
It's also clear, one wants to end with a character sequence of any of dash and/or word. Due to having to exclude _ one can not just use the \w escape for letters and digits since \w covers/includes _ as well. But one could use unicode property escapes, thus ...
a regex covering the end of a line with a valid character class is ...
already mixed ... [-\p{L}\d]+$
mostly unicode escapes ... [-\p{L}\p{N}]+)$
A combined regex like ... /^\d{1,5}[-\p{L}\d]+$/u ... almost covers the requirements but fails for 111111 which of cause gets matched even though it shouldn't according to the requirements.
A negative lookahead ... (?!\d+) respectively (?!\p{N}+) ... which follows the starting digit sequence does prevent any other (terminating) digit-only sequence, thus 123456 does not get matched anymore.
Example code ...
const multilineSample = `12345a235
123a
12345-aa1
12345a
12345-a
-11
111111
aaaa`;
// see ... [https://regex101.com/r/zPkcwv/1]
const regXJustMatch = /^\d{1,5}(?!\d+)[-\p{L}\d]+$/gmu;
// see ... [https://regex101.com/r/zPkcwv/2]
const regXNamedGroups =
/^(?<digitsOnly>\p{N}{1,5}(?!\p{N}+))(?<miscellaneous>[-\p{L}\p{N}]+)$/gmu;
console.log(
'matches only ...',
multilineSample.match(regXJustMatch)
);
console.log(
'matches and captures ...', [
...multilineSample.matchAll(regXNamedGroups)
]
.map(({ 0: match, groups }) => ({ match, ...groups }))
);
.as-console-wrapper { min-height: 100%!important; top: 0; }

How do you access the groups of match/matchAll like an array?

Here's what I would like to be able to do:
function convertVersionToNumber(line) {
const groups = line.matchAll(/^# ([0-9]).([0-9][0-9]).([0-9][0-9])\s*/g);
return parseInt(groups[1] + groups[2] + groups[3]);
}
convertVersionToNumber("# 1.03.00")
This doesn't work because groups is an IterableIterator<RegExpMatchArray>, not an array. Array.from doesn't seem to turn it into an array of groups either. Is there an easy way (ideally something that can fit on a single line) that can convert groups into an array?
The API of that IterableIterator<RegExpMatchArray> is a little inconvenient, and I don't know how to skip the first element in a for...of. I mean, I do know how to use both of these, it just seems like it's going to add 4+ lines so I'd like to know if there is a more concise way.
I am using typescript, so if it has any syntactic sugar to do this, I'd be happy to use that.

1) matchAll will return an Iterator object Iterator [RegExp String Iterator]
result will contain an Iterator and when you use the spread operator It will give you all matches. Since it contains only one match so It contains a single element only.
[ '# 1.03.00', '1', '03', '00', index: 0, input: '# 1.03.00', groups: undefined ]
Finally, we used a spread operator to get all value and wrap it in an array
[...result]
function convertVersionToNumber(line) {
const result = line.matchAll(/^# ([0-9]).([0-9][0-9]).([0-9][0-9])\s*/g);
const groups = [...result][0];
return parseInt(groups[1] + groups[2] + groups[3]);
}
console.log(convertVersionToNumber("# 1.03.00"));
Since you are using regex i.e /^# ([0-9]).([0-9][0-9]).([0-9][0-9])\s*/
2) If there are multiple matches then yon can spread results in an array and then use for..of to loop over matches
function convertVersionToNumber(line) {
const iterator = line.matchAll(/# ([0-9]).([0-9][0-9]).([0-9][0-9])\s*/g);
const results = [...iterator];
for (let arr of results) {
const [match, g1, g2, g3] = arr;
console.log(match, g1, g2, g3);
}
}
convertVersionToNumber("# 1.03.00 # 1.03.00");
Alternate solution: You can also get the same result using simple match also
function convertVersionToNumber(line) {
const result = line.match(/\d/g);
return +result.join("");
}
console.log(convertVersionToNumber("# 1.03.00"));

You do not need .matchAll in this concrete case. You simply want to match a string in a specific format and re-format it by only keeping the three captured substrings.
You may do it with .replace:
function convertVersionToNumber(line) {
return parseInt(line.replace(/^# (\d)\.(\d{2})\.(\d{2})[\s\S]*/, '$1$2$3'));
}
console.log( convertVersionToNumber("# 1.03.00") );
You may check if the string before replacing is equal to the new string if you need to check if there was a match at all.
Note you need to escape dots to match them as literal chars.
The ^# (\d)\.(\d{2})\.(\d{2})[\s\S]* pattern matches
^ - start of string
# - space + #
(\d) - Group 1: a digit
\. - a dot
(\d{2}) - Group 2: two digits
\. - a dot
(\d{2}) - Group 3: two digits
[\s\S]* - the rest of the string (zero or more chars, as many as possible).
The $1$2$3 replacement pattern is the concatenated Group 1, 2 and 3 values.

Complex assignments with comma separator

I have a serie of string that will be pass to a function, and that function must return an array. The string is a serie of vars to be export on bash, and some of that vars may be a json. This is the possible list of string as example and the expected result:
string
return
desc
ONE=one
[ "ONE=one" ]
Array of one element
ONE="{}"
[ 'ONE="{}"' ]
Array of one element with quoted value.
ONE='{}'
[ "ONE='{}'" ]
Array of one element with simple quoted value
ONE='{attr: \"value\"}'
[ "ONE='{attr: \\"value\\"}'" ]
Array of one element
ONE='{attr1: \"value\", attr2:\"value attr 2\"}'
[ "ONE='{attr1: \\"value\\", attr2:\\"value attr 2\\"}'" ]
Array of one element and json inside with multiples values
ONE=one,TWO=two
[ "ONE=one", "TWO=two" ]
Array of two elements
ONE=one, TWO=two
[ "ONE=one", "TWO=two" ]
Array of two elements (Ignoring space after comma)
ONE='{}', TWO=two
[ "ONE='{}', TWO=two" ]
Array of two elements, one quoted
ONE='{}',TWO='{}',THREE='{}'
[ "ONE='{}'", "TWO='{}'", "THREE='{}'" ]
Array of three elements
ONE='{}', TWO=two, THREE=three
[ "ONE='{}',", "TWO=two", "THREE=three" ]
Array of three elements, one quoted
How can i get the correct regex or process to get the expected result on each one?
This is what i have:
function parseVars(envString) {
let matches = envArg.matchAll(/([A-Za-z][A-Za-z0-9]+=(["']?)((?:\\\2|(?:(?!\2)).)*)(\2))(\,\s?)?/g);
let ret = [];
for (const match of matches) {
ret.push(match[1].trim())
}
return ret;
}
And tests:
describe("parseVars function", () => {
it("should be one simple variable", () => {
expect(parseVars("ONE=one")).toMatchObject([
"ONE=one"
]);
});
it("should be two simple variable", () => {
expect(parseVars("ONE=one,TWO=two")).toMatchObject([
"ONE=one",
"TWO=two"
]);
});
it("should be two simple variable (Trim space)", () => {
expect(parseVars("ONE=one, TWO=two")).toMatchObject([
"ONE=one",
"TWO=two"
]);
});
it("should be simple json", () => {
expect(parseVars("ONE='{}'")).toMatchObject([
"ONE='{}'",
]);
});
it("should be three simple json", () => {
expect(parseVars("ONE='{}',TWO='{}',THREE='{}'")).toMatchObject([
"ONE='{}'",
"TWO='{}'",
"THREE='{}'",
]);
});
it("should be three simple json (Simple quote)", () => {
expect(parseVars("ONE='{}'")).toMatchObject([
"ONE='{}'",
]);
});
it("should be three simple json with attribute", () => {
expect(parseVars("ONE='{attr: \"value\"}'")).toMatchObject([
"ONE='{attr: \"value\"}'",
]);
});
it("should be complex json with multiple attributes", () => {
expect(parseVars("ONE='{attr1: \"value\", attr2:\"value attr 2\"}'")).toMatchObject([
"ONE='{attr1: \"value\", attr2:\"value attr 2\"}'",
]);
});
it("should be one json and one simple var", () => {
expect(parseVars("ONE='{}', TWO=two")).toMatchObject([
"ONE='{}'",
"TWO=two",
]);
});
it("should be one json and two simple vars", () => {
expect(parseVars("ONE='{}', TWO=two, THREE=three")).toMatchObject([
"ONE='{}'",
"TWO=two",
"THREE=three",
]);
});
});
And the results:
parseVars function
✕ should be one simple variable (4ms)
✕ should be two simple variable (1ms)
✕ should be two simple variable (Trim space)
✓ should be simple json (1ms)
✓ should be three simple json
✓ should be three simple json (Simple quote)
✓ should be three simple json with attribute
✓ should be complex json with multiple attributes
✕ should be one json and one simple var (1ms)
✕ should be one json and two simple vars (1ms)

The issue with your regex is you're only testing the quote enclosures like ONE='{attr: \"value\"}', but not allowing ONE=one.
When you use a capture group with an optional match (['"]?), if it doesn't match, the group still captures a zero-width character. When combine it with a negative lookahead (?!\2) it fails everything - any character has a zero-width character in front of it.
You just need to combine the quote enclosure test with |[^,]*, so it works for both scenarios.
Here's a simplified version of your concept:
/(?=\b[a-z])\w+=(?:(['"])(?:(?!\1).)*\1|[^,]*)/gi
Explanation
(?=\b[a-z])\w+ any word characters, but must start with an alphabetic character
= equal sign
(?: non-capturing group
(['"])(?:\\\1|(?!\1).)*\1 a quote enclosure
|[^,]* or any string that not made by comma
)
See the proof
const texts = [
`ONE=one`,
`ONE="{}"`,
`ONE='{}'`,
`ONE='{attr: \"value\"}'`,
`ONE='{attr1: \"value\", attr2:\"value attr 2\"}'`,
`ONE=one,TWO=two`,
`ONE=one, TWO=two`,
`ONE='{}', TWO=two`,
`ONE='{}',TWO='{}',THREE='{}'`,
`ONE='{}', TWO=two, THREE=three`
];
const regex = /(?=\b[a-z])\w+=(?:(['"])(?:\\\1|(?!\1).)*\1|[^,]*)/gi;
texts.forEach(text => {
console.log(text, '=>', text.match(regex));
})

You might also start the match with a char a-z followed by optional word chars. Then match either from an opening till closing " or ', or match all except a whitespace or comma without using lookarounds or capture groups.
Using a case insensitive match using /i
\b[a-z]\w*=(?:"[^"\\]*(?:\\.[^"\\]*)*"|\'[^\'\\]*(?:\\.[^\'\\]*)*\'|[^\s,]+)
The pattern matches:
\b A word boundary to prevent a partial match
[a-z]\w*= Match a char a-z, optional word chars and =
(?: Non capture group
"[^"\\]*(?:\\.[^"\\]*)*" Match from " till " not stopping at an escaped one
| Or
\'[^\'\\]*(?:\\.[^\'\\]*)*\' Match from ' till ' not stopping at an escaped one
| Or
[^\s,]+ Match 1+ times any char except a whitspace char or ,
) Close non capture group
See a Regex demo
const regex = /\b[a-z]\w*=(?:"[^"\\]*(?:\\.[^"\\]*)*"|\'[^\'\\]*(?:\\.[^\'\\]*)*\'|[^\s,]+)/gi;
[
`ONE=one`,
`ONE="{}"`,
`ONE='{}'`,
`ONE='{attr: \"value\"}'`,
`ONE="{attr: \"value\"}"`,
`ONE='{attr1: \"value\", attr2:\"value attr 2\"}'`,
`ONE=one,TWO=two`,
`ONE=one, TWO=two`,
`ONE='{}', TWO=two`,
`ONE='{}',TWO='{}',THREE='{}'`,
`ONE='{}', TWO=two, THREE=three`
].forEach(s => console.log(s.match(regex)))

regex exclude matches that don't meet one of two patterns separated by delimiter

In Javascript using string.match():
I have a string like: foo_2:asc,foo2:desc,foo3,foo4:wrong
the matches should look like ["foo_2:asc", "foo2:desc", "foo3"]
but instead the best I can get it to so far is a match returning ["foo_2:asc", "foo2:desc", "foo3", "wrong"]
the regex that I'm using currently for the above wrong match is: /([a-z0-9_]+?[:asc|:desc]*?)(?=,|$)/gi
I also need a regex that will return the opposite, i.e. find a match for all patterns between the delimiter that doesn't match the pattern rules of thing_1:asc, thing_1:desc, or thing_1 i.e. this would be used to validate the string, while the other would be used to gather the values (i.e. instead of splitting the string manually). So the result of the original would be ["foo4:wrong"] as the part of that string that doesn't meet the pattern.

Assuming that the only valid forms are words followed by one of :asc, :desc or nothing, you can do what you want by splitting the string, first on , and then on : and checking whether there are two values as a result of the last split and the second is not one of asc or desc:
const str = 'foo_2:asc,foo2:desc,foo3,foo4:wrong';
const errs = str.split(',').filter(v => v.split(':').length == 2 && ['asc', 'desc'].indexOf(v.split(':')[1]) == -1);
console.log(errs);
If you must use regex, you can split on , and then filter based on the value not matching ^\w+(:(asc|desc))$:
const str = 'foo_2:asc,foo2:desc,foo3,foo4:wrong';
const errs = str.split(',').filter(v => !v.match(/^\w+(:(?:asc|desc))?$/));
console.log(errs);
If the format of the string is guaranteed to be \w+(:\w+)?(,\w+(:\w+)?)* you can simplify to this:
const str = 'foo_2:asc,foo2:desc,foo3,foo4:wrong';
const errs = str.match(/\w+:(?!(?:asc|desc)\b)\w+/g);
console.log(errs);

If you'd like regex for this purpose, you probably can just add start from coma or string start.
/(^|\,)([a-z0-9_]+?(:asc|:desc)*?)(?=,|$)/gi
also pay attention [:asc|:desc] changed to (:asc|:desc), to avoid false positive cases like:
foo5:aaa,foo6:d,foo7:,foo8|,et:c
it just matches by any char in square brackets.
Regarding opposite, try something like:
/(^|\,)(?!([a-z0-9_]+?(:asc|:desc)*?)(?=,|$))[^,$]+/gi
seems to do the job.

For the match I came up with
/(?<=(^|,))((\w+(?!:)|\w+(:asc|:desc)))(?=($|,))/g
Example: https://regex101.com/r/QLJeDV/3/
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".match(/(?<=(^|,))((\w+(?!:)|\w+(:asc|:desc)))(?=($|,))/g)
[ 'foo_2:asc', 'foo2:desc', 'foo3' ]
Or even
/(?<=(^|,))\w+(:asc|:desc)?(?=($|,))/g
should work. Example: https://regex101.com/r/QLJeDV/6/
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".match(/(?<=(^|,))\w+(:asc|:desc)?(?=($|,))/g)
[ 'foo_2:asc', 'foo2:desc', 'foo3' ]
They are using lookahead and lookbehind.
For the "opposite", I don't know how to match something and then "negate" a later pattern, but only know how to negate the result of whether it is a complete match, so I had to split it. The "opposite":
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".split(",").filter(s => !/^((\w+(?!:)|\w+(:asc|:desc)))$/.test(s))
[ 'foo4:wrong' ]
and the "original":
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".split(",").filter(s => /^((\w+(?!:)|\w+(:asc|:desc)))$/.test(s))
[ 'foo_2:asc', 'foo2:desc', 'foo3' ]
Or it can be simplified as:
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".split(",").filter(s => !/^\w+(:asc|:desc)?$/.test(s))
[ 'foo4:wrong' ]
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".split(",").filter(s => /^\w+(:asc|:desc)?$/.test(s))
[ 'foo_2:asc', 'foo2:desc', 'foo3' ]

Splitting a string into multiple arrays based on length (javascript)

I have a string which can be of variable length, for this question I will keep it simply and assume a small subset of items in the list.
The objective is to split the string to create multiple string values where the length is greater than 11, however I would need to preserve the comma values (e.g. I can't just split every 11 characters, I must split at the last comma before the 11th character
test1,test2,test3,test4,test5
For arguments sake, lets propose the max length of the string can be 10 characters, so in this example the above would be converted to three separate strings:
test1,test2
test3,test4
test5
To clarify there is a maximum allowed character limit of 11 characters per split value, but we want to use these as efficiently as possible.

You can use ( when you want to treat as 10 as min length and want to continue upto to the next upcoming , or end of string )
(.{10,}?)(?:,|$)
const input = 'test1,test2,test3,test4,test5';
console.log(
input.split(/(.{10,}?)(?:,|$)/g).filter(Boolean)
);
Update:- Since you want the value in between a range you can use this
(.{1,22})(?:,|$)
Demo

I am not sure if you are looking for something like this. But this code gives the output according to your example :
// Try edit msg
var msg = 'test1,test2,test3,test4,test5'
msgs = msg.split(",")
final = []
str = ""
msgs.map( m => {
str += str == "" ? m : "," + m
if (str.length > 10){
final.push(str)
str = ""
}
})
final.push(str)
console.log(final)
OUTPUT:
[
"test1,test2" ,
"test3,test4" ,
"test5"
]

Use a regular expression to match a non-comma, followed by up to 8 characters, followed by another non-comma, and lookahead for either a comma or the end of the string:
const input = 'test1,test2,test3,test4,test5';
console.log(
input.match(/[^,].{0,8}[^,](?=,|$)/g)
);
Because in the given input, test1,test2 (and any other combination of multiple items) will be length 11 or more, they won't be included.
If you want to allow length 11 as well, then change {0,8} to {0,9}:
const input = 'test1,test2,test3,test4,test5';
console.log(
input.match(/[^,].{0,9}[^,](?=,|$)/g)
);
If there might be items of length 1, make everything matched after the first non-comma optional in a non-capturing group:
const input = 'test1,test2,test3,test4,t';
console.log(
input.match(/[^,](?:.{0,9}[^,])?(?=,|$)/g)
);

Develop Reference

JavaScript is the programming language of the Web.

javascript regexp split including delimiter - javascript

even more useful to me: [ "9088", "2-12", "1-729" ] It can be done using simple tricks! "9088{2}12{1}729".replace(/\}/g,'-').split(/\{/g) // ["9088", "2-12", "1-729"]

Related

Regex to accept only 5 numbers and then a dash or a letter on typescript

How do you access the groups of match/matchAll like an array?

Complex assignments with comma separator

regex exclude matches that don't meet one of two patterns separated by delimiter

Splitting a string into multiple arrays based on length (javascript)

Categories

Resources