JS regexp: match repeated pattern

JS regexp: match repeated pattern - javascript

I wonder why these regexps aren't equivalent:
/(a)(a)(a)/.exec ("aaa").toString () => "aaa,a,a,a" , as expected
/(a){3}/.exec ("aaa").toString () => "aaa,a" :(
/(a)*/.exec ("aaa").toString () => "aaa,a" :(
How must the last two be reformulated so that they behave like the first? The important thing is that I want arbitrary multiples matched and remembered.
The following line
/([abc])*/.exec ("abc").toString () => "abc,c"
suggests that only one character is saved per parenthesis - the last match.

You probably are looking for this:
var re = /([abc])/g,
matches = [],
input = "abc";
while (match = re.exec(input)) matches.push(match[1]);
console.log(matches);
//=> ["a", "b", "c"]
Remember that any matching group will give you last matched pattern not all of them.

RegExBuddy describes it very well:
Note: you repeated the capturing group itself. The group will capture
only the last iteration

Related

How to add access property and wrap the arguments with curly brackets with regex in js?

I have an array of strings. Each string might contain function calls.
I made this regular expression to match the word between $ctrl. and (, as long as it is immediately after $ctrl. and it will also match the parameters inside the parentheses if they exist.
/(\$ctrl\.)(\w+)(\(([^)]+)\))?/g;
Once there is a match then I add emit before the function call ($ctrl.foo() to $ctrl.foo.emit()), and if there are brackets then I wrap them with curly brackets: $ctrl.foo(account, user) to $ctrl.foo.emit({ account, user })).
The problem is this regex doesn't work for some cases.
const inputs = [
'$ctrl.foo(account)',
'$ctrl.foo(account, bla)',
'$ctrl.foo(account, bla); $ctrl.some()',
'$ctrl.foo(account, bla);$ctrl.some(you)',
'$ctrl.foo.some(account, bla);$ctrl.fn.some(you)',
'$ctrl.gog',
];
const regex = /(\$ctrl\.)(\w+)(\(([^)]+)\))?/g;
inputs.forEach((input) => {
let output = input.replace(regex, '$1$2.emit({$4})');
console.log(output);
});
The results:
$ctrl.foo.emit({account})
$ctrl.foo.emit({account, bla})
$ctrl.foo.emit({account, bla}); $ctrl.some.emit({})()
$ctrl.foo.emit({account, bla});$ctrl.some.emit({you})
$ctrl.foo.emit({}).some(account, bla);$ctrl.fn.emit({}).some(you)
$ctrl.gog.emit({})
The first and two results are excellent. The regex adds emit and wraps the arguments with {..}.
But the regex is not working if I don't have arguments or if I have another access property before the function call: $ctrl.foo.bar() (should not match this case).
What is missing in my regex to get those results?
$ctrl.foo.emit({account})
$ctrl.foo.emit({account, bla})
$ctrl.foo.emit({account, bla}); $ctrl.some.emit()
$ctrl.foo.emit({account, bla});$ctrl.some.emit({you})
$ctrl.foo.some(account, bla);$ctrl.fn.some(you)
$ctrl.gog

Maybe this modified regexp works better for you?
const inputs = [
'$ctrl.foo(account)',
'$ctrl.foo(account, bla)',
'$ctrl.foo(account, bla); $ctrl.some()',
'$ctrl.foo(account, bla);$ctrl.some(you)',
'$ctrl.foo.some(account, bla);$ctrl.fn.some(you)',
'$ctrl.gog',
];
const regex = /(\$ctrl\.)(\w+)(\(([^)]*)\))/g;
inputs.forEach((input) => {
let output = input.replace(regex, '$1$2.emit({$4})');
console.log(output);
});
I changed two quantifies in the regexp:
[^)]+ to [^)]* this allows also zero-length matches
)?/g to )/g this makes the existence of the \( ... \)-group at the end of the pattern no longer optional but compulsory.

Regular expression to match environment

I'm using JavaScript and I'm looking for a regex to match the placeholder "environment", which will be a different value like "production" or "development" in "real" strings.
The regex should match "environment" in both strings:
https://company-application-environment.company.local
https://application-environment.company.local
I have tried:
[^-]+$ which matches environment.company.local
\.[^-]+$ which matches .company.local
How do I get environment?

You may use this regex based on a positive lookahead:
/[^.-]+(?=\.[^-]+$)/
Details:
[^.-]+: Match 1+ of any char that is not - and .
(?=\.[^-]+$): Lookahead to assert that we have a dot and 1+ of non-hyphen characters till end.
RegEx Demo
Code:
const urls = [
"https://company-application-environment.company.local",
"https://application-environment.company.local",
"https://application-production.any.thing",
"https://foo-bar-baz-development.any.thing"
]
const regex = /[^.-]+(?=\.[^-]+$)/;
urls.forEach(url =>
console.log(url.match(regex)[0])
)

Not the fanciest reg exp, but gets the job done.
const urls = [
"https://company-application-environment.company.local",
"https://application-environment.company.local",
"https://a-b-c-d-e-f.foo.bar"
]
urls.forEach(url =>
console.log(url.match(/-([^-.]+)\./)[1])
)

As an alternative you might use URL, split on - and get the last item from the array. Then split on a dot and get the first item.
[
"https://company-application-environment.company.local",
"https://application-environment.company.local"
].forEach(s => {
let env = new URL(s).host.split('-').pop().split('.')[0];
console.log(env);
})

Match for known environments
var tests = [
'https://company-application-development.company.local',
'https://application-production.company.local',
'https://appdev.company.local',
'https://appprod.company.local'
];
tests.forEach(test => {
var pattern = /(development|dev|production|prod)/g;
var match = test.match(pattern);
console.log(`environment = ${match}`);
});
In this case, the best way to match is to literally use the word you are looking for.
And if you need to match multiple values in the environment position, use the RegEx or format. See the MDN.
(production|development)

Extract word between '=' and '('

I have the following string
234234=AWORDHERE('sdf.'aa')
where I need to extract AWORDHERE.
Sometimes there can be space in between.
234234= AWORDHERE('sdf.'aa')
Can I do this with a regular expression?
Or should I do it manually by finding indexes?
The datasets are huge, so it's important to do it as fast as possible.

Try this regex:
\d+=\s?(\w+)\(
Check Demo
in Javascript it would like that:
var myString = "234234=AWORDHERE('sdf.'aa')";// or 234234= AWORDHERE('sdf.'aa')
var myRegexp = /\d+=\s?(\w+)\(/g;
var match = myRegexp.exec(myString);
console.log(match[1]); // AWORDHERE

You could do this at least three ways. You need to benchmark to see what's fastest.
Substring w/ indexes
function extract(from) {
var ixEq = from.indexOf("=");
var ixParen = from.indexOf("(");
return from.substring(ixEq + 1, ixParen);
}
.
Splits
function extract(from) {
var spEq = from.split("=");
var spParen = spEq[1].split("(");
return spParen[0];
}
Regex (demo)
Here is some sample regex you could use
/[^=]+=([^(]+).*/g
This says
[^=]+ - One or more character which is not an =
= - The = itself
( - creates a matching group so you can access your match in code
[^(]+ - One or more character which is not a (
) - closes the matching group
.* - Matches the rest of the line
the /g on the end tells it to perform the match on all lines.

Using look around you can search for string preceded by = and followed by ( as following.
Regex: (?<==)[A-Z ]+(?=\()
Explanation:
(?<==) checks if [A-Z ] is preceded by an =.
[A-Z ]+ matches your pattern.
(?=\() checks if matched pattern is followed by a (.
Regex101 Demo

var str = "234234= AWORDHERE('sdf.'aa')";
var regexp = /.*=\s+(\w+)\(.*\)/g;
var match = regexp.exec(str);
alert( match[1] );

I made my solution for this just a little more general than you asked for, but I don't think it takes much more time to execute. I didn't measure. If you need greater efficiency than this provides, comment and I or someone else can help you with that.
Here's what I did, using the command prompt of node:
> var s = "234234= AWORDHERE('sdf.'aa')"
undefined
> var a = s.match(/(\w+)=\s*(\w+)\s*\(.*/)
undefined
> a
[ '234234= AWORDHERE(\'sdf.\'aa\')',
'234234',
'AWORDHERE',
index: 0,
input: '234234= AWORDHERE(\'sdf.\'aa\')' ]
>
As you can see, this matches the number before the = in a[1], and it matches the AWORDHERE name as you requested in a[2]. This will work with any number (including zero) spaces before and/or after the =.

How to extract two strings from url using regex?

I've matched a string successfully, but I need to split it and add some new segments to URL. If it is possible by regex, How to match url and extract two strings like in the example below?
Current result:
["domain.com/collection/430000000000000"]
Desired result:
["domain.com/collection/", "430000000000000"]
Current code:
var reg = new RegExp('domain.com\/collection\/[0-9]+');
var str = 'http://localhost:3000/#/domain.com/collection/430000000000000?page=0&layout=grid';
console.log(str.match(reg));

You want Regex Capture Groups.
Put the parts you want to extract into braces like this, each part forming a matching group:
new RegExp('(domain.com\/collection\/)([0-9]+)')
Then after matching, you can extract each group content by index, with index 0 being the whole string match, 1 the first group, 2 the second etc. (thanks for the addendum, jcubic!).
This is done with exec() on the regex string like described here:
/\d(\d)\d/.exec("123");
// → ["123", "2"]
First comes the whole match, then the group matches in the sequence they appear in the pattern.

You can declare an array and then fill it with the required values that you can capture with parentheses (thus, making use of capturing groups):
var reg = /(domain.com\/collection)\/([0-9]+)/g;
// ^ ^ ^ ^
var str = 'http://localhost:3000/#/domain.com/collection/430000000000000?page=0&layout=grid';
var arr = [];
while ((m = reg.exec(str)) !== null) {
arr.push(m[1]);
arr.push(m[2]);
}
console.log(arr);
Output: ["domain.com/collection", "430000000000000"]

split string only on first instance of specified character

In my code I split a string based on _ and grab the second item in the array.
var element = $(this).attr('class');
var field = element.split('_')[1];
Takes good_luck and provides me with luck. Works great!
But, now I have a class that looks like good_luck_buddy. How do I get my javascript to ignore the second _ and give me luck_buddy?
I found this var field = element.split(new char [] {'_'}, 2); in a c# stackoverflow answer but it doesn't work. I tried it over at jsFiddle...

Use capturing parentheses:
'good_luck_buddy'.split(/_(.*)/s)
['good', 'luck_buddy', ''] // ignore the third element
They are defined as
If separator contains capturing parentheses, matched results are returned in the array.
So in this case we want to split at _.* (i.e. split separator being a sub string starting with _) but also let the result contain some part of our separator (i.e. everything after _).
In this example our separator (matching _(.*)) is _luck_buddy and the captured group (within the separator) is lucky_buddy. Without the capturing parenthesis the luck_buddy (matching .*) would've not been included in the result array as it is the case with simple split that separators are not included in the result.
We use the s regex flag to make . match on newline (\n) characters as well, otherwise it would only split to the first newline.

What do you need regular expressions and arrays for?
myString = myString.substring(myString.indexOf('_')+1)
var myString= "hello_there_how_are_you"
myString = myString.substring(myString.indexOf('_')+1)
console.log(myString)

I avoid RegExp at all costs. Here is another thing you can do:
"good_luck_buddy".split('_').slice(1).join('_')

With help of destructuring assignment it can be more readable:
let [first, ...rest] = "good_luck_buddy".split('_')
rest = rest.join('_')

A simple ES6 way to get both the first key and remaining parts in a string would be:
const [key, ...rest] = "good_luck_buddy".split('_')
const value = rest.join('_')
console.log(key, value) // good, luck_buddy

Nowadays String.prototype.split does indeed allow you to limit the number of splits.
str.split([separator[, limit]])
...
limit Optional
A non-negative integer limiting the number of splits. If provided, splits the string at each occurrence of the specified separator, but stops when limit entries have been placed in the array. Any leftover text is not included in the array at all.
The array may contain fewer entries than limit if the end of the string is reached before the limit is reached.
If limit is 0, no splitting is performed.
caveat
It might not work the way you expect. I was hoping it would just ignore the rest of the delimiters, but instead, when it reaches the limit, it splits the remaining string again, omitting the part after the split from the return results.
let str = 'A_B_C_D_E'
const limit_2 = str.split('_', 2)
limit_2
(2) ["A", "B"]
const limit_3 = str.split('_', 3)
limit_3
(3) ["A", "B", "C"]
I was hoping for:
let str = 'A_B_C_D_E'
const limit_2 = str.split('_', 2)
limit_2
(2) ["A", "B_C_D_E"]
const limit_3 = str.split('_', 3)
limit_3
(3) ["A", "B", "C_D_E"]

This solution worked for me
var str = "good_luck_buddy";
var index = str.indexOf('_');
var arr = [str.slice(0, index), str.slice(index + 1)];
//arr[0] = "good"
//arr[1] = "luck_buddy"
OR
var str = "good_luck_buddy";
var index = str.indexOf('_');
var [first, second] = [str.slice(0, index), str.slice(index + 1)];
//first = "good"
//second = "luck_buddy"

You can use the regular expression like:
var arr = element.split(/_(.*)/)
You can use the second parameter which specifies the limit of the split.
i.e:
var field = element.split('_', 1)[1];

Replace the first instance with a unique placeholder then split from there.
"good_luck_buddy".replace(/\_/,'&').split('&')
["good","luck_buddy"]
This is more useful when both sides of the split are needed.

I need the two parts of string, so, regex lookbehind help me with this.
const full_name = 'Maria do Bairro';
const [first_name, last_name] = full_name.split(/(?<=^[^ ]+) /);
console.log(first_name);
console.log(last_name);

Non-regex solution
I ran some benchmarks, and this solution won hugely:1
str.slice(str.indexOf(delim) + delim.length)
// as function
function gobbleStart(str, delim) {
return str.slice(str.indexOf(delim) + delim.length);
}
// as polyfill
String.prototype.gobbleStart = function(delim) {
return this.slice(this.indexOf(delim) + delim.length);
};
Performance comparison with other solutions
The only close contender was the same line of code, except using substr instead of slice.
Other solutions I tried involving split or RegExps took a big performance hit and were about 2 orders of magnitude slower. Using join on the results of split, of course, adds an additional performance penalty.
Why are they slower? Any time a new object or array has to be created, JS has to request a chunk of memory from the OS. This process is very slow.
Here are some general guidelines, in case you are chasing benchmarks:
New dynamic memory allocations for objects {} or arrays [] (like the one that split creates) will cost a lot in performance.
RegExp searches are more complicated and therefore slower than string searches.
If you already have an array, destructuring arrays is about as fast as explicitly indexing them, and looks awesome.
Removing beyond the first instance
Here's a solution that will slice up to and including the nth instance. It's not quite as fast, but on the OP's question, gobble(element, '_', 1) is still >2x faster than a RegExp or split solution and can do more:
/*
`gobble`, given a positive, non-zero `limit`, deletes
characters from the beginning of `haystack` until `needle` has
been encountered and deleted `limit` times or no more instances
of `needle` exist; then it returns what remains. If `limit` is
zero or negative, delete from the beginning only until `-(limit)`
occurrences or less of `needle` remain.
*/
function gobble(haystack, needle, limit = 0) {
let remain = limit;
if (limit <= 0) { // set remain to count of delim - num to leave
let i = 0;
while (i < haystack.length) {
const found = haystack.indexOf(needle, i);
if (found === -1) {
break;
}
remain++;
i = found + needle.length;
}
}
let i = 0;
while (remain > 0) {
const found = haystack.indexOf(needle, i);
if (found === -1) {
break;
}
remain--;
i = found + needle.length;
}
return haystack.slice(i);
}
With the above definition, gobble('path/to/file.txt', '/') would give the name of the file, and gobble('prefix_category_item', '_', 1) would remove the prefix like the first solution in this answer.
Tests were run in Chrome 70.0.3538.110 on macOSX 10.14.

Use the string replace() method with a regex:
var result = "good_luck_buddy".replace(/.*?_/, "");
console.log(result);
This regex matches 0 or more characters before the first _, and the _ itself. The match is then replaced by an empty string.

Javascript's String.split unfortunately has no way of limiting the actual number of splits. It has a second argument that specifies how many of the actual split items are returned, which isn't useful in your case. The solution would be to split the string, shift the first item off, then rejoin the remaining items::
var element = $(this).attr('class');
var parts = element.split('_');
parts.shift(); // removes the first item from the array
var field = parts.join('_');

Here's one RegExp that does the trick.
'good_luck_buddy' . split(/^.*?_/)[1]
First it forces the match to start from the
start with the '^'. Then it matches any number
of characters which are not '_', in other words
all characters before the first '_'.
The '?' means a minimal number of chars
that make the whole pattern match are
matched by the '.*?' because it is followed
by '_', which is then included in the match
as its last character.
Therefore this split() uses such a matching
part as its 'splitter' and removes it from
the results. So it removes everything
up till and including the first '_' and
gives you the rest as the 2nd element of
the result. The first element is "" representing
the part before the matched part. It is
"" because the match starts from the beginning.
There are other RegExps that work as
well like /_(.*)/ given by Chandu
in a previous answer.
The /^.*?_/ has the benefit that you
can understand what it does without
having to know about the special role
capturing groups play with replace().

if you are looking for a more modern way of doing this:
let raw = "good_luck_buddy"
raw.split("_")
.filter((part, index) => index !== 0)
.join("_")

Mark F's solution is awesome but it's not supported by old browsers. Kennebec's solution is awesome and supported by old browsers but doesn't support regex.
So, if you're looking for a solution that splits your string only once, that is supported by old browsers and supports regex, here's my solution:
String.prototype.splitOnce = function(regex)
{
var match = this.match(regex);
if(match)
{
var match_i = this.indexOf(match[0]);
return [this.substring(0, match_i),
this.substring(match_i + match[0].length)];
}
else
{ return [this, ""]; }
}
var str = "something/////another thing///again";
alert(str.splitOnce(/\/+/)[1]);

For beginner like me who are not used to Regular Expression, this workaround solution worked:
var field = "Good_Luck_Buddy";
var newString = field.slice( field.indexOf("_")+1 );
slice() method extracts a part of a string and returns a new string and indexOf() method returns the position of the first found occurrence of a specified value in a string.

This should be quite fast
function splitOnFirst (str, sep) {
const index = str.indexOf(sep);
return index < 0 ? [str] : [str.slice(0, index), str.slice(index + sep.length)];
}
console.log(splitOnFirst('good_luck', '_')[1])
console.log(splitOnFirst('good_luck_buddy', '_')[1])

This worked for me on Chrome + FF:
"foo=bar=beer".split(/^[^=]+=/)[1] // "bar=beer"
"foo==".split(/^[^=]+=/)[1] // "="
"foo=".split(/^[^=]+=/)[1] // ""
"foo".split(/^[^=]+=/)[1] // undefined
If you also need the key try this:
"foo=bar=beer".split(/^([^=]+)=/) // Array [ "", "foo", "bar=beer" ]
"foo==".split(/^([^=]+)=/) // [ "", "foo", "=" ]
"foo=".split(/^([^=]+)=/) // [ "", "foo", "" ]
"foo".split(/^([^=]+)=/) // [ "foo" ]
//[0] = ignored (holds the string when there's no =, empty otherwise)
//[1] = hold the key (if any)
//[2] = hold the value (if any)

a simple es6 one statement solution to get the first key and remaining parts
let raw = 'good_luck_buddy'
raw.split('_')
.reduce((p, c, i) => i === 0 ? [c] : [p[0], [...p.slice(1), c].join('_')], [])

You could also use non-greedy match, it's just a single, simple line:
a = "good_luck_buddy"
const [,g,b] = a.match(/(.*?)_(.*)/)
console.log(g,"and also",b)

Develop Reference

JavaScript is the programming language of the Web.

JS regexp: match repeated pattern - javascript

You probably are looking for this: var re = /([abc])/g, matches = [], input = "abc"; while (match = re.exec(input)) matches.push(match[1]); console.log(matches); //=> ["a", "b", "c"] Remember that any matching group will give you last matched pattern not all of them.

RegExBuddy describes it very well: Note: you repeated the capturing group itself. The group will capture only the last iteration

Related

How to add access property and wrap the arguments with curly brackets with regex in js?

Regular expression to match environment

Extract word between '=' and '('

How to extract two strings from url using regex?

split string only on first instance of specified character

Categories

Resources