javascript regex capturing parentheses - javascript

I don't really get the concept on capturing parentheses when dealing with javascript regex. I don't understand why we need parentheses for the following example
var x = "{xxx} blah blah blah {yyy} and {111}";
x.replace( /{([^{}]*)}/g ,
function(match,content) {
console.log(match,content);
return "whatever";
});
//it will print
{xxx} xxx
{yyy} yyy
{111} 111
so when i drop the parentheses from my pattern x the results give a different value
x.replace( /{[^{}]*}/g ,
function(match,content) {
console.log(match,content);
return "whatever";
});
//it will print
{xxx} 0
{yyy} 37
{111} 49
so the content values now become numeric value which i have no idea why. Can someone explains what's going on behind the scene ?

According to the MDN documentation, the parameters to the function will be, in order:
The matched substring.
Any groups that are defined, if there are any.
The index in the original string where the match was found.
The original string.
So in the first example, content will be the string which was captured in group 1. But when you remove the group in the second example, content is actually the index where the match was found.

This is useful with replacement of texts.
For example, I have this string "one two three four" that I want to reverse like "four three two one". To achieve that I will use this line of code:
var reversed = "one two three four".replace(/(one) (two) (three) (four)/, "$4 $3 $2 $1");
Note how $n represents each word in the string.
Another example: I have the same string "one two three four" and I want to print each word twice:
var eachWordTwice = "one two three four".replace(/(one) (two) (three) (four)/, "$1 $1 $2 $2 $3 $3 $4 $4");

The numbers:
The offset of the matched substring within the total string being
examined. (For example, if the total string was "abcd", and the
matched substring was "bc", then this argument will be 1.)
Source:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace
"Specifying a function as a parameter" section

Parenthesis are used to capture/replace only a portion of the match. For instance, when I use it to match phone numbers that may or may not have extensions. This function matches the whole string (if the if is correct), so the entire string is replaced, but I am only using a specific types of characters in a specific order, with whitespace or other("() -x") characters allowed in the input.
It will always output a string formatted to (651) 258-9631 x1234 if given 6512589631x1234 or 1 651 258 9631 1234. It also doesn't allow (or in this case format) toll-free numbers as they aren't allowed in my field.
function phoneNumber(v) {
// take in a string, return a formatted string (651) 651-6511 x1234
if (v.search(/^[1]{0,1}[-(\s.]{0,1}(?!800|888|877|866|855|900)([2-9][0-9]{2})[-)\s.]{0,2}([2-9][0-9]{2})[-.\s]{0,2}([0-9]{4})[\s]*[x]{0,1}([0-9]{1,5}){1}$/gi) !== -1) {return v.replace(/^[1]{0,1}[-(\s.]{0,1}(?!800|888|877|866|855|900)([2-9][0-9]{2})[-)\s.]{0,2}([2-9][0-9]{2})[-.\s]{0,2}([0-9]{4})[\s]*[x]{0,1}([0-9]{1,5}){1}$/gi,"($1) $2-$3 x$4"); }
if (v.search(/^[1]{0,1}[-(\s.]{0,1}(?!800|888|877|866|855|900)([2-9][0-9]{2})[-)\s.]{0,1}([2-9][0-9]{2})[-.\s]{0,2}([0-9]{4})$/gi) !== -1) { return v.replace(/^[1]{0,1}[-(\s.]{0,1}(?!800|888|877|866|855|900)([2-9][0-9]{2})[-)\s.]{0,1}([2-9][0-9]{2})[-.\s]{0,2}([0-9]{4})$/gi,"($1) $2-$3"); }
return v;
}
What this allows me to do is gather the area code, prefix, line number, and an optional extension, and format it the way I need it (for users who can't follow directions, for instance).
So it you input 6516516511x1234 or "(651) 651-6511 x1234", it will match one regex or another in this example.
Now what is happening in your code is as #amine-hajyoussef said - The index of the start of each match is being returned. Your use of that code would be better serviced by match for example one (text returned), or search for the index, as in example two. p.s.w.g's answer expands.

Related

Find how many times a char is repeated in a string and remove those repeated chars by a dynamic number

I would like to write a function that recieves two parameters: String and Number.
The function will return another string that is similar to the input string, but with certain characters
removed.
The function will remove characters from consecutive runs of the same
character, where the length of the run is greater than the input parameter.
for example:
"aaab", 2 => "aab"
"aabb", 1 => "ab"
"aabbaa", 1 => "aba"
What I did:
function doSomething(string,number) {
let repeatCount = 0
debugger;
for (let i = 0; i < string.length; i++) {
if(string[i] == string[i+1]){
repeatCount++
}
if(repeatCount > number ){
string.replace(string[i],'')
}
}
console.log(string)
}
doSomething('aaab',2)
The console.log(string) prints 'aaab' but I want it to print 'aab' because the number is 2 and the char 'a' is repeated 3 times.
If there is another better way to do it , I will be happy to learn.
If there is another better way to do it, I will be happy to learn.
You could go with a .replace() approach and a regular expression with a backreference to match consecutive letters. Then you can use .slice() to remove the additional letters to get it to the defined length like so:
function shorten(string,number) {
return string.replace(/(.)\1+/g, m => m.slice(0, number))
}
console.log(shorten("aaab", 2))// => "aab"
console.log(shorten("aabb", 1))// => "ab"
console.log(shorten("aabbaa", 1))// => "aba"
The above regular expression will match any character and group it (.). This matched character is then checked for again to see if it is repeated one or more times by using \1+. The replacement function will then be invoked for each consecutive runs of letters, which you can trim down to your desired length by using .slice().
For example, take the string aabbaa. The regular expression tries to find consecutive runs of characters. The (.) would match any character, in this case, it finds "a" and puts it into a "capture group" called "1". Now the regular expression tries to find whether “a” is followed by one or more “a” characters by checking if the grouped (ie the character “a”) follows it one or more times. This is done using \1+. The first section of the aabbaa string that this regular expression matches is "aa", as we match the “a”, capture it, and find that it is repeated with \1+. When a match is found, the function m => m.slice(0, number) is ran. This function takes the match (m), which in this case is "aa", and returns the sliced version of it, giving "a". This then replaces the "aa" we matched from the original string with the value returned, thus causing "aa" to be converted to "a" (note this conversion doesn't modify the original string, it occurs in the new string that gets returned by the replace method). The /g at the end of the regular expression means repeat this for the entire string. As a result, our regular expression moves on and finds "bb". The function then gets called again but this time with m set as "bb", causing "bb" to be converted to "b". Lastly, we match "aa", this causes "aa" to get converted to "a". Once replace has finished going through the entire string, it returns the result with the returned values (as well as the part of the original string it didn’t modify) and so it gives "aba"
Not that the rest of your code is correct. But one fundamental mistake you have made is that, strings in javascript is immutable. You cannot change an element of the string like that.
string.replace(string[i],'')
This won't change 'string'. You have to make another string from it.
let str = string.replace(string[i],'')
function doSomething(string,number) {
let repeatCount = 0
debugger
let sameletter=string[0]
for (let i = 0; i < string.length;i++) {
if(string[i] == sameletter){
repeatCount++
if(repeatCount>number){
var result = string.split('')
result.splice(i, 1)
string = result.join('')
i--
}
}
else{
sameletter=string[i];
repeatCount=1;
}
}
console.log(string)
}
doSomething('aaaabbbbeeeffffgggggggggg',2)
Try this

Javascript regex pattern match multiple strings ( AND, OR, NEAR/n, P/n )

I need to filter a collection of strings based on a rather complex query
I have query input as a string
var query1 ='Abbott near/10 (assay* OR test* ) AND BLOOD near/10 (Point P/1 Care)';
From this query INPUT string I want to collect just the important words:
var words= 'Abbott assay* test* BLOOD Point care';
The query can change for example:
var query2='(assay* OR test* OR analy* OR array) OR (Abbott p/1 Point P/1 Care)';
from this query need to collect
var words='assay* test* analy* array Abbott Point Care';
I'm looking for your suggestion.
Thanks.
You may just use | in your regex to capture the words and/or special characters that you want to remove:
([()]|AND|OR|(NEAR|P)\/\d+) ?
DEMO: https://regex101.com/r/rqpmXr/2
Note the /gi in the regex options, with i meaning that it's case insensitive.
EXPLANATION:
([()]|AND|OR|(NEAR|P)\/\d+) - This is a capture group containing all the words you specified in your title, plus the parentheses.
(NEAR|P)\/\d+ - Just to clear out this part, \d+ means that one or more digits are following the words NEAR or P.
 ? - This captures the possible trailing space after the captured word.

Unable to craft dynamically growing regex

I'm trying to build a regex in JavaScript that will match parts of an arithmetic operation. For instance, here are a few inputs and expected outputs:
What is 7 minus 5? >> ['7','minus','5']
What is 6 multiplied by -3? >> ['6','multiplied by', '-3']
I have this working regex: /^What is (-?\d+) (minus|plus|multiplied by|divided by) (-?\d+)\?$/
Now I want to expand things to capture additional operations. For instance:
What is 7 minus 5 plus 3? >> ['7','minus','5','plus','3']
So I used: ^What is (-?\d+)(?: (minus|plus|multiplied by|divided by) (-?\d+))+\?$. But it yields:
What is 7 minus 5 plus 3? >> ['7','plus','3']
Why is the minus 5 skipped? And how do I include it in results as I'd like? (here is my sample)
The problem you are facing comes from the fact that a capturing group can only return one value. If the same capturing group would have more than one value (like it is in your case) it would always return the last one.
I like how it is explained at http://www.rexegg.com/regex-capture.html#spawn_groups
The capturing parentheses you see in a pattern only capture a single
group. So in (\d)+, capture groups do not magically mushroom as you
travel down the string. Rather, they repeatedly refer to Group 1,
Group 1, Group 1… If you try this regex on 1234 (assuming your regex
flavor even allows it), Group 1 will contain 4—i.e. the last capture.
In essence, Group 1 gets overwritten every time the regex iterates
through the capturing parentheses.
So the trick for you is use a regex with the global flag (g) and execute the expression more than once, when using the g flag, the following execution starts where the last one ended.
I've made a regex to show you the strategy, isolate the formula and then iterate until you found everything.
var formula = "What is 2 minus 1 minus 1";
var regex = /^What is ((?:-?\d+)(?: (?:minus|plus|multiplied by|divided by) (?:-?\d+))+)$/
if (regex.exec(formula).length > 1) {
var math_string = regex.exec(formula)[1];
console.log(math_string);
var math_regex = /(-?\d+)? (minus|plus|multiplied by|divided by) (-?\d+)/g
var operation;
var result = [];
while (operation = math_regex.exec(math_string)) {
if (operation[1]) {
result.push(operation[1]);
}
result.push(operation[2], operation[3]);
}
console.log(result);
}
Another solution, if you aren't requiring anything fancy would be to remove the "What is", replace multiplied by with multiplied_by (same for divided) and split the string on spaces.
var formula = "What is 2 multiplied by 1 divided by 1";
var regex = /^What is ((?:-?\d+)(?: (?:minus|plus|multiplied by|divided by) (?:-?\d+))+)$/
if (regex.exec(formula).length > 1) {
var math_string = regex.exec(formula)[1].replace('multiplied by', 'multiplied_by').replace('divided by', 'divided_by');
console.log(math_string.split(" "));
}
Each capturing group in a regex can only hold a single value. So, if you have a repetition on a group, you're only going to get one result for that group (usually the last one, I think). In your case it's the following:
(?: (minus|plus|multiplied by|divided by) (-?\d+))+
You're repeating the non-capturing group around, which will match repeatedly. But the groups within can, in the end, only hold a single match, which is the result of the last repetition.
You should probably switch to matching tokens instead of having a single regex that tries to match the whole phrase and dissects it via capturing groups. Something like a two-step process where you first verify that the whole phrase is constructed correctly (starts with »What is«, ends with »?«, etc.) and then a pass that extracts the individual tokens, e.g. something like
-?\d+|minus|plus|multiplied by|divided by

Extract specific chars from a string using a regex

I need to split an email address and take out the first character and the first character after the '#'
I can do this as follows:
'bar#foo'.split('#').map(function(a){ return a.charAt(0); }).join('')
--> bf
Now I was wondering if it can be done using a regex match, something like this
'bar#foo'.match(/^(\w).*?#(\w)/).join('')
--> bar#fbf
Not really what I want, but I'm sure I miss something here! Any suggestions ?
Why use a regex for this? just use indexOf to get the char at any given position:
var addr = 'foo#bar';
console.log(addr[0], addr[addr.indexOf('#')+1])
To ensure your code works on all browsers, you might want to use charAt instead of []:
console.log(addr.charAt(0), addr.charAt(addr.indexOf('#')+1));
Either way, It'll work just fine, and This is undeniably the fastest approach
If you are going to persist, and choose a regex, then you should realize that the match method returns an array containing 3 strings, in your case:
/^(\w).*?#(\w)/
["the whole match",//start of string + first char + .*?# + first string after #
"groupw 1 \w",//first char
"group 2 \w"//first char after #
]
So addr.match(/^(\w).*?#(\w)/).slice(1).join('') is probably what you want.
If I understand correctly, you are quite close. Just don't join everything returned by match because the first element is the entire matched string.
'bar#foo'.match(/^(\w).*?#(\w)/).splice(1).join('')
--> bf
Using regex:
matched="",
'abc#xyz'.replace(/(?:^|#)(\w)/g, function($0, $1) { matched += $1; return $0; });
console.log(matched);
// ax
The regex match function returns an array of all matches, where the first one is the 'full text' of the match, followed by every sub-group. In your case, it returns this:
bar#f
b
f
To get rid of the first item (the full match), use slice:
'bar#foo'.match(/^(\w).*?#(\w)/).slice(1).join('\r')
Use String.prototype.replace with regular expression:
'bar#foo'.replace(/^(\w).*#(\w).*$/, '$1$2'); // "bf"
Or using RegEx
^([a-zA-Z0-9])[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+#([a-zA-Z0-9-])[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$
Fiddle

getting contents of string between digits

have a regex problem :(
what i would like to do is to find out the contents between two or more numbers.
var string = "90+*-+80-+/*70"
im trying to edit the symbols in between so it only shows up the last symbol and not the ones before it. so trying to get the above variable to be turned into 90+80*70. although this is just an example i have no idea how to do this. the length of the numbers, how many "sets" of numbers and the length of the symbols in between could be anything.
many thanks,
Steve,
The trick is in matching '90+-+' and '80-+/' seperately, and selecting only the number and the last constant.
The expression for finding the a number followed by 1 or more non-numbers would be
\d+[^\d]+
To select the number and the last non-number, add parens:
(\d+)[^\d]*([^\d])
Finally add a /g to repeat the procedure for each match, and replace it with the 2 matched groups for each match:
js> '90+*-+80-+/*70'.replace(/(\d+)[^\d]*([^\d])/g, '$1$2');
90+80*70
js>
Or you can use lookahead assertion and simply remove all non-numerical characters which are not last: "90+*-+80-+/*70".replace(/[^0-9]+(?=[^0-9])/g,'');
You can use a regular expression to match the non-digits and a callback function to process the match and decide what to replace:
var test = "90+*-+80-+/*70";
var out = test.replace(/[^\d]+/g, function(str) {
return(str.substr(-1));
})
alert(out);
See it work here: http://jsfiddle.net/jfriend00/Tncya/
This works by using a regular expression to match sequences of non-digits and then replacing that sequence of non-digits with the last character in the matched sequence.
i would use this tutorial, first, then review this for javascript-specific regex questions.
This should do it -
var string = "90+*-+80-+/*70"
var result = '';
var arr = string.split(/(\d+)/)
for (i = 0; i < arr.length; i++) {
if (!isNaN(arr[i])) result = result + arr[i];
else result = result + arr[i].slice(arr[i].length - 1, arr[i].length);
}
alert(result);
Working demo - http://jsfiddle.net/ipr101/SA2pR/
Similar to #Arnout Engelen
var string = "90+*-+80-+/*70";
string = string.replace(/(\d+)[^\d]*([^\d])(?=\d+)/g, '$1$2');
This was my first thinking of how the RegEx should perform, it also looks ahead to make sure the non-digit pattern is followed by another digit, which is what the question asked for (between two numbers)
Similar to #jfriend00
var string = "90+*-+80-+/*70";
string = string.replace( /(\d+?)([^\d]+?)(?=\d+)/g
, function(){
return arguments[1] + arguments[2].substr(-1);
});
Instead of only matching on non-digits, it matches on non-digits between two numbers, which is what the question asked
Why would this be any better?
If your equation was embedded in a paragraph or string of text. Like:
This is a test where I want to clean up something like 90+*-+80-+/*70 and don't want to scrap the whole paragraph.
Result (Expected) :
This is a test where I want to clean up something like 90+80*70 and don't want to scrap the whole paragraph.
Why would this not be any better?
There is more pattern matching, which makes it theoretically slower (negligible)
It would fail if your paragraph had embedded numbers. Like:
This is a paragraph where Sally bought 4 eggs from the supermarket, but only 3 of them made it back in one piece.
Result (Unexpected):
This is a paragraph where Sally bought 4 3 of them made it back in one piece.

Categories

Resources