Why is this regex matching also words within a non-capturing group? - javascript

I have this string (notice the multi-line syntax):
var str = ` Number One: Get this
Number Two: And this`;
And I want a regex that returns (with match):
[str, 'Get this', 'And this']
So I tried str.match(/Number (?:One|Two): (.*)/g);, but that's returning:
["Number One: Get this", "Number Two: And this"]
There can be any whitespace/line-breaks before any "Number" word.
Why doesn't it return only what is inside of the capturing group? Am I misundersating something? And how can I achieve the desired result?

Per the MDN documentation for String.match:
If the regular expression includes the g flag, the method returns an Array containing all matched substrings rather than match objects. Captured groups are not returned. If there were no matches, the method returns null.
(emphasis mine).
So, what you want is not possible.
The same page adds:
if you want to obtain capture groups and the global flag is set, you need to use RegExp.exec() instead.
so if you're willing to give on using match, you can write your own function that repeatedly applies the regex, gets the captured substrings, and builds an array.
Or, for your specific case, you could write something like this:
var these = str.split(/(?:^|\n)\s*Number (?:One|Two): /);
these[0] = str;

Replace and store the result in a new string, like this:
var str = ` Number One: Get this
Number Two: And this`;
var output = str.replace(/Number (?:One|Two): (.*)/g, "$1");
console.log(output);
which outputs:
Get this
And this
If you want the match array like you requested, you can try this:
var getMatch = function(string, split, regex) {
var match = string.replace(regex, "$1" + split);
match = match.split(split);
match = match.reverse();
match.push(string);
match = match.reverse();
match.pop();
return match;
}
var str = ` Number One: Get this
Number Two: And this`;
var regex = /Number (?:One|Two): (.*)/g;
var match = getMatch(str, "#!SPLIT!#", regex);
console.log(match);
which displays the array as desired:
[ ' Number One: Get this\n Number Two: And this',
' Get this',
'\n And this' ]
Where split (here #!SPLIT!#) should be a unique string to split the matches. Note that this only works for single groups. For multi groups add a variable indicating the number of groups and add a for loop constructing "$1 $2 $3 $4 ..." + split.

Try
var str = " Number One: Get this\
Number Two: And this";
// `/\w+\s+\w+(?=\s|$)/g` match one or more alphanumeric characters ,
// followed by one or more space characters ,
// followed by one or more alphanumeric characters ,
// if following space or end of input , set `g` flag
// return `res` array `["Get this", "And this"]`
var res = str.match(/\w+\s+\w+(?=\s|$)/g);
document.write(JSON.stringify(res));

Related

How do I correctly apply regex so array does not have empty string at the start

I'm struggling with a regex.
I am able to split the string at the required location, but when it is added to an array, the array has an empty string at the start.
// This is the string I am wanting to split.
// I want the first 4 words to be separated from the remainder of the string
const chatMessage = "This is a string that I want to split";
// I am using this regex
const r = /(^(?:\S+\s+\n?){4})/;
const chatMessageArr = chatMessage.split(r);
console.log(chatMessageArr);
It returns:
[ '', 'This is a string ', 'that I want to split' ]
But need it to return:
[ 'This is a string ', 'that I want to split' ]
I wouldn't use string split here, I would use a regex replacement:
var chatMessage = "This is a string that I want to split";
var first = chatMessage.replace(/^\s*(\S+(?:\s+\S+){3}).*$/, "$1");
var last = chatMessage.replace(/^\s*\S+(?:\s+\S+){3}\s+(.*$)/, "$1");
console.log(chatMessage);
console.log(first);
console.log(last);
Add a second capture group to the regexp and use .match() instead of .split().
// This is the string I am wanting to split.
// I want the first 4 words to be separated from the remainder of the string
const chatMessage = "This is a string that I want to split";
// I am using this regex
const r = /(^(?:\S+\s+\n?){4})(.*)/;
const chatMessageArr = chatMessage.match(r);
chatMessageArr.shift(); // remove the full match
console.log(chatMessageArr);

Regex match cookie value and remove hyphens

I'm trying to extract out a group of words from a larger string/cookie that are separated by hyphens. I would like to replace the hyphens with a space and set to a variable. Javascript or jQuery.
As an example, the larger string has a name and value like this within it:
facility=34222%7CConner-Department-Store;
(notice the leading "C")
So first, I need to match()/find facility=34222%7CConner-Department-Store; with regex. Then break it down to "Conner Department Store"
var cookie = document.cookie;
var facilityValue = cookie.match( REGEX ); ??
var test = "store=874635%7Csomethingelse;facility=34222%7CConner-Department-Store;store=874635%7Csomethingelse;";
var test2 = test.replace(/^(.*)facility=([^;]+)(.*)$/, function(matchedString, match1, match2, match3){
return decodeURIComponent(match2);
});
console.log( test2 );
console.log( test2.split('|')[1].replace(/[-]/g, ' ') );
If I understood it correctly, you want to make a phrase by getting all the words between hyphens and disallowing two successive Uppercase letters in a word, so I'd prefer using Regex in that case.
This is a Regex solution, that works dynamically with any cookies in the same format and extract the wanted sentence from it:
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Demo:
var str = "facility=34222%7CConner-Department-Store;";
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Explanation:
Use this Regex (/([A-Z][a-z]+)-?/g to match the words between -.
Replace any - occurence in the matched words.
Then just join these matches array with white space.
Ok,
first, you should decode this string as follows:
var str = "facility=34222%7CConner-Department-Store;"
var decoded = decodeURIComponent(str);
// decoded = "facility=34222|Conner-Department-Store;"
Then you have multiple possibilities to split up this string.
The easiest way is to use substring()
var solution1 = decoded.substring(decoded.indexOf('|') + 1, decoded.length)
// solution1 = "Conner-Department-Store;"
solution1 = solution1.replace('-', ' ');
// solution1 = "Conner Department Store;"
As you can see, substring(arg1, arg2) returns the string, starting at index arg1 and ending at index arg2. See Full Documentation here
If you want to cut the last ; just set decoded.length - 1 as arg2 in the snippet above.
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1)
//returns "Conner-Department-Store"
or all above in just one line:
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1).replace('-', ' ')
If you want still to use a regular Expression to retrieve (perhaps more) data out of the string, you could use something similar to this snippet:
var solution2 = "";
var regEx= /([A-Za-z]*)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/;
if (regEx.test(decoded)) {
solution2 = decoded.match(regEx);
/* returns
[0:"facility=34222|Conner-Department-Store",
1:"facility",
2:"34222",
3:"Conner-Department-Store",
index:0,
input:"facility=34222|Conner-Department-Store;"
length:4] */
solution2 = solution2[3].replace('-', ' ');
// "Conner Department Store"
}
I have applied some rules for the regex to work, feel free to modify them according your needs.
facility can be any Word built with alphabetical characters lower and uppercase (no other chars) at any length
= needs to be the char =
34222 can be any number but no other characters
| needs to be the char |
Conner-Department-Store can be any characters except one of the following (reserved delimiters): :/?#[]#;,'
Hope this helps :)
edit: to find only the part
facility=34222%7CConner-Department-Store; just modify the regex to
match facility= instead of ([A-z]*)=:
/(facility)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/
You can use cookies.js, a mini framework from MDN (Mozilla Developer Network).
Simply include the cookies.js file in your application, and write:
docCookies.getItem("Connor Department Store");

How to extract two strings from url using regex?

I've matched a string successfully, but I need to split it and add some new segments to URL. If it is possible by regex, How to match url and extract two strings like in the example below?
Current result:
["domain.com/collection/430000000000000"]
Desired result:
["domain.com/collection/", "430000000000000"]
Current code:
var reg = new RegExp('domain.com\/collection\/[0-9]+');
var str = 'http://localhost:3000/#/domain.com/collection/430000000000000?page=0&layout=grid';
console.log(str.match(reg));
You want Regex Capture Groups.
Put the parts you want to extract into braces like this, each part forming a matching group:
new RegExp('(domain.com\/collection\/)([0-9]+)')
Then after matching, you can extract each group content by index, with index 0 being the whole string match, 1 the first group, 2 the second etc. (thanks for the addendum, jcubic!).
This is done with exec() on the regex string like described here:
/\d(\d)\d/.exec("123");
// → ["123", "2"]
First comes the whole match, then the group matches in the sequence they appear in the pattern.
You can declare an array and then fill it with the required values that you can capture with parentheses (thus, making use of capturing groups):
var reg = /(domain.com\/collection)\/([0-9]+)/g;
// ^ ^ ^ ^
var str = 'http://localhost:3000/#/domain.com/collection/430000000000000?page=0&layout=grid';
var arr = [];
while ((m = reg.exec(str)) !== null) {
arr.push(m[1]);
arr.push(m[2]);
}
console.log(arr);
Output: ["domain.com/collection", "430000000000000"]

Search and replace and remember with which word the string has been replaced in regex

How do i find which substrings were replaced when regex replace was applied in javascript
Main string : abcSSSdeSSfghEEEijSSSkEEElmSSSnSSSEEEopEEE
I want to replace all the minimal length substrings starting with 'SSS' and ends with 'EEE'. with .*
Upon applying a desired function i should get modified string
abc.*ij.*lmSSSn.*opEEE
and also the array for replaced strings as follows :
[ SSSdeSSfghEEE, SSSkEEE, SSSEEE ]
How to efficiently implement the above desired function
Pass a function as the second argument to String.replace:
var replacements = [];
var newString = oldString.replace(/*regex pattern here*/, function(match){
replacements.push(match);
return 'some replacement string';
});
Javascript's replace lets us pass a function, so we can collect the matched values:
var arr = []
string.replace(/SSS.*?EEE/g, function (match) {
arr.push(match)
return '.*'
})
The .*? in the pattern insures it matches minimal length strings, and won't try to match from the first SSS to the last EEE.

How to Split string with multiple rules in javascript

I have this string for example:
str = "my name is john#doe oh.yeh";
the end result I am seeking is this Array:
strArr = ['my','name','is','john','&#doe','oh','&yeh'];
which means 2 rules apply:
split after each space " " (I know how)
if there are special characters ("." or "#") then also split but add the characther "&" before the word with the special character.
I know I can strArr = str.split(" ") for the first rule. but how do I do the other trick?
thanks,
Alon
Assuming the result should be '&doe' and not '&#doe', a simple solution would be to just replace all . and # with & split by spaces:
strArr = str.replace(/[.#]/g, ' &').split(/\s+/)
/\s+/ matches consecutive white spaces instead of just one.
If the result should be '&#doe' and '&.yeah' use the same regex and add a capture:
strArr = str.replace(/([.#])/g, ' &$1').split(/\s+/)
You have to use a Regular expression, to match all special characters at once. By "special", I assume that you mean "no letters".
var pattern = /([^ a-z]?)[a-z]+/gi; // Pattern
var str = "my name is john#doe oh.yeh"; // Input string
var strArr = [], match; // output array, temporary var
while ((match = pattern.exec(str)) !== null) { // <-- For each match
strArr.push( (match[1]?'&':'') + match[0]); // <-- Add to array
}
// strArr is now:
// strArr = ['my', 'name', 'is', 'john', '&#doe', 'oh', '&.yeh']
It does not match consecutive special characters. The pattern has to be modified for that. Eg, if you want to include all consecutive characters, use ([^ a-z]+?).
Also, it does nothing include a last special character. If you want to include this one as well, use [a-z]* and remove !== null.
use split() method. That's what you need:
http://www.w3schools.com/jsref/jsref_split.asp
Ok. i saw, you found it, i think:
1) first use split to the whitespaces
2) iterate through your array, split again in array members when you find # or .
3) iterate through your array again and str.replace("#", "&#") and str.replace(".","&.") when you find
I would think a combination of split() and replace() is what you are looking for:
str = "my name is john#doe oh.yeh";
strArr = str.replace('\W',' &');
strArr = strArr.split(' ');
That should be close to what you asked for.
This works:
array = string.replace(/#|\./g, ' &$&').split(' ');
Take a look at demo here: http://jsfiddle.net/M6fQ7/1/

Categories

Resources