JavaScript regular expression calculating wrongly - javascript

I m using a local JavaScript executed on Chrome browser.
I really don't understand why this is providing the wrong result:
Script:
var str = "hello 1 test test hello 2";
var patt = /(hello \S+)/g;
var res = str.split(patt);
//var res = str.search(patt);
if(res!=null) {
for(var i=0;i<res.length;i++) console.log(i+res[i]);
}
Output:
0
1hello 1
2 test test
3hello 2
4
Expected Result:
0hello 1
1hello 2
What am I doing wrong?!

Looks like you're looking for matches rather than splitting the string
Use str.match(patt)
str.match
Instead, your answer is splitting the string twice, as your regular expression matches in two places. Splitting a string by regex gives 3 parts. Before the match, the match, and after the match.
Your string has matched twice. Meaning that this process has happened twice, resulting in 5 parts, the result shown (two parts are empty).

split ... splits the string.
Just as splitting x,y,z by /(,)/ will give you ["x", ",", "y", ",", "z"], you get the result seen here.
What you wanted to do was iterate over the matches:
str.match(/(hello \S+)/g)

Match approach
With match, it is much simpler, just use /hello\s+\S+/g:
var str = "hello 1 test test hello 2";
var patt = /hello\s+\S+/g;
var res = str.match(patt);
if(res!=null) {
for(var i=0;i<res.length;i++)
console.log(i+res[i]);
}
Note that you do not need any capturing groups in this case as you are not using the captured text, you need the whole matched text. Besides, \s+ will match any whitespace there can be between hello and a sequence of non-whitespace characters.
Split approach
You need to match the rest of the string that is after hello \S+ and remove blank entries before outputting them:
var str = "hello 1 test test hello 2";
var patt = /(hello \S+).*?(?=$|hello \S)/g;
var res = str.split(patt);
//var res = str.search(patt);
if(res!=null) {
res = res.filter(Boolean);
for(var i=0;i<res.length;i++)
if (res[i]) {
console.log(i+res[i]);
}
}
Result:
0hello 1
js:21 1hello 2
The regex - (hello \S+).*?(?=$|hello \S) - matches and captures the hello + a sequence of non-whitespace symbols, and then any characters but a newline up to the end of string or next hello + non-whitespace characters.
I have used res.filter(Boolean); to remove empty elements in the resulting array (that almost always are present when splitting with regex).

You used split, then you have an array with all values before and after each match of /(hello \S+)/g.
You want to use match:
"hello 1 test test hello 2".match(/(hello \S+)/g);
// ["hello 1", "hello 2"]

Related

How to slice optional arguments in RegEx?

Actually i have the following RegExp expression:
/^(?:(?:\,([A-Za-z]{5}))?)+$/g
So the accepted input should be something like ,IGORA but even ,IGORA,GIANC,LOLLI is valid and i would be able to slice the string to 3 group in this case, in other the group number should be equals to the user input that pass the RegExp test.
i was trying to do something like this in JavaScript but it return only the last value
var str = ',GIANC,IGORA';
var arr = str.match(/^(?:(?:\,([A-Za-z]{5}))?)+$/).slice(1);
alert(arr);
So the output is 'IGORA' while i would it to be 'GIANC' 'IGORA'
Here is another example
/^([A-Z]{5})(?:(?:\,([A-Za-z]{2}))?)+$/g
test of regexp may have at least 5 chart string but it also can have other 5 chart string separated with a comma so from input
IGORA,CIAOA,POPOP
I would have an array of ["IGORA","CIAOA","POPOP"]
You can capture the words in a capturing surrounded by an optional preceding comma or an optional trailing comma.
You can test the regex here: ,?([A-Za-z]+),?
const pattern = /,?([A-Za-z]+),?/gm;
const str = `,IGORA,GIANC,LOLLI`;
let matches = [];
let match;
// Iterate until no match found
while ((m = pattern.exec(str))) {
// The first captured group is the match
matches.push(m[1]);
}
console.log(matches);
There are other ways to do this, but I found that one of the simple ways is by using the replace method, as it can replace all instances that match that regex.
For example:
var regex = /^(?:(?:\,([A-Za-z]{5}))?)+$/g;
var str = ',GIANC,IGORA';
var arr = [];
str.replace(regex, function(match) {
arr[arr.length] = match;
return match;
});
console.log(arr);
Also, in my code snippet you can see that there is an extra coma in each string, you can solve that by changing line 5 to arr[arr.length] = match.replace(/^,/, '').
Is this what you're looking for?
Explanation:
\b word boundary (starting or ending a word)
\w a word ([A-z])
{5} 5 characters of previous
So it matches all 5-character words but not NANANANA
var str = 'IGORA,CIAOA,POPOP,NANANANA';
var arr = str.match(/\b\w{5}\b/g);
console.log(arr); //['IGORA', 'CIAOA', 'POPOP']
If you only wish to select words separated by commas and nothing else, you can test for them like so:
(?<=,\s*|^) preceded by , with any number of trailing space, OR is the first word in list.
(?=,\s*|$) followed by , and any number of trailing spaces OR is last word in list.
In the following code, POPOP and MOMMA are rejected because they are not separated by a comma, and NANANANA fails because it is not 5 character.
var str = 'IGORA, CIAOA, POPOP MOMMA, NANANANA, MEOWI';
var arr = str.match(/(?<=,\s*|^)\b\w{5}\b(?=,\s*|$)/g);
console.log(arr); //['IGORA', 'CIAOA', 'MEOWI']
If you can't have any trailing spaces after the comma, just leave out the \s* from both (?<=,\s*|^) and (?=,\s*|$).

Separating words with Regex

I am trying to get this result: 'Summer-is-here'. Why does the code below generate extra spaces? (Current result: '-Summer--Is- -Here-').
function spinalCase(str) {
var newA = str.split(/([A-Z][a-z]*)/).join("-");
return newA;
}
spinalCase("SummerIs Here");
You are using a variety of split where the regexp contains a capturing group (inside parentheses), which has a specific meaning, namely to include all the splitting strings in the result. So your result becomes:
["", "Summer", "", "Is", " ", "Here", ""]
Joining that with - gives you the result you see. But you can't just remove the unnecessary capture group from the regexp, because then the split would give you
["", "", " ", ""]
because you are splitting on zero-width strings, due to the * in your regexp. So this doesn't really work.
If you want to use split, try splitting on zero-width or space-only matches looking ahead to a uppercase letter:
> "SummerIs Here".split(/\s*(?=[A-Z])/)
^^^^^^^^^ LOOK-AHEAD
< ["Summer", "Is", "Here"]
Now you can join that to get the result you want, but without the lowercase mapping, which you could do with:
"SummerIs Here" .
split(/\s*(?=[A-Z])/) .
map(function(elt, i) { return i ? elt.toLowerCase() : elt; }) .
join('-')
which gives you want you want.
Using replace as suggested in another answer is also a perfectly viable solution. In terms of best practices, consider the following code from Ember:
var DECAMELIZE_REGEXP = /([a-z\d])([A-Z])/g;
var DASHERIZE_REGEXP = /[ _]/g;
function decamelize(str) {
return str.replace(DECAMELIZE_REGEXP, '$1_$2').toLowerCase();
}
function dasherize(str) {
return decamelize(str).replace(DASHERIZE_REGEXP, '-');
}
First, decamelize puts an underscore _ in between two-character sequences of lower-case letter (or digit) and upper-case letter. Then, dasherize replaces the underscore with a dash. This works perfectly except that it lower-cases the first word in the string. You can sort of combine decamelize and dasherize here with
var SPINALIZE_REGEXP = /([a-z\d])\s*([A-Z])/g;
function spinalCase(str) {
return str.replace(SPINALIZE_REGEXP, '$1-$2').toLowerCase();
}
You want to separate capitalized words, but you are trying to split the string on capitalized words that's why you get those empty strings and spaces.
I think you are looking for this :
var newA = str.match(/[A-Z][a-z]*/g).join("-");
([A-Z][a-z]*) *(?!$|[a-z])
You can simply do a replace by $1-.See demo.
https://regex101.com/r/nL7aZ2/1
var re = /([A-Z][a-z]*) *(?!$|[a-z])/g;
var str = 'SummerIs Here';
var subst = '$1-';
var result = str.replace(re, subst);
var newA = str.split(/ |(?=[A-Z])/).join("-");
You can change the regex like:
/ |(?=[A-Z])/ or /\s*(?=[A-Z])/
Result:
Summer-Is-Here

Why is this regex matching also words within a non-capturing group?

I have this string (notice the multi-line syntax):
var str = ` Number One: Get this
Number Two: And this`;
And I want a regex that returns (with match):
[str, 'Get this', 'And this']
So I tried str.match(/Number (?:One|Two): (.*)/g);, but that's returning:
["Number One: Get this", "Number Two: And this"]
There can be any whitespace/line-breaks before any "Number" word.
Why doesn't it return only what is inside of the capturing group? Am I misundersating something? And how can I achieve the desired result?
Per the MDN documentation for String.match:
If the regular expression includes the g flag, the method returns an Array containing all matched substrings rather than match objects. Captured groups are not returned. If there were no matches, the method returns null.
(emphasis mine).
So, what you want is not possible.
The same page adds:
if you want to obtain capture groups and the global flag is set, you need to use RegExp.exec() instead.
so if you're willing to give on using match, you can write your own function that repeatedly applies the regex, gets the captured substrings, and builds an array.
Or, for your specific case, you could write something like this:
var these = str.split(/(?:^|\n)\s*Number (?:One|Two): /);
these[0] = str;
Replace and store the result in a new string, like this:
var str = ` Number One: Get this
Number Two: And this`;
var output = str.replace(/Number (?:One|Two): (.*)/g, "$1");
console.log(output);
which outputs:
Get this
And this
If you want the match array like you requested, you can try this:
var getMatch = function(string, split, regex) {
var match = string.replace(regex, "$1" + split);
match = match.split(split);
match = match.reverse();
match.push(string);
match = match.reverse();
match.pop();
return match;
}
var str = ` Number One: Get this
Number Two: And this`;
var regex = /Number (?:One|Two): (.*)/g;
var match = getMatch(str, "#!SPLIT!#", regex);
console.log(match);
which displays the array as desired:
[ ' Number One: Get this\n Number Two: And this',
' Get this',
'\n And this' ]
Where split (here #!SPLIT!#) should be a unique string to split the matches. Note that this only works for single groups. For multi groups add a variable indicating the number of groups and add a for loop constructing "$1 $2 $3 $4 ..." + split.
Try
var str = " Number One: Get this\
Number Two: And this";
// `/\w+\s+\w+(?=\s|$)/g` match one or more alphanumeric characters ,
// followed by one or more space characters ,
// followed by one or more alphanumeric characters ,
// if following space or end of input , set `g` flag
// return `res` array `["Get this", "And this"]`
var res = str.match(/\w+\s+\w+(?=\s|$)/g);
document.write(JSON.stringify(res));

Javascript split by spaces but not those in quotes

The goal is to split a string at the spaces but not split the text data that is in quotes or separate that from the adjacent text.
The input is effectively a string that contains a list of value pairs. If the value value contains a space it is enclosed in quotes. I need a function that returns an array of value-pair elements as per the example below:
Example Input:
'a:0 b:1 moo:"foo bar" c:2'
Expected result:
a:0,b:1,moo:foo bar,c:2 (An array of length 4)
I have checked through a load of other questions but none of them (I found) seem to cope with my issue. Most seem to split at the space within the quotes or they split the 'moo:' and 'foo bar' into separate parts.
Any assistance would be greatly appreciated,
Craig
You can use this regex for split:
var s = 'a:0 b:1 moo:"foo bar" c:2';
var m = s.split(/ +(?=(?:(?:[^"]*"){2})*[^"]*$)/g);
//=> [a:0, b:1, moo:"foo bar", c:2]
RegEx Demo
It splits on spaces only if it is outside quotes by using a positive lookahead that makes sure there are even number of quotes after a space.
You could approach it slightly differently and use a Regular Expression to split where spaces are followed by word characters and a colon (rather than a space that's not in a quoted part):
var str = 'a:0 b:1 moo:"foo bar" c:2',
arr = str.split(/ +(?=[\w]+\:)/g);
/* [a:0, b:1, moo:"foo bar", c:2] */
Demo jsFiddle
What's this Regex doing?
It looks for a literal match on the space character, then uses a Positive Lookahead to assert that the next part can be matched:
[\w]+ = match any word character [a-zA-Z0-9_] between one and unlimited times.
\: = match the : character once (backslash escaped).
g = global modifier - don't return on first match.
Demo Regex101 (with explanation)
Any special reason it has to be a regexp?
var str = 'a:0 b:1 moo:"foo bar" c:2';
var parts = [];
var currentPart = "";
var isInQuotes= false;
for (var i = 0; i < str.length, i++) {
var char = str.charAt(i);
if (char === " " && !isInQuotes) {
parts.push(currentPart);
currentPart = "";
} else {
currentPart += char;
}
if (char === '"') {
isInQuotes = !isInQuotes;
}
}
if (currentPart) parts.push(currentPart);

How can I remove all characters up to and including the 3rd slash in a string?

I'm having trouble with removing all characters up to and including the 3 third slash in JavaScript. This is my string:
http://blablab/test
The result should be:
test
Does anybody know the correct solution?
To get the last item in a path, you can split the string on / and then pop():
var url = "http://blablab/test";
alert(url.split("/").pop());
//-> "test"
To specify an individual part of a path, split on / and use bracket notation to access the item:
var url = "http://blablab/test/page.php";
alert(url.split("/")[3]);
//-> "test"
Or, if you want everything after the third slash, split(), slice() and join():
var url = "http://blablab/test/page.php";
alert(url.split("/").slice(3).join("/"));
//-> "test/page.php"
var string = 'http://blablab/test'
string = string.replace(/[\s\S]*\//,'').replace(/[\s\S]*\//,'').replace(/[\s\S]*\//,'')
alert(string)
This is a regular expression. I will explain below
The regex is /[\s\S]*\//
/ is the start of the regex
Where [\s\S] means whitespace or non whitespace (anything), not to be confused with . which does not match line breaks (. is the same as [^\r\n]).
* means that we match anywhere from zero to unlimited number of [\s\S]
\/ Means match a slash character
The last / is the end of the regex
var str = "http://blablab/test";
var index = 0;
for(var i = 0; i < 3; i++){
index = str.indexOf("/",index)+1;
}
str = str.substr(index);
To make it a one liner you could make the following:
str = str.substr(str.indexOf("/",str.indexOf("/",str.indexOf("/")+1)+1)+1);
You can use split to split the string in parts and use slice to return all parts after the third slice.
var str = "http://blablab/test",
arr = str.split("/");
arr = arr.slice(3);
console.log(arr.join("/")); // "test"
// A longer string:
var str = "http://blablab/test/test"; // "test/test";
You could use a regular expression like this one:
'http://blablab/test'.match(/^(?:[^/]*\/){3}(.*)$/);
// -> ['http://blablab/test', 'test]
A string’s match method gives you either an array (of the whole match, in this case the whole input, and of any capture groups (and we want the first capture group)), or null. So, for general use you need to pull out the 1th element of the array, or null if a match wasn’t found:
var input = 'http://blablab/test',
re = /^(?:[^/]*\/){3}(.*)$/,
match = input.match(re),
result = match && match[1]; // With this input, result contains "test"
let str = "http://blablab/test";
let data = new URL(str).pathname.split("/").pop();
console.log(data);

Categories

Resources