Javascript RegExp matching returning too many - javascript

I need to take a string and get some values from it. I have this string:
'tab/tab2/tab3'
The '/tab3' is optional so this string should also work:
'tab/tab2'
I currently am trying this which works for the most part:
'tab/tab2/tab3'.match(new RegExp('^tab/([%a-zA-Z0-9\-\_\s,]+)(/([%a-zA-Z0-9-_s,]+)?)$'));
This will return:
["tab/tab2/tab3", "tab2", "/tab3", "tab3"]
but I want it to return
["tab/tab2/tab3", "tab2", "tab3"]
So I need to get rid of the 3rd index item ("/tab3") and also get it to work with just the 'tab/tab2' string.
To complicate it even more, I only have control over the /([%a-zA-Z0-9-_s,]+)? part in the last grouping meaning it will always wrap in a grouping.

you don't need regex for this, just use split() method:
var str = 'tab/tab2/tab3';
var arr = str.split('/');
console.log(arr[0]); //tab
console.log(arr[1]); //tab2
jsfiddle

I used this regexp to do this:
'tab/tab2/tab3'.match(new RegExp('^tab/([%a-zA-Z0-9\-\_\s,]+)(?:/)([%a-zA-Z0-9-_s,]+)$'));
Now I get this return
["tab/tab2/tab3", "tab2", "tab3"]
Now I just need to allow 'tab/tab2' to be accepted aswell...

Do not put regex between " or ', using /g to make global search else only first occurrence is returned
"tab/tab2/tab3".match(/tab[0-9]/g)

Related

Extract specific chars from a string using a regex

I need to split an email address and take out the first character and the first character after the '#'
I can do this as follows:
'bar#foo'.split('#').map(function(a){ return a.charAt(0); }).join('')
--> bf
Now I was wondering if it can be done using a regex match, something like this
'bar#foo'.match(/^(\w).*?#(\w)/).join('')
--> bar#fbf
Not really what I want, but I'm sure I miss something here! Any suggestions ?
Why use a regex for this? just use indexOf to get the char at any given position:
var addr = 'foo#bar';
console.log(addr[0], addr[addr.indexOf('#')+1])
To ensure your code works on all browsers, you might want to use charAt instead of []:
console.log(addr.charAt(0), addr.charAt(addr.indexOf('#')+1));
Either way, It'll work just fine, and This is undeniably the fastest approach
If you are going to persist, and choose a regex, then you should realize that the match method returns an array containing 3 strings, in your case:
/^(\w).*?#(\w)/
["the whole match",//start of string + first char + .*?# + first string after #
"groupw 1 \w",//first char
"group 2 \w"//first char after #
]
So addr.match(/^(\w).*?#(\w)/).slice(1).join('') is probably what you want.
If I understand correctly, you are quite close. Just don't join everything returned by match because the first element is the entire matched string.
'bar#foo'.match(/^(\w).*?#(\w)/).splice(1).join('')
--> bf
Using regex:
matched="",
'abc#xyz'.replace(/(?:^|#)(\w)/g, function($0, $1) { matched += $1; return $0; });
console.log(matched);
// ax
The regex match function returns an array of all matches, where the first one is the 'full text' of the match, followed by every sub-group. In your case, it returns this:
bar#f
b
f
To get rid of the first item (the full match), use slice:
'bar#foo'.match(/^(\w).*?#(\w)/).slice(1).join('\r')
Use String.prototype.replace with regular expression:
'bar#foo'.replace(/^(\w).*#(\w).*$/, '$1$2'); // "bf"
Or using RegEx
^([a-zA-Z0-9])[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+#([a-zA-Z0-9-])[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$
Fiddle

JS / RegEx to remove characters grouped within square braces

I hope I can explain myself clearly here and that this is not too much of a specific issue.
I am working on some javascript that needs to take a string, find instances of chars between square brackets, store any returned results and then remove them from the original string.
My code so far is as follows:
parseLine : function(raw)
{
var arr = [];
var regex = /\[(.*?)]/g;
var arr;
while((arr = regex.exec(raw)) !== null)
{
console.log(" ", arr);
arr.push(arr[1]);
raw = raw.replace(/\[(.*?)]/, "");
console.log(" ", raw);
}
return {results:arr, text:raw};
}
This seems to work in most cases. If I pass in the string [id1]It [someChar]found [a#]an [id2]excellent [aa]match then it returns all the chars from within the square brackets and the original string with the bracketed groups removed.
The problem arises when I use the string [id1]It [someChar]found [a#]a [aa]match.
It seems to fail when only a single letter (and space?) follows a bracketed group and starts missing groups as you can see in the log if you try it out. It also freaks out if i use groups back to back like [a][b] which I will need to do.
I'm guessing this is my RegEx - begged and borrowed from various posts here as I know nothing about it really - but I've had no luck fixing it and could use some help if anyone has any to offer. A fix would be great but more than that an explanation of what is actually going on behind the scenes would be awesome.
Thanks in advance all.
You could use the replace method with a function to simplify the code and run the regexp only once:
function parseLine(raw) {
var results = [];
var parsed = raw.replace(/\[(.*?)\]/g, function(match,capture) {
results.push(capture);
return '';
});
return { results : results, text : parsed };
}
The problem is due to the lastIndex property of the regex /\[(.*?)]/g; not resetting, since the regex is declared as global. When the regex has global flag g on, lastIndex property of RegExp is used to mark the position to start the next attempt to search for a match, and it is expected that the same string is fed to the RegExp.exec() function (explicitly, or implicitly via RegExp.test() for example) until no more match can be found. Either that, or you reset the lastIndex to 0 before feeding in a new input.
Since your code is reassigning the variable raw on every loop, you are using the wrong lastIndex to attempt the next match.
The problem will be solved when you remove g flag from your regex. Or you could use the solution proposed by Tibos where you supply a function to String.replace() function to do replacement and extract the capturing group at the same time.
You need to escape the last bracket: \[(.*?)\].

Is it possible to cut off the beginning of a string using regex?

I have a string which contains a path, such as
/foo/bar/baz/hello/world/bla.html
Now, I'd like to get everything from the second-last /, i.e. the result shall be
/world/bla.html
Is this possible using a regex? If so, how?
My current solution is to split the string into an array, and join its last two members again, but I'm sure that there is a better solution than this.
For example:
> '/foo/bar/baz/hello/world/bla.html'.replace(/.*(\/.*\/.*)/, "$1")
/world/bla.html
You can also do
str.split(/(?=\/)/g).slice(-2).join('')
> '/foo/bar/baz/hello/world/bla.html'.match(/(?:\/[^/]+){2}$/)[0]
"/world/bla.html"
Without regular expression:
> var s = '/foo/bar/baz/hello/world/bla.html';
> s.substr(s.lastIndexOf('/', s.lastIndexOf('/')-1))
"/world/bla.html"
I think this will work:
var str = "/foo/bar/baz/hello/world/bla.html";
alert( str.replace( /^.*?(\/[^/]*(?:\/[^/]*)?)$/, "$1") );
This will allow for there being possibly only one last part (like, "foo/bar").
You can use /(\/[^\/]*){2}$/ which selects a slash and some content twice followed by the end of the string.
See this regexplained.

javascript regex to extract the first character after the last specified character

I am trying to extract the first character after the last underscore in a string with an unknown number of '_' in the string but in my case there will always be one, because I added it in another step of the process.
What I tried is this. I also tried the regex by itself to extract from the name, but my result was empty.
var s = "XXXX-XXXX_XX_DigitalF.pdf"
var string = match(/[^_]*$/)[1]
string.charAt(0)
So the final desired result is 'D'. If the RegEx can only get me what is behind the last '_' that is fine because I know I can use the charAt like currently shown. However, if the regex can do the whole thing, even better.
If you know there will always be at least one underscore you can do this:
var s = "XXXX-XXXX_XX_DigitalF.pdf"
var firstCharAfterUnderscore = s.charAt(s.lastIndexOf("_") + 1);
// OR, with regex
var firstCharAfterUnderscore = s.match(/_([^_])[^_]*$/)[1]
With the regex, you can extract just the one letter by using parentheses to capture that part of the match. But I think the .lastIndexOf() version is easier to read.
Either way if there's a possibility of no underscores in the input you'd need to add some additional logic.

How to read all string inside parentheses using regex

I wanted to get all strings inside a parentheses pair. for example, after applying regex on
"fun('xyz'); fun('abcd'); fun('abcd.ef') { temp('no'); "
output should be
['xyz','abcd', 'abcd.ef'].
I tried many option but was not able to get desired result.
one option is
/fun\((.*?)\)/gi.exec("fun('xyz'); fun('abcd'); fun('abcd.ef')").
Store the regex in a variable, and run it in a loop...
var re = /fun\((.*?)\)/gi,
string = "fun('xyz'); fun('abcd'); fun('abcd.ef')",
matches = [],
match;
while(match = re.exec(string))
matches.push(match[1]);
Note that this only works for global regex. If you omit the g, you'll have an infinite loop.
Also note that it'll give an undesired result if there a ) between the quotation marks.
You can use this code will almost do the job:
"fun('xyz'); fun('abcd'); fun('abcd.ef')".match(/'.*?'/gi);
You'll get ["'xyz'", "'abcd'", "'abcd.ef'"] which contains extra ' around the string.
The easiest way to find what you need is to use this RegExp: /[\w.]+(?=')/g
var string = "fun('xyz'); fun('abcd'); fun('abcd.ef')";
string.match(/[\w.]+(?=')/g); // ['xyz','abcd', 'abcd.ef']
It will work with alphanumeric characters and point, you will need to change [\w.]+ to add more symbols.

Categories

Resources