Regex keeps trailing slash in capture group - javascript

So, I've tried making a little Regex expression to fetch the requested URL, minus the starting an trailing slash.
One little catch, the trailing slash wont always be there, for instance if the user requests "/test/example/", they can also request "/test/example". So I tried to make a method to handle that:
req.url.match(/^(?:\/)(.+)(?:[\/])?$/i)[1]
Although, if I request a path like "/test/example/", it keeps the trailing slash, and returns "test/example/" in the capture group...? Basically what I wanted to avoid. (So, all it's doing is removing the starting slash)
Now, I tried removing the ? that's next to the $ symbol. But this just causes an error when requesting "/test/example" (something without the trailing slash), because [1] would be null.
I made an example on regex101, which you can view here. As you can see, the capture group includes the ending slash, even though in my expression, I thought I told it to not do that.
TL;DR: Regex is still capturing trailing slash, even though I don't want to do (and keep in mind that the trailing slash wont always be present).
To clairify, I want the regex to do this:
"/test/example/" to "test/example"
and
"/test/example" to "test/example"
(So, removing the starting and trailing slash, but the trailing slash is optional)

You need to make the regex less greedy. Add 2 ?s:
^(?:\/)?(.*?)(?:[\/])?$
See updated regex here.

use this pattern and replace with nothing
^\/|\/$
Demo

Javascript's regex engine is not the strongest.
I had an issue just like this and I ended up doing the regex in two steps for better readability and predictability.
var matches = req.url.match(/^(?:\/)(.+)(?:[\/])?$/i)
if(matches.length){
var my_url = matches[1].replace(/(^\/|\/$)/g,'') // Removes start and ending slashes
}else{
var my_url = 'something_else'
}

Related

JavaScript regex to capture everything after the second to last backslash

I have a regex that looks like the following currently:
/^.*[\\\/]/
This will strip every single backslash from a string. The problem I'm facing is I have to now be able to capture everything from the second to last backslash.
Example:
/Users/foo/a/b/c would return b/c
/Another/example/ would return Another/Example
So I need to capture everything after the second to last backslash. How would the regex above do that?
Try with this simple solution:
s = "aaaa/bbbb/cccc/dddd";
s.split("/").slice(-2).join("/"); /* It will return "cccc/dddd" */
I assume that you mean forward slash, not backslash.
Here is a regex alternative to pierlauro's answer.
/([^\/]+\/[^\/]+)\/?$/
Regex101
As pierlauro's answer shows, split, join, and slice are probably the best options for this. But if you MUST use a regex (not sure why), you could employ something like the following:
\/?(\[^\/\]+?\/\[^\/\]+)\/?$
This regex accommodates for optional trailing slashes and for urls shorter than 2 /s. It leverages the $ character to focus our search scope on the end of the string.

regex for ng-pattern for filepath

I have arrived at a regex for file path that has these conditions,
Must match regex ^(\\\\[^\\]+\\[^\\]+|https?://[^/]+), so either something like \server\share (optionally followed by one or more "\folder"s), or an HTTP(S) URL
Cannot contain any invalid path name chars( ",<,>, |)
How can i get a single regex to use in angular.js that meets these conditions
Your current regex doesn't seem to match what you want. But given it is correctly doing what you want, then this will add the negation :
^(?!.*[ "<>|])(\\\\[^\\]+\\[^\\]+|https?://[^/]+)
Here we added a negative lookahead to see if any characters are in the string which we will fail the match. If we find none, then the rest of the regular expression will continue.
If I understand your requirements correctly, you could probably do this :
^(?!.*[ "<>|])(\\\\|https?://).*$
This will still not match any invalid characters defined in the negative lookahead, and also meets your criteria of matching one or more path segments, as well as http(s) and is much simpler.
The caviate is that if you require 2 or more path segments, or a trailing slash on the url, than this will not work. This is what your regex seems to suggest.
So in that case this is still somewhat cleaner than the original
^(?!.*[ "<>|])(\\\\[^\\]+\\.|https?://[^/]+/).*$
One more point. You ask to match \server\share, yet your regex opens with \\\\. I have assumed that \server\share should be \\server\share and wrote the regex's accordingly. If this is not the case, then all instances of \\\\ in the examples I gave should be changed to \\
Ok, first the regex, than the explanation:
(?<folderorurl>(?<folder>(\\[^\\\s",<>|]+)+)|(?<url>https?:\/\/[^\s]+))
Your first condition is to match a folder name which must not contain any character from ",<>|" nor a whitespace. This is written as:
[^\s,<>|] # the caret negates the character class, meaning this must not be matched
Additionally, we want to match a folder name optionally followed by another
(sub)folder, so we have to add a backslash to the character class:
[^\\\s,<>|] # added backslash
Now we want to match as many characters as possible but at minimum one, this is what the plus sign is for (+). With this in mind, consider the following string:
\server\folder
At the moment, only "server" is matched, so we need to prepend a backslash, thus "\server" will be matched. Now, if you break a filepath down, it always consists of a backslash + somefoldername, so we need to match backslash + somefoldername unlimited times (but minimum one):
(\\[^\\\s",<>|]+)+
As this is getting somewhat unreadable, I have used a named capturing group ((?<folder>)):
(?<folder>(\\[^\\\s",<>|]+)+)
This will match everything like \server or \server\folder\subfolder\subfolder and store it in the group called folder.
Now comes the URL part. A URL consists of http or https followed by a colon, two forward slashes and "something afterwards":
https?:\/\/[^\s]+ # something afterwards = .+, but no whitespaces
Following the explanation above this is stored in a named group called "url":
(?<folder>(\\[^\\\s",<>|]+)+)
Bear in mind though, that this will match even non-valid url strings (e.g. https://www.google.com.256357216423727...), if this is ok for you, leave it, if not, you may want to have a look at this question here on SO.
Now, last but not least, let's combine the two elements with an or, store it in another named group (folderorurl) and we are done. Simple, right?
(?<folderorurl>(?<folder>(\\[^\\\s",<>|]+)+)|(?<url>https?:\/\/[^\s]+))
Now the folder or a URL can be found in the folderorurl group while still saving the parts in url or folder. Unfortunately, I do know nothing about angular.js but the regex will get you started. Additionally, see this regex101 demo for a working fiddle.
Must match regex ^(\\\\[^\\]+\\[^\\]+|https?://[^/]+), so either something like \\server\share (optionally followed by one or more
"\folder"s), or an HTTP(S) URL
Cannot contain any invalid path name chars( ",<,>, |)
To introduce the second condition in your regex, you mainly just have to include the invalid characters in the negated character sets, e. g. instead of [^/] use [^/"<>|].
Here's a working example with a slightly rearranged regex:
paths = [ '\\server\\share',
'\\\\server\\share',
'\\\\server\\share\\folder',
'http://www.invalid.de',
'https://example.com',
'\\\\<server\\share',
'https://"host.com',
'\\\\server"\\share',
]
for (i in paths)
{
document.body.appendChild(document.createTextNode(paths[i]+' '+
/^\\(\\[^\\"<>|]+){2,}$|^https?:\/\/[^/"<>|]+$/.test(paths[i])))
document.body.appendChild(document.createElement('br'))
}

Javascript Regex: how to simulate "match without capture" behavior of positive lookbehind?

I have a relatively simple regex problem - I need to match specific words in a string, if they are entire words or a prefix. With word boundaries, it would look something like this:
\b(word1|word2|prefix1|prefix2)
However, I can't use the word boundary condition because some words may start with odd characters, e.g. .999
My solution was to look for whitespace or starting token for these odd cases.
(\b|^|\s)(word1|word2|prefix1|prefix2)
Now words like .999 will still get matched correctly, BUT it also captures the whitespace preceding the matched words/prefixes. For my purposes, I can't have it capture the whitespace.
Positive lookbehinds seem to solve this, but javascript doesn't support them. Is there some other way I can get the same behavior to solve this problem?
You can use a non-capturing group using (?:):
/(?:\b|^|\s)(word1|word2|prefix1|prefix2)/
UPDATE:
Based on what you want to replace it with (and #AlanMoore's good point about the \b), you probably want to go with this:
var regex = /(^|\s)(word1|word2|prefix1|prefix2)/g;
myString.replace(regex,"$1<span>$2</span>");
Note that I changed the first group back to a capturing one since it'll be part of the match but you want to keep it in the replacement string (right?). Also added the g modifier so that this happens for all occurrences in the string (assuming thats what you wanted).
Let's get the terminology straight first. A regex normally consumes everything it matches. When you do a replace(), everything that was consumed is overwritten. You can also capture parts of the matched text separately and plug them back in using $1, $2, etc.
When you were using the word boundary you didn't have to worry about this, because \b doesn't consume anything. But now you're consuming the leading whitespace character if there is one, so you have to plug it back in. I don't know what you're replacing the match with, so I'll just replace them with nothing for this demonstration.
result = subject.replace(/(^|\s)(word1|word2|prefix1|prefix2)/g, "$1");
Note that the \b isn't needed any more. In fact, you must remove it, or it will match things like .999 in xyz.999, because \b matches between z and .. I'm pretty sure you don't want that.

Javascript regular Expressions cut off last parameters between slashes

Hi I'm new to regex and can't find a proper solution for this:
I want to cut off the last parameters of the url between the two slashes "/1032/" form url "http://www.blablabla.com/test.php/addpage/1032/"
That i have just http://www.blablabla.com/test.php/addpage/ ...though this part is not important of being matched...so just cut off the parameters between the last slashes...
What i did was:
curr_url= "http://www.blablabla.com/test.php/addpage/1032/";
expression =/.*\/./;
alert(expression.exec(curr_url));
Result is "http://www.blablabla.com/test.php/addpage/1"
Now i could cut off the last parameter by a slice but thats not reasonable i guess
Any better solutions? Thanks a lot!
curr_url.match(/.+(\/.+\/)$/)
["http://www.blablabla.com/test.php/addpage/1032/", "/1032/"]
First greedy .+ captures everything to last slash and then backs off to next-to-last, because rest of pattern would fail otherwise. () capture everything between this slash and last an then \/$ at the end tell that string should end after last slash. Move slashes outside brackets if you only want number itself.
It seems though I misinterpreted your intent. If you need first part of string, you can use regexp suggested bellow with a little change:
curr_url.match(/(.+)\/.+/)[1]
It captures "everything until slash, and then some more". It will first reach last slash in string, but then back off to previous slash, because there's not "then some more", thus leaving exactly the part of string you want.
var parts=curr_url.split('/');
After that you can get any part from the array.
In your expression
/.*\/./
you need the last dot to tell the pattern that there is a character after the slash following, but this is also matched and therefore in your result. If you remove it it will match till the last slash, so no solution.
But there is a way to define what should be after the pattern and to not match that stuff. Its called a look ahead assertion.
So you could do
/.*\/(?!$)/
and (?!$) is a negative lookahead assertion that means "Match the previous slash only if the end of the string (the $) is not following".
curr_url= "http://www.blablabla.com/test.php/addpage/1032/";
expression =/.*\/(?!$)/;
alert(expression.exec(curr_url));
Would return also what you want.

How can I shorten this regex for JavaScript?

Basically I just want it to match anything inside (). I tried the . and * but they don't seem to work. Right now my regex looks like:
\(([\\\[\]\-\d\w\s/*\.])+\)
The strings it's going to match are URL routes like:
#!/foo/bar/([a-z])/([\d\w])/(*)
In this example, my regex above matches:
([a-z])
([\d\w])
(*)
BONUS:
How can I make it so that it only matches when it starts with a ( and ends with a ). I thought I used the ^ at the front where it's \( and the $ and the end where it's \) but no luck.
Disregard this bonus. I didnt realize it didnt matter...
Are you worried about nested parentheses? If not, you could set it up to match all characters that aren't a closing paren:
\(([^)]*)\)
Basically I just want it to match anything inside ().
BONUS: How can I make it so that it only matches when it starts with a ( and ends with a )?
Easy peasy.
var re1 = /^\(.*\)$/
// or
var re2 = new RegExp('^\\(.*\\)$');
Edit
Re: #Mike Samuel's comments
Does not match newlines between the parentheses which were explicitly matched by \s in the original.
...
Maybe you should use [\s\S] instead of .
...
If you're going to exclude newlines you should do so intentionally or explicitly.
Note that . matches any single character except the newline character. If you also want to match newlines as part of the "anything" between parentheses, use the [\s\S] character class:
var re3 = /^\([\s\S]*\)$/
// or
var re4 = new RegExp('^\\([\\s\\S]*\\)$');
To negate a match, you use the [^...] construct. Thus, to match anything within parentheses, you would use:
\([^)]+\)
which says "match any string that starts with an open parenthesis, contains any number of characters that are not closing parentheses and ends with a closing parenthesis.
To match entire lines that match the above construct, just wrap it with ^ and $:
^\([^)]+\)$
I'm not completely sure I understand what you're doing, but try this:
var re = /\/(\([^()]+\)(?=\/|$)/;
Matching the leading slash in addition to the opening paren ensures that the paren is indeed at the beginning. You can't do the same thing at the end because you don't know there will be a trailing slash. And if there is one, you don't want to consume it because it's also the leading slash for the next match attempt.
Instead, you use the lookahead - (?=\/|$) - to match the trailing slash without consuming it. If there is no slash, I assume no other character should be present either--hence the anchor: $.
#patorjk brought up a good point, though: can there be more parentheses between the outermost pair? If there are, the problem is much more complicated. I won't bother trying to expand my regex to deal with nested parens; some regex flavors can handle such things, but not JavaScript. Instead I'll recommend this sloppier regex:
\/(\([\s\S]+?\))(?=\/|$)
I say "sloppy" because it relies on the assumption that the sequences /( and )/ will never appear inside a valid match. As with my first regex, the text that you're interested in (i.e., everything but the leading and trailing slashes) will be captured in group #1.
Notice the non-greedy quantifier, too. With a regular greedy quantifier it will match everything from the first ( to the last ) in one shot. In other words, it'll match ([a-z])/([\d\w])/(*) instead of ([a-z]), ([\d\w]) and (*) as you wanted.

Categories

Resources