Regex to match specific path with specific query param - javascript

I'm struggling with creating regex to match URL path with query param that could be in any place.
For example URLs could be:
/page?foo=bar&target=1&test=1 <- should match
/page?target=1&test=2 <- should match
/page/nested?foo=bar&target=1&test=1 <- should NOT match
/page/nested?target=1&test=2 <- should NOT match
/another-page?foo=bar&target=1&test=1 <- should NOT match
/another-page?target=1&test=2 <- should NOT match
where I need to target param target specifically on /page
This regex works only to find the param \A?target=[^&]+&*.
Thanks!
UPDATE:
It is needed for a third-party tool that will decide on which page to run an experiment. It only accepts setup on their dashboard with regular experssion so I cannot use code tools like URL parser.

General rule is that if you want to parse params, use URL parser, not a custom regex.
In this case you can use for instance:
# http://a.b/ is just added to make URL parsing work
url = new URL("http://a.b/page?foo=bar&target=1&test=1")
url.searchParams.get("target")
# => 1
url.pathname
# => '/page'
And then check those values in ifs:
url = new URL("http://a.b/page?foo=bar&target=1&test=1")
url = new URL("http://a.b/page?foo=bar&target=1&test=1")
if (url.searchParams.get("foo") && url.pathname == '/page' {
# ...
}
See also:
https://developer.mozilla.org/en-US/docs/Web/API/URLSearchParams
https://developer.mozilla.org/en-US/docs/Web/API/URL
EDIT
If you have to use regex try this one:
\/page(?=\?).*[?&]target=[^&\s]*(&|$)
Demo
Explanation:
\/page(?=\?) - matches path (starts with / then page then lookahead for ?)
.*[?&]target=[^&\s]*($|&) matches param name target:
located anywhere (preceded by anything .*)
[?&] preceded with ? or &
followed by its value (=[^&\s]*)
ending with end of params ($) or another param (&)

If you're looking for a regex then you may use:
/\/page\?(?:.*&)?target=[^&]*/i
RegEx Demo
RegEx Details:
\/page\?: Match text /page?:
(?:.*&)?: Match optional text of any length followed by &
target=[^&]*: Match text target= followed by 0 or more characters that are not &

Related

javascript regex insert new element into expression

I am passing a URL to a block of code in which I need to insert a new element into the regex. Pretty sure the regex is valid and the code seems right but no matter what I can't seem to execute the match for regex!
//** Incoming url's
//** url e.g. api/223344
//** api/11aa/page/2017
//** Need to match to the following
//** dir/api/12ab/page/1999
//** Hence the need to add dir at the front
var url = req.url;
//** pass in: /^\/api\/([a-zA-Z0-9-_~ %]+)(?:\/page\/([a-zA-Z0-9-_~ %]+))?$/
var re = myregex.toString();
//** Insert dir into regex: /^dir\/api\/([a-zA-Z0-9-_~ %]+)(?:\/page\/([a-zA-Z0-9-_~ %]+))?$/
var regVar = re.substr(0, 2) + 'dir' + re.substr(2);
var matchedData = url.match(regVar);
matchedData === null ? console.log('NO') : console.log('Yay');
I hope I am just missing the obvious but can anyone see why I can't match and always returns NO?
Thanks
Let's break down your regex
^\/api\/ this matches the beginning of a string, and it looks to match exactly the string "/api"
([a-zA-Z0-9-_~ %]+) this is a capturing group: this one specifically will capture anything inside those brackets, with the + indicating to capture 1 or more, so for example, this section will match abAB25-_ %
(?:\/page\/([a-zA-Z0-9-_~ %]+)) this groups multiple tokens together as well, but does not create a capturing group like above (the ?: makes it non-captuing). You are first matching a string exactly like "/page/" followed by a group exactly like mentioned in the paragraph above (that matches a-z, A-Z, 0-9, etc.
?$ is at the end, and the ? means capture 0 or more of the precending group, and the $ matches the end of the string
This regex will match this string, for example: /api/abAB25-_ %/page/abAB25-_ %
You may be able to take advantage of capturing groups, however, and use something like this instead to get similar results: ^\/api\/([a-zA-Z0-9-_~ %]+)\/page\/\1?$. Here, we are using \1 to reference that first capturing group and match exactly the same tokens it is matching. EDIT: actually, this probably won't work, since the text after /api/ and the text after /page/ will most likely be different, carrying on...
Afterwards, you are are adding "dir" to the beginning of your search, so you can now match someting like this: dir/api/abAB25-_ %/page/abAB25-_ %
You have also now converted the regex to a string, so like Crayon Violent pointed out in their comment, this will break your expected funtionality. You can fix this by using .source on your regex: var matchedData = url.match(regVar.source); https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/source
Now you can properly match a string like this: dir/api/11aa/page/2017 see this example: https://repl.it/Mj8h
As mentioned by Crayon Violent in the comments, it seems you're passing a String rather than a regular expression in the .match() function. maybe try the following:
url.match(new RegExp(regVar, "i"));
to convert the string to a regular expression. The "i" is for ignore case; don't know that's what you want. Learn more here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

Get everything except match in javascript regular expression

I have the following regex to get the first part after a url:
^http[s]?:\/\/.*?\/([a-zA-Z-_.%]+).*$
It matches test in the below urls:
foo.com
http://foo.com
http://foo.com/test
http://foo.com/test/
http://foo.com/test?bar
What I'm now trying to do is recreate the same url, but replace test with a different value. Either by taking the parts before and after the match or reversing the result.
I'm sure there's a regexy way of doing this, but I'm unable to find out how to do so.
You can use a capturing group for part before /test and use it as back-reference in replacement:
var re = /^(https?:\/\/[^\/]+\/)[^?\/]+/gmi;
var subst = '$1foobar';
var result = str.replace(re, subst);
[^?\/]+ will match text before next / or ? after domain name in URL. As your original regex it also assumes that URLs start with http:// or https://.
RegEx Demo

What RegEx would clean up this set of inputs?

I'm trying to figure out a RegEx that would match the following:
.../string-with-no-spaces -> string-with-no-spaces
or
string-with-no-spaces:... -> string-with-no-spaces
or
.../string-with-no-spaces:... -> string-with-no-spaces
where ... can be anything in these example strings:
example.com:8080/string-with-no-spaces:latest
string-with-no-spaces:latest
example.com:8080/string-with-no-spaces
string-with-no-spaces
and a bonus would be
http://example.com:8080/string-with-no-spaces:latest
and all would match string-with-no-spaces.
Is it possible for a single RegEx to cover all those cases?
So far I've gotten as far as /\/.+(?=:)/ but that not only includes the slash, but only works for case 3. Any ideas?
Edit: Also I should mention that I'm using Node.js, so ideally the solution should pass all of these: https://jsfiddle.net/ys0znLef/
How about:
(?:.*/)?([^/:\s]+)(?::.*|$)
Consider the following solution using specific regex pattern and String.match function:
var re = /(?:[/]|^)([^/:.]+?)(?:[:][^/]|$)/,
// (?:[/]|^) - passive group, checks if the needed string is preceded by '/' or is at start of the text
// (?:[:][^/]|$) - passive group, checks if the needed string is followed by ':' or is at the end of the text
searchString = function(str){
var result = str.match(re);
return result[1];
};
console.log(searchString("example.com:8080/string-with-no-spaces"));
console.log(searchString("string-with-no-spaces:latest"));
console.log(searchString("string-with-no-spaces"));
console.log(searchString("http://example.com:8080/string-with-no-spaces:latest"));
The output for all the cases above will be string-with-no-spaces
Here's the expression I've got... just trying to tweak to use the slash but not include it.
Updated result works in JS
\S([a-zA-Z0-9.:/\-]+)\S
//works on regexr, regex storm, & regex101 - tested with a local html file to confirm JS matches strings
var re = /\S([a-zA-Z0-9.:/\-]+)\S/;

Regex - Extract digits from a url

I have this url:
http://example.com/things/stuff/532453?morethings&stuff=things&ver=1
I need just that number in the middle there. Closest I got was
(\d*)?\?
but this includes the question mark. Basiclly all numbers that come before the ? all the way to the slash so the ouput is 532453.
Try the following regex (?!\/)\d+(?=\?):
url = "http://example.com/things/stuff/532453?morethings&stuff=things"
url.match(/(?!\/)\d+(?=\?)/) # outputs 532453
This regex will attempt to match any series of digits only after a / and before ? by using negative/positive lookahead without returning the / or ? as part of the match.
A quick test within developer tools:
# create a list of example urls to test against (only one should match regex)
urls = ["http://example.com/things/stuff/532453?morethings&stuff=things",
"http://example.com/things/stuff?morethings&stuff=things",
"http://example.com/things/stuff/123a?morethings&stuff=things"]
urls.forEach(function(value) {
console.log(value.match(/(?!\/)\d+(?=\?)/));
})
# returns the following:
["532453"]
null
null
Just use this:
([\d]+)
You can check this link out: https://regex101.com/r/hR2eY7/1
if you use javascript:
/([\d]+)/g
Try this :
url = "http://example.com/things/stuff/532453?morethings&stuff=things"
number = url.match(/(\d+)\?/g)[0].slice(0,-1)
Though the approach is slightly naive, it works. It grabs numbers with ? at the end then removes the ? from the end using slice.

Make AngularJS routes named groups non-greedy

Given the following route:
$routeProvider.when('/users/:userId-:userEncodedName', {
...
})
When hitting the URL /users/42-johndoe, the $routeParams are initialized as expected:
$routeParams.userId // is 42
$routeParams.userEncodedName // is johndoe
But when hitting the URL /users/42-john-doe, the $routeParam are initialized as follow:
$routeParams.userId // is 42-john
$routeParams.userEncodedName // is doe
Is there any way to make the named groups non-greedy, i.e. to obtain the following $routeParams:
$routeParams.userId // is 42
$routeParams.userEncodedName // is john-doe
?
You can change the path
from
$routeProvider.when('/users/:userId-:userEncodedName', {});
to
$routeProvider.when('/users/:userId*-:userEncodedName', {})
As stated in the AngularJS Documentation regarding $routeProviders, path property:
path can contain named groups starting with a colon and ending with a
star: e.g.:name*. All characters are eagerly stored in $routeParams
under the given name when the route matches.
Oddly enough Ryeballar's answer does indeed work (as is demonstrated in this short demo). I say "oddly enough", because based on the docs ("[...] characters are eagerly stored [...]"), I would expect it to work exactly the opposite way.
So, out of curiosity, I did some digging into the source code (v1.2.16) and it turns out that by a strange coincidence it indeed works. (Actually, this looks more like an inconsistency in the way route-paths are parsed).
The pathRegExp() function is responsible for converting the route path template into a regular expression, which is later used to match against the actual route paths.
The code that converts the route path template string into a RegExp pattern is the following:
path = path
.replace(/([().])/g, '\\$1')
.replace(/(\/)?:(\w+)([\?\*])?/g, function(_, slash, key, option){
var optional = option === '?' ? option : null;
var star = option === '*' ? option : null;
...
slash = slash || '';
return ''
+ (optional ? '' : slash)
+ '(?:'
+ (optional ? slash : '')
+ (star && '(.+?)' || '([^/]+)')
+ (optional || '')
+ ')'
+ (optional || '');
})
.replace(/([\/$\*])/g, '\\$1');
Based on the code above, the two route path templates (with and without *) end up in the following (totally different) regular expressions:
'/test/:param1-:param2' ==> '\/test\/(?:([^\/]+))-(?:([^\/]+))'
'/test/:param1*-:param2' ==> '\/test\/(?:(.+?))-(?:([^\/]+))'
So, what does each RegExp mean ?
/test/(?:([^/]+))-(?:([^/]+))
Let's break this up:
\/test\/: Match the string '/test/'.
(?:([^\/]+)) is equivalent to ([^\/]+) with the difference that we tell the RegExp engine not to store the capturing group's backreference.
([^\/]+): Match any sequence of 1 or more characters that does not contain /. By default, the RegExp engine will try to match as many characters as possible, as long as the rest of the string can match the remaining pattern (-(?:([^\/]+))).
Since the minimum substring that matches -(?:([^\/]+)) is -doe, :param2 will be matched to doe and :param1 to 42-john.
/test/(?:(.+?))-(?:([^/]+))
Let's break this up:
\/test\/: Match the string '/test/'.
(?:(.+?)) is equivalent to (.+?) with the difference that we tell the RegExp engine not to store the capturing group's backreference.
(.+?): Non-greedily match any sequence of 1 or more characters (any characters), as long as the rest of the string can match the remaining pattern (-(?:([^\/]+))). The key here is the ? following .+ which adds the non-greedy behaviour.
Since the minimum substring that matches (.+?) (and on the same time let the rest of the string match -(?:([^\/]+))) is 42, :param1 will be matched to 42 and :param2 to john-doe.
I hope this makes sense. Feel free to leave a comment if it doesn't :)

Categories

Resources