Make AngularJS routes named groups non-greedy - javascript

Given the following route:
$routeProvider.when('/users/:userId-:userEncodedName', {
...
})
When hitting the URL /users/42-johndoe, the $routeParams are initialized as expected:
$routeParams.userId // is 42
$routeParams.userEncodedName // is johndoe
But when hitting the URL /users/42-john-doe, the $routeParam are initialized as follow:
$routeParams.userId // is 42-john
$routeParams.userEncodedName // is doe
Is there any way to make the named groups non-greedy, i.e. to obtain the following $routeParams:
$routeParams.userId // is 42
$routeParams.userEncodedName // is john-doe
?

You can change the path
from
$routeProvider.when('/users/:userId-:userEncodedName', {});
to
$routeProvider.when('/users/:userId*-:userEncodedName', {})
As stated in the AngularJS Documentation regarding $routeProviders, path property:
path can contain named groups starting with a colon and ending with a
star: e.g.:name*. All characters are eagerly stored in $routeParams
under the given name when the route matches.

Oddly enough Ryeballar's answer does indeed work (as is demonstrated in this short demo). I say "oddly enough", because based on the docs ("[...] characters are eagerly stored [...]"), I would expect it to work exactly the opposite way.
So, out of curiosity, I did some digging into the source code (v1.2.16) and it turns out that by a strange coincidence it indeed works. (Actually, this looks more like an inconsistency in the way route-paths are parsed).
The pathRegExp() function is responsible for converting the route path template into a regular expression, which is later used to match against the actual route paths.
The code that converts the route path template string into a RegExp pattern is the following:
path = path
.replace(/([().])/g, '\\$1')
.replace(/(\/)?:(\w+)([\?\*])?/g, function(_, slash, key, option){
var optional = option === '?' ? option : null;
var star = option === '*' ? option : null;
...
slash = slash || '';
return ''
+ (optional ? '' : slash)
+ '(?:'
+ (optional ? slash : '')
+ (star && '(.+?)' || '([^/]+)')
+ (optional || '')
+ ')'
+ (optional || '');
})
.replace(/([\/$\*])/g, '\\$1');
Based on the code above, the two route path templates (with and without *) end up in the following (totally different) regular expressions:
'/test/:param1-:param2' ==> '\/test\/(?:([^\/]+))-(?:([^\/]+))'
'/test/:param1*-:param2' ==> '\/test\/(?:(.+?))-(?:([^\/]+))'
So, what does each RegExp mean ?
/test/(?:([^/]+))-(?:([^/]+))
Let's break this up:
\/test\/: Match the string '/test/'.
(?:([^\/]+)) is equivalent to ([^\/]+) with the difference that we tell the RegExp engine not to store the capturing group's backreference.
([^\/]+): Match any sequence of 1 or more characters that does not contain /. By default, the RegExp engine will try to match as many characters as possible, as long as the rest of the string can match the remaining pattern (-(?:([^\/]+))).
Since the minimum substring that matches -(?:([^\/]+)) is -doe, :param2 will be matched to doe and :param1 to 42-john.
/test/(?:(.+?))-(?:([^/]+))
Let's break this up:
\/test\/: Match the string '/test/'.
(?:(.+?)) is equivalent to (.+?) with the difference that we tell the RegExp engine not to store the capturing group's backreference.
(.+?): Non-greedily match any sequence of 1 or more characters (any characters), as long as the rest of the string can match the remaining pattern (-(?:([^\/]+))). The key here is the ? following .+ which adds the non-greedy behaviour.
Since the minimum substring that matches (.+?) (and on the same time let the rest of the string match -(?:([^\/]+))) is 42, :param1 will be matched to 42 and :param2 to john-doe.
I hope this makes sense. Feel free to leave a comment if it doesn't :)

Related

Tolerate certain characters in RegEx

I am writing a message formatting parser that has the capability (among others) to parse links. This specific case requires parsing a link in the from of <url|linkname> and replacing that text with just the linkname. The issue here is that both url or linkname may or may not contain \1 or \2 characters anywhere in any order (at most one of each though). I want to match the pattern but keep the "invalid" characters. This problem solves itself for linkname as that part of the pattern is just ([^\n+]), but the url fragment is matched by a much more complicated pattern, more specifically the URL validation pattern from is.js. It would not be trivial to modify the whole pattern manually to tolerate [\1\2] everywhere, and I need the pattern to preserve those characters as they are used for tracking purposes (so I can't simply just .replace(/\1|\2/g, "") before matching).
If this kind of matching is not possible, is there some automated way to reliably modify the RegExp to add [\1\2]{0,2} between every character match, add \1\2 to all [chars] matches, etc.
This is the url pattern taken from is.js:
/(?:(?:https?|ftp):\/\/)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:\/\S*)?/i
This pattern was adapted for my purposes and for the <url|linkname> format as follows:
let namedUrlRegex = /<((?:(?:https?|ftp):\/\/)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:\/\S*)?)\|([^\n]+)>/ig;
The code where this is used is here: JSFiddle
Examples for clarification (... represents the namedUrlRegex variable from above, and $2 is the capture group that captures linkname):
Current behavior:
"<googl\1e.com|Google>".replace(..., "$2") // "<googl\1e.com|Google>" WRONG
"<google.com|Goo\1gle>".replace(..., "$2") // "Goo\1gle" CORRECT
"<not_\1a_url|Google>".replace(..., "$2") // "<not_\1a_url|Google>" CORRECT
Expected behavior:
"<googl\1e.com|Google>".replace(..., "$2") // "Google" (note there is no \1)
"<google.com|Goo\1gle>".replace(..., "$2") // "Goo\1gle"
"<not_\1a_url|Google>".replace(..., "$2") // "<not_\1a_url|Google>"
Note the same rules for \1 apply to \2, \1\2, \1...\2, \2...\1 etc
Context: This is used to normalize a string from a WYSIWYG editor to the length/content that it will display as, preserving the location of the current selection (denoted by \1 and \2 so it can be restored after parsing). If the "caret" is removed completely (e.g. if the cursor was in the URL of a link), it will select the whole string instead. Everything works as expected, except for when the selection starts or ends in the url fragment.
Edit for clarification: I only want to change a segment in a string if it follows the format of <url|linkname> where url matches the URL pattern (tolerating \1, \2) and linkname consists of non-\n characters. If this condition is not met within a <...|...> string, it should be left unaltered as per the not_a_url example above.
I ended up making a RegEx that matches all "symbols" in the expression. One quirk of this is that it expects :, =, ! characters to be escaped, even outside of a (?:...), (?=...), (?!...) expression. This is addressed by escaping them before processing.
Fiddle
let r = /(\\.|\[.+?\]|\w|[^\\\/\[\]\^\$\(\)\?\*\+\{\}\|\+\:\=\!]|(\{.+?\}))(?:((?:\{.+?\}|\+|\*)\??)|\??)/g;
let url = /((?:(?:https?|ftp):\/\/)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:\/\S*)?)/
function tolerate(regex, insert) {
let first = true;
// convert to string
return regex.toString().replace(/\/(.+)\//, "$1").
// escape :=!
replace(/((?:^|[^\\])\\(?:\\)*\(\?|[^?])([:=!]+)/g, (m, g1, g2) => g1 + (g2.split("").join("\\"))).
// substitute string
replace(r, function(m, g1, g2, g3, g4) {
// g2 = {...} multiplier (to prevent matching digits as symbols)
if (g2) return m;
// g3 = multiplier after symbol (must wrap in parenthesis to preserve behavior)
if (g3) return "(?:" + insert + g1 + ")" + g3;
// prevent matching tolerated characters at beginning, remove to change this behavior
if (first) {
first = false;
return m;
}
// insert the insert
return insert + m;
}
);
}
alert(tolerate(url, "\1?\2?"));

How to allow only certain words consecutively with Regex in javascript

I'm trying to write a regex that will return true if it matches the format below, otherwise, it should return false. It should only allow words as below:
Positive match (return true)
UA-1234-1,UA-12345-2,UA-34578-2
Negative match (return false or null)
Note: A is missing after U
UA-1234-1,U-12345-2
It should always give me true when the string passed to regex is
UA-1234-1,UA-12345-2,UA-34578-2,...........
Below is what I am trying to do but it is matching only the first element and not returning null.
var pattern=/^UA-[0-9]+(-[0-9]+)?/g;
pattern.match("UA-1234-1,UA-12345-2,UA-34578-2");
pattern.exec("UA-1234-1,UA-12345-2,UA-34578-2)
Thanks in advance. Help is greatly appreciated.
The pattern you need is a pattern enclosed with anchors (^ - start of string and $ - end of string) that matches your pattern at first (the initial "block") and then matches 0 or more occurrences of a , followed with the block pattern.
It looks like /^BLOCK(?:,BLOCK)*$/. You may introduce optional whitespaces in between, e.g. /^BLOCK(?:,\s*BLOCK)*$/.
In the end, the pattern looks like ^UA-[0-9]+(?:-[0-9]+)?(?:,UA-[0-9]+(?:-[0-9]+)?)*$. It is best to build it dynamically to keep it readable and easy to maintain:
const block = "UA-[0-9]+(?:-[0-9]+)?";
let rx = new RegExp(`^${block}(?:,${block})*$`); // RegExp("^" + block + "(?:," + block + ")*$") // for non-ES6
let tests = ['UA-1234-1,UA-12345-2,UA-34578-2', 'UA-1234-1,U-12345-2'];
for (var s of tests) {
console.log(s, "=>", rx.test(s));
}
split the string by commas, and test each element instead.

javascript regex insert new element into expression

I am passing a URL to a block of code in which I need to insert a new element into the regex. Pretty sure the regex is valid and the code seems right but no matter what I can't seem to execute the match for regex!
//** Incoming url's
//** url e.g. api/223344
//** api/11aa/page/2017
//** Need to match to the following
//** dir/api/12ab/page/1999
//** Hence the need to add dir at the front
var url = req.url;
//** pass in: /^\/api\/([a-zA-Z0-9-_~ %]+)(?:\/page\/([a-zA-Z0-9-_~ %]+))?$/
var re = myregex.toString();
//** Insert dir into regex: /^dir\/api\/([a-zA-Z0-9-_~ %]+)(?:\/page\/([a-zA-Z0-9-_~ %]+))?$/
var regVar = re.substr(0, 2) + 'dir' + re.substr(2);
var matchedData = url.match(regVar);
matchedData === null ? console.log('NO') : console.log('Yay');
I hope I am just missing the obvious but can anyone see why I can't match and always returns NO?
Thanks
Let's break down your regex
^\/api\/ this matches the beginning of a string, and it looks to match exactly the string "/api"
([a-zA-Z0-9-_~ %]+) this is a capturing group: this one specifically will capture anything inside those brackets, with the + indicating to capture 1 or more, so for example, this section will match abAB25-_ %
(?:\/page\/([a-zA-Z0-9-_~ %]+)) this groups multiple tokens together as well, but does not create a capturing group like above (the ?: makes it non-captuing). You are first matching a string exactly like "/page/" followed by a group exactly like mentioned in the paragraph above (that matches a-z, A-Z, 0-9, etc.
?$ is at the end, and the ? means capture 0 or more of the precending group, and the $ matches the end of the string
This regex will match this string, for example: /api/abAB25-_ %/page/abAB25-_ %
You may be able to take advantage of capturing groups, however, and use something like this instead to get similar results: ^\/api\/([a-zA-Z0-9-_~ %]+)\/page\/\1?$. Here, we are using \1 to reference that first capturing group and match exactly the same tokens it is matching. EDIT: actually, this probably won't work, since the text after /api/ and the text after /page/ will most likely be different, carrying on...
Afterwards, you are are adding "dir" to the beginning of your search, so you can now match someting like this: dir/api/abAB25-_ %/page/abAB25-_ %
You have also now converted the regex to a string, so like Crayon Violent pointed out in their comment, this will break your expected funtionality. You can fix this by using .source on your regex: var matchedData = url.match(regVar.source); https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/source
Now you can properly match a string like this: dir/api/11aa/page/2017 see this example: https://repl.it/Mj8h
As mentioned by Crayon Violent in the comments, it seems you're passing a String rather than a regular expression in the .match() function. maybe try the following:
url.match(new RegExp(regVar, "i"));
to convert the string to a regular expression. The "i" is for ignore case; don't know that's what you want. Learn more here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

Match filename and file extension from single Regex

I'm sure this must be easy enough, but I'm struggling...
var regexFileName = /[^\\]*$/; // match filename
var regexFileExtension = /(\w+)$/; // match file extension
function displayUpload() {
var path = $el.val(); //This is a file input
var filename = path.match(regexFileName); // returns file name
var extension = filename[0].match(regexFileExtension); // returns extension
console.log("The filename is " + filename[0]);
console.log("The extension is " + extension[0]);
}
The function above works fine, but I'm sure it must be possible to achieve with a single regex, by referencing different parts of the array returned with the .match() method. I've tried combining these regex but without success.
Also, I'm not using a string to test it on in the example, as console.log() escapes the backslashes in a filepath and it was starting to confuse me :)
Assuming that all files do have an extension, you could use
var regexAll = /[^\\]*\.(\w+)$/;
Then you can do
var total = path.match(regexAll);
var filename = total[0];
var extension = total[1];
/^.*\/(.*)\.?(.*)$/g after this first group is your file name and second group is extention.
var myString = "filePath/long/path/myfile.even.with.dotes.TXT";
var myRegexp = /^.*\/(.*)\.(.*)$/g;
var match = myRegexp.exec(myString);
alert(match[1]); // myfile.even.with.dotes
alert(match[2]); // TXT
This works even if your filename contains more then one dotes or doesn't contain dots at all (has no extention).
EDIT:
This is for linux, for windows use this /^.*\\(.*)\.?(.*)$/g (in linux directory separator is / in windows is \ )
You can use groups in your regular expression for this:
var regex = /^([^\\]*)\.(\w+)$/;
var matches = filename.match(regex);
if (matches) {
var filename = matches[1];
var extension = matches[2];
}
I know this is an old question, but here's another solution that can handle multiple dots in the name and also when there's no extension at all (or an extension of just '.'):
/^(.*?)(\.[^.]*)?$/
Taking it a piece at a time:
^
Anchor to the start of the string (to avoid partial matches)
(.*?)
Match any character ., 0 or more times *, lazily ? (don't just grab them all if the later optional extension can match), and put them in the first capture group ( ).
(\.
Start a 2nd capture group for the extension using (. This group starts with the literal . character (which we escape with \ so that . isn't interpreted as "match any character").
[^.]*
Define a character set []. Match characters not in the set by specifying this is an inverted character set ^. Match 0 or more non-. chars to get the rest of the file extension *. We specify it this way so that it doesn't match early on filenames like foo.bar.baz, incorrectly giving an extension with more than one dot in it of .bar.baz instead of just .baz.
. doesn't need escaped inside [], since everything (except^) is a literal in a character set.
)?
End the 2nd capture group ) and indicate that the whole group is optional ?, since it may not have an extension.
$
Anchor to the end of the string (again, to avoid partial matches)
If you're using ES6 you can even use destructing to grab the results in 1 line:
[,filename, extension] = /^(.*?)(\.[^.]*)?$/.exec('foo.bar.baz');
which gives the filename as 'foo.bar' and the extension as '.baz'.
'foo' gives 'foo' and ''
'foo.' gives 'foo' and '.'
'.js' gives '' and '.js'
This will recognize even /home/someUser/.aaa/.bb.c:
function splitPathFileExtension(path){
var parsed = path.match(/^(.*\/)(.*)\.(.*)$/);
return [parsed[1], parsed[2], parsed[3]];
}
I think this is a better approach as matches only valid directory, file names and extension. and also groups the path, filename and file extension. And also works with empty paths only filename.
^([\w\/]*?)([\w\.]*)\.(\w)$
Test cases
the/p0090Aath/fav.min.icon.png
the/p0090Aath/fav.min.icon.html
the/p009_0Aath/fav.m45in.icon.css
fav.m45in.icon.css
favicon.ico
Output
[the/p0090Aath/][fav.min.icon][png]
[the/p0090Aath/][fav.min.icon][html]
[the/p009_0Aath/][fav.m45in.icon][css]
[][fav.m45in.icon][css]
[][favicon][ico]
(?!\w+).(\w+)(\s)
Find one or more word (s) \w+, negate (?! ) so that the word (s) are not shown on the result, specify the delimiter ., find the first word (\w+) and ignore the words that are after a possible blank space (\s)

Better RegEx to extract GoogleVideo ID from URL

HI!
I use this the following regex with JS to extract this id 6321890784249785097 from that url
http://video.google.com/googleplayer.swf?docId=6321890784249785097
url.replace(/^[^\$]+.(.{19}).*/,"$1");
But I only cut the last 19 chars from the tail. How can I make to more bullet-proof? Maybe with an explanation so that I learn something?
This should work a bit better:
/^.*docId=(\d+)$/
This matches all characters up to the 'docId=', then gives you all digits after that up to the end of the url.
video[.]google[.]com/googleplayer[.]swf[?]docId=(\d+)
The ID will be captured in reference #1. If you just want to match 19 digits you can chance it to this:
video[.]google[.]com/googleplayer[.]swf[?]docId=(\d{19})
url.replace(/.*docId=(\d{19}).*/i,"$1");
this cuts 19 digits that follow docId=.
Here is the function I use in our app to read url parameters. So far it didn't let me down ;)
urlParam:function(name, w){
w = w || window;
var rx = new RegExp('[\&|\?]'+name+'=([^\&\#]+)'),
val = w.location.href.match(rx);
return !val ? '':val[1];
}
For the explanation of the regexp:
[\&|\?] take either the start of the query string '?' or the separation between parameters '&'
'name' will be the name of the parameter 'docId' in your case
([^\&#]+) take any characters that are not & and #. The hash key is often used in one page apps. And the parenthesis keep the reference of the content.
val will be an array or null/undefined and val[1] the value you are looking for

Categories

Resources