Regex for url resource part - javascript

Context
I have code that takes an url path and replaces path params with '*'.
All my urls follow JSON API naming convention.
All valid url resource parts follow next rules:
Member names SHOULD start and end with the characters “a-z” (U+0061
to U+007A)
Member names SHOULD contain only the characters “a-z”
(U+0061 to U+007A), “0-9” (U+0030 to U+0039), and the hyphen minus
(U+002D HYPHEN-MINUS, “-“) as separator between multiple words.
The pass param usually is an id (number, uuid, guid, etc).
Here are several examples of transformations:
/user/e09e4f9f-cfcd-4a23-a88f-b9f2f265167f/info -> /user/*/info
/user/e09e4f9f-cfcd-4a23-a88f-b9f2f265167f -> /user/*
/user/1 -> /user/*
What I have
/^[a-z][a-z0-9-]*[a-z]$/
The issues is that it doesn't handle uuid as a path param.
Here is my function that parses the url (sorry don't have time to create a jsfiddle):
const escapeResourcePathParameters = resource => resource
.substr(resource.startsWith('/') ? 1 : 0)
.split('/')
.reduce((url, member) => {
const match = member.match(REGEX.JSONAPI_RESOURCE_MEMBER);
const part = match
? member
: '*';
return `${url}/${part}`;
}, '');
Questions
I need a regex that follows the rules above and works for the examples above.
UPD:
I've added my function that I use to parse urls. To test your regex, just replace it with REGEX.JSONAPI_RESOURCE_MEMBER and pass the url like
/user/e09e4f9f-cfcd-4a23-a88f-b9f2f265167f/info, it should return /user/*/info

i am guessing you are looking for a regex to capture the UUID :
this should be working in javascript :
/[a-z][a-z0-9]+[-]+[a-z0-9-]+[a-z]/
I suppose a UUID Should have at least two words, so at least one "-"
let a = "/user/e09e4f9f-cfcd-4a23-a88a/info"
const match = a.match(/[a-z][a-z0-9]+[-]+[a-z0-9-]+[a-z]/)
console.log(match[0])
So for your code, it should be something like
const escapeResourcePathParameters = resource => resource
.substr(resource.startsWith('/') ? 1 : 0)
.split('/')
.reduce((url, member) => {
// with REGEX.JSONAPI_RESOURCE_MEMBER = /[a-z][a-z0-9]*[-]+[a-z0-9-]+[a-z]/
return `${url}/${member.replace(REGEX.JSONAPI_RESOURCE_MEMBER, '*')}`;
}, '');

You could use look-arounds:
(?<=\/)[a-z][a-z0-9-]*[a-z](?=\/)
As noted in my comment, no need to use the anchors ^ nor $. Also escape slash \/. The regex wil match the pattern [a-z][a-z0-9-]*[a-z] only when surrounded by slashes.
/user/e09e4f9f-cfcd-4a23-a88f-b9f2f265167f/info # result: /*/*/info
/user/e09e4f9f-cfcd-4a23-a88f-b9f2f265167f # result: /*/e09e4f9f-cfcd-4a23-a88f-b9f2f265167f
/user/1 # result: /*/1
To match UUIDs use:
(?<=\/)[a-z0-9-]{8}-(?:[a-z0-9-]{4}-){3}[a-z0-9-]{12}(?=\/)
The UUID format is described here: https://en.wikipedia.org/wiki/Universally_unique_identifier#Format

Related

How to use regex to match an IPFS URL?

I have the following IPFS URL.
https://example.com:2053/ipfs/QmPQeMz2vzeLin5HcNYinVoSggPsaXh5QiKDBFtxMREgLf/images/0000000000000000000000000000000000000000000000000000000000000001.png
I want to use regex to match this file, but instead of writing the full URL, I want to just match something like https*00000001.png.
The problem is that when I use
paddedHex = '00000001';
let tmpSearchQuery = `https*${paddedHex}.png`;
It doesn't really match anything. Why?
You are repeating an s char zero or more times using s* and to create a dynamic regex you have to use the RegExp constructor.
You can repeat optional non whitespace chars instead using \S* and if you want to match http and https make the s optional using s?
const s = `https://ipfs.moralis.io:2053/ipfs/QmPQeMz2vzeLin5HcNYinVoSggPsaXh5QiKDBFtxMREgLf/images/0000000000000000000000000000000000000000000000000000000000000001.png`;
const paddedHex = '00000001';
const tmpSearchQuery = new RegExp(`https?\\S*${paddedHex}\\.png`);
const m = s.match(tmpSearchQuery);
if (m) {
console.log(m[0]);
}

How can I cut the string after a second underscore?

I'm receiving a list of files in an object and I just need to display a file name and its type in a table.
All files come back from a server in such format: timestamp_id_filename.
Example: 1568223848_12345678_some_document.pdf
I wrote a helper function which cuts the string.
At first, I did it with String.prototype.split() method, I used regex, but then again - there was a problem. Files can have underscores in their names so that didn't work, so I needed something else. I couldn't come up with a better idea. I think it looks really dumb and it's been haunting me the whole day.
The function looks like this:
const shortenString = (attachmentName) => {
const file = attachmentName
.slice(attachmentName.indexOf('_') + 1)
.slice(attachmentName.slice(attachmentName.indexOf('_') + 1).indexOf('_') + 1);
const fileName = file.slice(0, file.lastIndexOf('.'));
const fileType = file.slice(file.lastIndexOf('.'));
return [fileName, fileType];
};
I wonder if there is a more elegant way to solve the problem without using loops.
You can use replace and split, with the pattern we are replacing the string upto the second _ from start of string and than we split on . to get name and type
let nameAndType = (str) => {
let replaced = str.replace(/^(?:[^_]*_){2}/g, '')
let splited = replaced.split('.')
let type = splited.pop()
let name = splited.join('.')
return {name,type}
}
console.log(nameAndType("1568223848_12345678_some_document.pdf"))
console.log(nameAndType("1568223848_12345678_some_document.xyz.pdf"))
function splitString(val){
return val.split('_').slice('2').join('_');
}
const getShortString = (str) => str.replace(/^(?:[^_]*_){2}/g, '')
For input like
1568223848_12345678_some_document.pdf, it should give you something like some_document.pdf
const re = /(.*?)_(.*?)_(.*)/;
const name = "1568223848_12345678_some_document.pdf";
[,date, id, filename] = re.exec(name);
console.log(date);
console.log(id);
console.log(filename);
some notes:
you want to make the regular expression 1 time. If you do this
function getParts(str) {
const re = /expression/;
...
}
Then you're making a new regular expression object every time you call getParts.
.*? is faster than .*
This is because .* is greedy so the moment the regular expression engine sees that it puts the entire rest of the string into that slot and then checks if can continue the expression. If it fails it backs off one character. If that fails it backs off another character, etc.... .*? on the other hand is satisfied as soon as possible. So it adds one character then sees if the next part of the expression works, if not it adds one more character and sees if the expressions works, etc..
splitting on '_' works but it could potentially make many temporary strings
for example if the filename is 1234_1343_a________________________.pdf
you'd have to test to see if using a regular experssion is faster or slower than splitting, assuming speed matters.
You can kinda chain .indexOf to get second offset and any further, although more than two would look ugly. The reason is that indexOf takes start index as second argument, so passing index of the first occurrence will help you find the second one:
var secondUnderscoreIndex = name.indexOf("_",name.indexOf("_")+1);
So my solution would be:
var index = name.indexOf("_",name.indexOf("_")+1));
var [timestamp, name] = [name.substring(0, index), name.substr(index+1)];
Alternatively, using regular expression:
var [,number1, number2, filename, extension] = /([0-9]+)_([0-9]+)_(.*?)\.([0-9a-z]+)/i.exec(name)
// Prints: "1568223848 12345678 some_document pdf"
console.log(number1, number2, filename, extension);
I like simplicity...
If you ever need the date in times, theyre in [1] and [2]
var getFilename = function(str) {
return str.match(/(\d+)_(\d+)_(.*)/)[3];
}
var f = getFilename("1568223848_12345678_some_document.pdf");
console.log(f)
If ever files names come in this format timestamp_id_filename. You can use a regular expression that skip the first two '_' and save the nex one.
test:
var filename = '1568223848_12345678_some_document.pdf';
console.log(filename.match(/[^_]+_[^_]+_(.*)/)[1]); // result: 'some_document.pdf'
Explanation:
/[^]+[^]+(.*)/
[^]+ : take characters diferents of ''
: take '' character
Repeat so two '_' are skiped
(.*): Save characters in a group
match method: Return array, his first element is capture that match expression, next elements are saved groups.
Split the file name string into an array on underscores.
Discard the first two elements of the array.
Join the rest of the array with underscores.
Now you have your file name.

Regex to match specific path with specific query param

I'm struggling with creating regex to match URL path with query param that could be in any place.
For example URLs could be:
/page?foo=bar&target=1&test=1 <- should match
/page?target=1&test=2 <- should match
/page/nested?foo=bar&target=1&test=1 <- should NOT match
/page/nested?target=1&test=2 <- should NOT match
/another-page?foo=bar&target=1&test=1 <- should NOT match
/another-page?target=1&test=2 <- should NOT match
where I need to target param target specifically on /page
This regex works only to find the param \A?target=[^&]+&*.
Thanks!
UPDATE:
It is needed for a third-party tool that will decide on which page to run an experiment. It only accepts setup on their dashboard with regular experssion so I cannot use code tools like URL parser.
General rule is that if you want to parse params, use URL parser, not a custom regex.
In this case you can use for instance:
# http://a.b/ is just added to make URL parsing work
url = new URL("http://a.b/page?foo=bar&target=1&test=1")
url.searchParams.get("target")
# => 1
url.pathname
# => '/page'
And then check those values in ifs:
url = new URL("http://a.b/page?foo=bar&target=1&test=1")
url = new URL("http://a.b/page?foo=bar&target=1&test=1")
if (url.searchParams.get("foo") && url.pathname == '/page' {
# ...
}
See also:
https://developer.mozilla.org/en-US/docs/Web/API/URLSearchParams
https://developer.mozilla.org/en-US/docs/Web/API/URL
EDIT
If you have to use regex try this one:
\/page(?=\?).*[?&]target=[^&\s]*(&|$)
Demo
Explanation:
\/page(?=\?) - matches path (starts with / then page then lookahead for ?)
.*[?&]target=[^&\s]*($|&) matches param name target:
located anywhere (preceded by anything .*)
[?&] preceded with ? or &
followed by its value (=[^&\s]*)
ending with end of params ($) or another param (&)
If you're looking for a regex then you may use:
/\/page\?(?:.*&)?target=[^&]*/i
RegEx Demo
RegEx Details:
\/page\?: Match text /page?:
(?:.*&)?: Match optional text of any length followed by &
target=[^&]*: Match text target= followed by 0 or more characters that are not &

Regex matching for nock tests failing

I am trying to match URLs.
lab.before(async () => {
nock('https://dev.azure.com')
.get(centosAzureUri)
.times(5)
.reply(201, [
...
If I use a string, it is working just fine. An example is below:
const centosAzureUri = `/${conf.org}/${conf.buildProject}/_apis/build/builds?api-version=4.1&branchName=${conf.buildBranch}`
However, I want to use a RegEx as below:
const centosAzureUri = new RegExp(`/${conf.org}/${conf.buildProject}/_apis/build/builds?api-version=4.1.*`, 'g')
That is not working.
According to the documentation, nock should accept regular expressions and .* should match any symbol [because of the .] and allow those matched characters to be repeated any number of times. Hence, I am assuming this should accept any string ending, including &branchName=${conf.buildBranch}.
What I am doing wrong?
I think nock only uses regex literal vs. regex object which will return a new object. eg.
nock('http://example.com')
.get(/harry\/[^\/]+$/)
.query({param1: 'value'})
.reply(200, "OK");
See related
How to build nock regex for dynamic urls
Please note that RegExp only needs the pattern up to "4.1" to perform a match. The rest of the string will be ignored if the match occurs. For example:
const centosAzureUri = new RegExp(`/${conf.org}/${conf.buildProject}/_apis/build/builds?api-version=4.1`, 'g')
Further, you may want to try escapements, since slashes require those:
const centosAzureUri = new RegExp(`\/${conf.org}\/${conf.buildProject}\/_apis\/build\/builds?api-version=4.1`, 'g')
HTH!

Make AngularJS routes named groups non-greedy

Given the following route:
$routeProvider.when('/users/:userId-:userEncodedName', {
...
})
When hitting the URL /users/42-johndoe, the $routeParams are initialized as expected:
$routeParams.userId // is 42
$routeParams.userEncodedName // is johndoe
But when hitting the URL /users/42-john-doe, the $routeParam are initialized as follow:
$routeParams.userId // is 42-john
$routeParams.userEncodedName // is doe
Is there any way to make the named groups non-greedy, i.e. to obtain the following $routeParams:
$routeParams.userId // is 42
$routeParams.userEncodedName // is john-doe
?
You can change the path
from
$routeProvider.when('/users/:userId-:userEncodedName', {});
to
$routeProvider.when('/users/:userId*-:userEncodedName', {})
As stated in the AngularJS Documentation regarding $routeProviders, path property:
path can contain named groups starting with a colon and ending with a
star: e.g.:name*. All characters are eagerly stored in $routeParams
under the given name when the route matches.
Oddly enough Ryeballar's answer does indeed work (as is demonstrated in this short demo). I say "oddly enough", because based on the docs ("[...] characters are eagerly stored [...]"), I would expect it to work exactly the opposite way.
So, out of curiosity, I did some digging into the source code (v1.2.16) and it turns out that by a strange coincidence it indeed works. (Actually, this looks more like an inconsistency in the way route-paths are parsed).
The pathRegExp() function is responsible for converting the route path template into a regular expression, which is later used to match against the actual route paths.
The code that converts the route path template string into a RegExp pattern is the following:
path = path
.replace(/([().])/g, '\\$1')
.replace(/(\/)?:(\w+)([\?\*])?/g, function(_, slash, key, option){
var optional = option === '?' ? option : null;
var star = option === '*' ? option : null;
...
slash = slash || '';
return ''
+ (optional ? '' : slash)
+ '(?:'
+ (optional ? slash : '')
+ (star && '(.+?)' || '([^/]+)')
+ (optional || '')
+ ')'
+ (optional || '');
})
.replace(/([\/$\*])/g, '\\$1');
Based on the code above, the two route path templates (with and without *) end up in the following (totally different) regular expressions:
'/test/:param1-:param2' ==> '\/test\/(?:([^\/]+))-(?:([^\/]+))'
'/test/:param1*-:param2' ==> '\/test\/(?:(.+?))-(?:([^\/]+))'
So, what does each RegExp mean ?
/test/(?:([^/]+))-(?:([^/]+))
Let's break this up:
\/test\/: Match the string '/test/'.
(?:([^\/]+)) is equivalent to ([^\/]+) with the difference that we tell the RegExp engine not to store the capturing group's backreference.
([^\/]+): Match any sequence of 1 or more characters that does not contain /. By default, the RegExp engine will try to match as many characters as possible, as long as the rest of the string can match the remaining pattern (-(?:([^\/]+))).
Since the minimum substring that matches -(?:([^\/]+)) is -doe, :param2 will be matched to doe and :param1 to 42-john.
/test/(?:(.+?))-(?:([^/]+))
Let's break this up:
\/test\/: Match the string '/test/'.
(?:(.+?)) is equivalent to (.+?) with the difference that we tell the RegExp engine not to store the capturing group's backreference.
(.+?): Non-greedily match any sequence of 1 or more characters (any characters), as long as the rest of the string can match the remaining pattern (-(?:([^\/]+))). The key here is the ? following .+ which adds the non-greedy behaviour.
Since the minimum substring that matches (.+?) (and on the same time let the rest of the string match -(?:([^\/]+))) is 42, :param1 will be matched to 42 and :param2 to john-doe.
I hope this makes sense. Feel free to leave a comment if it doesn't :)

Categories

Resources