regex to capture just filename (no url path, no extension) - javascript

In JavaScript, I can use this regex ([^\/]+)(\.[^\.\/]+)$ to capture just the filename in a URL. It works well in the following cases:
http://a.com/b/file.name.ext
http://a.com/b/file.name.ext#hash
http://a.com/b/file.name.ext?query
However it fails to match if there is no extension:
No match
http://a.com/b/filename
http://a.com/b/filename#hash
http://a.com/b/filename?query
This is normal. The second capturing group expects there to be a .ext chunk at the end.
If I make the second capturing group optional...
`([^\/]+)(\.[^\.\/]+)?$`
... then the first capturing group becomes greedy, and includes the .ext ending, which I don't want. How is the regex engine thinking about the optional second group? How can I make the existence of an extension optional?
NOTE: This regex is not intended for use with URLs with the following structure:
http://a.com/b/filename?query=a.b
http://a.com/b/filename.ext?query=a.b
In my case, dots will never appear later in the the URL.

If you want pure regex (= nice and clean regular language expression from theoretical computer science, plus capturing groups), then you can do it with alternative groups:
([^\/.]+)$|([^\/]+)(\.[^\/.]+)$
and identify groups 1 and 2. Group 3 is the optional extension.
Another possibility:
([^\/.]+)(([^\/]*)(\.[^\/.]+))?$
Here you'd use group 4 as the extension, and the concatenation of groups 1 and 3 as the filename. Group 2 is only used to make the compound of 3 and 4 optional.

Tested with:
http://a.com/b/file.name.ext
http://a.com/b/filename
http://a.com/b/filename#hash
http://a.com/b/filename?query
var file = "http://a.com/b/filename#hash";
function getFileName(url) {
var index = url.lastIndexOf("/") + 1;
var filenameWithExtension = url.substr(index);
var filename = filenameWithExtension.split(".")[0];
filename = filename.replace(/(#|\?).*?$/, "");
return filename;
}
alert(getFileName(file));
//filename
References:
lastindexof
split
substr
replace

Related

Getting element from filename using continous split or regex

I currently have the following string :
AAAAA/BBBBB/1565079415419-1564416946615-file-test.dsv
But I would like to split it to only get the following result (removing all tree directories + removing timestamp before the file):
1564416946615-file-test.dsv
I currently have the following code, but it's not working when the filename itselfs contains a '-' like in the example.
getFilename(str){
return(str.split('\\').pop().split('/').pop().split('-')[1]);
}
I don't want to use a loop for performances considerations (I may have lots of files to work with...) So it there an other solution (maybe regex ?)
We can try doing a regex replacement with the following pattern:
.*\/\d+-\b
Replacing the match with empty string should leave you with the result you want.
var filename = "AAAAA/BBBBB/1565079415419-1564416946615-file-test.dsv";
var output = filename.replace(/.*\/\d+-\b/, "");
console.log(output);
The pattern works by using .*/ to first consume everything up, and including, the final path separator. Then, \d+- consumes the timestamp as well as the dash that follows, leaving only the portion you want.
You may use this regex and get captured group #1:
/[^\/-]+-(.+)$/
RegEx Demo
RegEx Details:
[^\/-]+: Match any character that is not / and not -
-: Match literal -
(.+): Match 1+ of any characters
$: End
Code:
var filename = "AAAAA/BBBBB/1565079415419-1564416946615-file-test.dsv";
var m = filename.match(/[^\/-]+-(.+)$/);
console.log(m[1]);
//=> 1564416946615-file-test.dsv

Extracting a complicated part of the string with plain Javascript

I have a following string:
Text
I want to extract from this string, with the use of JavaScript 'pl' or 'pl_company_com'
There are a few variables:
jan_kowalski is a name and surname it can change, and sometimes even have 3 elements
the country code (in this example 'pl') will change to other en / de / fr (this is that part of the string i want to get)
the rest of the string remains the same for every case (beginning + everything after starting with _company_com ...
Ps. I tried to do it with split, but my knowledge of JS is very basic and I cant get what i want, plase help
An alternative to Randy Casburn's solution using regex
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_(.*_company_com)')[1];
console.log(out);
Or if you want to just get that string with those country codes you specified
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
A proof of concept that this solution also works for other combinations
let urls = [
new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx'),
new URL('https://my.domain.com/personal/firstname_middlename_lastname_pl_company_com/Documents/Forms/All.aspx')
]
urls.forEach(url => console.log(url.href.match('.*_(en|de|fr|pl).*')[1]))
I have been very successful before with this kind of problems with regular expressions:
var string = 'Text';
var regExp = /([\w]{2})_company_com/;
find = string.match(regExp);
console.log(find); // array with found matches
console.log(find[1]); // first group of regexp = country code
First you got your given string. Second you have a regular expression, which is marked with two slashes at the beginning and at the end. A regular expression is mostly used for string searches (you can even replace complicated text in all major editors with it, which can be VERY useful).
In this case here it matches exactly two word characters [\w]{2} followed directly by _company_com (\w indicates a word character, the [] group all wanted character types, here only word characters, and the {}indicate the number of characters to be found). Now to find the wanted part string.match(regExp) has to be called to get all captured findings. It returns an array with the whole captured string followed by all capture groups within the regExp (which are denoted by ()). So in this case you get the country code with find[1], which is the first and only capture group of the regular expression.

javascript regex insert new element into expression

I am passing a URL to a block of code in which I need to insert a new element into the regex. Pretty sure the regex is valid and the code seems right but no matter what I can't seem to execute the match for regex!
//** Incoming url's
//** url e.g. api/223344
//** api/11aa/page/2017
//** Need to match to the following
//** dir/api/12ab/page/1999
//** Hence the need to add dir at the front
var url = req.url;
//** pass in: /^\/api\/([a-zA-Z0-9-_~ %]+)(?:\/page\/([a-zA-Z0-9-_~ %]+))?$/
var re = myregex.toString();
//** Insert dir into regex: /^dir\/api\/([a-zA-Z0-9-_~ %]+)(?:\/page\/([a-zA-Z0-9-_~ %]+))?$/
var regVar = re.substr(0, 2) + 'dir' + re.substr(2);
var matchedData = url.match(regVar);
matchedData === null ? console.log('NO') : console.log('Yay');
I hope I am just missing the obvious but can anyone see why I can't match and always returns NO?
Thanks
Let's break down your regex
^\/api\/ this matches the beginning of a string, and it looks to match exactly the string "/api"
([a-zA-Z0-9-_~ %]+) this is a capturing group: this one specifically will capture anything inside those brackets, with the + indicating to capture 1 or more, so for example, this section will match abAB25-_ %
(?:\/page\/([a-zA-Z0-9-_~ %]+)) this groups multiple tokens together as well, but does not create a capturing group like above (the ?: makes it non-captuing). You are first matching a string exactly like "/page/" followed by a group exactly like mentioned in the paragraph above (that matches a-z, A-Z, 0-9, etc.
?$ is at the end, and the ? means capture 0 or more of the precending group, and the $ matches the end of the string
This regex will match this string, for example: /api/abAB25-_ %/page/abAB25-_ %
You may be able to take advantage of capturing groups, however, and use something like this instead to get similar results: ^\/api\/([a-zA-Z0-9-_~ %]+)\/page\/\1?$. Here, we are using \1 to reference that first capturing group and match exactly the same tokens it is matching. EDIT: actually, this probably won't work, since the text after /api/ and the text after /page/ will most likely be different, carrying on...
Afterwards, you are are adding "dir" to the beginning of your search, so you can now match someting like this: dir/api/abAB25-_ %/page/abAB25-_ %
You have also now converted the regex to a string, so like Crayon Violent pointed out in their comment, this will break your expected funtionality. You can fix this by using .source on your regex: var matchedData = url.match(regVar.source); https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/source
Now you can properly match a string like this: dir/api/11aa/page/2017 see this example: https://repl.it/Mj8h
As mentioned by Crayon Violent in the comments, it seems you're passing a String rather than a regular expression in the .match() function. maybe try the following:
url.match(new RegExp(regVar, "i"));
to convert the string to a regular expression. The "i" is for ignore case; don't know that's what you want. Learn more here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

How would I write a Regular Expression to capture the value between Last Slash and Query String?

Problem:
Extract image file name from CDN address similar to the following:
https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba
Two-stage Solution:
I am using two regular expressions to retrieve the file name:
var postLastSlashRegEx = /[^\/]+$/,
preQueryRegEx = /^([^?]+)/;
var fileFromURL = urlString.match(postLastSlashRegEx)[0].match(preQueryRegEx)[0];
// fileFromURL = "photo%2FB%_2.jpeg"
Question:
Is there a way I can combine both regular expressions?
I've tried using capture groups, but haven't been able to produce a working solution.
From my comment
You can use a lookahead to find the "?" and use [^/] to match any non-slash characters.
/[^/]+(?=\?)/
To remove the dependency on the URL needing a "?", you can make the lookahead match a question mark or the end of line indicator (represented by $), but make sure the first glob is non-greedy.
/[^/]+?(?=\?|$)/
You don't have to use regex, you can just use split and substr.
var str = "https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba".split("?")[0];
var fileName = temp.substr(temp.lastIndexOf('/')+1);
but if regex is important to you, then:
str.match(/[^?]*\/([^?]+)/)[1]
The code using the substring method would look like the following -
var fileFromURL = urlString.substring(urlString.lastIndexOf('/') + 1, urlString.lastIndexOf('?'))

Match filename and file extension from single Regex

I'm sure this must be easy enough, but I'm struggling...
var regexFileName = /[^\\]*$/; // match filename
var regexFileExtension = /(\w+)$/; // match file extension
function displayUpload() {
var path = $el.val(); //This is a file input
var filename = path.match(regexFileName); // returns file name
var extension = filename[0].match(regexFileExtension); // returns extension
console.log("The filename is " + filename[0]);
console.log("The extension is " + extension[0]);
}
The function above works fine, but I'm sure it must be possible to achieve with a single regex, by referencing different parts of the array returned with the .match() method. I've tried combining these regex but without success.
Also, I'm not using a string to test it on in the example, as console.log() escapes the backslashes in a filepath and it was starting to confuse me :)
Assuming that all files do have an extension, you could use
var regexAll = /[^\\]*\.(\w+)$/;
Then you can do
var total = path.match(regexAll);
var filename = total[0];
var extension = total[1];
/^.*\/(.*)\.?(.*)$/g after this first group is your file name and second group is extention.
var myString = "filePath/long/path/myfile.even.with.dotes.TXT";
var myRegexp = /^.*\/(.*)\.(.*)$/g;
var match = myRegexp.exec(myString);
alert(match[1]); // myfile.even.with.dotes
alert(match[2]); // TXT
This works even if your filename contains more then one dotes or doesn't contain dots at all (has no extention).
EDIT:
This is for linux, for windows use this /^.*\\(.*)\.?(.*)$/g (in linux directory separator is / in windows is \ )
You can use groups in your regular expression for this:
var regex = /^([^\\]*)\.(\w+)$/;
var matches = filename.match(regex);
if (matches) {
var filename = matches[1];
var extension = matches[2];
}
I know this is an old question, but here's another solution that can handle multiple dots in the name and also when there's no extension at all (or an extension of just '.'):
/^(.*?)(\.[^.]*)?$/
Taking it a piece at a time:
^
Anchor to the start of the string (to avoid partial matches)
(.*?)
Match any character ., 0 or more times *, lazily ? (don't just grab them all if the later optional extension can match), and put them in the first capture group ( ).
(\.
Start a 2nd capture group for the extension using (. This group starts with the literal . character (which we escape with \ so that . isn't interpreted as "match any character").
[^.]*
Define a character set []. Match characters not in the set by specifying this is an inverted character set ^. Match 0 or more non-. chars to get the rest of the file extension *. We specify it this way so that it doesn't match early on filenames like foo.bar.baz, incorrectly giving an extension with more than one dot in it of .bar.baz instead of just .baz.
. doesn't need escaped inside [], since everything (except^) is a literal in a character set.
)?
End the 2nd capture group ) and indicate that the whole group is optional ?, since it may not have an extension.
$
Anchor to the end of the string (again, to avoid partial matches)
If you're using ES6 you can even use destructing to grab the results in 1 line:
[,filename, extension] = /^(.*?)(\.[^.]*)?$/.exec('foo.bar.baz');
which gives the filename as 'foo.bar' and the extension as '.baz'.
'foo' gives 'foo' and ''
'foo.' gives 'foo' and '.'
'.js' gives '' and '.js'
This will recognize even /home/someUser/.aaa/.bb.c:
function splitPathFileExtension(path){
var parsed = path.match(/^(.*\/)(.*)\.(.*)$/);
return [parsed[1], parsed[2], parsed[3]];
}
I think this is a better approach as matches only valid directory, file names and extension. and also groups the path, filename and file extension. And also works with empty paths only filename.
^([\w\/]*?)([\w\.]*)\.(\w)$
Test cases
the/p0090Aath/fav.min.icon.png
the/p0090Aath/fav.min.icon.html
the/p009_0Aath/fav.m45in.icon.css
fav.m45in.icon.css
favicon.ico
Output
[the/p0090Aath/][fav.min.icon][png]
[the/p0090Aath/][fav.min.icon][html]
[the/p009_0Aath/][fav.m45in.icon][css]
[][fav.m45in.icon][css]
[][favicon][ico]
(?!\w+).(\w+)(\s)
Find one or more word (s) \w+, negate (?! ) so that the word (s) are not shown on the result, specify the delimiter ., find the first word (\w+) and ignore the words that are after a possible blank space (\s)

Categories

Resources