Match filename and file extension from single Regex - javascript

I'm sure this must be easy enough, but I'm struggling...
var regexFileName = /[^\\]*$/; // match filename
var regexFileExtension = /(\w+)$/; // match file extension
function displayUpload() {
var path = $el.val(); //This is a file input
var filename = path.match(regexFileName); // returns file name
var extension = filename[0].match(regexFileExtension); // returns extension
console.log("The filename is " + filename[0]);
console.log("The extension is " + extension[0]);
}
The function above works fine, but I'm sure it must be possible to achieve with a single regex, by referencing different parts of the array returned with the .match() method. I've tried combining these regex but without success.
Also, I'm not using a string to test it on in the example, as console.log() escapes the backslashes in a filepath and it was starting to confuse me :)

Assuming that all files do have an extension, you could use
var regexAll = /[^\\]*\.(\w+)$/;
Then you can do
var total = path.match(regexAll);
var filename = total[0];
var extension = total[1];

/^.*\/(.*)\.?(.*)$/g after this first group is your file name and second group is extention.
var myString = "filePath/long/path/myfile.even.with.dotes.TXT";
var myRegexp = /^.*\/(.*)\.(.*)$/g;
var match = myRegexp.exec(myString);
alert(match[1]); // myfile.even.with.dotes
alert(match[2]); // TXT
This works even if your filename contains more then one dotes or doesn't contain dots at all (has no extention).
EDIT:
This is for linux, for windows use this /^.*\\(.*)\.?(.*)$/g (in linux directory separator is / in windows is \ )

You can use groups in your regular expression for this:
var regex = /^([^\\]*)\.(\w+)$/;
var matches = filename.match(regex);
if (matches) {
var filename = matches[1];
var extension = matches[2];
}

I know this is an old question, but here's another solution that can handle multiple dots in the name and also when there's no extension at all (or an extension of just '.'):
/^(.*?)(\.[^.]*)?$/
Taking it a piece at a time:
^
Anchor to the start of the string (to avoid partial matches)
(.*?)
Match any character ., 0 or more times *, lazily ? (don't just grab them all if the later optional extension can match), and put them in the first capture group ( ).
(\.
Start a 2nd capture group for the extension using (. This group starts with the literal . character (which we escape with \ so that . isn't interpreted as "match any character").
[^.]*
Define a character set []. Match characters not in the set by specifying this is an inverted character set ^. Match 0 or more non-. chars to get the rest of the file extension *. We specify it this way so that it doesn't match early on filenames like foo.bar.baz, incorrectly giving an extension with more than one dot in it of .bar.baz instead of just .baz.
. doesn't need escaped inside [], since everything (except^) is a literal in a character set.
)?
End the 2nd capture group ) and indicate that the whole group is optional ?, since it may not have an extension.
$
Anchor to the end of the string (again, to avoid partial matches)
If you're using ES6 you can even use destructing to grab the results in 1 line:
[,filename, extension] = /^(.*?)(\.[^.]*)?$/.exec('foo.bar.baz');
which gives the filename as 'foo.bar' and the extension as '.baz'.
'foo' gives 'foo' and ''
'foo.' gives 'foo' and '.'
'.js' gives '' and '.js'

This will recognize even /home/someUser/.aaa/.bb.c:
function splitPathFileExtension(path){
var parsed = path.match(/^(.*\/)(.*)\.(.*)$/);
return [parsed[1], parsed[2], parsed[3]];
}

I think this is a better approach as matches only valid directory, file names and extension. and also groups the path, filename and file extension. And also works with empty paths only filename.
^([\w\/]*?)([\w\.]*)\.(\w)$
Test cases
the/p0090Aath/fav.min.icon.png
the/p0090Aath/fav.min.icon.html
the/p009_0Aath/fav.m45in.icon.css
fav.m45in.icon.css
favicon.ico
Output
[the/p0090Aath/][fav.min.icon][png]
[the/p0090Aath/][fav.min.icon][html]
[the/p009_0Aath/][fav.m45in.icon][css]
[][fav.m45in.icon][css]
[][favicon][ico]

(?!\w+).(\w+)(\s)
Find one or more word (s) \w+, negate (?! ) so that the word (s) are not shown on the result, specify the delimiter ., find the first word (\w+) and ignore the words that are after a possible blank space (\s)

Related

RegEx for detecting a string and a path in one go

Here is an example of what regex I need regex
I have many of these lines in a file
build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2
I need to detect the string between CXX_COMPILER__ and _Debug, which here is testfoo2.
At the same time, I need to also detect the entire file path /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp, which comes always after the first match.
I could not figure out a regex for this. So far I have .*CXX_COMPILER__(.\w+)_\w+|(\/[a-zA-Z_0-9-]+)+\.\w+ and I am using it in typescript like so:
const fileAndTargetRegExp = new RegExp('.*CXX_COMPILER__(.\w+)_\w+|(\/[a-zA-Z_0-9-]+)+\.\w+', 'gm');
let match;
while (match = fileAndTargetRegExp.exec(fileContents)) {
//do something
}
But I get no matches. Is there an easy way to do this?
Will it always have the || <stuff here> at the end? If so, this regex based on the one you provided should work:
/.*CXX_COMPILER__(\w+)_.+?((?:\/.+)+) \|\|.*/g
As the regex101 breakdown shows, the first capturing group should contain the string between CXX_COMPILER__ and _Debug, while the second should contain the path, using the space and pipes to detect where the latter ends.
let line = 'build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2';
const matches = line.match(/.*CXX_COMPILER__(\w+)_.+?((?:\/.+)+) \|\|.*/).slice(1); //slice(1) just to not include the first complete match returned by match!
for (let match of matches) {
console.log(match);
}
If the pipes won't always be there, then this version should work instead (regex101):
.*CXX_COMPILER__(\w+)_.+?((?:\/(?:\w|\.|-)+)+).*
But it requires you to add all of the valid path characters individually every time you realize a new one might be there, and you'll need to make sure the paths don't have spaces because adding space to the regex would make it detect the stuff after the path too.
Looks good, but you need delimiters. Add "/" before and after your Regex - no quotation marks.
let fileContents = 'build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2';
const fileAndTargetRegExp = new RegExp(/.*CXX_COMPILER__(.\w+)_\w+|(\/[a-zA-Z_0-9-]+)+\.\w+/, 'gm');
let match;
while (match = fileAndTargetRegExp.exec(fileContents)) {
console.log(match);
}
Here's my way of doing it with replace:
I need to detect the string between CXX_COMPILER__ and _Debug, which is here testfoo2.
Try to replace all characters of the string with just the first captured group $1 which is between CXX_COMPILER__ and _Debug:
/.*CXX_COMPILER__(\w+)_Debug.*/
^^^^<--testfoo2
I need to also detect the entire file path /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp
The same, just this time replace all just leave the second matched group which is anything comes after our first captured group:
/.*CXX_COMPILER__(\w+)_Debug\s+(.*?)(?=\\|\|).*/
^^^<-- /home/.../testfoo2.cpp
let line = 'build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2'
console.log(line.replace(/.*CXX_COMPILER__(\w+)_Debug.*/gm,'$1'))
console.log(line.replace(/.*CXX_COMPILER__(\w+)_Debug\s+(.*?)(?=\\|\|).*/gm,'$2'))

Getting element from filename using continous split or regex

I currently have the following string :
AAAAA/BBBBB/1565079415419-1564416946615-file-test.dsv
But I would like to split it to only get the following result (removing all tree directories + removing timestamp before the file):
1564416946615-file-test.dsv
I currently have the following code, but it's not working when the filename itselfs contains a '-' like in the example.
getFilename(str){
return(str.split('\\').pop().split('/').pop().split('-')[1]);
}
I don't want to use a loop for performances considerations (I may have lots of files to work with...) So it there an other solution (maybe regex ?)
We can try doing a regex replacement with the following pattern:
.*\/\d+-\b
Replacing the match with empty string should leave you with the result you want.
var filename = "AAAAA/BBBBB/1565079415419-1564416946615-file-test.dsv";
var output = filename.replace(/.*\/\d+-\b/, "");
console.log(output);
The pattern works by using .*/ to first consume everything up, and including, the final path separator. Then, \d+- consumes the timestamp as well as the dash that follows, leaving only the portion you want.
You may use this regex and get captured group #1:
/[^\/-]+-(.+)$/
RegEx Demo
RegEx Details:
[^\/-]+: Match any character that is not / and not -
-: Match literal -
(.+): Match 1+ of any characters
$: End
Code:
var filename = "AAAAA/BBBBB/1565079415419-1564416946615-file-test.dsv";
var m = filename.match(/[^\/-]+-(.+)$/);
console.log(m[1]);
//=> 1564416946615-file-test.dsv

regex to capture just filename (no url path, no extension)

In JavaScript, I can use this regex ([^\/]+)(\.[^\.\/]+)$ to capture just the filename in a URL. It works well in the following cases:
http://a.com/b/file.name.ext
http://a.com/b/file.name.ext#hash
http://a.com/b/file.name.ext?query
However it fails to match if there is no extension:
No match
http://a.com/b/filename
http://a.com/b/filename#hash
http://a.com/b/filename?query
This is normal. The second capturing group expects there to be a .ext chunk at the end.
If I make the second capturing group optional...
`([^\/]+)(\.[^\.\/]+)?$`
... then the first capturing group becomes greedy, and includes the .ext ending, which I don't want. How is the regex engine thinking about the optional second group? How can I make the existence of an extension optional?
NOTE: This regex is not intended for use with URLs with the following structure:
http://a.com/b/filename?query=a.b
http://a.com/b/filename.ext?query=a.b
In my case, dots will never appear later in the the URL.
If you want pure regex (= nice and clean regular language expression from theoretical computer science, plus capturing groups), then you can do it with alternative groups:
([^\/.]+)$|([^\/]+)(\.[^\/.]+)$
and identify groups 1 and 2. Group 3 is the optional extension.
Another possibility:
([^\/.]+)(([^\/]*)(\.[^\/.]+))?$
Here you'd use group 4 as the extension, and the concatenation of groups 1 and 3 as the filename. Group 2 is only used to make the compound of 3 and 4 optional.
Tested with:
http://a.com/b/file.name.ext
http://a.com/b/filename
http://a.com/b/filename#hash
http://a.com/b/filename?query
var file = "http://a.com/b/filename#hash";
function getFileName(url) {
var index = url.lastIndexOf("/") + 1;
var filenameWithExtension = url.substr(index);
var filename = filenameWithExtension.split(".")[0];
filename = filename.replace(/(#|\?).*?$/, "");
return filename;
}
alert(getFileName(file));
//filename
References:
lastindexof
split
substr
replace

File path validation in javascript

I am trying to validate XML file path in javascript. My REGEX is:
var isValid = /^([a-zA-Z]:)?(\\{2}|\/)?([a-zA-Z0-9\\s_#-^!#$%&+={}\[\]]+(\\{2}|\/)?)+(\.xml+)?$/.test(str);
It returns true even when path is wrong.
These are valid paths
D:/test.xml
D:\\folder\\test.xml
D:/folder/test.xml
D:\\folder/test.xml
D:\\test.xml
At first the obvious errors:
+ is a repeat indicator that has the meaning at least one.
so the (\.xml+) will match everything starting with .xm followed by one or more l (it would also match .xmlllll). the ? means optional, so (\.xml+)? has the meaning it could have an .xml but it is not required.
the same is for ([a-zA-Z]:)? this means the driver letter is optional.
Now the not so obvious errors
[a-zA-Z0-9\\s_#-^!#$%&+={}\[\]] here you define a list of allowed chars. you have \\s and i assume you want to allow spaces, but this allows \ and s so you need to change it to \s. then you have this part #-^ i assume you want to allow #, - and ^ but the - has a special meaning inside of [ ] with it you define a range so you allow all chars that are in the range of # to ^ if you want to allow - you need to escape it there so you have to write #\-^ you also need to take care about ^, if it is right after the [ it would have also a special meaning.
your Regex should contain the following parts:
^[a-z]: start with (^) driver letter
((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+ followed by one or more path parts that start with either \ or / and having a path name containing one or more of your defined letters (a-z0-9\s_#\-^!#$%&+={}\[\])
\.xml$ ends with ($) the .xml
therefore your final regex should look like this
/^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(str)
(under the assumption you do a case insensitve regex using the i flag)
EDIT:
var path1 = "D:/test.xml"; // D:/test.xml
var path2 = "D:\\folder\\test.xml"; // D:\folder\test.xml
var path3 = "D:/folder/test.xml"; // D:/folder/test.xml
var path4 = "D:\\folder/test.xml"; // D:\folder/test.xml
var path5 = "D:\\test.xml"; // D:\test.xml
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path1) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path2) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path3) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path4) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path5) );
UPDATE:
you need to take care about the / and the \ if you need to escape them depends on if you use it with new RegExp(' ... the regex ... ',"i") and new RegExp(" ... the regex ... ","i") or with / ... the regex ... /i
for further informations about regular expressions you should take a look at e.g. www.regular-expressions.info
This could work out for you
var str = 'D:/test.xml';
var str2 = 'D:\\folder\\test.xml';
var str3 = 'D:/folder/test.xml';
var str4 = 'D:\\folder/test.xml';
var str5 = 'D:\\test\\test\\test\\test.xml';
var regex = new RegExp('^[a-z]:((\\\\|\/)[a-zA-Z0-9_ \-]+)+\.xml$', 'i');
regex.test(str5);
The reason of having \\\\ in RegExp to match a \\ in string is that javascript uses \ to escape special characters, i.e., \n for new lines, \b for word boundary etc. So to use a literal \, use \\. It also allows you to have different rules for file name and folder name.
Update
[a-zA-Z0-9_\-]+ this section of regexp actually match file/folder name. So to allow more characters in file/folder name, just add them to this class, e.g., to allow a * in file/folder name make it [a-zA-Z0-9_\-\*]+
Update 2
For adding to the answer, following is an RegExp that adds another check to the validation, i.e., it checks for mixing of / and \\ in the path.
var str6 = 'D:/This is folder/test # file.xml';
var str7 = 'D:/This is invalid\\path.xml'
var regex2 = new RegExp('^[a-z]:(\/|\\\\)([a-zA-Z0-9_ \-]+\\1)*[a-zA-Z0-9_ #\-]+\.xml?', 'gi');
regex2 will match all paths but str7
Update
My apologies for mistyping a ? instead of $ in regex2. Below is the corrected and intended version
var regex2 = new RegExp('^[a-z]:(\/|\\\\)([a-zA-Z0-9_ \-]+\\1)*[a-zA-Z0-9_ #\-]+\.xml$', 'i');
Tested using Scratchpad.
var regex = /^[a-z]:((\/|(\\?))[\w .]+)+\.xml$/i;
Prints true in Web Console: (Ctrl+Shift+K on Firefox)
console.log(regex.test("D:/test.xml"));
console.log(regex.test("D:\\folder\\test.xml"));
console.log(regex.test("D:/folder/test.xml"));
console.log(regex.test("D:\\folder/test.xml"));
console.log(regex.test("D:\\test.xml"));
console.log(regex.test("D:\\te st_1.3.xml")); // spaces, dots allowed
Or, using Alert boxes:
alert(regex.test("D:/test.xml"));
alert(regex.test("D:\\folder\\test.xml"));
alert(regex.test("D:/folder/test.xml"));
alert(regex.test("D:\\folder/test.xml"));
alert(regex.test("D:\\test.xml"));
alert(regex.test("D:\\te st_1.3.xml"));
Invalid file paths:
alert(regex.test("AD:/test.xml")); // invalid drive letter
alert(regex.test("D:\\\folder\\test.xml")); // three backslashes
alert(regex.test("/folder/test.xml")); // drive letter missing
alert(regex.test("D:\\folder/test.xmlfile")); // invalid extension

Put the filename and the filetype in a array

I want to put the filename and the filetype in a array and I know the answer (split) but I don't know how to look for the last dot before the extension begin.
Examples: Funny - SMS 02.jpg will get Funny - SMS 02in one array and jpg in another. But when I'm try to split the name of an file that already contains dots, the trouble begins. Funny - When you see it....jpg prints Funny - When you see it in for example fname[0] and jpg in fname[1].
How can I make it print Funny - When you see it... as fname[0] and jpg as fname[1]?
Thanks in advance.
function getFnameExt(filename) {
var parts = filename.split('.'), ext = parts.pop(), fname = parts.join('.');
return [ fname, ext ];
}
console.log( getFnameExt("Funny - When you see it....jpg") );
var array = [];
var s = "Funny - When you see it....jpg";
var lastDot = s.lastIndexOf(".");
array[0] = s.substring(0, lastDot);
array[1] = s.substring(lastDot + 1);
alert(array[0] + "---" + array[1]);
For these kinds of tasks splitting is usually cumbersome. Regular expressions are more powerful:
var matches = /^(.*)\.([^.]*)$/g.exec("Funny - When you see it....jpg");
matches.shift();
// matches:
// ["Funny - When you see it...", "jpg"]
This matches the string against the regexp, which results in an array with three elements. The first is the full match which is not needed, so shift it.
^ begin of string
.* any amount of any character
\. a dot
[^.]* any amount of any character except a dot
$ end of string
With the begin/end of string anchors, .* must contain all characters before the last dot.
( and ) denote a group, which adds the matched substring to the array.
fname = "Funny - When you see it....jpg"
parts = fname.split(/\.(?=[^.]*$)/)
// parts=["Funny - When you see it...", "jpg"]
?= is called 'lookahead' and basically means "followed by". So, the above reads: "split by a dot if there's no dots after it".

Categories

Resources