Regular expressions ignoring a file extension - javascript

I need some help with a regular expression. I have the following 4 file names
heapdump.20160406.214053.18914.0013.phd
heapdump.20160406.214053.18914.0013.phd.gz
javacore.20160406.214053.18914.0002.txt
javacore.20160406.214053.18914.0002.txt.gz
Basically what I need is for my regular expression to ignore the files with the .gz on the end of it. I tried the following but it does not seem to work.
/heapdump.*.phd|javacore.*.txt/i
/heapdump*.phd|javacore*.txt/i
/heapdump.\d+.\d+.\d+.\d+.phd|javacore.\d+.\d+.\d+.\d+.txt/i
Thanks

This will work
(?!.*\.gz$)(^.*$)
Regex Demo
JS Code
var re = /(?!.*\.gz$)(^.*$)/gm;
var str = 'heapdump.20160406.214053.18914.0013.phd\nheapdump.20160406.214053.18914.0013.phd.gz\njavacore.20160406.214053.18914.0002.txt\njavacore.20160406.214053.18914.0002.txt.gz';
var result = str.match(re);
document.writeln(result)

It depends on how much you want the solution to be precise. If you only have phd and txt extensions this will work
/heapdump.*\.(phd|txt)$/
Which means: a string starting with heapdump, followed by whatever, then a dot, then phd or txt, end of line
Or you can simply negate a string that ends with dot gz
/.*\.gz$/

One option which does not require using a regular expression would be to split the filename on period (.) into an array, and then check if the last element of the array contains the extension gz:
var filename = "heapdump.20160406.214053.18914.0013.phd.gz";
var parts = filename.split(".");
if (parts[parts.length - 1] == 'gz') {
alert("ignore this file");
}
else {
alert("pay attention to this file");
}

Related

How would I write a Regular Expression to capture the value between Last Slash and Query String?

Problem:
Extract image file name from CDN address similar to the following:
https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba
Two-stage Solution:
I am using two regular expressions to retrieve the file name:
var postLastSlashRegEx = /[^\/]+$/,
preQueryRegEx = /^([^?]+)/;
var fileFromURL = urlString.match(postLastSlashRegEx)[0].match(preQueryRegEx)[0];
// fileFromURL = "photo%2FB%_2.jpeg"
Question:
Is there a way I can combine both regular expressions?
I've tried using capture groups, but haven't been able to produce a working solution.
From my comment
You can use a lookahead to find the "?" and use [^/] to match any non-slash characters.
/[^/]+(?=\?)/
To remove the dependency on the URL needing a "?", you can make the lookahead match a question mark or the end of line indicator (represented by $), but make sure the first glob is non-greedy.
/[^/]+?(?=\?|$)/
You don't have to use regex, you can just use split and substr.
var str = "https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba".split("?")[0];
var fileName = temp.substr(temp.lastIndexOf('/')+1);
but if regex is important to you, then:
str.match(/[^?]*\/([^?]+)/)[1]
The code using the substring method would look like the following -
var fileFromURL = urlString.substring(urlString.lastIndexOf('/') + 1, urlString.lastIndexOf('?'))

Regex to detect a string that contains a URL or file extension

I'm trying to create a small script that detects whether the string input is either:
1) a URL (which will hold a filename): 'http://ajax.googleapis.com/html5shiv.js'
2) just a filename: 'html5shiv.js'
So far I've found this but I think it just checks the URL and file extension. Is there an easy way to make it so it uses an 'or' check? I'm not very experienced with RegExp.
var myRegExp = /[^\\]*\.(\w+)$/i;
Thank you in advance.
How bout this regex?
(\.js)$
it checks the end of the line if it has a .js on it.
$ denotes end of line.
tested here.
Basically, to use 'OR' in regex, simply use the 'pipe' delimiter.
(aaa|bbb)
will match
aaa
or
bbb
For regex to match a url, I'd suggest the following:
\w+://[\w\._~:/?#\[\]#!$&'()*+,;=%]*
This is based on the allowed character set for a url.
For the file, what's your definition of a filename?
If you want to search for strings, that match "(at least) one to many non-fullstop characters, followed by a fullstop, followed by (at least) one to many non-fullstop characters", I'd suggest the following regex:
[^\.]+\.[^\.]+
And altogether:
(\w+://[\w\._~:/?#\[\]#!$&'()*+,;=%]*|[^\.]+\.[^\.]+)
Here's an example of working (in javascript): jsfiddle
You can test it out regex online here: http://gskinner.com/RegExr/
If it is for the purpose of flow control you can do the following:
var test = "http://ajax.googleapis.com/html5shiv.js";
// to recognize http & https
var regex = /^https?:\/\/.*/i;
var result = regex.exec(test);
if (result == null){
// no URL found code
} else {
// URL found code
}
For the purpose of capturing the file name you could use:
var test = "http://ajax.googleapis.com/html5shiv.js";
var regex = /(\w+\.\w+)$/i;
var filename = regex.exec(test);
Yes, you can use the alternation operator |. Be careful, though, because its priority is very low. Lower than sequencing. You will need to write things like /(cat)|(dog)/.
It's very hard to understand what you exactly want with so few use/test cases, but
(http://[a-zA-Z0-9\./]+)|([a-zA-Z0-9\.]+)
should give you a starting point.
If it's a URL, strip it down to the last part and treat it the same way as "just a filename".
function isFile(fileOrUrl) {
// This will return everything after the last '/'; if there's
// no forward slash in the string, the unmodified string is used
var filename = fileOrUrl.split('/').pop();
return (/.+\..+/).test(filename);
}
Try this:
var ajx = 'http://ajax.googleapis.com/html5shiv.js';
function isURL(str){
return /((\/\w+)|(^\w+))\.\w{2,}$/.test(str);
}
console.log(isURL(ajx));
Have a look at this (requires no regex at all):
var filename = string.indexOf('/') == -1
? string
: string.split('/').slice(-1)[0];
Here is the program!
<script>
var url="Home/this/example/file.js";
var condition=0;
var result="";
for(var i=url.length; i>0 && condition<2 ;i--)
{
if(url[i]!="/" && url[i]!="."){result= (condition==1)? (url[i]+result):(result);}
else{condition++;}
}
document.write(result);
</script>

Add regex to ignore /js /img and /css

I have this regular expression
// Look for /en/ or /en-US/ or /en_US/ on the URL
var matches = req.url.match( /^\/([a-zA-Z]{2,3}([-_][a-zA-Z]{2})?)(\/|$)/ );
Now with the above regular express it will cause the problem with the URL such as:
http://mydomain.com/css/bootstrap.css
or
http://mydomain.com/js/jquery.js
because my regular expression is to strip off 2-3 characters from A-Z or a-z
My question is how would I add in to this regular expression to not strip off anything with
js or img or css or ext
Without impacting the original one.
I'm not so expert on regular expression :(
Negative lookahead?
var matches = req.url.match(/^\/(?!(js|css))([a-zA-Z]{2,3}([-_][a-zA-Z]{2})?)(\/|$)/ );
\ not followed by js or css
First of all you have not defined what exactly you are searching for.
Define an array with lowercased common language codes (Common language codes)
This way you'll know what to look for.
After that, convert your url to lowercase and replace all '_' with '-' and search for every member of the array in the resulting string using indexOf().
Since you said you're using the regex to replace text, I changed it to a replace function. Also, you forced the regex to match the start of the string; I don't see how it would match anything with that. Anyway, here's my approach:
var result = req.url.replace(/\/([a-z]{2,3}([-_][a-z]{2})?)(?=\/|$)/i,
function(s,t){
switch(t){case"js":case"img":case"css":case"ext":return s;}
return "";
}
);

Match filename and file extension from single Regex

I'm sure this must be easy enough, but I'm struggling...
var regexFileName = /[^\\]*$/; // match filename
var regexFileExtension = /(\w+)$/; // match file extension
function displayUpload() {
var path = $el.val(); //This is a file input
var filename = path.match(regexFileName); // returns file name
var extension = filename[0].match(regexFileExtension); // returns extension
console.log("The filename is " + filename[0]);
console.log("The extension is " + extension[0]);
}
The function above works fine, but I'm sure it must be possible to achieve with a single regex, by referencing different parts of the array returned with the .match() method. I've tried combining these regex but without success.
Also, I'm not using a string to test it on in the example, as console.log() escapes the backslashes in a filepath and it was starting to confuse me :)
Assuming that all files do have an extension, you could use
var regexAll = /[^\\]*\.(\w+)$/;
Then you can do
var total = path.match(regexAll);
var filename = total[0];
var extension = total[1];
/^.*\/(.*)\.?(.*)$/g after this first group is your file name and second group is extention.
var myString = "filePath/long/path/myfile.even.with.dotes.TXT";
var myRegexp = /^.*\/(.*)\.(.*)$/g;
var match = myRegexp.exec(myString);
alert(match[1]); // myfile.even.with.dotes
alert(match[2]); // TXT
This works even if your filename contains more then one dotes or doesn't contain dots at all (has no extention).
EDIT:
This is for linux, for windows use this /^.*\\(.*)\.?(.*)$/g (in linux directory separator is / in windows is \ )
You can use groups in your regular expression for this:
var regex = /^([^\\]*)\.(\w+)$/;
var matches = filename.match(regex);
if (matches) {
var filename = matches[1];
var extension = matches[2];
}
I know this is an old question, but here's another solution that can handle multiple dots in the name and also when there's no extension at all (or an extension of just '.'):
/^(.*?)(\.[^.]*)?$/
Taking it a piece at a time:
^
Anchor to the start of the string (to avoid partial matches)
(.*?)
Match any character ., 0 or more times *, lazily ? (don't just grab them all if the later optional extension can match), and put them in the first capture group ( ).
(\.
Start a 2nd capture group for the extension using (. This group starts with the literal . character (which we escape with \ so that . isn't interpreted as "match any character").
[^.]*
Define a character set []. Match characters not in the set by specifying this is an inverted character set ^. Match 0 or more non-. chars to get the rest of the file extension *. We specify it this way so that it doesn't match early on filenames like foo.bar.baz, incorrectly giving an extension with more than one dot in it of .bar.baz instead of just .baz.
. doesn't need escaped inside [], since everything (except^) is a literal in a character set.
)?
End the 2nd capture group ) and indicate that the whole group is optional ?, since it may not have an extension.
$
Anchor to the end of the string (again, to avoid partial matches)
If you're using ES6 you can even use destructing to grab the results in 1 line:
[,filename, extension] = /^(.*?)(\.[^.]*)?$/.exec('foo.bar.baz');
which gives the filename as 'foo.bar' and the extension as '.baz'.
'foo' gives 'foo' and ''
'foo.' gives 'foo' and '.'
'.js' gives '' and '.js'
This will recognize even /home/someUser/.aaa/.bb.c:
function splitPathFileExtension(path){
var parsed = path.match(/^(.*\/)(.*)\.(.*)$/);
return [parsed[1], parsed[2], parsed[3]];
}
I think this is a better approach as matches only valid directory, file names and extension. and also groups the path, filename and file extension. And also works with empty paths only filename.
^([\w\/]*?)([\w\.]*)\.(\w)$
Test cases
the/p0090Aath/fav.min.icon.png
the/p0090Aath/fav.min.icon.html
the/p009_0Aath/fav.m45in.icon.css
fav.m45in.icon.css
favicon.ico
Output
[the/p0090Aath/][fav.min.icon][png]
[the/p0090Aath/][fav.min.icon][html]
[the/p009_0Aath/][fav.m45in.icon][css]
[][fav.m45in.icon][css]
[][favicon][ico]
(?!\w+).(\w+)(\s)
Find one or more word (s) \w+, negate (?! ) so that the word (s) are not shown on the result, specify the delimiter ., find the first word (\w+) and ignore the words that are after a possible blank space (\s)

Regex: Getting content from URL

I want to get "the-game" using regex from URLs like
http://www.somesite.com.domain.webdev.domain.com/en/the-game/another-one/another-one/another-one/
http://www.somesite.com.domain.webdev.domain.com/en/the-game/another-one/another-one/
http://www.somesite.com.domain.webdev.domain.com/en/the-game/another-one/
What parts of the URL could vary and what parts are constant? The following regex will always match whatever is in the slashes following "/en/" - the-game in your example.
(?<=/en/).*?(?=/)
This one will match the contents of the 2nd set of slashes of any URL containing "webdev", assuming the first set of slashes contains a 2 or 3 character language code.
(?<=.*?webdev.*?/.{2,3}/).*?(?=/)
Hopefully you can tweak these examples to accomplish what you're looking for.
var myregexp = /^(?:[^\/]*\/){4}([^\/]+)/;
var match = myregexp.exec(subject);
if (match != null) {
result = match[1];
} else {
result = "";
}
matches whatever lies between the fourth and fifth slash and stores the result in the variable result.
You probably should use some kind of url parsing library rather than resorting to using regex.
In python:
from urlparse import urlparse
url = urlparse('http://www.somesite.com.domain.webdev.domain.com/en/the-game/another-one/another-one/another-one/')
print url.path
Which would yield:
/en/the-game/another-one/another-one/another-one/
From there, you can do simple things like stripping /en/ from the beginning of the path. Otherwise, you're bound to do something wrong with a regular expression. Don't reinvent the wheel!

Categories

Resources