File path validation in javascript - javascript

I am trying to validate XML file path in javascript. My REGEX is:
var isValid = /^([a-zA-Z]:)?(\\{2}|\/)?([a-zA-Z0-9\\s_#-^!#$%&+={}\[\]]+(\\{2}|\/)?)+(\.xml+)?$/.test(str);
It returns true even when path is wrong.
These are valid paths
D:/test.xml
D:\\folder\\test.xml
D:/folder/test.xml
D:\\folder/test.xml
D:\\test.xml

At first the obvious errors:
+ is a repeat indicator that has the meaning at least one.
so the (\.xml+) will match everything starting with .xm followed by one or more l (it would also match .xmlllll). the ? means optional, so (\.xml+)? has the meaning it could have an .xml but it is not required.
the same is for ([a-zA-Z]:)? this means the driver letter is optional.
Now the not so obvious errors
[a-zA-Z0-9\\s_#-^!#$%&+={}\[\]] here you define a list of allowed chars. you have \\s and i assume you want to allow spaces, but this allows \ and s so you need to change it to \s. then you have this part #-^ i assume you want to allow #, - and ^ but the - has a special meaning inside of [ ] with it you define a range so you allow all chars that are in the range of # to ^ if you want to allow - you need to escape it there so you have to write #\-^ you also need to take care about ^, if it is right after the [ it would have also a special meaning.
your Regex should contain the following parts:
^[a-z]: start with (^) driver letter
((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+ followed by one or more path parts that start with either \ or / and having a path name containing one or more of your defined letters (a-z0-9\s_#\-^!#$%&+={}\[\])
\.xml$ ends with ($) the .xml
therefore your final regex should look like this
/^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(str)
(under the assumption you do a case insensitve regex using the i flag)
EDIT:
var path1 = "D:/test.xml"; // D:/test.xml
var path2 = "D:\\folder\\test.xml"; // D:\folder\test.xml
var path3 = "D:/folder/test.xml"; // D:/folder/test.xml
var path4 = "D:\\folder/test.xml"; // D:\folder/test.xml
var path5 = "D:\\test.xml"; // D:\test.xml
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path1) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path2) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path3) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path4) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path5) );
UPDATE:
you need to take care about the / and the \ if you need to escape them depends on if you use it with new RegExp(' ... the regex ... ',"i") and new RegExp(" ... the regex ... ","i") or with / ... the regex ... /i
for further informations about regular expressions you should take a look at e.g. www.regular-expressions.info

This could work out for you
var str = 'D:/test.xml';
var str2 = 'D:\\folder\\test.xml';
var str3 = 'D:/folder/test.xml';
var str4 = 'D:\\folder/test.xml';
var str5 = 'D:\\test\\test\\test\\test.xml';
var regex = new RegExp('^[a-z]:((\\\\|\/)[a-zA-Z0-9_ \-]+)+\.xml$', 'i');
regex.test(str5);
The reason of having \\\\ in RegExp to match a \\ in string is that javascript uses \ to escape special characters, i.e., \n for new lines, \b for word boundary etc. So to use a literal \, use \\. It also allows you to have different rules for file name and folder name.
Update
[a-zA-Z0-9_\-]+ this section of regexp actually match file/folder name. So to allow more characters in file/folder name, just add them to this class, e.g., to allow a * in file/folder name make it [a-zA-Z0-9_\-\*]+
Update 2
For adding to the answer, following is an RegExp that adds another check to the validation, i.e., it checks for mixing of / and \\ in the path.
var str6 = 'D:/This is folder/test # file.xml';
var str7 = 'D:/This is invalid\\path.xml'
var regex2 = new RegExp('^[a-z]:(\/|\\\\)([a-zA-Z0-9_ \-]+\\1)*[a-zA-Z0-9_ #\-]+\.xml?', 'gi');
regex2 will match all paths but str7
Update
My apologies for mistyping a ? instead of $ in regex2. Below is the corrected and intended version
var regex2 = new RegExp('^[a-z]:(\/|\\\\)([a-zA-Z0-9_ \-]+\\1)*[a-zA-Z0-9_ #\-]+\.xml$', 'i');

Tested using Scratchpad.
var regex = /^[a-z]:((\/|(\\?))[\w .]+)+\.xml$/i;
Prints true in Web Console: (Ctrl+Shift+K on Firefox)
console.log(regex.test("D:/test.xml"));
console.log(regex.test("D:\\folder\\test.xml"));
console.log(regex.test("D:/folder/test.xml"));
console.log(regex.test("D:\\folder/test.xml"));
console.log(regex.test("D:\\test.xml"));
console.log(regex.test("D:\\te st_1.3.xml")); // spaces, dots allowed
Or, using Alert boxes:
alert(regex.test("D:/test.xml"));
alert(regex.test("D:\\folder\\test.xml"));
alert(regex.test("D:/folder/test.xml"));
alert(regex.test("D:\\folder/test.xml"));
alert(regex.test("D:\\test.xml"));
alert(regex.test("D:\\te st_1.3.xml"));
Invalid file paths:
alert(regex.test("AD:/test.xml")); // invalid drive letter
alert(regex.test("D:\\\folder\\test.xml")); // three backslashes
alert(regex.test("/folder/test.xml")); // drive letter missing
alert(regex.test("D:\\folder/test.xmlfile")); // invalid extension

Related

Using JS to modify user input for REGEXP search

I'm taking user input from a searchbar and modifying it to a regexp. From there I can search a json file for valid values and return them. It works fine with input without quotes, but with them, I'm appending "\Q" and "\E" so I can find the entirety of the string (with spaces and other special characters).
if (searchField.includes('"')){
var tempexpress = searchField.substring(1,searchField.length-1);
var tempexpress = "\\Q" + tempexpress + "\\E";
var expression = new RegExp(tempexpress);
} else {
var tempexpress = searchField.replace('(',"\\(");
var tempexpress = tempexpress.replace(')',"\\)");
var tempexpress = tempexpress.replace(/'/g,"\\'");
var tempexpress = tempexpress.replace('*',"\.");
var expression = new RegExp(tempexpress, "i");
};
if (value.data.label.search(expression) != -1){
console.log('found it');
}
If I input "QTT6" into the search field (with quotes for a literal), then it creates the following regexp: /\QQTT6\E/
In my testing, I found that it doesn't match to QTT6 for some reason and I'm not sure why. Any help is appreciated.
Also I'm very new to JS and Jquery, so sorry if my code isn't very well put together.
Per Kelly's comment:
In JS you need to use ^ and $ instead of \Q and \E.
For more information, see the MDN docs on Regex Assertions:
^:
Matches the beginning of input. If the multiline flag is set to true, also matches immediately after a line break character. For example, /^A/ does not match the "A" in "an A", but does match the first "A" in "An A".
Note: This character has a different meaning when it appears at the start of a character class.
$:
Matches the end of input. If the multiline flag is set to true, also matches immediately before a line break character. For example, /t$/ does not match the "t" in "eater", but does match it in "eat".

Replace all character matches that are not escaped with backslash

I am using regex to replace ( in other regexes (or regexs?) with (?: to turn them into non-matching groups. My expression assumes that no (?X structures are used and looks like this:
(
[^\\] - Not backslash character
|^ - Or string beginning
)
(?:
[\(] - a bracket
)
Unfortunatelly this doesn't work in case that there are two matches next to each other, like in this case: how((\s+can|\s+do)(\s+i)?)?
With lookbehinds, the solution is easy:
/(?<=[^\\]|^)[\(]/g
But javascript doesn't support lookbehinds, so what can I do? My searches didn't bring any easy universal lookbehind alternative.
Use lookbehind through reversal:
function revStr(str) {
return str.split('').reverse().join('');
}
var rx = /[(](?=[^\\]|$)/g;
var subst = ":?(";
var data = "how((\\s+can|\\s+do)(\\s+i)?)?";
var res = revStr(revStr(data).replace(rx, subst));
document.getElementById("res").value = res;
<input id="res" />
Note that the regex pattern is also reversed so that we could use a look-ahead instead of a look-behind, and the substitution string is reversed, too. It becomes too tricky with longer regexps, but in this case, it is still not that unreadable.
One option is to do a two-pass replacement, with a token (I like unicode for this, as it's unlikely to appear elsewhere):
var s = 'how((\\s+can|\\s+do)(\\s+i)?)?';
var token = "\u1234";
// Look for the character preceding the ( you want
// to replace. We'll add the token after it.
var patt1 = /([^\\])(?=\()/g;
// The second pattern looks for the token and the (.
// We'll replace both with the desired string.
var patt2 = new RegExp(token + '\\(', 'g');
s = s.replace(patt1, "$1" + token).replace(patt2, "(?:");
console.log(s);
https://jsfiddle.net/48e75wqz/1/
(EDITED)
string example:
how((\s+can|\s+do)(\s+i)?)?
one line solution:
o='how((\\s+can|\\s+do)(\\s+i)?)?';
alert(o.replace(/\\\(/g,9e9).replace(/\(/g,'(?:').replace(/90{9}/g,'\\('))
result:
how(?:(?:\s+can|\s+do)(?:\s+i)?)?
and of course it works with strings like how((\s+\(can\)|\s+do)(\s+i)?)?

Matching a string with a regex gives null even though it should match

I am trying to get my regex to work in JavaScript, but I have a problem.
Code:
var reg = new RegExp('978\d{10}');
var isbn = '9788740013498';
var res = isbn.match(reg);
console.log(res);
However, res is always null in the console.
This is quite interesting, as the regex should work.
My question: then, what is the right syntax to match a string and a regex?
(If it matters and could have any say in the environment: this code is taken from an app.get view made in Express.js in my Node.js application)
Because you're using a string to build your regex, you need to escape the \. It's currently working to escape the d, which doesn't need escaping.
You can see what happens if you create your regex on the chrome console:
new RegExp('978\d{10}');
// => /978d{10}/
Note that there is no \d, only a d, so your regex matches 978dddddddddd. That is, the literal 'd' character repeated 10 times.
You need to use \\ to insert a literal \ in the string you're building the regex from:
var reg = new RegExp('978\\d{10}');
var isbn = '9788740013498';
var res = isbn.match(reg);
console.log(res)
// => ["9788740013498", index: 0, input: "9788740013498"]
You need to escape with double back slash if you use RegExp constructor:
var reg = new RegExp('978\\d{10}');
Quote from documentation:
When using the constructor function, the normal string escape rules (preceding special characters with \ when included in a string) are necessary. For example, the following are equivalent:
var re = /\w+/;
var re = new RegExp("\\w+");

Match filename and file extension from single Regex

I'm sure this must be easy enough, but I'm struggling...
var regexFileName = /[^\\]*$/; // match filename
var regexFileExtension = /(\w+)$/; // match file extension
function displayUpload() {
var path = $el.val(); //This is a file input
var filename = path.match(regexFileName); // returns file name
var extension = filename[0].match(regexFileExtension); // returns extension
console.log("The filename is " + filename[0]);
console.log("The extension is " + extension[0]);
}
The function above works fine, but I'm sure it must be possible to achieve with a single regex, by referencing different parts of the array returned with the .match() method. I've tried combining these regex but without success.
Also, I'm not using a string to test it on in the example, as console.log() escapes the backslashes in a filepath and it was starting to confuse me :)
Assuming that all files do have an extension, you could use
var regexAll = /[^\\]*\.(\w+)$/;
Then you can do
var total = path.match(regexAll);
var filename = total[0];
var extension = total[1];
/^.*\/(.*)\.?(.*)$/g after this first group is your file name and second group is extention.
var myString = "filePath/long/path/myfile.even.with.dotes.TXT";
var myRegexp = /^.*\/(.*)\.(.*)$/g;
var match = myRegexp.exec(myString);
alert(match[1]); // myfile.even.with.dotes
alert(match[2]); // TXT
This works even if your filename contains more then one dotes or doesn't contain dots at all (has no extention).
EDIT:
This is for linux, for windows use this /^.*\\(.*)\.?(.*)$/g (in linux directory separator is / in windows is \ )
You can use groups in your regular expression for this:
var regex = /^([^\\]*)\.(\w+)$/;
var matches = filename.match(regex);
if (matches) {
var filename = matches[1];
var extension = matches[2];
}
I know this is an old question, but here's another solution that can handle multiple dots in the name and also when there's no extension at all (or an extension of just '.'):
/^(.*?)(\.[^.]*)?$/
Taking it a piece at a time:
^
Anchor to the start of the string (to avoid partial matches)
(.*?)
Match any character ., 0 or more times *, lazily ? (don't just grab them all if the later optional extension can match), and put them in the first capture group ( ).
(\.
Start a 2nd capture group for the extension using (. This group starts with the literal . character (which we escape with \ so that . isn't interpreted as "match any character").
[^.]*
Define a character set []. Match characters not in the set by specifying this is an inverted character set ^. Match 0 or more non-. chars to get the rest of the file extension *. We specify it this way so that it doesn't match early on filenames like foo.bar.baz, incorrectly giving an extension with more than one dot in it of .bar.baz instead of just .baz.
. doesn't need escaped inside [], since everything (except^) is a literal in a character set.
)?
End the 2nd capture group ) and indicate that the whole group is optional ?, since it may not have an extension.
$
Anchor to the end of the string (again, to avoid partial matches)
If you're using ES6 you can even use destructing to grab the results in 1 line:
[,filename, extension] = /^(.*?)(\.[^.]*)?$/.exec('foo.bar.baz');
which gives the filename as 'foo.bar' and the extension as '.baz'.
'foo' gives 'foo' and ''
'foo.' gives 'foo' and '.'
'.js' gives '' and '.js'
This will recognize even /home/someUser/.aaa/.bb.c:
function splitPathFileExtension(path){
var parsed = path.match(/^(.*\/)(.*)\.(.*)$/);
return [parsed[1], parsed[2], parsed[3]];
}
I think this is a better approach as matches only valid directory, file names and extension. and also groups the path, filename and file extension. And also works with empty paths only filename.
^([\w\/]*?)([\w\.]*)\.(\w)$
Test cases
the/p0090Aath/fav.min.icon.png
the/p0090Aath/fav.min.icon.html
the/p009_0Aath/fav.m45in.icon.css
fav.m45in.icon.css
favicon.ico
Output
[the/p0090Aath/][fav.min.icon][png]
[the/p0090Aath/][fav.min.icon][html]
[the/p009_0Aath/][fav.m45in.icon][css]
[][fav.m45in.icon][css]
[][favicon][ico]
(?!\w+).(\w+)(\s)
Find one or more word (s) \w+, negate (?! ) so that the word (s) are not shown on the result, specify the delimiter ., find the first word (\w+) and ignore the words that are after a possible blank space (\s)

Javascript regular expression to replace word but not within curly brackets

I have some content, for example:
If you have a question, ask for help on StackOverflow
I have a list of synonyms:
a={one typical|only one|one single|one sole|merely one|just one|one unitary|one small|this solitary|this slight}
ask={question|inquire of|seek information from|put a question to|demand|request|expect|inquire|query|interrogate}
I'm using JavaScript to:
Split synonyms based on =
Looping through every synonym, if found in content replace with {...|...}
The output should look like:
If you have {one typical|only one|one single|one sole|merely one|just one|one unitary|one small|this solitary|this slight} question, {question|inquire of|seek information from|put a question to|demand|request|expect|inquire|query|interrogate} for help on StackOverflow
Problem:
Instead of replacing the entire word, it's replacing every character found. My code:
for(syn in allSyn) {
var rtnSyn = allSyn[syn].split("=");
var word = rtnSyn[0];
var synonym = (rtnSyn[1]).trim();
if(word && synonym){
var match = new RegExp(word, "ig");
postProcessContent = preProcessContent.replace(match, synonym);
preProcessContent = postProcessContent;
}
}
It should replace content word with synonym which should not be in {...|...}.
When you build the regexps, you need to include word boundary anchors at both the beginning and the end to match whole words (beginning and ending with characters from [a-zA-Z0-9_]) only:
var match = new RegExp("\\b" + word + "\\b", "ig");
Depending on the specific replacements you are making, you might want to apply your method to individual words (rather than to the entire text at once) matched using a regexp like /\w+/g to avoid replacing words that themselves are the replacements for others. Something like:
content = content.replace(/\w+/g, function(word) {
for(var i = 0, L = allSyn.length; i < L; ++i) {
var rtnSyn = allSyn[syn].split("=");
var synonym = (rtnSyn[1]).trim();
if(synonym && rtnSyn[0].toLowerCase() == word.toLowerCase()) return synonym;
}
});
Regular expressions include something called a "word-boundary", represented by \b. It is a zero-width assertion (it just checks something, it doesn't "eat" input) that says in order to match, certain word boundary conditions have to apply. One example is a space followed by a letter; given the string ' X', this regex would match it: / \bX/. So to make your code work, you just have to add word boundaries to the beginning and end of your word regex, like this:
for(syn in allSyn) {
var rtnSyn = allSyn[syn].split("=");
var word = rtnSyn[0];
var synonym = (rtnSyn[1]).trim();
if(word && synonym){
var match = new RegExp("\\b"+word+"\\b", "ig");
postProcessContent = preProcessContent.replace(match, synonym);
preProcessContent = postProcessContent;
}
}
[Note that there are two backslashes in each of the word boundary matchers because in javascript strings, the backslash is for escape characters -- two backslashes turns into a literal backslash.]
For optimization, don't create a new RegExp on each iteration. Instead, build up a big regex like [^{A-Za-z](a|ask|...)[^}A-Za-z] and an hash with a value for each key specifying what to replace it with. I'm not familiar enough with JavaScript to create the code on the fly.
Note the separator regex which says the match cannot begin with { or end with }. This is not terribly precise, but hopefully acceptable in practice. If you genuinely need to replace words next to { or } then this can certainly be refined, but I'm hoping we won't have to.

Categories

Resources