How to use regex to match an IPFS URL? - javascript

I have the following IPFS URL.
https://example.com:2053/ipfs/QmPQeMz2vzeLin5HcNYinVoSggPsaXh5QiKDBFtxMREgLf/images/0000000000000000000000000000000000000000000000000000000000000001.png
I want to use regex to match this file, but instead of writing the full URL, I want to just match something like https*00000001.png.
The problem is that when I use
paddedHex = '00000001';
let tmpSearchQuery = `https*${paddedHex}.png`;
It doesn't really match anything. Why?

You are repeating an s char zero or more times using s* and to create a dynamic regex you have to use the RegExp constructor.
You can repeat optional non whitespace chars instead using \S* and if you want to match http and https make the s optional using s?
const s = `https://ipfs.moralis.io:2053/ipfs/QmPQeMz2vzeLin5HcNYinVoSggPsaXh5QiKDBFtxMREgLf/images/0000000000000000000000000000000000000000000000000000000000000001.png`;
const paddedHex = '00000001';
const tmpSearchQuery = new RegExp(`https?\\S*${paddedHex}\\.png`);
const m = s.match(tmpSearchQuery);
if (m) {
console.log(m[0]);
}

Related

RegEx for detecting a string and a path in one go

Here is an example of what regex I need regex
I have many of these lines in a file
build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2
I need to detect the string between CXX_COMPILER__ and _Debug, which here is testfoo2.
At the same time, I need to also detect the entire file path /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp, which comes always after the first match.
I could not figure out a regex for this. So far I have .*CXX_COMPILER__(.\w+)_\w+|(\/[a-zA-Z_0-9-]+)+\.\w+ and I am using it in typescript like so:
const fileAndTargetRegExp = new RegExp('.*CXX_COMPILER__(.\w+)_\w+|(\/[a-zA-Z_0-9-]+)+\.\w+', 'gm');
let match;
while (match = fileAndTargetRegExp.exec(fileContents)) {
//do something
}
But I get no matches. Is there an easy way to do this?
Will it always have the || <stuff here> at the end? If so, this regex based on the one you provided should work:
/.*CXX_COMPILER__(\w+)_.+?((?:\/.+)+) \|\|.*/g
As the regex101 breakdown shows, the first capturing group should contain the string between CXX_COMPILER__ and _Debug, while the second should contain the path, using the space and pipes to detect where the latter ends.
let line = 'build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2';
const matches = line.match(/.*CXX_COMPILER__(\w+)_.+?((?:\/.+)+) \|\|.*/).slice(1); //slice(1) just to not include the first complete match returned by match!
for (let match of matches) {
console.log(match);
}
If the pipes won't always be there, then this version should work instead (regex101):
.*CXX_COMPILER__(\w+)_.+?((?:\/(?:\w|\.|-)+)+).*
But it requires you to add all of the valid path characters individually every time you realize a new one might be there, and you'll need to make sure the paths don't have spaces because adding space to the regex would make it detect the stuff after the path too.
Looks good, but you need delimiters. Add "/" before and after your Regex - no quotation marks.
let fileContents = 'build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2';
const fileAndTargetRegExp = new RegExp(/.*CXX_COMPILER__(.\w+)_\w+|(\/[a-zA-Z_0-9-]+)+\.\w+/, 'gm');
let match;
while (match = fileAndTargetRegExp.exec(fileContents)) {
console.log(match);
}
Here's my way of doing it with replace:
I need to detect the string between CXX_COMPILER__ and _Debug, which is here testfoo2.
Try to replace all characters of the string with just the first captured group $1 which is between CXX_COMPILER__ and _Debug:
/.*CXX_COMPILER__(\w+)_Debug.*/
^^^^<--testfoo2
I need to also detect the entire file path /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp
The same, just this time replace all just leave the second matched group which is anything comes after our first captured group:
/.*CXX_COMPILER__(\w+)_Debug\s+(.*?)(?=\\|\|).*/
^^^<-- /home/.../testfoo2.cpp
let line = 'build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2'
console.log(line.replace(/.*CXX_COMPILER__(\w+)_Debug.*/gm,'$1'))
console.log(line.replace(/.*CXX_COMPILER__(\w+)_Debug\s+(.*?)(?=\\|\|).*/gm,'$2'))

regex get part of the link

https://www.example.com/uk/This-Part-I-Need-To-Get/F1ST2/sometext/
need to get "This-Part-I-Need-To-Get", with "-" symbols and capital letters at the wordstart.
All I managed to do is "/([A-Z-])\w+/g", that returns
"This" "-Part" "-I" "-Need" "-To" "-Get" "F1ST2", but I don`t need "F1ST2".
How should I do it?
It might depend on URL format, but at this point:
var url = 'https://www.example.com/uk/This-Part-I-Need-To-Get/F1ST2/sometext/';
console.log(url.split('/')[4])
Try this regex
/([A-Z][a-z]|-[A-Z]|-[A-Z][a-z]-|-[A-Z]-)\w+/g
Here is a SNIPPET
var url = 'https://www.example.com/uk/This-Part-I-Need-To-Get/F1ST2/sometext/';
console.log(url.match(/([A-Z][a-z]|-[A-Z]|-[A-Z][a-z]-|-[A-Z]-)\w+/g).join(''))
As #MichałSałaciński said, you should consider using split function.
BTW, if you wan't to use regular expressions, then this one will work if url format does not change : [^\/]+(?=(?:\/\w+){2}\/)
Demo
var re = /[^\/]+(?=(?:\/\w+){2}\/)/
var url = "https://www.example.com/uk/This-Part-I-Need-To-Get/F1ST2/sometext/"
if(re.test(url)) {
// URL match regex pattern, we can safely get full match
var value = re.exec(url)[0];
console.log(value);
}
Explanation
[^\/]+ Any character but a slash n times
(?=...) Followed by
(?:\/\w+){2}\/ a slash and any word character (2 times) then a slash
Solution 2
This one also works using captured group 1: :\/\/[^\/]+\/[^\/]+\/([^\/]+)
Demo
var re = /:\/\/[^\/]+\/[^\/]+\/([^\/]+)/;
var url = "https://www.example.com/uk/This-Part-I-Need-To-Get/F1ST2/sometext/";
if(re.test(url)) {
// URL match regex pattern, we can safely get group 1 value
var value = re.exec(url)[1];
console.log(value );
}

How would I write a Regular Expression to capture the value between Last Slash and Query String?

Problem:
Extract image file name from CDN address similar to the following:
https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba
Two-stage Solution:
I am using two regular expressions to retrieve the file name:
var postLastSlashRegEx = /[^\/]+$/,
preQueryRegEx = /^([^?]+)/;
var fileFromURL = urlString.match(postLastSlashRegEx)[0].match(preQueryRegEx)[0];
// fileFromURL = "photo%2FB%_2.jpeg"
Question:
Is there a way I can combine both regular expressions?
I've tried using capture groups, but haven't been able to produce a working solution.
From my comment
You can use a lookahead to find the "?" and use [^/] to match any non-slash characters.
/[^/]+(?=\?)/
To remove the dependency on the URL needing a "?", you can make the lookahead match a question mark or the end of line indicator (represented by $), but make sure the first glob is non-greedy.
/[^/]+?(?=\?|$)/
You don't have to use regex, you can just use split and substr.
var str = "https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba".split("?")[0];
var fileName = temp.substr(temp.lastIndexOf('/')+1);
but if regex is important to you, then:
str.match(/[^?]*\/([^?]+)/)[1]
The code using the substring method would look like the following -
var fileFromURL = urlString.substring(urlString.lastIndexOf('/') + 1, urlString.lastIndexOf('?'))

Removing a query string using regex in java script

I have a requirement of removing a query parameter coming with a REST API call. Below are the sample URLs which need to be considered. In each of this URL, we need to remove 'key' parameter and its value.
/test/v1?key=keyval&param1=value1&param2=value2
/test/v1?key=keyval
/test/v1?param1=value1&key=keyval
/test/v1?param1=value1&key=keyval&param2=value2
After removing the key parameter, the final URLs should be as follows.
/test/v1?param1=value1&param2=value2
/test/v1?
/test/v1?param1=value1
/test/v1?param1=value1=&param2=value2
We used below regex expression to match and replace this query string in php. (https://regex101.com/r/pK0dX3/1)
(?<=[?&;])key=.*?($|[&;])
We couldn't use the same regex in java script. Once we use it in java script it gives some syntax errors. Can you please help us to figure out the issue with the same regex ? How can we change this regex to match and remove query parameter as mentioned above?
Obviously lookbehind isn't supported in Javascript hence your regex won't work.
In Javascript you can use this:
repl = input.replace(/(\?)key=[^&]*(?:&|$)|&key=[^&]*/gmi, '$1');
RegEx Demo
Regex is working on 2 paths using regex alternation:
If this query parameter is right after ? then we grab till & after parameter and place ? back in replacement.
If this query parameter is after & then &key=value is replaced by an empty string.
The regex works in PHP but not in Javascript because Javascript does not support lookbehind.
The easiest fix here would be to replace the lookbehind (?<=[?&;]) with the equivalent characters in a capturing group ([?&;]) and use a backreference ($1) to insert this bit back into the replacement string.
For example:
var path = '/test/v1?key=keyval&param1=value1&param2=value2';
var regex = /([?&;])key=.*?($|[&;])/;
console.log(path.replace(regex, '$1'); // outputs '/test/v1?param1=value1&param2=value2'
Not convinced regex would be the most reliable way of removing a query parameter, but that's a different story :-)
Just in case you want to do it without a regex, here is a function that will do the trick:
var removeQueryString = function (str) {
var qm = str.lastIndexOf('?');
var path = str.substr(0, qm + 1);
var querystr = str.substr(qm + 1);
var params = querystr.split('&');
var keyIndex = -1;
for (var i = 0; i < params.length; i++) {
if (params[i].indexOf("key=") === 0) {
keyIndex = i;
break;
}
}
if (keyIndex != -1) {
params.splice(keyIndex, 1);
}
var result = path + params.join('&');
return result;
};
The lookbehind feature isn't available in javascript, so to test the character before the key/value, you must match it. To make the pattern works whatever the position in the query part of the url, you can use an alternation in a non-capturing group, and you capture the question mark:
url = url.replace(/(?:&|(\?))key=[^&#]*(?:(?!\1).)?/, '$1');
Note: the # is excluded from the character class to prevent the fragment part (if any) of the url to be matched with key value.

File path validation in javascript

I am trying to validate XML file path in javascript. My REGEX is:
var isValid = /^([a-zA-Z]:)?(\\{2}|\/)?([a-zA-Z0-9\\s_#-^!#$%&+={}\[\]]+(\\{2}|\/)?)+(\.xml+)?$/.test(str);
It returns true even when path is wrong.
These are valid paths
D:/test.xml
D:\\folder\\test.xml
D:/folder/test.xml
D:\\folder/test.xml
D:\\test.xml
At first the obvious errors:
+ is a repeat indicator that has the meaning at least one.
so the (\.xml+) will match everything starting with .xm followed by one or more l (it would also match .xmlllll). the ? means optional, so (\.xml+)? has the meaning it could have an .xml but it is not required.
the same is for ([a-zA-Z]:)? this means the driver letter is optional.
Now the not so obvious errors
[a-zA-Z0-9\\s_#-^!#$%&+={}\[\]] here you define a list of allowed chars. you have \\s and i assume you want to allow spaces, but this allows \ and s so you need to change it to \s. then you have this part #-^ i assume you want to allow #, - and ^ but the - has a special meaning inside of [ ] with it you define a range so you allow all chars that are in the range of # to ^ if you want to allow - you need to escape it there so you have to write #\-^ you also need to take care about ^, if it is right after the [ it would have also a special meaning.
your Regex should contain the following parts:
^[a-z]: start with (^) driver letter
((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+ followed by one or more path parts that start with either \ or / and having a path name containing one or more of your defined letters (a-z0-9\s_#\-^!#$%&+={}\[\])
\.xml$ ends with ($) the .xml
therefore your final regex should look like this
/^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(str)
(under the assumption you do a case insensitve regex using the i flag)
EDIT:
var path1 = "D:/test.xml"; // D:/test.xml
var path2 = "D:\\folder\\test.xml"; // D:\folder\test.xml
var path3 = "D:/folder/test.xml"; // D:/folder/test.xml
var path4 = "D:\\folder/test.xml"; // D:\folder/test.xml
var path5 = "D:\\test.xml"; // D:\test.xml
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path1) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path2) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path3) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path4) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path5) );
UPDATE:
you need to take care about the / and the \ if you need to escape them depends on if you use it with new RegExp(' ... the regex ... ',"i") and new RegExp(" ... the regex ... ","i") or with / ... the regex ... /i
for further informations about regular expressions you should take a look at e.g. www.regular-expressions.info
This could work out for you
var str = 'D:/test.xml';
var str2 = 'D:\\folder\\test.xml';
var str3 = 'D:/folder/test.xml';
var str4 = 'D:\\folder/test.xml';
var str5 = 'D:\\test\\test\\test\\test.xml';
var regex = new RegExp('^[a-z]:((\\\\|\/)[a-zA-Z0-9_ \-]+)+\.xml$', 'i');
regex.test(str5);
The reason of having \\\\ in RegExp to match a \\ in string is that javascript uses \ to escape special characters, i.e., \n for new lines, \b for word boundary etc. So to use a literal \, use \\. It also allows you to have different rules for file name and folder name.
Update
[a-zA-Z0-9_\-]+ this section of regexp actually match file/folder name. So to allow more characters in file/folder name, just add them to this class, e.g., to allow a * in file/folder name make it [a-zA-Z0-9_\-\*]+
Update 2
For adding to the answer, following is an RegExp that adds another check to the validation, i.e., it checks for mixing of / and \\ in the path.
var str6 = 'D:/This is folder/test # file.xml';
var str7 = 'D:/This is invalid\\path.xml'
var regex2 = new RegExp('^[a-z]:(\/|\\\\)([a-zA-Z0-9_ \-]+\\1)*[a-zA-Z0-9_ #\-]+\.xml?', 'gi');
regex2 will match all paths but str7
Update
My apologies for mistyping a ? instead of $ in regex2. Below is the corrected and intended version
var regex2 = new RegExp('^[a-z]:(\/|\\\\)([a-zA-Z0-9_ \-]+\\1)*[a-zA-Z0-9_ #\-]+\.xml$', 'i');
Tested using Scratchpad.
var regex = /^[a-z]:((\/|(\\?))[\w .]+)+\.xml$/i;
Prints true in Web Console: (Ctrl+Shift+K on Firefox)
console.log(regex.test("D:/test.xml"));
console.log(regex.test("D:\\folder\\test.xml"));
console.log(regex.test("D:/folder/test.xml"));
console.log(regex.test("D:\\folder/test.xml"));
console.log(regex.test("D:\\test.xml"));
console.log(regex.test("D:\\te st_1.3.xml")); // spaces, dots allowed
Or, using Alert boxes:
alert(regex.test("D:/test.xml"));
alert(regex.test("D:\\folder\\test.xml"));
alert(regex.test("D:/folder/test.xml"));
alert(regex.test("D:\\folder/test.xml"));
alert(regex.test("D:\\test.xml"));
alert(regex.test("D:\\te st_1.3.xml"));
Invalid file paths:
alert(regex.test("AD:/test.xml")); // invalid drive letter
alert(regex.test("D:\\\folder\\test.xml")); // three backslashes
alert(regex.test("/folder/test.xml")); // drive letter missing
alert(regex.test("D:\\folder/test.xmlfile")); // invalid extension

Categories

Resources