Simple Nodejs Regex: Extract text from between two strings

Simple Nodejs Regex: Extract text from between two strings - javascript

I'm trying to extract the Vine ID from the following URL:
https://vine.co/v/Mipm1LMKVqJ/embed
I'm using this regex:
/v/(.*)/
and testing it here: http://regexpal.com/
...but it's matching the V and closing "/". How can I just get "Mipm1LMKVqJ", and what would be the cleanest way to do this in Node?

You need to reference the first match group in order to print the match result only.
var re = new RegExp('/v/(.*)/');
var r = 'https://vine.co/v/Mipm1LMKVqJ/embed'.match(re);
if (r)
console.log(r[1]); //=> "Mipm1LMKVqJ"
Note: If the url often change, I recommend using *? to prevent greediness in your match.
Although from the following url, maybe consider splitting.
var r = 'https://vine.co/v/Mipm1LMKVqJ/embed'.split('/')[4]
console.log(r); //=> "Mipm1LMKVqJ"

Related

How would I write a Regular Expression to capture the value between Last Slash and Query String?

Problem:
Extract image file name from CDN address similar to the following:
https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba
Two-stage Solution:
I am using two regular expressions to retrieve the file name:
var postLastSlashRegEx = /[^\/]+$/,
preQueryRegEx = /^([^?]+)/;
var fileFromURL = urlString.match(postLastSlashRegEx)[0].match(preQueryRegEx)[0];
// fileFromURL = "photo%2FB%_2.jpeg"
Question:
Is there a way I can combine both regular expressions?
I've tried using capture groups, but haven't been able to produce a working solution.

From my comment
You can use a lookahead to find the "?" and use [^/] to match any non-slash characters.
/[^/]+(?=\?)/
To remove the dependency on the URL needing a "?", you can make the lookahead match a question mark or the end of line indicator (represented by $), but make sure the first glob is non-greedy.
/[^/]+?(?=\?|$)/

You don't have to use regex, you can just use split and substr.
var str = "https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba".split("?")[0];
var fileName = temp.substr(temp.lastIndexOf('/')+1);
but if regex is important to you, then:
str.match(/[^?]*\/([^?]+)/)[1]

The code using the substring method would look like the following -
var fileFromURL = urlString.substring(urlString.lastIndexOf('/') + 1, urlString.lastIndexOf('?'))

Get everything except match in javascript regular expression

I have the following regex to get the first part after a url:
^http[s]?:\/\/.*?\/([a-zA-Z-_.%]+).*$
It matches test in the below urls:
foo.com
http://foo.com
http://foo.com/test
http://foo.com/test/
http://foo.com/test?bar
What I'm now trying to do is recreate the same url, but replace test with a different value. Either by taking the parts before and after the match or reversing the result.
I'm sure there's a regexy way of doing this, but I'm unable to find out how to do so.

You can use a capturing group for part before /test and use it as back-reference in replacement:
var re = /^(https?:\/\/[^\/]+\/)[^?\/]+/gmi;
var subst = '$1foobar';
var result = str.replace(re, subst);
[^?\/]+ will match text before next / or ? after domain name in URL. As your original regex it also assumes that URLs start with http:// or https://.
RegEx Demo

What RegEx would clean up this set of inputs?

I'm trying to figure out a RegEx that would match the following:
.../string-with-no-spaces -> string-with-no-spaces
or
string-with-no-spaces:... -> string-with-no-spaces
or
.../string-with-no-spaces:... -> string-with-no-spaces
where ... can be anything in these example strings:
example.com:8080/string-with-no-spaces:latest
string-with-no-spaces:latest
example.com:8080/string-with-no-spaces
string-with-no-spaces
and a bonus would be
http://example.com:8080/string-with-no-spaces:latest
and all would match string-with-no-spaces.
Is it possible for a single RegEx to cover all those cases?
So far I've gotten as far as /\/.+(?=:)/ but that not only includes the slash, but only works for case 3. Any ideas?
Edit: Also I should mention that I'm using Node.js, so ideally the solution should pass all of these: https://jsfiddle.net/ys0znLef/

How about:
(?:.*/)?([^/:\s]+)(?::.*|$)

Consider the following solution using specific regex pattern and String.match function:
var re = /(?:[/]|^)([^/:.]+?)(?:[:][^/]|$)/,
// (?:[/]|^) - passive group, checks if the needed string is preceded by '/' or is at start of the text
// (?:[:][^/]|$) - passive group, checks if the needed string is followed by ':' or is at the end of the text
searchString = function(str){
var result = str.match(re);
return result[1];
};
console.log(searchString("example.com:8080/string-with-no-spaces"));
console.log(searchString("string-with-no-spaces:latest"));
console.log(searchString("string-with-no-spaces"));
console.log(searchString("http://example.com:8080/string-with-no-spaces:latest"));
The output for all the cases above will be string-with-no-spaces

Here's the expression I've got... just trying to tweak to use the slash but not include it.
Updated result works in JS
\S([a-zA-Z0-9.:/\-]+)\S
//works on regexr, regex storm, & regex101 - tested with a local html file to confirm JS matches strings
var re = /\S([a-zA-Z0-9.:/\-]+)\S/;

How to split a string by a character not directly preceded by a character of the same type?

Let's say I have a string: "We.need..to...split.asap". What I would like to do is to split the string by the delimiter ., but I only wish to split by the first . and include any recurring .s in the succeeding token.
Expected output:
["We", "need", ".to", "..split", "asap"]
In other languages, I know that this is possible with a look-behind /(?<!\.)\./ but Javascript unfortunately does not support such a feature.
I am curious to see your answers to this question. Perhaps there is a clever use of look-aheads that presently evades me?
I was considering reversing the string, then re-reversing the tokens, but that seems like too much work for what I am after... plus controversy: How do you reverse a string in place in JavaScript?
Thanks for the help!

Here's a variation of the answer by guest271314 that handles more than two consecutive delimiters:
var text = "We.need.to...split.asap";
var re = /(\.*[^.]+)\./;
var items = text.split(re).filter(function(val) { return val.length > 0; });
It uses the detail that if the split expression includes a capture group, the captured items are included in the returned array. These capture groups are actually the only thing we are interested in; the tokens are all empty strings, which we filter out.
EDIT: Unfortunately there's perhaps one slight bug with this. If the text to be split starts with a delimiter, that will be included in the first token. If that's an issue, it can be remedied with:
var re = /(?:^|(\.*[^.]+))\./;
var items = text.split(re).filter(function(val) { return !!val; });
(I think this regex is ugly and would welcome an improvement.)

You can do this without any lookaheads:
var subject = "We.need.to....split.asap";
var regex = /\.?(\.*[^.]+)/g;
var matches, output = [];
while(matches = regex.exec(subject)) {
output.push(matches[1]);
}
document.write(JSON.stringify(output));
It seemed like it'd work in one line, as it did on https://regex101.com/r/cO1dP3/1, but had to be expanded in the code above because the /g option by default prevents capturing groups from returning with .match (i.e. the correct data was in the capturing groups, but we couldn't immediately access them without doing the above).
See: JavaScript Regex Global Match Groups
An alternative solution with the original one liner (plus one line) is:
document.write(JSON.stringify(
"We.need.to....split.asap".match(/\.?(\.*[^.]+)/g)
.map(function(s) { return s.replace(/^\./, ''); })
));
Take your pick!

Note: This answer can't handle more than 2 consecutive delimiters, since it was written according to the example in the revision 1 of the question, which was not very clear about such cases.
var text = "We.need.to..split.asap";
// split "." if followed by "."
var res = text.split(/\.(?=\.)/).map(function(val, key) {
// if `val[0]` does not begin with "." split "."
// else split "." if not followed by "."
return val[0] !== "." ? val.split(/\./) : val.split(/\.(?!.*\.)/)
});
// concat arrays `res[0]` , `res[1]`
res = res[0].concat(res[1]);
document.write(JSON.stringify(res));

RegExp - If first part of search string is found then replace with the full search string value

Is there a RegExp to find and replace a value based on the criteria, "if first part of search string is in the target string then replace the part that matches with the search string."
This is a special search and replace because the replacement is also used as the search string.
For example, I have this URL:
http://www.domain.com/path/something/more/something/
Search for any part of the following and replace with the whole:
/path/user/
Since, "/path/" is in both the replacement string and the target string the results would be:
http://www.domain.com/path/user/something/more/something/
NOTE: The search / replacement value can be anything.
I don't know what the replacement and search string is at the time I make a replacement so I can't use something that hard codes the search string. For example, this won't work because the term is hard coded:
s.replace(/(\/path\/)/, "$1value/");
Another example:
Here is the sentence, "Thank you Susan for your order."
Here is the search and replacement, "Susan Summers"
Here is the desired sentence, "Thank you Susan Summers for your order."
Use Case:
Lets say you are given 1 million text documents that are letters to customers but when they created the documents they used the customers first name only when they were supposed to use the full name. Now it's your job to find and replace every occurrence of their first name with their full name. You only have their full name to work with not first name.
Just realized this may not work as a RegEx and might require code.

You can use:
s = 'http://www.domain.com/path/something/more/something/';
r = s.replace(/(\/path\/)/, "$user/");
//=> "http://www.domain.com/path/user/something/more/something/"

You don't need to use regular expression for this case:
var url = 'http://www.domain.com/path/something/more/something/';
url.replace('/path/', '/path/user/');
// => "http://www.domain.com/path/user/something/more/something/"

I'm not quite sure if I understand the problem correctly. The following replaces any part of of /path/user/ (-> part 1: 'path', part 2: 'user') with the whole /path/user:
var url1 = "http://www.domain.com/path/something/more/something/";
var url2 = "http://www.domain.com/user/something/more/something/";
url1.replace(/\/path\/|\/user\//, '/path/user/');
url2.replace(/\/path\/|\/user\//, '/path/user/');
results in:
http://www.domain.com/path/user/something/more/something/
http://www.domain.com/path/user/something/more/something/
I hope this is what you need, otherwise, please add another example.
EDIT:
Here is the regex in action: http://regex101.com/r/jL6tK6

split + join alternative :
url = url.split('/path/').join('/path/user/');
Although your requirements are not clear, here is a guess that raises a few extra questions :
var sub = '/path/user/';
var parts = sub.match(/[^\/]+/g);
url = url.replace(new RegExp(
'\\/(' + [parts.join('\\/')].concat(parts).join('|') + ')\\/'
), sub);
The resulting regular expression is as follows :
/\/(path\/user|path|user)\// // "/path/user/" OR "/path/" OR "/user/"
Let's check some urls assuming we live in the best of worlds :
'http://domain/' -> 'http://domain/'
'http://path/user/' -> 'http://path/user/'
'http://path/' -> 'http://path/user/'
'http://user/' -> 'http://path/user/'
Now, what do you think about the following ones?
'http://path/user' -> 'http://path/user/user'
'http://user/path/' -> 'http://path/user/path/'
'http://path/user/path/' -> 'http://path/user/path/'
The remaining questions are :
Is this what you are looking for?
What to do when there is no trailing slash?
What to do in the reverse order case?
What to do with recurrent parts?

Develop Reference

JavaScript is the programming language of the Web.

Simple Nodejs Regex: Extract text from between two strings - javascript

I'm trying to extract the Vine ID from the following URL: https://vine.co/v/Mipm1LMKVqJ/embed I'm using this regex: /v/(.*)/ and testing it here: http://regexpal.com/ ...but it's matching the V and closing "/". How can I just get "Mipm1LMKVqJ", and what would be the cleanest way to do this in Node?

Related

How would I write a Regular Expression to capture the value between Last Slash and Query String?

Get everything except match in javascript regular expression

What RegEx would clean up this set of inputs?

How to split a string by a character not directly preceded by a character of the same type?

RegExp - If first part of search string is found then replace with the full search string value

Categories

Resources