Regular expression match the begining and the end without the middle - javascript

How could I merge both those regular expressions? This is interesting because it would allow to match the beginning of a string and the end without touching the content in the middle.
function cleanURL(url) {
url = url.replace(/^(?:https?:\/\/)?(?:www\.|ww\d\.|w\d\d\.)?/, '')
url = url.replace(/(\/.*)/, '')
return url
}
console.log(cleanURL('https://hello-world.example.tld/yello-blue/green'))
Result: hello-world.example.tld

Rather than trying to use regex to tease out parts of URLs just use the URL class:
new URL("https://hello-world.example.tld/yello-blue/green").hostname
That doesn't mean it can't be done with a regex, you just need to look for whatever is between // and /.
All this said, I'm not sure what your actual intent is, because there's a bunch of shenanigans with www etc. The URL approach won't filter, it'll just return the hostname.

Related

How to remove URL from a string completely in Javascript?

I have a string that may contain several url links (http or https). I need a script that would remove all those URLs from the string completely and return that same string without them.
I tried so far:
var url = "and I said http://fdsadfs.com/dasfsdadf/afsdasf.html";
var protomatch = /(https?|ftp):\/\//; // NB: not '.*'
var b = url.replace(protomatch, '');
console.log(b);
but this only removes the http part and keeps the link.
How to write the right regex that it would remove everything that follows http and also detect several links in the string?
Thank you so much!
You can use this regex:
var b = url.replace(/(?:https?|ftp):\/\/[\n\S]+/g, '');
//=> and I said
This regex matches and removes any URL that starts with http:// or https:// or ftp:// and matches up to next space character OR end of input. [\n\S]+ will match across multi lines as well.
Did you search for a url parser regex? This question has a few comprehensive answers Getting parts of a URL (Regex)
That said, if you want something much simpler (and maybe not as perfect), you should remember to capture the entire url string and not just the protocol.
Something like
/(https?|ftp):\/\/[\.[a-zA-Z0-9\/\-]+/
should work better. Notice that the added half parses the rest of the URL after the protocol.

Javascript Regular Expression for non-image url

In JavaScript, I want to extract a non-image url from a string e.g.
http://example.com
http://example.com/a.png
http://www.example.ccom/acd.php
http://www.example.com/b.jpg etc.
I would like to extract 1st and 3rd (non-image) URLs and ignore 2nd and 4th (image) URLs.
I tried the following which did not work
(https?:)?\/\/?[^\'"<>]+?^(\.(jpe?g|gif|png))
Which is the modification of the following Image URL Regular Expression (RE) to whom I added ^() (for not) for above snippet
(https?:)?//?[^\'"<>]+?\.(jpg|jpeg|gif|png)
Note: The RE in above examples is case-sensitive, if any clue for making RE case-insensitive
You can use a negative lookahead like these examples It will exclude anything with the string
assuming your urls are newline delimited like your example, something like this should work
(?!.*(jpg|jpeg|gif|png).*).*
EDIT: it looks like my example doesn't work, hopefully it is pointing oyu in the right direction at least
first removing the images:
var tmp = text.replace(/https?:\/\/[\S]+\.(png|jpeg|jpg|gif)/gi, '');
and then matching:
var m = tmp.match(/https?:\/\/[\S]+/gi);
console.log(m);

Bookmarklet - Verify URL format and extract substring

I'm trying to build a bookmarklet that preforms a service client side, but I'm not really fluent in Javascript. In my code below I want to take the current page url and first verify that it's a url following a specific format after the domain, which is...
/photos/[any alphanumeric string]/[any numeric string]
after that 3rd "/" should always be the numeric string that I need to extract into a var. Also, I can't just start from the end and work backwards because there will be times that there is another "/" after the numeric string followed by other things I don't need.
Is indexOf() the right function to verify if the url is the specific format and how would I write that expression? I've tried several things related to indexOf() and Regex(), but had no success. I seem to always end up with an unexpected character or it just doesn't work.
And of course the second part of my question is once I know the url is the right format, how do I extract the numeric string into a variable?
Thank you for any help!
javascript:(function(){
// Retrieve the url of the current page
var photoUrl = window.location.pathname;
if(photoUrl.indexOf(/photos/[any alphanumeric string]/[any numeric string]) == true) {
// Extract the numeric substring into a var and do something with it
} else {
// Do something else
}
})();
var id = window.location.pathname.match(/\/photos\/(\w+)\/(\d+)/i);
if (id) alert(id[1]); // use 1 or 2 depending on what you want
else alert('url did not fit expected format');
(EDIT: changed first \d* to \w+ and second \d* to \d+ and dig to id.)
To test strings for patterns and get their parts, you can use regular expressions. Exression for your criteria would be like this:
/^\/photos\/\w+\/(\d+)\/?$/
It will match any string starting with /photos/, followed by any alphanumeric character (and underscore), followed by any number and optional / at the end of string, wrapped in a capture group.
So, if we do this:
"/photos/abc123/123".match(/^\/photos\/\w+\/(\d+)\/?$/)
the result will be ["/photos/abc123/123", "123"]. As you might have noticed, capture group is the second array element.
Ready to use function:
var extractNumeric = function (string) {
var exp = /^\/photos\/\w+\/(\d+)\/?$/,
out = string.match(exp);
return out ? out[1] : false;
};
You can find more detailed example here.
So, the answers:
Is indexOf() the right function to verify if the url is the specific
format and how would I write that expression? I've tried several
things related to indexOf() and Regex(), but had no success. I seem to
always end up with an unexpected character or it just doesn't work.
indexOf isn't the best choice for the job, you were right about using regular expression, but lacked experience to do so.
And of course the second part of my question is once I know the url is
the right format, how do I extract the numeric string into a variable?
Regular expression together with match function will allow to test string for desired format and get it's portions at the same time.

Javascript url validation allowing relative and absolute urls

I'm trying to validate a field to allow relative and absolute urls. I'm using the regex from this post but it is allowing spaces in the url.
var urlRegex = new RegExp(/(\/?[\w-]+)(\/[\w-]+)*\/?|(((http|ftp|https):\/\/)?[\w-]+(\.[\w-]+)+([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-])?)/gi);
Example:
// this should work
this/will/work.aspx?say=hello
http://www.example.com/this/will/work.aspx?say=hello
// this shouldn't work but does
and/this will also work/even though it shouldn't
and/this-shouldn't/but it does/also
The code below is what I was originally using to validate just absolute urls and it was working perfectly. If I remember properly, I pulled it from the jquery source. If this could be modified to also accept relative urls that would be perfect, but this is out of my league.
var urlRegex = new RegExp(/^(https?|ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i);
I think you just need to anchor the pattern so that it has to match the whole string:
var urlRegex = /^(\/?[\w-]+)(\/[\w-]+)*\/?|(((http|ftp|https):\/\/)?[\w-]+(\.[\w-]+)+([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-])?)$/gi;
The leading ^ and trailing $ means that the pattern has to match the entire string instead of just some part of it.
edit that said, the pattern has other problems. First, those HTML entities for & (&) need to be just "&". The slashes don't need to be escaped in [] groups, and we don't need the "g" suffix. That leaves us with:
var urlRegex = /^(?:(\/?[\w-]+)(\/[\w-]+)*\/?|(((http|ftp|https):\/\/)?[\w-]+(\.[\w-]+)*([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?))$/i;
edit again - oops also need to wrap the whole thing.
I wrote an article about URI validation complete with code snippets for all the various URI components as defined by RFC3986 here:
Regular Expression URI Validation
You may find what you are looking for there. Note however that almost any string represents a valid URI - even an empty string!

Javascript routing regex

I need to build a router, that routes a REST request to a correct controller and action. Here some examples:
POST /users
GET /users/:uid
GET /users/search&q=lol
GET /users
GET /users/:uid/pictures
GET /users/:uid/pictures/:pid
It is important to have a single regular expression and as good as possible since routing is essential and done at every request.
we first have to replace : (untill end or untill next forward slash /) in the urls with a regex, that we can afterwards use to validate the url with the request url.
How can we replace these dynamic routings with regex? Like search for a string that starts with ":" and end with "/", end of string or "&".
This is what I tried:
var fixedUrl = new RegExp(url.replace(/\\\:[a-zA-Z0-9\_\-]+/g, '([a-zA-Z0-0\-\_]+)'));
For some reason it does not work. How could I implement a regex that replaces :id with a regex, or just ignores them when comparing to the real request url.
Thanks for help
I'd use :[^\s/]+ for matching parameters starting with colon (match :, then as many characters as possible except / and whitespace).
As replacement, I'm using ([\\w-]+) to match any alphanumeric character, - and _, in a capture group, given you're interested in using the matched parameters as well.
var route = "/users/:uid/pictures";
var routeMatcher = new RegExp(route.replace(/:[^\s/]+/g, '([\\w-]+)'));
var url = "/users/1024/pictures";
console.log(url.match(routeMatcher))

Categories

Resources