Regex to convert URL to string - javascript

REGEX ONLY
I exclusively need Javascript regex code to convert URLs like
https://hello.romeo-juliet.fr
https://hello.romeojuliet.co.uk
https://hello.romeo-jul-iet.fr
https://hello.romeo-juliet.com
into this string romeojuliet
Basically want to get the alphabetic domain name with removing all other characters and https://, com/co.uk/fr etc Top Level Domains
Would be helpful if done using JS replace.
I tried till here
let url="https://hello.romeo-juliet.fr";
const test=url.replace(/(^\w+:|^)\/\/(\w+.)/, '');
console.log(test);

A non regex solution:
Get the host of the URL (by parsing the string with the URL() constructor and getting its host property), split by a period and get the second item in the resulting array, then remove all occurences of -:
let url="https://hello.romeo-juliet.fr";
const test = new URL(url).host.split(".")[1].replaceAll("-", '');
console.log(test);

You can use it with no regex as the following:
let url="https://hello.romeo-juliet.fr";
url.substring(url.indexOf(".")+1, url.lastIndexOf("."));
// result: romeo-juliet
I hope this answers your question

Related

how to extract url id from string with regex?

suppose that, i've this string:
google.com/:id/:category
how can i extract only id and category from this string?
i should use regex
this match doesn't work:
match(/\/:([a-zA-Z0-9]*)/g);
You may try the following:
var url = "google.com/:id/:category";
var parts = url.match(/(?<=\/:)[a-zA-Z0-9]+/g);
console.log(parts);
This approach uses the positive lookbehind (?<=\/:) to get around the problem of matching the unwanted leading /: portion. Instead, this leading marker is asserted but not matched in the version above.
Well, capture groups are ignored in match with /g. You might go with matchAll like this:
const url = "google.com/:id/:category"
const info = [...url.matchAll(/\/:([a-zA-Z0-9]*)/g)].map(match => match[1])
console.log(info)
Credit: Better access to capturing groups (than String.prototype.match())

JavaScript - How to get label names and Urls in String based on Colon?

Sample String:
var demoString="Extract the URLs and Lables from String Google:https://www.google.com Yahoo:http://yahoo.com";
I would like to able to extract some portion of the provide string like(Google,https://www.google.com,Yahoo,http://yahoo.com)
How can I achieve this with JavaScript?
You can achieve this using a Regex expression. These expressions specify rules for matching text.
In this case you could use the following Regex expression (you can try it here):
[A-Za-z0-9]+\:https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*)
[A-Za-z0-9]+ matches the site name (with no spaces)
\: matches the colon
https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*) matches any URL starting the http or https (taken from this answer: https://stackoverflow.com/a/3809435/1943263)
To use this in Javascript you would do something like this:
var demoString="Extract the URLs and Lables from String Google:https://www.google.com Yahoo:http://yahoo.com";
var regexPattern = /([A-Za-z0-9]+)\:(https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*))/g;
var matches = demoString.match(regexPattern);
console.log(matches);

Getting element from filename using continous split or regex

I currently have the following string :
AAAAA/BBBBB/1565079415419-1564416946615-file-test.dsv
But I would like to split it to only get the following result (removing all tree directories + removing timestamp before the file):
1564416946615-file-test.dsv
I currently have the following code, but it's not working when the filename itselfs contains a '-' like in the example.
getFilename(str){
return(str.split('\\').pop().split('/').pop().split('-')[1]);
}
I don't want to use a loop for performances considerations (I may have lots of files to work with...) So it there an other solution (maybe regex ?)
We can try doing a regex replacement with the following pattern:
.*\/\d+-\b
Replacing the match with empty string should leave you with the result you want.
var filename = "AAAAA/BBBBB/1565079415419-1564416946615-file-test.dsv";
var output = filename.replace(/.*\/\d+-\b/, "");
console.log(output);
The pattern works by using .*/ to first consume everything up, and including, the final path separator. Then, \d+- consumes the timestamp as well as the dash that follows, leaving only the portion you want.
You may use this regex and get captured group #1:
/[^\/-]+-(.+)$/
RegEx Demo
RegEx Details:
[^\/-]+: Match any character that is not / and not -
-: Match literal -
(.+): Match 1+ of any characters
$: End
Code:
var filename = "AAAAA/BBBBB/1565079415419-1564416946615-file-test.dsv";
var m = filename.match(/[^\/-]+-(.+)$/);
console.log(m[1]);
//=> 1564416946615-file-test.dsv

Get base url from string with Regex and Javascript

I'm trying to get the base url from a string (So no window.location).
It needs to remove the trailing slash
It needs to be regex (No New URL)
It need to work with query parameters and anchor links
In other words all the following should return https://apple.com or https://www.apple.com for the last one.
https://apple.com?query=true&slash=false
https://apple.com#anchor=true&slash=false
http://www.apple.com/#anchor=true&slash=true&whatever=foo
These are just examples, urls can have different subdomains like https://shop.apple.co.uk/?query=foo should return https://shop.apple.co.uk - It could be any url like: https://foo.bar
The closer I got is with:
const baseUrl = url.replace(/^((\w+:)?\/\/[^\/]+\/?).*$/,'$1').replace(/\/$/, ""); // Base Path & Trailing slash
But this doesn't work with anchor links and queries which start right after the url without the / before
Any idea how I can get it to work on all cases?
You could add # and ? to your negated character class. You don't need .* because that will match until the end of the string.
For your example data, you could match:
^https?:\/\/[^#?\/]+
Regex demo
strings = [
"https://apple.com?query=true&slash=false",
"https://apple.com#anchor=true&slash=false",
"http://www.apple.com/#anchor=true&slash=true&whatever=foo",
"https://foo.bar/?q=true"
];
strings.forEach(s => {
console.log(s.match(/^https?:\/\/[^#?\/]+/)[0]);
})
You could use Web API's built-in URL for this. URL will also provide you with other parsed properties that are easy to get to, like the query string params, the protocol, etc.
Regex is a painful way to do something that the browser makes otherwise very simple.
I know that you asked about using regex, but in the event that you (or someone coming here in the future) really just cares about getting the information out and isn't committed to using regex, maybe this answer will help.
let one = "https://apple.com?query=true&slash=false"
let two = "https://apple.com#anchor=true&slash=false"
let three = "http://www.apple.com/#anchor=true&slash=true&whatever=foo"
let urlOne = new URL(one)
console.log(urlOne.origin)
let urlTwo = new URL(two)
console.log(urlTwo.origin)
let urlThree = new URL(three)
console.log(urlThree.origin)
const baseUrl = url.replace(/(.*:\/\/.*)[\?\/#].*/, '$1');
This will get you everything up to the .com part. You will have to append .com once you pull out the first part of the url.
^http.*?(?=\.com)
Or maybe you could do:
myUrl.Replace(/(#|\?|\/#).*$/, "")
To remove everything after the host name.

Regex to get a specific query string variable in a URL

I have a URL like
server/area/controller/action/4/?param=2"
in which the server can be
http://localhost/abc
https://test.abc.com
https://abc.om
I want to get the first character after "action/" which is 4 in the above URL, with a regex. Is it possible with regex in js, or is there any way?
Use regex \d+(?=\/\?)
var url = "server/area/controller/action/4/?param=2";
var param = url.match(/\d+(?=\/\?)/);
Test code here.
Using this regex in JavaScript:
action/(.)
Allows you to access the first matching group, which will contain the first character after action/ -- see the examples at JSFiddle
This way splits the URL on the / characters and extracts the last but one element
var url = "server/area/controller/action/4/?param=2".split ('/').slice (-2,-1)[0];

Categories

Resources