I'm looking for a Javascript regex which is able to extract the channel identifier of a YouTube channel link. I've found some solutions on Stackoverflow but I'm still missing a solution which is also capable to work with the YouTube channel alias (e.g. https://www.youtube.com/#youtubecreators)
So the regex should be able to match following URLs:
https://www.youtube.com/c/coca-cola --> coca-cola
https://www.youtube.com/channel/UCosXctaTYxN4YPIvI5Fpcrw --> UCosXctaTYxN4YPIvI5Fpcrw
https://www.youtube.com/#coca-cola --> coca-cola
https://www.youtube.com/coca-cola --> coca-cola
The matching should also work even when there's a path attached like https://www.youtube.com/#Coca-Cola/about
Any hints are welomce!
The pattern you're looking for is:
https:\/\/www\.youtube\.com\/(?:c\/|channel\/|#)?([^/]+)(?:\/.*)?
const urls = [
"https://www.youtube.com/c/coca-cola",
"https://www.youtube.com/channel/UCosXctaTYxN4YPIvI5Fpcrw",
"https://www.youtube.com/#coca-cola",
"https://www.youtube.com/coca-cola",
"https://www.youtube.com/#Coca-Cola/about",
]
const pattern = /https:\/\/www\.youtube\.com\/(?:c\/|channel\/|#)?([^/]+)(?:\/.*)?/
for (url of urls) {
console.log(url.match(pattern)[1])
}
Related
Documentation here:
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Examples
Code here:
https://github.com/mdn/webextensions-examples/tree/main/content-script-register
The above example from Firefox's own documentation does not appear to work as expected. Here is the main JS for the extension:
'use strict';
const hostsInput = document.querySelector("#hosts");
const codeInput = document.querySelector("#code");
const defaultHosts = "*://*.org/*";
const defaultCode = "document.body.innerHTML = '<h1>This page has been eaten</h1>'";
hostsInput.value = defaultHosts;
codeInput.value = defaultCode;
function registerScript() {
browser.runtime.sendMessage({
hosts: hostsInput.value.split(","),
code: codeInput.value
});
}
document.querySelector("#register").addEventListener('click', registerScript);
You can see the line const defaultHosts = "*://*.org/*"; which works as expected, however no matter what I do I cannot get it to work for i.e. const defaultHosts = *reddit.com/* or *://*google* etc.
Any ideas why it might be?
A match pattern must specify scheme://host/path, so the first pattern will be *://*.reddit.com/*
A match pattern's last domain cannot be *, so the second pattern cannot be fixed and you'll have to list all top-level domains explicitly (example).
P.S. Although it's also possible to use includeGlobs: ['*.google.*/'] when registering the content script in the background script, but it's a terrible workaround as it'll match the text in the wrong part of a URL like /path/ or search?parameter=value.
I am using Dev tools with Google chrome and need to find out is the site url contains a specific string.
I am able to get Url_Str correctly, although when I run the match command it always returns a "match found".
How do I fix this?
let Url_Str = ___grecaptcha_cfg.clients[0].B.S.baseURI;
if (Url_Str.match(/*myconstant*/)) {"match found";}
Also is there a Alternate way to get the site url instead of using grecaptcha_cfg?
I managed to get it working. Thanks to Qasim # https://forum.freecodecamp.org/t/developer-tools-match-for-site-url/502817
let Url_Str = window.location.href;
let a = 0
if(/myconstant.com/.test(Url_Str) === true) {a = 1;}
i have this line:
token = videos.results[i].titlemay_link.split("?v=")[1];
videos.results[i].titlemay_link = the link to a youtube video en the split only returns the code.
now the problem is that some youtube links i get are links like this:
https://www.youtube.com/watch?v=1zO9nWgI_LY&feature=youtu.be
so the output i get is:
1zO9nWgI_LY&feature=youtu.be
this will not load the video in the embed player, how can i get rid of the
&feature=youtu.be
thanks!
token = videos.results[i].titlemay_link.split("?v=")[1];
token = token.split("&")[0];
But that won't be sufficient in most of the cases as youtube URLs gets complicated many times, here is a more roust method to fetch the youtube video ID
function youtube_parser(url){
var regExp = /^.*((youtu.be\/)|(v\/)|(\/u\/\w\/)|(embed\/)|(watch\?))\??v?=?([^#\&\?]*).*/;
var match = url.match(regExp);
return (match&&match[7].length==11)? match[7] : false;
}
These are the types of URLs supported
http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index
http://www.youtube.com/user/IngridMichaelsonVEVO#p/a/u/1/QdK8U-VIH_o
http://www.youtube.com/v/0zM3nApSvMg?fs=1&hl=en_US&rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg#t=0m10s
http://www.youtube.com/embed/0zM3nApSvMg?rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg
http://youtu.be/0zM3nApSvMg
Add one more condition after you get token and check using .contains as below:
if(token.contains('&'))
token=token.split('&')[0];
I am trying to check whether a url is a valid youtube video URL and get the youtube video ID from it, so far I am using a simple javascript split function in order to achieve this, however this has some minor disadvantages as youtube has multiple URL's.
I have been viewing other stackoverflow threads however all of them only support 1 specific URL which is not what I need.
I need something that matches all these URL's:
http(s)://www.youtu.be/videoID
http(s)://www.youtube.com/watch?v=videoID
(and optionally any other short URL's which the script automatically detects whether it contains a youtube video)
Any ideas which can be handled by the browser quick/efficient is greatly appreciated!
Try this:
var url = "...";
var videoid = url.match(/(?:https?:\/{2})?(?:w{3}\.)?youtu(?:be)?\.(?:com|be)(?:\/watch\?v=|\/)([^\s&]+)/);
if(videoid != null) {
console.log("video id = ",videoid[1]);
} else {
console.log("The youtube url is not valid.");
}
see regex:
/
(?:https?:\/{2})? // Optional protocol, if have, must be http:// or https://
(?:w{3}\.)? // Optional sub-domain, if have, must be www.
youtu(?:be)? // The domain. Match 'youtu' and optionally 'be'.
\.(?:com|be) // the domain-extension must be .com or .be
(?:\/watch\?v=|\/)([^\s&]+) //match the value of 'v' parameter in querystring from 'watch' directory OR after root directory, any non-space value.
/
Maybe you should look at the Youtube API and try to see if there is a way to get a videoID by parsing the URL though the API.
Look at this SO post:
Youtube API - Extract video ID
This could be quick:
var url = 'http://www.youtu.be/543221';
//http://www.youtube.com/watch?v=SNfYz6Yw0W8&feature=g-all-esi would work also
var a = url.split("v=")[1];
a = a != undefined ? a : url.split("youtu.be/")[1];
b = a.split("&")[0];
the variable c will have your id. Quick. The regex is nicer... harder to read though. I have modified my code to account for both.
There are too many kind:
latest short format: http://youtu.be/NLqAF9hrVbY
iframe: http://www.youtube.com/embed/NLqAF9hrVbY
iframe (secure): https://www.youtube.com/embed/NLqAF9hrVbY
object param: http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
object embed: http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
watch: http://www.youtube.com/watch?v=NLqAF9hrVbY
users: http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo
ytscreeningroom: http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I
any/thing/goes!: http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/2/PPS-8DMrAn4
any/subdomain/too: http://gdata.youtube.com/feeds/api/videos/NLqAF9hrVbY
more params: http://www.youtube.com/watch?v=spDj54kf-vY&feature=g-vrec
query may have dot: http://www.youtube.com/watch?v=spDj54kf-vY&feature=youtu.be
(Source: How do I find all YouTube video ids in a string using a regex?)
The best way is limiting input-data.
Good luck
try this code
var url = "...";
var videoid = url.match((?:youtube(?:-nocookie)?\.com\/(?:[^\/\n\s]+\/\S+\/|(?:v|e(?:mbed)?)\/|\S*?[?&]v=)|youtu\.be\/)([a-zA-Z0-9_-]{11}));
if(videoid != null) {
console.log("video id = ",videoid[1]);
} else {
console.log("The youtube url is not valid.");
}
The Regex is from
YouTube video ID regex
you can do it easily using preg_match here is the example:$url = "http://www.youtube.com/watch?v=YzOt12co4nk&feature=g-vrec";
preg_match('/v=([0-9a-zA-Z]+)/', $url, $matches);
$vid = $matches[1];
Now you will have the video id as: $vid = YzOt12co4nk;
I found a simple way of doing it without using regex.
I made a function which does it for you:
function getLink(url){
fetch('www.youtube.com/oembed?url=' + url).then(res => {
var thumbnailUrl = res.thumbnail_url;
var id = thumbnail_url.split('vi/')[1].substring(0, 11);
return id;}
)
}
console.log(getLink(your_url));
// here replace 'your_url' with your specified youtube url.
All this does is, it uses youtube api and passes your url as an parameter, and youtube take cares of the type of the url, so you dont have to worry about it. The next thing is, the function then takes the 'thumbnail_url' data from that api, and then splits the thumbnail's url accordingly to find the ID of the video.
I have a list of domains e.g.
site.co.uk
site.com
site.me.uk
site.jpn.com
site.org.uk
site.it
also the domain names can contain 3rd and 4th level domains e.g.
test.example.site.org.uk
test2.site.com
I need to try and extract the 2nd level domain, in all these cases being site
Any ideas? :)
no way to reliably get that. Subdomains are arbitrary and there is a monster list of domain extensions that grows every day. Best case is you check against the monster list of domain extensions and maintain the list.
list:
http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
Following #kohlehydrat's suggestion:
import urllib2
class TldMatcher(object):
# use class vars for lazy loading
MASTERURL = "http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1"
TLDS = None
#classmethod
def loadTlds(cls, url=None):
url = url or cls.MASTERURL
# grab master list
lines = urllib2.urlopen(url).readlines()
# strip comments and blank lines
lines = [ln for ln in (ln.strip() for ln in lines) if len(ln) and ln[:2]!='//']
cls.TLDS = set(lines)
def __init__(self):
if TldMatcher.TLDS is None:
TldMatcher.loadTlds()
def getTld(self, url):
best_match = None
chunks = url.split('.')
for start in range(len(chunks)-1, -1, -1):
test = '.'.join(chunks[start:])
startest = '.'.join(['*']+chunks[start+1:])
if test in TldMatcher.TLDS or startest in TldMatcher.TLDS:
best_match = test
return best_match
def get2ld(self, url):
urls = url.split('.')
tlds = self.getTld(url).split('.')
return urls[-1 - len(tlds)]
def test_TldMatcher():
matcher = TldMatcher()
test_urls = [
'site.co.uk',
'site.com',
'site.me.uk',
'site.jpn.com',
'site.org.uk',
'site.it'
]
errors = 0
for u in test_urls:
res = matcher.get2ld(u)
if res != 'site':
print "Error: found '{0}', should be 'site'".format(res)
errors += 1
if errors==0:
print "Passed!"
return (errors==0)
Using python tld
https://pypi.python.org/pypi/tld
$ pip install tld
from tld import get_tld, get_fld
print(get_tld("http://www.google.co.uk"))
'co.uk'
print(get_fld("http://www.google.co.uk"))
'google.co.uk'
Problem in mix of extractions 1st and 2nd level.
Trivial solution...
Build list of possible site suffixes, ordered from narrow to common case.
"co.uk", "uk", "co.jp", "jp", "com"
And check, Can suffix be matched at end of domain. if matched, next part is site.
The only possible way would be via a list with all the top level domains (here like .com or co.uk) possible. Then you would scan through this list and check out. I don't see any other way, at least without accessing the internet at runtime.
#Hugh Bothwell
In your example you are not dealing with special domains like parliament.uk , they are represent in the file with "!" (e.g. !parliament.uk)
I did some changes of your code, also make it looks more like my PHP function I used before.
Also added possibility to load the data from local file.
Also tested it with some domains such:
niki.bg, niki.1.bg
parliament.uk
niki.at, niki.co.at
niki.us, niki.ny.us
niki.museum, niki.national.museum
www.niki.uk - due to "*" in Mozilla's file this is reported as OK.
Feel free to contact me # github so I can add you as co-author there.
GitHub repo is here:
https://github.com/nmmmnu/TLDExtractor/blob/master/TLDExtractor.py