hi I have a string that contains several youtube links
test test test
https://youtu.be/G7KNmW9a75Y?list=PL08MW4hWrm0I5BMN-Z_r8dqRdNTyuyRaD
https://www.youtube.com/watch?v=JlOZR5OwS-8&list=PL08MW4hWrm0I5BMN-Z_r8dqRdNTyuyRaD&index=4
test test test
it is possible to clean youtube links from their parameters, and make them become;
https://youtu.be/G7KNmW9a75Y
https://www.youtube.com/watch?v=JlOZR5OwS-8
complete with the entire string
test test test
https://youtu.be/G7KNmW9a75Y
https://www.youtube.com/watch?v=JlOZR5OwS-8
test test test
many thanks to whoever offers the solution
i used this
'test test test youtu.be/G7KNmW9a75Y?list=PL08MW4hWrm0I5BMN-Z_r8dqRdNTyuyRaD youtube.com/watch?v=JlOZR5OwS-8&list=PL08MW4hWrm0I5BMN-Z_r8dqRdNTyuyRaD&index=4 test test test'.replace(/<[^>]*>?/gm, '').replace('https://'+videoid[0], "");
but it doesn't extract the complete link in order to use replace, and delete it from the text
generally to get "https://youtu.be/G7KNmW9a75Y" out of "https://youtu.be/G7KNmW9a75Y?list=PL08MW4hWrm0I5BMN-Z_r8dqRdNTyuyRaD" you need to use new URL():
let u = new URL('https://youtu.be/G7KNmW9a75Y?list=PL08MW4hWrm0I5BMN-Z_r8dqRdNTyuyRaD')
${u.origin}${u.pathname} // "https://youtu.be/G7KNmW9a75Y" <-- your answer
Full code:
const myStr =
"test test test https://youtu.be/G7KNmW9a75Y?list=PL08MW4hWrm0I5BMN-Z_r8dqRdNTyuyRaD https://www.youtube.com/watch?v=JlOZR5OwS-8&list=PL08MW4hWrm0I5BMN-Z_r8dqRdNTyuyRaD&index=4 test test test";
const result = myStr
.split(" ")
.map((str) => {
if (str.match(/\bhttps?:\/\/\S+/gi)) {
let u = new URL(
"https://youtu.be/G7KNmW9a75Y?list=PL08MW4hWrm0I5BMN-Z_r8dqRdNTyuyRaD"
);
return `${u.origin}${u.pathname}`;
}
return str;
})
.join("\r\n");
console.log("result", result); // I noticed that the result of the `map()` fn should be stored in variable otherwise strings won't be placed in new lines
Related
I have a JavaScript String which looks like this:
From Windows to Linux
With JavaScript, how can I highlight the words From and Linux, via a substring which looks like this:
From Linux
so the string looks like this in the end:
<mark>From</mark> Windows to <mark>Linux</mark>
This is my current implementation of the function to do that job:
function highlightSearchTerm(string, substring) {
const regex = new RegExp(`(${substring})`, 'ig');
return string.replace(regex, '<mark>$1</mark>');
}
I call it like this:
highlightSearchTerm("From Windows to Linux", "from linux")
It works well, the only thing that is missing is to make it work when the substring has words which are not directly next to each other.
These substrings for instance work:
from windows
From
to Linux
While these don't (Words are not directly next to each other in the main string):
Windows Linux
from To
Linux from
Short Answer
Call highlightSearchTerm() with a pipe(|) between the terms to achieve the desired output.
Longer Answer
The answer has to deal with how you are building your Regex.
The function
function highlightSearchTerm(string, substring) {
const regex = new RegExp(`(${substring})`, 'ig');
return string.replace(regex, '<mark>$1</mark>');
}
It's important to understand what the corresponding RegExp object that is created reads like, and how it equates to a form that we would maybe write out directly.
First, if we call
// assume substring = 'hello';
new RegExp(`(${substring})`, 'ig');
// Equivalent: /(hello)/ig;
Notice that the grouped item is looking for the word hello.
Now, if we supply something that has multiple things we want in it, such as hi and you then if we supply them as a single string separated by space, e.g.
const substring = 'hey you';
new RegExp(`(${substring})`,'ig');
// Equivalent: /(hey you)/ig
This will not give us what we want because instead of looking for hey or you, the parser is now looking hey you as a phrase.
However, if we separate those things by a pipe (|) we get
// assume substring = 'hey|you';
new RegExp(`(${substring})`,'ig');
// Equivalent: /(hey|you)/ig
This now looks for either hey or you in the string. This is because the pipe character in RegEx is the OR.
If you'd like to expand the search for multiple phrases, you separate each specific one by a pipe, e.g.
new RegExp('(hey|you|that guy)', 'ig');
Will search for the words hey and you and the phrase (space included) that guy.
You can use the Pipe | just like #Jhecht explained above, alternatively you can split your substring and doing it this way:
function highlightSearchTerm(string, substring) {
let arr = substring.split(' ');
arr.forEach(el => {
const regex = new RegExp(el, 'ig'),
temp = el;
el = el.replace(regex, `<mark>${el}</mark>`);
string = string.replace(temp, el);
})
return string;
}
let text = document.querySelector('div').innerHTML;
document.querySelector('div').innerHTML = highlightSearchTerm(text, 'From Linux');
<div>From Windows to Linux</div>
this is how you return true or false if your text includes the substring
let text = document.querySelector('div').innerHTML;
function isIncludesSubstring(text, substring){
let arr = substring.split(' '),
arrResult = [];
arr.forEach(el => {
const regex = new RegExp(el, 'ig');
arrResult.push(regex.test(text));
});
/* arrResult includes true or false based on whether substring single word
is included in the text or not, the function will return true if all words are included
else it will return false */
return arrResult.includes(false) ? false : true;
}
console.log(isIncludesSubstring(text, 'From Windows Linux'))
console.log(isIncludesSubstring(text, 'To Windows from'))
console.log(isIncludesSubstring(text, 'From Test Linux'))
<div>From Windows to Linux</div>
I have a URL that will always be in either this format
http://domain.tld/foo/bar/boo
http://www.domain.tld/foo/bar/boo
http://sub.domain.tld/foo/bar/boo
http://www.sub.domain.tld/foo/bar/boo
I'd like to use Regex to extract bar from the url, regardless of the format.
I am using JavaScript.
I tried to break the url up using something like
var x = 'http://domain.tld/foo/bar/boo`'
x.split(/^((http[s]?):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(.*)?(#[\w\-]+)?$/g)
but this doesn't really work nor does it help as I seem to get an array or items when I really just need the value at bar
var el = document.createElement('a');
el.href = "http://www.domain.tld/foo/bar/boo";
var importantPart = el.pathname.split('/')[2];
console.log(importantPart);
fiddle: https://jsfiddle.net/dcyo4ph5/1/
sources: https://css-tricks.com/snippets/javascript/get-url-and-url-parts-in-javascript/ & JavaScript - Get Portion of URL Path
I guess this doesn't use regex. So that's maybe not what you want.
I'll list both regex and non regex way. Surprisingly the regex way seems shorter.
Regex Way
The regex to find bar and boo is this /.*\/(.*)\/(.*)$/ which is short, precise and exactly what you need.
Let's put into practice,
const params = "http://www.sub.domain.tld/foo/bar/boo".match(/.*\/(.*)\/(.*)$/)
This results in,
params;
["http://www.sub.domain.tld/foo/bar/boo","bar","boo"]
Just access it like params[0] and params[1].
Regex Explanation:
Extended Version:
The regex can be extended more to grab the /bar/foo/ pattern with a ending slash like this,
.*\/\b(.*)\/\b(.*)(\/?)$
Which means,
and it can be further extended, but let's keep it simple for now.
Non Regex Way
Use native methods like .split(),
function getLastParam(str, targetIndex = 1) {
const arr = str
.split("/") // split by slash
.filter(e=>e); // remove empty array elements
return arr[arr.length - targetIndex];
}
Let's test it out quickly for different cases
[
"http://domain.tld/foo/bar/boo",
"http://www.domain.tld/foo/bar/boo",
"http://sub.domain.tld/foo/bar/boo",
"http://www.sub.domain.tld/foo/bar/boo",
"http://domain.tld/foo/bar/boo/",
".../bar/boo"
].map(e => {
console.log({ input: e, output: getLastParam(e, 1) });
});
This will yield in following,
{input: "http://domain.tld/foo/bar/boo", output: "boo"}
{input: "http://www.domain.tld/foo/bar/boo", output: "boo"}
{input: "http://sub.domain.tld/foo/bar/boo", output: "boo"}
{input: "http://www.sub.domain.tld/foo/bar/boo", output: "boo"}
{input: "http://domain.tld/foo/bar/boo/", output: "boo"}
{input: ".../bar/boo", output: "boo"}
If you want bar, then use 2 for targetIndex instead. It will get the second last. In which case, getLastParam(str, 2) would result in bar.
Speed stuff
Here is the small benchmark stuff, http://jsbench.github.io/#a6bcecaa60b7d668636f8f760db34483
getLastParamNormal: 5,203,853 ops/sec
getLastParamRegex: 6,619,590 ops/sec
Well, it doesn't matter. But nonetheless, it's interesting.
Split and slice will do that as simple as this, where split('/') creates an array and slice(-2)[0] will pick the first [0] of the last two (-2).
With replace(/\/$/, "") you get rid of any trailing slash (showed in 4th sample below)
Stack snippet
var x = 'http://domain.tld/foo/bar/boo'
console.log( x.split('/').slice(-2)[0] );
var x = 'http://www.sub.domain.tld/foo/bar/boo'
console.log( x.split('/').slice(-2)[0] );
var x = 'http://www.domain.tld/foo/bar/boo'
console.log( x.split('/').slice(-2)[0] );
// and this one will trim trailing slash
var x = 'http://www.domain.tld/foo/bar/boo/'
console.log( x.replace(/\/$/, "").split('/').slice(-2)[0] );
Or maybe just reverse the array and get the 2nd item ([1] as array is zero based)
var x = 'http://www.domain.tld/foo/bar/boo/'
console.log( x.split('/').reverse()[1] );
You don't need regex. Anchor elements have an API that breaks down the URL for you. You can then split the pathname to get the path
function parse(path) {
let a = document.createElement('a');
a.href = path;
return a.pathname.split('/')[2];
}
console.log(parse('http://domain.tld/foo/bar/boo'));
console.log(parse('http://www.domain.tld/foo/bar/boo'));
console.log(parse('http://sub.domain.tld/foo/bar/boo'));
console.log(parse('http://www.sub.domain.tld/foo/bar/boo'));
this is actually the first time I encounter this problem.
I'am trying to parse a string for key value pairs where the seperator can be a different character. It works fine at any regex tester but not in my current JS project. I already found out, that JS regex works different then for example php. But I couldn't find what to change with mine.
My regex is the following:
[(\?|\&|\#|\;)]([^=]+)\=([^\&\#\;]+)
it should match:
#foo=bar#site=test
MATCH 1
1. [1-4] `foo`
2. [5-8] `bar`
MATCH 2
1. [9-13] `site`
2. [14-18] `test`
and the JS is:
'#foo=bar#site=test'.match(/[(\?|\&|\#|\;)]([^=]+)\=([^\&\#\;]+)/g);
Result:
["#foo=bar", "#site=test"]
For me it looks like the grouping is not working properly.
Is there a way around this?
String#match doesn't include capture groups. Instead, you loop through regex.exec:
var match;
while (match = regex.exec(str)) {
// Use each result
}
If (like me) that assignment in the condition bothers you, you can use a !! to make it clearly a test:
var match;
while (!!(match = regex.exec(str))) {
// Use each result
}
Example:
var regex = /[(\?|\&|\#|\;)]([^=]+)\=([^\&\#\;]+)/g;
var str = '#foo=bar#site=test';
var match;
while (!!(match = regex.exec(str))) {
console.log("result", match);
}
I wouldn't rely on a complex regex which could fail anytime for strange reasons and is hard to read but use simple functions to split the string:
var str = '#foo=bar#site=test'
// split it by # and do a loop
str.split('#').forEach(function(splitted){
// split by =
var splitted2 = splitted.split('=');
if(splitted2.length === 2) {
// here you have
// splitted2[0] = foo
// splitted2[1] = bar
// and in the next iteration
// splitted2[0] = site
// splitted2[1] = test
}
}
Given:
This is some text which could have
line breaks and tabs before and after {code}
and I want them {code} to be replaced {code}in pairs{code} without any issues.
I want:
This is some text which could have
line breaks and tabs before and after <code>
and I want them </code> to be replaced <code>in pairs</code> without any issues.
JsFiddle: http://jsfiddle.net/egrwD/1
Simple working text sample:
var sample1 = 'test test test {code}foo bar{code} {code}good to know{code}';
var regEx1 = new RegExp('(\{code\})(.*?)(\{code\})', 'gi');
var r1 = sample1.replace(regEx1, '<code>$2</code>');
Gives:
test test test <code>foo bar</code> <code>good to know</code>
Non working sample:
var sample2 = 'test test test {code}\tfoo bar{code} {code}\r\ngood to know{code}';
var regEx2 = new RegExp('(\{code\})(.*?)(\{code\})', 'gi');
var r2 = sample2.replace(regEx2, '<code>$2</code>');
Gives:
test test test {code} foo bar{code} {code}
good to know{code}
Looks like you just need to make the pattern match across line breaks, properly escape that first {, and use a regex literal to fix the need to double-escape backslashes in the string:
/(\{code\})([\s\S]*?)(\{code\})/gi
http://jsfiddle.net/mattball/QNak5
Note that you don't even need the capturing parentheses around the {code}s:
/\{code\}([\s\S]*?)\{code\}/gi
http://jsfiddle.net/mattball/Jk5cr
my code:
var test = "aa";
test += "ee";
alert(test);
Prints out "aaee"
How can I do the same thing, but add the string not to end, but start:
Like this:
var test = "aa";
test = "ee" + test;
This is the long way, but is there somekind of shorter way like in 1st example?
What I want is that I must not write the initial variable out again in definition.
There's no built-in operator that allows you to achieve this as in the first example. Also test = "ee" + test; seems pretty self explanatory.
You could do it this way ..
test = test.replace (/^/,'ee');
disclaimer: http://xkcd.com/208/
You have a few possibilities although not a really short one:
var test = "aa";
// all yield eeaa
result = "ee" + test;
result = test.replace(/^/, "ee");
var test = "aa";
alert('ee'.concat(test));
What you have, test = "ee" + test; seems completely fine, and there's no shorter way to do this.
If you want a js solution on this you can do something like,
test = test.replace(/^/, "ee");
There are a whole lot of ways you can achieve this, but test = "ee" + test; seems the best imo.
You can add a string at the start (multiple times, if needed) until the resulting string reaches the length you want with String.prototype.padStart(). like this:
var test = "aa";
test = test.padStart(4, "e");
alert(test);
Prints out
eeaa
var test = "aa";
test = test.padStart(5, "e");
alert(test);
Prints out
eeeaa