I have a content (string) that in some part contains full urls to youtube and in others just the video ID.
I need to replace the full length youtube urls with the video id only.
var "content" is for example.
var content = '{GENERICO:type="youtube",id="DluFA_AUjV8"}{GENERICO:type="youtube",id="https://youtu.be/DluFA_AUjV8"}';
var myRegex = /{GENERICO:type="youtube",id=".*?(?:youtube\.com|youtu\.be)\/(?:embed\/|watch\?v\=)?([^\&\?\/\"]+).*?["&\?]}/gi;
content = content.replace(myRegex, '{GENERICO:type="youtube",id="$1"}' );
console.log(content);
the result I want to achieve (in the example) is:
{GENERICO:type="youtube",id="DluFA_AUjV8"}{GENERICO:type="youtube",id="DluFA_AUjV8"}
what I actually get is this following:
the result I want to achieve (in the example) is:
{GENERICO:type="youtube",id="DluFA_AUjV8"}
for some reason, it removes one of the strings in the content.
I can't figure out if it's a javascript issue or a regex issue or what i'm doing wrong.
here is the jsfiddle
Use content.replace(new RegExp('https://youtu.be/','g'), '') for a global replace.
var content = '{GENERICO:type="youtube",id="DluFA_AUjV8"}{GENERICO:type="youtube",id="https://youtu.be/DluFA_AUjV8"}';
console.log(content.replace(new RegExp('https://youtu.be/','g'), ''))
Related
I have a system that dynamically generates links. but the html links are displayed like this :
Page Example
there's a way to remove the repetition of <a> tags using JS ? so, the link becomes :
Page Example
Let's take a look at your url:
var url='Page Example';
First let's get rid of both occurences of "
url=url.replace(/"/g,'');
Now remove the first occurence of </a> by feeding the exact string instead of a regular expression to the .replace method.
url=url.replace('</a>','');
At this point your url looks like this:
Page Example
We're getting closer. Let's remove anything in between the > and the " by
url=url.replace(/\>(.*)\"/,'"');
which gives us
Page Example
Almost done - finally let's get rid of "<a href=
url=url.replace('"<a href=','"');
To make the whole thing a bit more beautiful we can chain all four operations:
var url = 'Page Example';
url = url.replace(/"/g, '').replace('</a>', '').replace(/\>(.*)\"/, '"').replace('"<a href=', '"');
console.log(url);
Within your process you can use regex to extract the url from the href string:
const string = "<a href="/page-example">Page Example</a>";
const url = string.match(/(\/)[\w-]*(?=&)/)[0];
console.log(url);
Yes, using the string split() function like this...
S='<a href="/page-example">Page Example</a>';
var A=split('"');
document.write(A[1]);
This should display "/page-example", and you can then add it as the href to an anchor.
You can retrieve the hrefvalue that seems to be the correct A element and replace the incorrect one with the correct one:
const a = document.querySelector('a[href]'); //if you have more <a> elements replace it to your need
const attr = a.getAttribute('href'); //get the value of 'href' attribute
const temp = document.createElement('template');
temp.innerHTML = attr; //create the new A element from the 'href' attribute's value
const newA = temp.content.children[0]; //retrieve the new <a> element from the template
a.parentElement.replaceChild(newA, a); //replace the incorrect <a> element with the new one
Page Example
I have an issue related to finding a regex for the link with some conditions. Here is the scenario:
I have created utils.ts it's a typescript. basically, it will take an API response as an input and return the formatted HTML supported text, like bold text, email, Images, Links.
So let's take one scenario which I am facing.
as a return of the utils.ts file, I am getting this.
https://www.google.com Click here
(Note: normal links and 'a' tag links can occure in any order)
from the above text, as you can see this part Click here is already in HTML supported method.
So I will get the following output on GUI
https://www.google.com Click here
so from this point, I want a regex which can format https://www.google.com but it must not manipulate Click here as it is already formated.
Here I also want to format https:///www.google.com as follow
Google
The main problem I am facing is when I am replacing the string with 'https://..' with tags it will also replace the links inside 'href' like this
Google Google">Click me</a>
Which is what I don't want.
Please share your thought on this.
Thank you
Not yet formatted links can be found using alternations. The idea is - if a link is formatted it's not captured to a group (don't be confused that the regex still finds something - you should only look at Group 1). Otherwise, the link is captured to a group.
The regex below is really simple, just to explain the idea. You might want to update it with a better URL search pattern.
demo
(?:href="https?\S+")|(https?\S+)
If I understood correctly, you want to extract from the text those web addresses that appear in the text and are not links. If so check out the following javascript:
//the data:
var txt1='https://www.google.com Click here http://other.domain.com';
// strip html tags
String.prototype.stripHTML = function () {
var reTag = /<(?:.|\s)*?>/g;
return this.replace(reTag, " ");
};
var txt2=txt1.stripHTML();
//console.log(txt2);
//split tokens
var regex1 = /\s/;
var tokens = txt2.split(regex1);
//console.log(tokens);
//build an address table
regex2=/^https?:\/\/.*/;
var i=0, j=0;
var addresses=[];
for (i in tokens) {
if (regex2.test(tokens[i])) {
addresses[j] = tokens[i];
j++;
}
i++;
}
console.log(addresses);
I have the following string:
"Site is <a href='javascript:;' xid='01' gid='02' rid='03' >TEST</a> is here "
This is a string with what seems like an 'a' tag inside. I need to get the 'xid', 'gid', and 'rid' values.
If I understand you, you want to get a atributes xid, gid, rid.
You need get link and get atributes. For example:
let link = document.querySelector('a');
if(link!==null){
let xid = a.getAttribute('xid');
let gid = a.getAttribute('gid');
let rid = a.getAttribute('rid');
}
It is faster than using regex;
If you has only string, you can add this string to dom and get current attributes.
Use regex in not true way , because your code will not be flex , and sometimes you can has bugs.
Firstly I've looked at a lot of posts on Stackoverflow but I don't see one which seems to be the definitive way. Someone always seems to find a flaw in the regex.
I already have retrieved my tweets and obviously they can contain any number of hashtags in each one.
If I have an array of possible hashtags that I want to find - ["#ENGLAND","#IRELAND","#wales"] etc.
What is a RELIABLE way to check if a tweet contains these hashtags. I don't want to call the API again, I only want to check my existing tweets, as I'm clicking on buttons to change the filtering on the fly, want to avoid rate limit if they keep clicking around for ages.
EDIT:
Example: Here is a tweet that contains #ENGLAND and #someothertag
I want to search all the tweets and just show the tweets that CONTAIN one or more of my array of tags, I already cache the tweets, I don't want to make a call containing any tags just filter the existing results!
Why only hashify particular hashtags (which you need to specify and then maintain) when you can hashify any hashtag?
I usually use something like this:
var hashregex = /#([a-z0-9_\-]+)/gi,
text = text.replace(hashregex, function (value) {
return '<a target="_blank" href="http://twitter.com/#!/search?q=' + value.replace('#', '%23') + '">' + value + '</a>';
});
Then you can just use text when you set the content to the processed tweets
You could store the hashtags from the entities on the element, for instance
<div class='tweet' data-hashtags='england ireland'>some_tweet</div>
And filter like this when someone clicks your button:
$('div.tweet').hide();
$('div.tweet[data-hashtags~="ireland"]').show();
It's obviously greatly simplified, but the general approach should help you avoid having to parse out the tags yourself
// match a #, followed by either "question" or "idea"
var myregexp = /#(england|ireland|wales)\b/i;
var match = myregexp.exec(subject);
if (match != null) {
result = match[1]; // will contain "england","ireland", or "wales"
} else {
result = "";
}
If you don't know the names of the hashtags on hand
replace
var myregexp = /#(england|ireland|wales)\b/i;
with
var myregexp = /#(\w+)/; // Use this instead
var whitelist = ['a','div','img', 'span'];
Given a block of HTML code, I want to go through every single tag using JQuery
Then, if that tag is NOT in my whitelist, remove it and all its children.
The final string should now be sanitized.
How do I do that?
By the way, this is my current code to remove specific tags (but I decided I want to do whitelist instead)
var canvas = '<div>'+canvas_html+'</div>';
var blacklist = ['script','object','param','embed','applet','app','iframe',
'form','input', 'link','meta','title','input','button','textarea'
'head','body','kbd'];
blacklist.forEach(function(r){
$(canvas).find(r).remove();
});
canvas_html = $(canvas).get('div').html();
Try this:
var whitelist = ['a','div','img', 'span'];
var output = $('<div>'+canvas_html+'</div>').find('*').each(function() {
if($.inArray(this.nodeName.toLowerCase(), whitelist)==-1) {
$(this).remove();
}
}).html();
// output contains the HTML with everything except those in the whitelist stripped off
try:
$(canvas).find(':not(' + whitelist.join(', ') + ')').remove().html();
The idea is to turn array of whitelist into "el1, el2, el3" format, then use :not selector to get the elements that's not in the whitelist, then delete.
This obviously could be expensive depending on the size of your html and whitelist.
Unfortunately, using jQuery to sanitize HTML in order to prevent XSS is not safe, as jQuery is not just parsing the HTML, but actually creating elements out of it. Even though it doesn't insert these into the DOM, in some cases embedded Javascript will be executed. So, for example, the snippet:
$('<img src="http://i.imgur.com/cncfg.gif" onload="alert(\'gotcha\');"/>')
will trigger an alert.