regex exclude certain tags - javascript

just need a quick help for solving this problem.
I want to strip all html tags out of a string except the tags from a whitelist(variable).
My code so far:
whitelist = 'p|br|ul|li|strike|em|strong|a',
reqExp = new RegExp('<\/?[^>|' + whitelist + ']+\/?>');
The problem is now it works more or less fine but also not removing for example b because it matches the b from the br out of the whitelist.
I tried different approaches but dont find the right solution.
How can i tell the regex to do something like /.WITHOUT(smth)/ (therefore: match all expect everything following).

Use this regex:-
<(?!/?(p|br|ul|li|strike|em|strong|a)(>|\s))[^<]+?>
LIVE DEMO
For more information, refer to my earlier answer, which fullfill your requirement.

Related

What RegEx to use to include all subroutes except for specific urls

I'm terribly at regex and I could use some help in building a regular expression so that I can target all subroutes on a specific domain and at the same time exclude a couple of specific subroutes.
The regex is to be used in JavaScript (as page targeting within the Optimizely software).
Should allow:
www.mydomain.com/**/*
www.mydomain.com/foo/**/*
Should not allow
www.mydomain.com/foo/bar/**/*
www.mydomain.com/baz/**/*
The part I am most struggling with is allowing everything, also allowing everything ending with /foo/... except when it is ending with /foo/bar/..., while also excluding anything ending with /baz/....
Any help is much appreciated, thank you in advance!
You can use a negative lookahead assertion to exclude specific patterns:
^www\.mydomain\.com\/(?!(?:foo\/bar|baz)\/).*\/.*
Demo: https://regex101.com/r/w6MQA0/1
Use this (www.mydomain.com\/)(([a-z]+\/)*(foo\/))?\*\*\/\*. It should work.
It's working in this scenario:
`www.mydomain.com/**/*`
or
`www.mydomain.com/<any params may or may not be>/foo/**/*`
Code:
var regx = /(www.mydomain.com\/)(([a-z]+\/)*(foo\/))?\*\*\/\*/g;
ar = ['www.mydomain.com/**/*', 'www.mydomain.com/foo/**/*','www.mydomain.com/foo/bar/**/*','www.mydomain.com/baz/**/*']
regx.test(ar[0]) // true
regx.test(ar[1]) // true
regx.test(ar[2]) // false
regx.test(ar[3]) // false
Demo: https://regex101.com/r/05vUz8/1
Other regex for referrance:
https://regex101.com/r/NoDI87/1
https://regex101.com/r/HFaQo0/1
Thanks for the replies, they helped me in finding the solution myself:
domain\.com((?=\/foo)|(?!\/foo\/bar\/|\/baz\/)).*

Regex expression to match certain url behavior in my website

I have the following url
https://myurl/blogs/<blog-category>/<blog-article>
I've trying to create a regEx so i can thrigger a script only when i'm in an article.
i tried this among other tests but it didn't work and i'm not really the best guy building RegExs.
window.location.pathname.match(/\/blogs\/^[a-zA-Z0-9_.-]*$\/^[a-zA-Z0-9_.-]*$/
So in my understanding the first part of this regEx (\/blogs\/) is trying just to match a fixed string.
Then next parts just tries to match any kind of numeric,character and _.- combination (which is basically the potential strings that i can have there)
However this is not working at all.
My piece of script is looking like this
if(window.location.pathname.match(/\/blogs\/^[a-zA-Z0-9_.-]*$\/^[a-zA-Z0-9_.-]*$/){
// A code implementation here
}
Note: One thing that i noticed when writing this is that if i remove everything and just try
window.location.pathname.match(/\/blogs\/)
It doesn't work either.
Can someone help me solve this? I will also appreciate any guide that can help me improve my RegEx skills.
Thanks!
Update: to have this working i had to separate my condition into two things to get it to work properly.
It ended up looking like this:
var path = window.location.pathname;
const regEx = /\/blogs\/[a-zA-Z0-9_.-]*\/[a-zA-Z0-9_.-]*/i;
if(path.match(regEx)){
// My code here
}
This should work:
\/blogs\/[a-zA-Z0-9_.-]*\/[a-zA-Z0-9_.-]*
the "^" symbol checks that it is the start of a string which is not the case for the url in question
I would suggest using https://regexr.com/ for testing your regex to remove any other possible issues from other code
var patt = /\/blogs\/[a-zA-Z0-9_.-]*\/[a-zA-Z0-9_.-]*/i window.location.pathname.match(patt)
You can try using this

Regex to exclude specific websites in javascript?

I have a regex which matches all the websites but i want to exclude 2 specific websites from this regex?
Regex is
[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)
Websites I want to exclude are
www.gfycat.com
www.imgur.com
imgur.com/*
gfycat.com/*
Is it possible to write the regex which exludes the specific websites? Any suggestions on how to solve this problem?
/[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)/
I have attached the screenshot for match patterns.
Using RES and regex need to be implemented here.
Try this
^(?:(?!(?:www\.)?(?:google|gfycat|imgur))[-a-zA-Z0-9#:%._\+~#=]{2,256})\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)
Not sure, why you need regex to do the same. Can you not do something simple like the below, unless I understood it completely wrong.
url = new URL('https://www.google.co.uk/?gfe_rd=cr&ei=bN5IWaP7CYyDtAHKv4CIBg#q=hello');
if url.hostname == 'www.google.com'
// ignore
else
// process
The answer is not relevant to the specific question as OP is using a different tool

Only match regex if it doesnt start with a pattern in javascript

I have a bit of a strange one here, I basically have a large chunk of text which may or may not contain links to images.
So lets say it does I have a pattern which will extract the image url fine, however once a match is found it is replaced with a element with the link as the src. Now the problem is there may be multiple matches within the text and this is where it gets tricky. As the url pattern will now match the src tags url, which will basically just enter an infinite loop.
So is there a way to ONLY match in regex if it doesnt start with a pattern like ="|=' ? as then it would match the url in something like:
some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6
but not
some image <img src="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6">
I am not sure if it is possible, but if it is could someone point me in the right direction? A replace by itself will not suffice in this scenario as the url matched needs to be used elsewhere too so it needs to be used like a capture.
The main scenarios I need to account for are:
Many links in one block of varied text
A single link without any other text
A single link with other varied text
== edit ==
Here is the current regex I am using to match urls:
(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
== edit 2 ==
Just so everyone understands why I cannot use the /g command here is an answer which explains the issue, if I could use this /g like I originally tried then it would make things a lot simpler.
Javascript regex multiple captures again
What you are looking for is a negative look behind, but Javascript doesn't support any kind of look behinds, so you will either have to use a callback function to check what was matched and make sure it is not preceded by a ' or ", or you can use the following regex:
(?:^|[^"'])(\b(https?|ftp|file):\/\/[-a-zA-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
which has a single problem, that is in the case of a successful match it will catch one more character, the one right before the (\b(https?|ftp|file) pattern in the input, but I think you can deal with this easily.
Regex101 Demo
Using the /ig command at the end should work... the g is for global replace and the i is for case-insensitivity, which is necessary as you've only got A-Z instead of a-zA-Z.
Using the following vanilla JS appears to work for me (see jsfiddle)...
var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");
Although, what it does highlight is that the query string part of the URL (the ?v=6 is not being picked up with your RegEx).
For jQuery, it would be (see jsfiddle)...
$(document).ready(function(){
var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
$("#output").html(test.replace(re,"<img src=\"$1\"/>"));
});
Update
Just in case my example of using the same image URL in the example doesn't convince you - it also works with different URLs... see this jsfiddle update
var test="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 http://cdn.sstatic.net/serverfault/img/sprites.png?v=7";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");
Couldn't you just see if there is a whitespace in front of the url, instead of that word-boundary? seems to work, although you will have to remove the matched whitespace later.
(\s(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
http://rubular.com/r/9wSc0HNWas
Edit: Damn, too slow :) I'll still leave this here as my regex is shorter ;)
as was said by freefaller, you might use /g flag to just find all matches in one go, if exec is not a must.
otherwise: you can add (="|=')? to the beginning of your regex, and check if $1 is undefined. if it is undefined, then it was not started with a ="|=' pattern

javascript regex replace some words with links, but not within existing links

Trying to replace certain words in HTML pages with the same word but as a URL linking to that resource.
For example, replace the word 'MySQL' with MySQL
Using the JS replace function with regex, and it's doing the replacing just fine.
BUT it's also replacing words that are already part of URLs... which is the problem.
For the MySQL example, it's replacing BOTH the "MySQL" text that's already linked, AND the URL leading to mysql.com, so breaking the already existing link.
Is there a way to update the inline regex (in the .replace call) to NOT do replacing in existing links, i.e. elements?
Here's the replace code:
var NewHTML = OriginalHTML
.replace(/\bJavaScript\b/gi, "$&")
.replace(/\bMySQL\b/gi, "$&")
;
Here's the full sample code (tried to paste it inline but wasn't looking right with the backticks):
http://pastie.org/private/v4l2s2c42aqduqlopurpw
Went through the JS regexp reference (here), and tried various other permutations in the regex matching, like the following, but all that does it make it not match ANY words on the page...
.replace(/\b(\<a\>*!\>)JavaScript\b/i,xxxxx
The following regex DOES prevent the match from happening wherever the word is literally touching a slash or a dash... but that's not the solution (and it does not fix the mysql example above):
.replace(/\b(?!\>)(?!\-)(?!\/)MySQL\b(?!\-)(?!\/)/gi, "$&")`
I've read through the related threads on stackoverflow and elsewhere, but can't seem to find this particular scenario, not in JavaScript anyway.
Any help would be greatly appreciated. :-)
Thanks!
You could change your regex to exclude keywords that precede the end anchor tag, </a>:
.replace(/\bMySQL\b(?![^<]*?<\/a>)/gi, "$&")
See jsfiddle for example.
A negative lookahead should be sufficient:
.replace(/\bMySQL(?!\.com)\b/gi, "$&")

Categories

Resources