Remove BBCode with Regex in Javascript - javascript

I am trying to remove BBCode with attributes and content between those tags. I'm using this regular expression that I got here from here. I also tried other regex I found on stackoverflow but they didn't work for me, just the one I copy here is the closest.
([[\/\!]*?[^\[\]]*?])
I added a . before *?]) and it maches the text between the tags but also matches pokemon and I don't want that.
**Regex**: ([[\/\!]*?[^\[\]].*?])
**Text**: I'm a pokemon master and I like
[TAG] this [/TAG] pokemon [TAG] and this [/TAG] text...
I use this web to test regex http://regexpal.com/
Can anyone help me?
Thanks in advance.

str = str.replace(/\[(\w+)[^\]]*](.*?)\[\/\1]/g, '');
jsFiddle.

This is what you want:
.replace(/\[(\w+)[^\]]*](.*?)\[\/\1]/g, '$2');
JavaScript demo
Basically you catch the value between tags and then replace the whole string with that value.
Using a regex to do this isn't a very clean way of doing it though...
Sorry Alex but you didn;t read it seems.

This should do:
\[(\w+).*?\].*?\[/\1\]
This will look for a closing tag matching the opening tag - and also accept attributes on the opening tag. The JavaScript code should then be:
str = str.replace(/\[(\w+).*?\].*?\[\/\1\]/, "");

Related

replace \n from tag attribute using javascript regex with empty character

I have tag like <span style="font-size:10.5pt;\nfont-family:\nKaiTi"> and I want to replace \n within tag with empty character.
Note: Tag could be anything(not fixed)
I want regex expression to replace the same in the javascript.
You should be able to strip out the \n character before applying this HTML to the page.
Having said that, try this (\\n)
You can see it here: regex101
Edit: A bit of refinement and I have this (\W\\n). It works with the example you provided. It breaks down if you have spaces in the body of the tags (<span> \n </span>).
I've tried everything I know to do. Perhaps someone with more regex experience can assist?

Check for HTML tags

I'm trying to write a regular expression to find out if there are any HTML tags, so far I have:
/^[^<>]+$/
It's for a validator, if no HTML tags exist, it will validate.
You can use the adapted version of the regex in this SO post:
^((?!<[^<]+>)[\s\S])*$
See demo.
Perhaps, you can further enhance it to only match if the first character after < is a letter:
^((?!<[a-zA-Z][^<]*>)[\s\S])*$
See another demo
You use a regex like this:
.*<\/?.*?>.*
Working demo
The idea is to find strings with tags like <tag>, <tag withAttribute="something"> or </closingTag>
Update: as Mr. Llama pointed in his comment you could enable s flag to enable . to match all. This will help you with multiple line strings.
(?s).*<\/?.*?>.*
^--- use this for inline single line flag or use the first regex but enable the `s` flag
Working demo

How to Grab Selected str.replace Value From Regex in JS

Ok, I understand how the title might be a bit confusing, let me elaborate.
So to start I am making a BBCode input section. Then it will transfer the code from BBCode to HTML. Now the problem is I am having MINIMAL issues. Let me post my code before I continue:
var newer = $('#my_textarea').val().replace(/\[b\]/gi, '<b>');
Now let me say this, it replaces the [b] tags correctly with the <b> tags. My problem is I do not know how to do that for all tags. I have tried shortening the code by using this:
var newer = $('#my_textarea').val().replace(/\[(?:b|u|i)\]/gi, '<???>');
Then it will replace all tags (bold, underline, and italicize) with the correct HTML tags. Yet how would I go about doings this? How would I replace the HTML tag with the the BBCode tag found? What I mean is with this part of the regex (?:b|u|i) where it selects any of the three letters, how could I add that same letter to the HTML tag? Do you understand what my problem is? :) Please Help, Thank You!!
Use the $2 to get the second selected group
var newer = $('#my_textarea').val().replace(/(\[(b|u|i)\])/gi, '<$2>');
A good site for understanding and creating RegEx: https://regex101.com/
To catch also [b],[u],[i] and also [\b],[\u],[\i] use the following:
var newer = $('#my_textarea').val().replace(/(\[((\/?)(b|u|i))\])/gi, '<$2>');
The regex below will convert the BBCode to HTML:
result = subject.replace(/\[(b|i|p)\](.*?)\[\/(b|i|p)\]/g, "<$1>$2</$3>");
You'll need to add more tags but you get the idea.
Demo
http://jsfiddle.net/tuga/sp5597aj/1/

replace using javascript regex outside a special tag?

I want to replace (remove) html tags outside the [code] bbcode using javascript. for example:
<script>these</script> [code]<script>alert</script>[/code]<script>that</script>
should become
these [code]<script>alert</script>[/code]that
how use RegEx to replace/remove tags outside [code]?
Replace this /(\[code\][\s\S]*?\[\/code\])|<[\s\S]*?>/g to $1:
your_string.replace(/(\[code\][\s\S]*?\[\/code\])|<[\s\S]*?>/g, '$1');
It'll find all [code] tags first, save them, and after that it will find the remaining html tags (which will not be in the [code] tags).
ok I find a solution:
replace(/<(?=[^\[]*\[\/code\])/gi,"&_lt_;");
replace(/>(?=[^\[]*\[\/code\])/gi,"&_gt_;");
DO OTHER REPLACEMENT/CUSTOMIZATION HERE
replace(/&_lt_;/gi,"<");
replace(/&_gt_;/gi,">");
that it! :)

Regex replace string but not inside html tag

I want to replace a string in HTML page using JavaScript but ignore it, if it is in an HTML tag, for example:
visit google search engine
you can search on google tatatata...
I want to replace google by <b>google</b>, but not here:
visit google search engine
you can search on <b>google</b> tatatata...
I tried with this one:
regex = new RegExp(">([^<]*)?(google)([^>]*)?<", 'i');
el.innerHTML = el.innerHTML.replace(regex,'>$1<b>$2</b>$3<');
but the problem: I got <b>google</b> inside the <a> tag:
visit <b>google</b> search engine
you can search on <b>google</b> tatatata...
How can fix this?
You'd be better using an html parser for this, rather than regex. I'm not sure it can be done 100% reliably.
You may or may not be able to do with with a regexp. It depends on how precisely you can define the conditions. Saying you want the string replaced except if it's in an HTML tag is not narrow enough, since everything on the page is presumably within some HTML tag (BODY if nothing else).
It would probably work better to traverse the DOM tree for this instead of trying to use a regexp on the HTML.
Parsing HTML with a regular expression is not going to be easy for anything other than trivial cases, since HTML isn't regular.
For more details see this Stackoverflow question (and answers).
I think you're all missing the question here...
When he says inside the tag, he means inside the opening tag, as in the <a href="google.com"> tag...This is something quite different than text, say, inside a <p> </p> tag pair or <body> </body>. While I don't have the answer yet, I'm struggling with this same problem and I know it has to be solvable using regex. Once I figure it out, i'll come back and post.
WORKAROUND
If You can't use a html parser or are quite confident about Your html structure try this:
do the "bad" changing
repeat replace (<[^>]*)(<[^>]+>) to $1 a few times (as much as You need)
It's a simple workaround, but works for me.
Cons?
Well... You have to do the replace twice for the case ... ...> as it removes only first unwanted tag from every tag on the page
[edit:]
SOLUTION
Why not use jQuery, put the html code into the page and do something like this:
$(containerOrSth).find('a').each(function(){
if($(this).children().length==0){
$(this).text($(this).text().replace('google','evil'));
}else{
//here You have to care about children tags, but You have to know where to expect them - before or after text. comment for more help
}
});
I'm using
regex = new RegExp("(?=[^>]*<)google", 'i');
you can't really do that, your "google" is always in some tag, either replace all or none
Well, since everything is part of a tag, your request makes no real sense. If it's just the <a /> tag, you might just check for that part. Mainly by making sure you don't have a tailing </a> tag before a fresh <a>
You can do that using REGEX, but filtering blocks like STYLE, SCRIPT and CDATA will need more work, and not implemented in the following solution.
Most of the answers state that 'your data is always in some tags' but they are missing the point, the data is always 'between' some tags, and you want to filter where it is 'in' a tag.
Note that tag characters in inline scripts will likely break this, so if they exist, they should be processed seperately with this method. Take a look at here :
complex html string.replace function
I can give you a hacky solution…
Pick a non printable character that’s not in your string…. Dup your buffer… now overwrite the tags in your dup buffer using the non printable character… perform regex to find position and length of match on dup buffer … Now you know where to perform replace in original buffer

Categories

Resources