I know that regex usually should not be used for parsing html content. In my special case i need them
(reason is, am using a rte editor and when pasting into the editor some replacement for attributes of paragraphs needs to be done).
I have something like
<p attribute1="val1" attribute2="val2" attribut="val3" ...>text blah blah</p>
and i need all attributes stripped out so that i get
<p>text blah blah</p>
How can this be done using a regex?
A solution to strip out attributes from all possible html tags is appreciated too.
Something like this should work on all tags:
replace(/<\s*(\w+).*?>/, '<$1>')
For paragraphs only, just replace the \w:
replace(/<\s*p.*?>/, '<p>')
The \s* in the beginning allows for whitespace before the tag name, so if you for some reason have < p class="foo">, it works on that too.
Because an html tag cannot have spaces before the tag name and can continue over multiple lines I would recommend this instead:
replace(/<(\w+)(.|[\r\n])*?>/, '<$1>');
And for paragraphs only:
replace(/<p\s+?(.|[\r\n])*?>/, '<p>');
perl -lpe 's/(<\w+)\s+[^>]*/$1/'
Related
I'm trying to remove all the characters between the characters <p and </p> (basically all the attributes in the p tags).
With the following block of code, it removes everything, including the text inside the <p>
MyString.replace(/<p.*>/, '<p>');
Example: <p style="test" class="test">my content</p> gives <p></p>
Thank you in advance for your help!
Try this RegEx: /<p [^>]*>/, basically just remove the closing bracket from the accepted characters. . matches all characters, that's why this doesn't work. With the new one it stops at the first >.
Edit: You can add a global and multi-line flag: /<p [^>]*>/gm. Also as one of the comments pointed out, removing the tag makes it applicant for every tag, however this will make replacing a bit harder. This RegEx is: /<[^>]*>/gm
MyString.replace(/\<p.*<\/p>/, '<p></p>');
I have a specific div that cannot have tags within it.
Whenever tags are found, I would like to escape them and display as regular text.
For example:
<div class='no-tags-div'>
<h1>Hi!</h1>
<p>Blablablabalablal</p>
</div>
Instead of displaying the Hi! as a header text followed by a paragraph of Blablablabalablal, I would like to literally display it with the tags:
<h1>Hi!</h1>
<p>Blablablabalablal</p>
I already have access to the content I just need to figure out how to escape any of these special characters.
Edit: I should probably specify, the content within the div is posted through an input. I am attempting to not allow users to post other tags through the input, so this isn't just static HTML text we are talking about here.
You can use < and > to escape < and >. If you're doing this on the server side, you can find and replace those. On the client, you can use element.innerText, as D. Pardal suggested, which replaces the contents of element with a text node, rather than interpreting it as HTML.
I need to replace all periods in a user submitted paragraph of text that will most likely be copy and pasted from a microsoft word document so the text will have formatting on it.
For example, text pasted in from word looks like this:
<p class="MsoNormal" style="margin-bottom: 5.75pt; text-indent: 0.5in;"><span style="font-size:12.0pt;font-family: etc...
I need to edit all of the periods not within these tags and put span tags around them, so I can't just grab the html and do .replace.
:(
Use this answer to find text nodes, then do the replace on them.
If you have it as a string, convert into document fragment first.
I want to replace (remove) html tags outside the [code] bbcode using javascript. for example:
<script>these</script> [code]<script>alert</script>[/code]<script>that</script>
should become
these [code]<script>alert</script>[/code]that
how use RegEx to replace/remove tags outside [code]?
Replace this /(\[code\][\s\S]*?\[\/code\])|<[\s\S]*?>/g to $1:
your_string.replace(/(\[code\][\s\S]*?\[\/code\])|<[\s\S]*?>/g, '$1');
It'll find all [code] tags first, save them, and after that it will find the remaining html tags (which will not be in the [code] tags).
ok I find a solution:
replace(/<(?=[^\[]*\[\/code\])/gi,"&_lt_;");
replace(/>(?=[^\[]*\[\/code\])/gi,"&_gt_;");
DO OTHER REPLACEMENT/CUSTOMIZATION HERE
replace(/&_lt_;/gi,"<");
replace(/&_gt_;/gi,">");
that it! :)
I have the following HTML as a string in my JavaScript function:
<p>one</p> <p align='center'>two</p>
I want to extract this string:
"onetwo" (without quotes obviously)
Can you please suggest some pure JavaScript code (jQuery is also OK...) to get tags' content?
Using jQuery you don't need a complex regex, you can easily parse the HTML and use the DOM:
var s = "<p>one</p> <p align='center'>two</p>";
var wrapper = $('<div />').html(s);
var text = wrapper.text();
In this case $(s).text() would have also worked, but it will fail if you have free text on the first level (e.g. <p>1</p>2), so I usually avoid it.
Note that the result here is "one two" (not "onetwo"), because you have a space between the <p> tags.
If that's a problem, you can use wrapper.children().text() or wrapper.find('p').text(), for example, according to your exact needs.
Working example: http://jsbin.com/osidi3
I made the following Regex to grab content from XML tags.
This will only work with a tag that has content and is followed by a closing tag. Will not get contents of tags that contain other tags.
The tag name is in capture group 1 and the tag content is in capture group 2. This will work to get all content including <, >, ", ' and & inside of tag content.
<([^\s>]+)\s?[^>]*>(.*)(?:<\/\1)>