Remove every html tag with JsHtmlSanitizer - javascript

I finnally got the JsHtmlSanitizer working as a standalone clientside script.
Now I'd like to remove all HTML-Tags from a string and not just script-tags and links.
This example
html_sanitize('<b>hello</b><img src="http://google.com"><a href="javascript:alert(0)"><script src="http://www.google.com"><\/script>');
returns "hello" but I'd like to remove all tags.

Why not use regular expressions to remove all HTML tags after sanitizing?
var input = '<b>hello</b><img src="http://google.com"><a href="javascript:alert(0)"><script src="http://www.google.com"></script>';
var output = null;
output = html_sanitize(input);
output = output.replace(/<[^>]+>/g, '');
This should strip your input string of all html tags after sanitization.
If you want to do just basic sanitization (removing script and style tags with their content and all html tags only) you could implement the entire thing within regex. I have demonstrated an example below.
var input = '<b>hello</b><img src="http://google.com"><a href="javascript:alert(0)"><script src="http://www.google.com"></script>';
input += '<script> if (1 < 2) { alert("This script should be removed!"); } </script><style type="text/css">.cssSelectorShouldBeRemoved > .includingThis { background-color: #FF0000; } </style>';
var output = null;
output = input.replace(/(?:<(?:script|style)[^>]*>[\s\S]+?<\/(?:script|style)[^>]*>)|<[^>]+>/ig, '');

Use this javascript function below to remove all html tags from the string you get from html_sanitize().
var output = html_sanitize('<b>hello</b><img src="http://google.com"><a href="javascript:alert(0)"><script src="http://www.google.com"><\/script>');
output = output.replace(/(<.*?>)/ig,"");
Hope it helps :)

Related

txt.replace </blockquote> in textarea js

I give up! I looked at many different answers. I've tried many different ways and nothing works. I want to change the </blackquote> tag to <br /> or a new line in the textarea. Alternatively, change to some other character, because later I can replace another character in PHP to <br/>. How to do it?
Working example for easy understand here: https://jsfiddle.net/jsf88/rb3xp7am/35/
<textarea id="comment" name="quote" placeholder="quote" style="width:80%;height:200px;"></textarea>
<section class="replyBox" style="width: 100%;"><br/>
[ click for quote ]
<div class="replyMsg">
<blockquote>this is a quote for comment😎 </blockquote><br />
"X" -- HERE I want BR_TAG or new line in textarea after click 'quote' 😐
</div>
</section>
$(document).on('ready', function() {
$('.quoteMsg').click(function() {
var txt = $(this).closest('.replyBox').find('.replyMsg').text();
//txt = txt.replace('</blockquote>', '<br/>');
//txt = txt.replace(/<\/(blockquote)\>/g, "<br/>");
//txt = txt.replace(/blockquote*/g, '<br/>');
//txt = txt.replace(/(.*?)<\/blockquote>(.*?)/g, ' xxx ');
txt = txt.replace(/<\/blockquote>/gi, '<br/>')//NOT WORKING!!
txt = txt.replace(/(?:\r\n|\r|\n)/g, ' ');//working great
console.log(txt);
$("textarea[name='quote']").val($.trim('[quote]' + txt + '[/quote]'));
});
});
To make it funnier, another example with changing the blackquote tag to br works without a problem. Why? can someone explain it?
//OTHER EXAMPLES WHERE CHANGE </BLACKQUOTE> to <br/> WORKING GOOD... WTF?!
string = ` <blockquote>this is a quote for comment😎 </blockquote><br />"X" -- HERE I want BR_TAG or new line in textarea after click 'quote' 😐`;
string = string
.replace(/<\/blockquote>/gi, ' <br /> ');//but here working! ;/
console.log(string);
you recover text with text function ('.replyMsg').text() but in that case you will have the text but with no html tag like <blockquote> so first you will have to recover the html to have the blockquote tag
var txt = $(this).closest('.replyBox').find('.replyMsg').html();
the br tag is not interpreted in textarea so you have to change it by a new line character
don't forget to remove opened bloquote tag to get the expected result
txt = txt.replace(/<blockquote>/gi, '');
$('.quoteMsg').click(function() {
var txt = $(this).closest('.replyBox').find('.replyMsg').html();
txt = txt.replace(/(?:\r\n|\r|\n)/g, ' ');
txt = txt.replace(/<\/blockquote>/gi, '\n');
txt = txt.replace(/<blockquote>/gi, '');
console.log(txt);
$("textarea[name='quote']").val($.trim('[quote]' + txt + '[/quote]'));
});
blockquote {
background-color: silver;
}
.replyMsg {
border: 2px solid green;
}
.quoteMsg {
background-color: green;
color: #fff;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<textarea id="comment" name="quote" placeholder="quote" style="width:80%;height:200px;"></textarea>
<section class="replyBox" style="width: 100%;"><br/>
[ click for quote ]
<div class="replyMsg">
<blockquote>this is a quote for comment😎 </blockquote>
"X" -- HERE I want BR_TAG or new line in textare a after c lick 'quote' 😐
</div>
</section>
The first problem in your code was how you were adding the event listener to the ready event. Being it something invented by jQuery, and not a native event, the correct way to do it should be as of now (v.3.3.1 the version I used in this demo) $(document).ready(()=>{/*code here*/}).
As a further reference:
https://api.jquery.com/ready/
There is also $(document).on( "ready", handler ), deprecated as of
jQuery 1.8 and removed in jQuery 3.0. Note that if the DOM becomes
ready before this event is attached, the handler will not be executed.
But... it's not perfectly clear how did you wish to transform your text before setting the value of the textarea. So I just better factored your logic so that you have some clear steps:
grabbing the blockquote element text content and trimming it (being the origin)
applying the transform newline to whitespace (with the regex that I left untouched)
build the final string as a template literal that will include the quote content, the meta tags wrapping it, AND anything else you wish to add like for example a new line (\n) that in this example is exacerbated by a text following it.
There's a hint in your words that put me in the position to say something superflous but still deserving an attempt: the value of a inner text is just plain text and doesn't render html content. So the <br> itself would remain as you read it and wouldn't have any rendering effect on the textarea content. That's why I focused my demonstration on putting a newline with the escaping sequence. It works both on double quoted strings and template literals: "\n" `\n`
Further notes
It seems the original approach of processing the blockquote html was preferred. It's worth saying that it was appearently a terrible strategy for several reasons:
It grabs the blockquote content as html despite that's not how it's
rendered on the page.
It takes the effort to consider the whole outerHTML removing the
wrapping blockquote tags instead of fetching directly the innerHTML.
It adds the newline as newline instead of embedding it as <br> so
at this point I ask myself if the content in the textarea was
supposed to be encoded html or not.. and the added br would then
belong to something meta?
It's harder to deal with in case you want to further customize the
string processing
But... maybe there's something I didn't get and I'm doing weak assumptions.
//since you are using the ready event with jquery, that's the correct syntax
$(document).ready(function() {
$('.quoteMsg').click(function() {
//grabs the text content of the blockquote element (trimming it)
var quoteTextContent = $(this).closest('.replyBox').find('.replyMsg').text().trim();
//performs the transform already in place in your code.. replacing newlines with white spaces
quoteTextContent = quoteTextContent.replace(/(?:\r\n|\r|\n)/g, ' '); //working great
//builds the string to set the textarea value with, using a template literal
//here you can add anything you want.. like a new line but that's just an example
const encoded = `[quote]${quoteTextContent}[/quote]\nand something following to show the new line happening`;
console.log(encoded);
$("textarea[name='quote']").val( encoded );
});
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<textarea id="comment" name="quote" placeholder="quote" style="width:80%;height:200px;"></textarea>
<section class="replyBox" style="width: 100%;"><br/>
[ click for quote ]
<div class="replyMsg">
<blockquote>this is a quote for comment😎
<br>
Having new lines also ... since you perform a regex transform newline=>whitespace
</blockquote><br />
</div>
</section>
Well, thanks for answers. The problem was a missing .html tag.
This script work for me almost perfect for quoting few times:
$(document).on('ready', function() {
$('.quoteMsg').click(function() {
var txt = $(this).closest('.replyBox').find('.replyMsg').html();
txt = txt.replace(/(?:\r\n|\r|\n)/g, ' ');
txt = txt.replace(/</g, "<");
txt = txt.replace(/>/g, ">");
txt = txt.replace(/&/g, "&");
txt = txt.replace(/"/g, '"');
txt = txt.replace(/'/g, "'");
txt = txt.replace(/<br>/g, "");
txt = txt.replace(/<hr>/g, "[hr]");
//txt = txt.replace(/<hr>/g, "\n");
txt = txt.replace(/<blockquote>/gi, '');
txt = txt.replace(/<\/blockquote>/gi, '[hr]');
txt = txt.replace(/[hr][hr]/gi, "");//not working ([][])
txt = txt.replace(/[hr][hr]/gi, "[hr]");//not working ([[hr]][[hr]])
console.log(txt);
$("textarea[name='quote']").val($.trim('[quote]' + txt + '[/quote]\n'));
});
});
The problem here is I dont know how to change dubble [hr][hr] for nothing, because this txt = txt.replace(/[hr][hr]/g, ""); not working, so would be cool for more explain about. One more time big thanks for answers! this function .replace is not as intuitive as in PHP.
EDIT: ahh.. I think is not possible to delete this dubel, because I extra insert it two times. Nvm. I will find and del this dubel in PHP.

Compare string with HTML text

I have a string of text that I want to compare with another string that has HTML code. The problem is that the text I need to compare it to in the HTML code is within different tags. Also, if the string exists in the HTML code then I want to wrap it inside a <mark> tag.
This is the example I am using:
var html = "<h1>This is a heading</h1><div class="subtitle">and this is the subheading</div><p class="small">this is some example text</p>";
var lookup = "is a heading and this is the subheading this is some";
var finalHtml = ""; //will contain new html
//Need to do some comparison and then add a <mark> tag around found string.
console.log(finalHtml);
//This should print "<h1>This <mark>is a heading</h1><div class="subtitle">and this is the subheading</div><p class="small">this is some</mark> example text</p>"
I am using Javascript/Jquery to do this.
This will only help to search your lookup within html (i.e., no marking). I have removed tags-spaces & then checked.
var html = '<h1>This is a heading</h1><div class="subtitle">and this is the subheading</div><p class="small">this is some example text</p>';
//remove html tags & spaces.
cleanHtml = html.replace(/<\/?[^>]+(>|$)/g, "").replace(/\s/g,"");
var lookup = "is a heading and this is the subheading this is some";
lookup = lookup.replace(/\s/g,'');
if(cleanText.includes(lookup)){
//match found
}

Remove HTML tags and formatting text

I would like to remove HTML tags between text and change newline to space. I'm using this pattern below but it is not perfectly. It adds two or more space between text. How to fix this pattern?
replace(/( |<([^>]+)>)/ig, ' ');
try below code and check
replace(/(<([^>]+)>)/ig,"");
UPDATE
You can do this way,
var html = 'Example: <h1></h1><p></p><div> </div><div>CONTENT</div> ';
html = html.replace(/\s|\n| /g, ' ');
html = html.replace(/<[^>]+>/gm, '');
Output will be like this,
Example: CONTENT
Play around the above solution & you will succeed.
Here is how I'll do what you want:
(See comments in my snippet)
// Input data
var input_data = `My<div><br>
<span></span>
<span></span>
</div><p>Content</p>`;
console.log("Input:", input_data);
// Creates html element with Input data
var elm = document.createElement('div');
elm.innerHTML = input_data;
// Use native function '.innerText' to get rid of the html,
// then replace new lines by spaces, and multiple spaces by only one space
output_data = elm.innerText.replace(/\n/g, ' ').replace(/[\s]+/g, ' ');
console.log("Output:", output_data);
Hope it helps!

How to ignore HTML tags in innerHTML attribute?

I'm making a messenger and my messages don't ignore HTML tags because I simply past a text from input in innerHTML of message. My code:
function Message(sender) {
...
this["text"] = "";
...
this.addText = function (text) {
this["text"] = text;
};
...
};
And here I display it:
...
var chatMessageText = document.createElement("p");
chatMessageText.innerHTML = message["text"];
...
What can I do for ignoring HTML tags in message["text"]?
Update Node#innerText property(or Node#textContent property).
chatMessageText.innerText = message["text"];
Check the difference of both here : innerText vs textContent
Refer : Difference between text content vs inner text
You can't. The point of innerHTML is that you give it HTML and it interprets it as HTML.
You could escape all the special characters, but the easier solution is to not use innerHTML.
var chatMessagePara = document.createElement("p");
var chatMessageText = document.createTextNode(message["text"]);
chatMessagePara.appendChild(chatMessageText)

Match a String in a Webpage along with HTML tags

With below code, I am trying to match a text in a web page to get rid of html tags in a page.
var body = $(body);
var str = "Search me in a Web page";
body.find('*').filter(function()
{
$(this).text().indexOf(str) > -1;
}).addClass('FoundIn');
$('.FoundIn').text() = $('.FoundIn').text().replace(str,"<span class='redT'>"+str+"</span>");
But it does not seems to work.. Please have a look at this and let me know where the problem is...
here is the fiddle
I have tried the below code instead..
function searchText()
{
var rep = body.text();
alert(rep);
var temp = "<font style='color:blue; background-color:yellow;'>";
temp = temp + str;
temp = temp + "</font>";
var rep1 = rep.replace(str,temp);
body.html(rep1);
}
But that is totally removing html tags from body...
change last line of your code to below one...you are using assignment operator which works with variables not with jquery object ..So you need to pass the replaced html to text method.
$('.FoundIn').text($('.FoundIn').text().replace(str,"<span class='redT'>"+str+"</span>"))
try this.
$('*:contains("Search me in a Web page")').text("<span class='redT'>Search me in a Web page</span>");

Categories

Resources