Strip HTML elements within DIV - javascript

I have a simple search engine on one of our older websites. This site is running IIS 6.0 on Windows Server 2003. The search functionality is provided by Microsoft Indexing Service.
You can see the search functionality on our website. (Just type in "speakers" and you will see some hits.
I would like to use the "FullHit" feature offered by the indexing service. When using this feature the Indexing service inserts the full hit results in between "begindetail" and "enddetail" on a target web page.
The problem that I have is that the documents that are being returned have HTML. This looks messy. (Just click on "Hit Locator Tool" in the search results above to see what I mean.)
I would like to create a DIV section such as ...
<DIV name="target">
begindetail
enddetail
</DIV>
Then, after the page is populated I would like to use javascript to strip out all of the HTML elements (but not the data) between the opening and closing DIV.
For example, <FONT color="magenta">Good Data</FONT> would be modified to only show Good Data.
I can also use Classic ASP if necessary.
Please let me know if you have any suggestions or know of any functions that I can add to the target page to accomplish this task.
Thanks in advance.

I inspected your webpage, and there definitely must be some logic errors in your ASP code. (1) Instead of something like <div></div> being passed to the browser, it is HTML entities for special characters, so it is being passed like &ltDIV&gt &lt/DIV&gt, which is very ugly and is why it is rendering as text instead of HTML code. In your ASP code, you must not be parsing the search result text before passing it to the browser. (2) All of this improperly-formatted code is inserted after the first closing html tag, and then there are closing body and html tags after the improperly-formatted code, so somewhere in your ASP code, you are telling it to append the code to the end of the document, rather than insert it inside the original <body></body>.
If you want to decode the mixture of HTML entities, <br> tags, and text into rendered HTML, this JavaScript may work:
window.onload = function() {
var text = decodeHTMLEntities(document.body.innerText);
document.write(text);
}
function decodeHTMLEntities(text) {
var entities = [
['amp', '&'],
['apos', '\''],
['#x27', '\''],
['#x2F', '/'],
['#39', '\''],
['#47', '/'],
['lt', '<'],
['gt', '>'],
['nbsp', ' '],
['quot', '"']
];
for (var i = 0, max = entities.length; i < max; ++i)
text = text.replace(new RegExp('&'+entities[i][0]+';', 'g'), entities[i][1]);
return text;
}
jsFiddle: https://jsfiddle.net/6ohc1tkr/
But first things first, you need to fix your ASP code, or whatever you use to parse and then display the search results. That's what is causing the improper formatting and display of the HTML. Show us your back-end code and then we can help you.

This is what I used to accomplish what you are trying to do.
string-strip-html
It worked pretty well for me.

I now have the search feature working as expected. I would like to thank everyone for their insightful comments. This feedback helped me identify and fix the problem.
OS: Windows Server 2003
IIS: 6.0
Microsoft Index Server
The hit locator tool will only work properly for HTML pages. If you use this tool with a simple TXT file then the results will not be displayed correctly.

Related

cordova-plugin-printer printing raw html instead of rendered html

I am using cordova-plugin-printer for one of my Cordova app. I have installed the plugin, and also able to access android native printing service. The plugin's print function can be called in app using it's function cordova.plugins.printer.print(); as described in the plugin documents here and on it's git repository.
I actually need to print a selected <div> from the HTML, using JavaScript/jQuery.
But the problem I'm facing is that when I am passing html as a var this plugin is printing raw HTML, and not the rendered HTML as desired. Where as, if I directly pass HTML code, as a parameter, it gives the desired result.
My Example Code:
When I call
cordova.plugins.printer.print('<h1>How r u</h1>');//print as desired.
But when I write code like this:
<div id="printable">
<h1>How are you</h1>
</div>
var printable = $("#printable").html();
cordova.plugins.printer.print(printable);//prints raw html
Can anyone help? or correct me what am I doing wrong? seems to be something silly, which I am probably unable to figure out.
You need to remove line breaks from your html.
Try
var printable = $("#printable").html();
printable = printable.replace(/(\r\n|\n|\r)/gm, '');
cordova.plugins.printer.print(printable);

How to make a live HTML preview textarea safe against HTML/Script Injection

I'm turning here as a last resort. I've scoured google and I'm having troubles coming to a solution. I have a form with a textarea element that allows you to type html in the area and it will render the HTML markup live as you type if you have the preview mode active. Not too different from the way StackOverflow shows the preview below a new post.
However, I have recently discovered that my functionality has a vulnerability. All I got to do is type something like:
</textarea>
<script>alert("Hello World!");</script>
<textarea style="display: none;">
And not only does this run from within the textarea live, if you save the form and reload said data on a different page this code still executes within the textarea on said different page but unbeknownst to the user; to them all the see is a textarea (if there is no alert obviously).
I found this post; Live preview of textarea input with javascript html, and attempted to refactor my JS to the accepted answer there, because I noticed I couldn't write a script tag in the JSFiddle example, though maybe that's some JSFiddle blocking that behaviour, but I couldn't get it working within my JS file.
These few lines is what I use to live render HTML markup:
$(".main").on("keyup", "#actualTextArea", function () {
$('#previewTextArea').html($('#actualTextArea').val());
});
$(".main").on("keydown", "#actualTextArea", function () {
$('#previewTextArea').html($('#actualTextArea').val());
});
Is there a way this can be refactored so it's safe? My only idea at the moment is to wipe the live preview and use a toggle on/off and encode it, but I really think this is a cool feature and would like to keep it live instead of toggle. Is there a way to "live encode" it or escape certain tags or something?
In order to sanitise your text area preview simply replace all the < and > with their html character code equivalents:
function showPreview()
{
var value = $('#writer').val().trim();
value = value.replace("<", "<");
value = value.replace(">", ">");
$('#preview').html(value);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea id="writer" onInput="showPreview();">
</textarea>
<br/>
<hr/>
<div id="preview">
</div>
Edit: Actually, I think this solution is a little cleaner, and makes the below code unnecessary. In the velocity page all that is needed is to take advantage of the Spring framework. So I replace the textarea with this like so:
#springBindEscaped("myJavaObj.textAreaText" true)
<textarea id="actualTextArea" name="${status.expression}" class="myClass" rows="10" cols="120">$!status.value</textarea>
This paired with some backend Java validation and it ends up being a much cleaner solution.
But if you want a non-spring/ velocity solution, then this below works just fine
I cobbled together a quick fix as my main purpose is to eliminate the ability for others to execute scripts easily. It's not ideal, and I"m not claiming it to be the best answer, so if someone finds a better solution, please do share. I created a "sanitize" function like so:
function sanitize(text){
var sanitized = text.replace("<script>", "");
sanitized = sanitized.replace("</script>", "");
return sanitized;
}
Then the previous two event handlers now look like:
$(".main").on("keyup", "#actualTextArea", function () {
var textAreaMarkup = $('#actualTextArea').val();
var sanitizedMarkup = sanitize(textAreaMarkup );
$('#actualTextArea').val(sanitizedMarkup);
$('#previewTextArea').html(sanitizedMarkup);
});
// This one can remain unchanged and infact needs to be
// If it's the same as above it will wipe the text area
// on a highlight-backspace
$(".main").on("keydown", "#actualTextArea", function () {
$('#previewTextArea').html($('#actualTextArea').val());
});
Along with Java side sanitation to prevent anything harmful being stored in the DB, this serves my purpose, but I'm very open to a better solution if it exists.

Prevent Javascript Injection in data attribute

I have a script that pulls a text from an API and sets that as a tooltip in my html.
<div class="item ttip" data-html="<?php echo $obj->titleTag;?>">...</div>
The API allows html and javascript to be entered on their side for that field.
I tried this $obj->titleTag = htmlentities(strip_tags_content($this->channel->status)));
I now had a user that entered the following (or similar, he is blocked now I cannot check it again):
\" <img src="xx" onerror=window.location.replace(https://www.youtube.com/watch?v=IAISUDbjXj0)>
which does not get caught by the above.
I could str_replace the window.location stuff, but that seems dirty.
What would be the right approach? I am reading a lot of "Whitelists" but I don't understand the concept for such a case.
//EDIT strip_tags_content comes from here: https://php.net/strip_tags#86964
Well, It's not tags you're replacing now but code within tags. You need to allow certain attributes in your code rather than stripping tags since you've only got one tag in there ;)
What you wanna do is check for any handlers being bound in the JS, a full list here, and then remove them if anything contains something like onerror or so

how to avoid fetching a part of html page which is being called inside another page?

I am calling a .html page(say A.html, which is dynamically created by another software each time a request is made) inside another webpage (say B.html). I am doing this by using the .load() function. Everything works fine but the problem is I donot want the so many "br" tags (empty tags) present at the end of A.html into B.html. Is there any way to avoid fetching those "br" tags into B.html? Any suggestion would be of great help. Thank you in advance.
You can't avoid loading part of a file when you are just accessing it.
The best option would be to simply remove the extra <br> tags from the document to begin with. There is probably a better way to accomplish whatever they are attempting to accomplish.
With some server-side scripting, it could be possible to strip them automatically when you load it, but would probably be pretty bothersome to do.
Instead, if you can't remove the <br> elements for some reason, what might be easier, if you are just dealing with a handful of <br> tags would be to simply strip them out.
Since you mention using the load() function, I'm guessing you are using jQuery.
If that's the case, something like this would cleanly strip out any extra <br> tags from the end of the document.
Here is a JSfiddle which will do it: http://jsfiddle.net/dMJ2F/
var html = "<p>A</p><br><p>B</p><br><p>C</p><br><br /><br/>";
var $html = $('<div>').append(html);
var $br;
while (($br = $html.find('br:last-child')).length > 0) {
$br.remove();
}
$('p').text($html.html());
Basically, throw the loaded stuff in to a div (in memory), then loop through and remove each <br> at the end until there aren't any. You could use regex to do this as well, but it runs a few risks that this jQuery method doesn't.
You shout delete the br-tags in your A.html.
Substitute them by changing the class .sequence with marging-top:30px
And have an other value in your B.html-file.
You also can run this:
$('br', '.sequence').remove();​
in the load-function. It will strip all br-tags.
You can't avoid fetching a part of your page, but you CAN fetch only a part of it.
According to the jQuery docs, you can call load like this:
$("#result").load("urlorpage #form-id");
That way, you only load the form html inside the result element.

Replacing to HTML Character Entities and reverting back

When replacing things in my chat room it comes up in the box as the 'HTML Character Entities'. However, I want it to revert back and actually show the character typed in when it is then shown in the chat room. So I am using the following code to stop any html from being entered and damaging the chat room by replacing certain html character with there entities (I want to get one or two working before I look at the others I know there are many more.) ....
Javascript
var str1 = this.value.replace(/>/g, '<');
if (str1!=this.value) this.value=str1;
var str2 = this.value.replace(/</g, '>');
if (str2!=this.value) this.value=str2;
and then the following code then displays the text after it has been entered into the database etc. and on updating the chat box it uses the following to add in the the updated messages ...
Returned from php and then displayed through the following javascript
$('#chatroomarea').append($("<p>"+ data.text[i] +"</p>"));
I have messed around with this a few times changing it to val and using
.html(.append($("<p>"+ data.text[i] +"</p>")));
Etc. But I have had no luck. I'm not quite sure how to do this I just need the HTML Character Entities to actually show up back in there true Character instead of displaying something such as... '&#62'
This might be something I need to actually put within the replacing code where it will include code of it's own on replacing such as (this is just an example I'm not exactly sure on how I would write it) ....
var str1 = this.value.replace(/>/g, '.html(<)');
Any help on this would be much appreciated, Thank you.
$('#chatroomarea').append($("<xmp>"+ data.text[i] +"</xmp>"));
HTML xmp tag
The use is deprecated, but supported in most browsers.
Another option will be to use a styled textarea , To my knowledge these two are the tags that doesn't bother rendering html tags as it is.

Categories

Resources