Prevent external html to interfere entire page template - javascript

My mission is to explore blogs and get their latest post. Now I have script that do the task and store the content as html in database.
Everything works properly except template inference. Means that if the content html code for example has an extra </div> or forget to close a tag, it will ruin my entire page.
Question: Is there any way to limit the external content to one division and therefore if external code had some problems, just influence template of that div box not entire template?
Link to correct template
Link to damaged template
Thanks in advance

We can simplify that by using a library that fix the malformed code that was scrapped.
You can do like that:
<?php
$content = '<div><p>I am a bad guy, and i am gonna put an additional div at the end.</p></div></div>';
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
libxml_clear_errors();
$content = trim($dom->saveHTML());
echo $content;
It will return:
<div><p>I am a bad guy, and i am gonna put an additional div at the end.</p></div>

The only safe way to ensure it doesn't affect anything else on your page, as far as I'm aware, is to iframe it. Anything else is going to be injecting into your page, so you'd be risking things you've mentioned like unclosed tags, style tags that override your CSS, potentially malicious JS etc unless you do some serious parsing and error correction. Some of this is done by things like JQuery's AJAX function, but if you can't risk anything at all, I'd go with an iframe that displays a page that renders your scraped content.

Related

Better way to write HTML instead of document.write()

On my website I have a menu button that goes on every page and also a comments section. Instead of copying and pasting this into every single HTML file I created a JavaScript file that creates all of the HTML via the document.write function. This works fine, but as it is getting more and more lengthy and complicated it is also getting harder and harder to find elements and attributes since they are all squashed in one line.
I want to know if there is a better way to do this because I feel this is not the correct way due to it being so messing and disorganized.
I am just using a JavaScript file. It would look something like this:
document.write("<div id="id"></div>");
but with a lot more HTML.
I would suggest templating with a server side language such as PHP. This will allow you to format your different sections so that they are easily readable. Also it will work even if JavaScript is turned off on the browser.
<html>
<head></head>
<?php require("menu.php"); ?>
<!-- HTML body content -->
<?php require("comments.php"); ?>
</html>
If you want to stick with a client side approach then you can just put your menu and comments into separate html files and use jQuery to load it using
$('#Menu').load('menu.html');
$('#CommentSection').load('comments.html');
You can use jquery
Put your button in its own .html file like button.html with .load() in main html file.
$('#WhereYouWantItID').load('whatfolder/button.html');
This will load the button.html file to a specific target on your page

accessing dynamically created iframe contents/elements like textbox, label from javascript

i am dynamically creating an iframe then calling aspx pages into it, what i need to do is access the elements of the iframe page and change its ( only specific element like text box or label etc.) value without reloading the whole page.
the first task is to access the elements of pages being called into my iframe, i am trying to acess them with javascript but no progress so far.
i have tried various solution like this :
How to get the body's content of an iframe in Javascript?
Actually, the answer you've attached should work. But note that this is only true in case that your parent page and the iframe URL are loaded from the same host (domain name). if not, you will get an error message from your browser stating that this operation is blocked.
If you are trying to show another site through and iframe and then manipulate it then you have to give up this dream because it can't happen that simply.
I can think of one solution for you, not sure about the legality of it, and it is kind of a pain in the ass.
You can open up a server side script on your own domain that receive a URL, fetches it's content and then echo it. This way you get the original desired page contents but you have it on your own host so you can manipulate it as mention in the attached answer.
Note that it's not easy at all to control from there, because once a user clicks a link in the page his out of your control again, so you may want to change all the page links to the address of your server side script and attach the original link to let it fetch it for you. Probably a lot more issues that i haven't thought about.
PHP Example of such a function:
function fetchURL() {
$urlToFetch = urldecode($_GET['url']);
$contents = file_get_contents($urlToFetch);
// maybe here manipulate links and other stuff throw str_replace or,
// if you want more control over it, you may want to load it in to some DOM parser class,
// manipulate it and extract the result back to a string variable.
echo $contents;
}
Note that in that case you should load the script through the iframe with the desired URL as a query string like that:
$yourDesiredURL = 'http://www.example.com';
echo '<iframe src="http://www.yourdomain.com/your/script/path.php?url=' . urlencode($yourDesiredURL) . '"></iframe>';
*************** EDIT *****************
Actually now i see that you tagged .NET, so my example code is probably not the best for you, but since it's very short and simple it wouldn't be any problem converting it.
Again, i want to say that iv'e never tried it and it's probably over your (and my) head, maybe you better give up on the idea.

Is it bad to have script tag inside div?

What is bad about having a script tag inside div inside body?
I'm dynamically updating a div to reload a javascript code inside a div. Are there any issues to worry about ?
Edit
As #Bergi insisted on seeing the code. Here it is(see below). This div (along with other div(s) containing presentation HTML elements) are updated via AJAX. This script inside div contains maps to do processing of newly loaded HTML elements on page with raw data.
<div>
<script type="text/javascript">
var namesMap = <dynamic string from server here>;
var addressesMap = <dynamic string from server here>;
</script>
</div>
It is perfectly ok to place the <script> tag anywhere in the body of the document.
From here,
The SCRIPT element places a script within a document. This element may appear any number of times in the HEAD or BODY of an HTML document.
However, whenever a <script> tag occurs, it pauses the parsing of the code till the script gets loaded, and executed.
You can add <script></script> inside a DIV tag. Just check on w3c, it is valid HTML.
There is not much bad about it. Most widgets work this way. It is still valid HTML.
If you want to embed an AdSense unit in your page, you will need to do it. The same with Amazon widgets. That means majority of websites have a script tag inside div.
There are pros and cons for putting scripts inside html. The good thing is that a small script can be placed close to where it is used so you can more easily understand what the page is doing.
If nobody else but that one location needs that script then it is fine to put it there, I think.
Bad thing is that when you divide parts of your program into multiple locations it becomes more difficult to see and manage how such parts interact and interfere with each other. Whereas if you keep your html and javascript in separate files it becomes easier to understand each independently, and then finally focus on how they interact with each other. What are the "interfaces" between them.
If JavaScript is interspersed into the html then you can not organize your script-code separately from organizing the HTML.
ONE MORE THING to be aware of: If you have a DIV you may think that you can manipulate its content by re-assigning its innerHTML. And that works, except, you can not inject a script into the DIV that way. SEE:
Can scripts be inserted with innerHTML?
So one bad thing about having a script inside a DIV is that you can not replace such a script by re-assigning its innerHTML.
By SCRIPT inside <DIV> still working.
But some annoy with your layout - shacking when scroll.
Best solution: put script inside <body> or <head> :D
It was always a good practice to try to put your <script></script> tags in <head></head>. However, lately arguments appearead whether putting a tags at the end of <body></body> tags, just before made a page more faster.
I would recommend to put your <script></script> in <head></head> section of your HTML document, since it is more preffered. Additionally, putting a <script></script> inside a DIV is not a good practice.
You can post your example for a better answer abour organizing the structure of your document.
To sum up, there is no problem in what you are doing. But a more organized way is what I suggest.

Data scraping the page JUST written with PHP

I have a page that's written with PHP, and after the PHP writes it, I want be able to search through the HTML source code to find certain tags. Is this impossible/unwise?
I tried using file_get_contents at the end of the script when everything has technically already been written to the HTML, and I think I might have broken my page temporarily that way (Hit a resource limit on my host)
My main goal is trying to figure out how I can use Javascript to alter elements of my page one by one. Which I figure I could do if I could find the html tags I'm trying to change...which the PHP wrote...in the same page.
Very new to Javascript you see.
You could do this fairly easily, client side, with jquery.
If you absolutely need to process it server side with php and you absolutely can't do it while generating the code, you could use ob_start() to capture the output and then ob_get_contents() to drop it into a string before doing ob_end_clean() to flush it to the browser.
You can just right click on your rendered web page in most browsers and choose some variant of 'View Source'. Or, you can cURL your page's content, and view it as a text file.
Also, file_get_contents(); makes a new request to get a page / file's contents. So, if you load a page, and at the bottom, it tries to get the page content, it's going to load a new page, then another, forever. You're creating an infinite loop, and exhausting your allocated resources, as dictated by your hosting provider.
if I did not understand wrong, after page is loaded you want to change your own html output so,
<?php
echo "<div id='mydiv'></div>";
?>
<script type="text/javascript">
window.onload = function() {
document.getElementById("mydiv").innerHTML = "updated html";
}
</script>
Unless you're capturing the output as you generate the page, e.g.
<?php
ob_start();
.... page building here ...
$page = ob_get_clean();
echo $page;
?>
there will be NOTHING for you to work on. However, if you are capturing as above, then you can simply feed $page into DOM and manipulate it there.
But this begs the question... if you need to change the page after it's been built, why not just change how it's built in the first place?

Programmatically remove <script src="/unwanted.js".. /> reference

I have partial control of a web page where by I can enter snippets of code at various places, but I cannot remove any preexisting code.
There is a script reference midway through the page
<script src="/unwanted.js" type="text/javascript"></script>
but I do not want the script to load. I cannot access the unwanted.js file. Is there anyway I can use javascript executing above this refernce to cause the unwanted.js file not to load?
Edit: To answer the comments asking what and why:
I'm setting up a Stack Exchange site and the WMD* js file loads halfway down the page. SE will allow you to insert HTML in various parts of the page - so you can have your custom header and footer etc. I want to override the standard WMD code with my own version of it.
I can get around the problem by just loading javascript after the original WMD script loads and replacing the functions with my own - but it would be nice not to have such a large chunk of JS load needlessly.
*WMD = the mark down editor used here at SO, and on the SE sites.
In short, you can't. Even if there is a hack, it would heavily depend on the way browsers parse the HTML and load the scripts and hence wouldn't be compatible with all browsers.
Please tell us exactly what you can and cannot do, and (preferably; this sounds fascinating) why.
If you can, try inserting <!-- before the script include and --> afterwards to comment it out.
Alternatively, look through the script file and see if there's any way that you could break it or nullify its effects. (this would depend entirely on the script itself; if you want more specific advice, please post more details, or preferably, the script itself.
Could you start an HTML comment above it and end below it in another block?
What does the contents of unwanted.js look like?
You can remove a script from the DOM after it is called by using something simple such as:
s = document.getElementById ("my_script");
s.parentNode.removeChild(s);
This will stop all functions of the script but will not take it out of user's cache. However like you wanted it can't be used.
Basically you can't unless you have access to the page content before you render it.
If you can manipulate the HTML before you send it off to the browser, you can write a regular expression that will match the desired piece of code, and remove it.

Categories

Resources