Data scraping the page JUST written with PHP - javascript

I have a page that's written with PHP, and after the PHP writes it, I want be able to search through the HTML source code to find certain tags. Is this impossible/unwise?
I tried using file_get_contents at the end of the script when everything has technically already been written to the HTML, and I think I might have broken my page temporarily that way (Hit a resource limit on my host)
My main goal is trying to figure out how I can use Javascript to alter elements of my page one by one. Which I figure I could do if I could find the html tags I'm trying to change...which the PHP wrote...in the same page.
Very new to Javascript you see.

You could do this fairly easily, client side, with jquery.
If you absolutely need to process it server side with php and you absolutely can't do it while generating the code, you could use ob_start() to capture the output and then ob_get_contents() to drop it into a string before doing ob_end_clean() to flush it to the browser.

You can just right click on your rendered web page in most browsers and choose some variant of 'View Source'. Or, you can cURL your page's content, and view it as a text file.
Also, file_get_contents(); makes a new request to get a page / file's contents. So, if you load a page, and at the bottom, it tries to get the page content, it's going to load a new page, then another, forever. You're creating an infinite loop, and exhausting your allocated resources, as dictated by your hosting provider.

if I did not understand wrong, after page is loaded you want to change your own html output so,
<?php
echo "<div id='mydiv'></div>";
?>
<script type="text/javascript">
window.onload = function() {
document.getElementById("mydiv").innerHTML = "updated html";
}
</script>

Unless you're capturing the output as you generate the page, e.g.
<?php
ob_start();
.... page building here ...
$page = ob_get_clean();
echo $page;
?>
there will be NOTHING for you to work on. However, if you are capturing as above, then you can simply feed $page into DOM and manipulate it there.
But this begs the question... if you need to change the page after it's been built, why not just change how it's built in the first place?

Related

Can you reload an iframe using PHP when an IF condition is met?

So I have two pages, my PHP script and then an HTML page with two iframes in it.
> <html>
>
> <iframe name=1></iframe> <iframe name=2></iframe>
>
> </html>
Iframe2 is currently housing my PHP script. My PHP script is something like below:
<?php
if this {
do this
}
?>
What I am trying to find out is if it's possible for my IF condition within my PHP script can reload iFrame1? I've played around with different methods and onClick obviously works to reload the frame, but I need the frame to reload based on the IF condition being met and I can't really figure that out. TIA!
PHP is not able to change/reload a frame on client browser. PHP is a server-site language. You have to use javascript here like you found out: "onClick" (javascript) works.
If you like to, you can use javascript combined with ajax to ask "PHP" on the server if javascript should reload the frame.
But PHP can NOT reload anything on the browser after document is loaded.

Prevent external html to interfere entire page template

My mission is to explore blogs and get their latest post. Now I have script that do the task and store the content as html in database.
Everything works properly except template inference. Means that if the content html code for example has an extra </div> or forget to close a tag, it will ruin my entire page.
Question: Is there any way to limit the external content to one division and therefore if external code had some problems, just influence template of that div box not entire template?
Link to correct template
Link to damaged template
Thanks in advance
We can simplify that by using a library that fix the malformed code that was scrapped.
You can do like that:
<?php
$content = '<div><p>I am a bad guy, and i am gonna put an additional div at the end.</p></div></div>';
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
libxml_clear_errors();
$content = trim($dom->saveHTML());
echo $content;
It will return:
<div><p>I am a bad guy, and i am gonna put an additional div at the end.</p></div>
The only safe way to ensure it doesn't affect anything else on your page, as far as I'm aware, is to iframe it. Anything else is going to be injecting into your page, so you'd be risking things you've mentioned like unclosed tags, style tags that override your CSS, potentially malicious JS etc unless you do some serious parsing and error correction. Some of this is done by things like JQuery's AJAX function, but if you can't risk anything at all, I'd go with an iframe that displays a page that renders your scraped content.

Better way to write HTML instead of document.write()

On my website I have a menu button that goes on every page and also a comments section. Instead of copying and pasting this into every single HTML file I created a JavaScript file that creates all of the HTML via the document.write function. This works fine, but as it is getting more and more lengthy and complicated it is also getting harder and harder to find elements and attributes since they are all squashed in one line.
I want to know if there is a better way to do this because I feel this is not the correct way due to it being so messing and disorganized.
I am just using a JavaScript file. It would look something like this:
document.write("<div id="id"></div>");
but with a lot more HTML.
I would suggest templating with a server side language such as PHP. This will allow you to format your different sections so that they are easily readable. Also it will work even if JavaScript is turned off on the browser.
<html>
<head></head>
<?php require("menu.php"); ?>
<!-- HTML body content -->
<?php require("comments.php"); ?>
</html>
If you want to stick with a client side approach then you can just put your menu and comments into separate html files and use jQuery to load it using
$('#Menu').load('menu.html');
$('#CommentSection').load('comments.html');
You can use jquery
Put your button in its own .html file like button.html with .load() in main html file.
$('#WhereYouWantItID').load('whatfolder/button.html');
This will load the button.html file to a specific target on your page

accessing dynamically created iframe contents/elements like textbox, label from javascript

i am dynamically creating an iframe then calling aspx pages into it, what i need to do is access the elements of the iframe page and change its ( only specific element like text box or label etc.) value without reloading the whole page.
the first task is to access the elements of pages being called into my iframe, i am trying to acess them with javascript but no progress so far.
i have tried various solution like this :
How to get the body's content of an iframe in Javascript?
Actually, the answer you've attached should work. But note that this is only true in case that your parent page and the iframe URL are loaded from the same host (domain name). if not, you will get an error message from your browser stating that this operation is blocked.
If you are trying to show another site through and iframe and then manipulate it then you have to give up this dream because it can't happen that simply.
I can think of one solution for you, not sure about the legality of it, and it is kind of a pain in the ass.
You can open up a server side script on your own domain that receive a URL, fetches it's content and then echo it. This way you get the original desired page contents but you have it on your own host so you can manipulate it as mention in the attached answer.
Note that it's not easy at all to control from there, because once a user clicks a link in the page his out of your control again, so you may want to change all the page links to the address of your server side script and attach the original link to let it fetch it for you. Probably a lot more issues that i haven't thought about.
PHP Example of such a function:
function fetchURL() {
$urlToFetch = urldecode($_GET['url']);
$contents = file_get_contents($urlToFetch);
// maybe here manipulate links and other stuff throw str_replace or,
// if you want more control over it, you may want to load it in to some DOM parser class,
// manipulate it and extract the result back to a string variable.
echo $contents;
}
Note that in that case you should load the script through the iframe with the desired URL as a query string like that:
$yourDesiredURL = 'http://www.example.com';
echo '<iframe src="http://www.yourdomain.com/your/script/path.php?url=' . urlencode($yourDesiredURL) . '"></iframe>';
*************** EDIT *****************
Actually now i see that you tagged .NET, so my example code is probably not the best for you, but since it's very short and simple it wouldn't be any problem converting it.
Again, i want to say that iv'e never tried it and it's probably over your (and my) head, maybe you better give up on the idea.

Loading portion of site into iframe

I have a script which loads content into a iframe using:
document.GetElementById('iframeid').src = 'site.html';
Now, the problem is that I want only a portion of the page loaded. If I could use jQuery I would specify which part by:
$('#iframeid').load(src = 'site.html .classforportion');
However if seems like jQuery .load doesn't work for iframes.
Is it possible to 'fool' the browser into grabbing the contents of a variable or a function for the first case? Or is it possible to create temporary 'html'-files only for so long that the script would load the right contents into it and pass it to an iframe (hm... seems unlikely...).
A work around solution for this specific case would be removing the first line of the loaded page. Is that possible with javascript or jQuery? Is it possible for this situation involving the iframe?
Thanks, going crazy over this thing! So, any help appreciated!
Ive never heard of a way of doing this with javascript since you cannot access files outside your own domain with javascript. You can do like this with PHP
<?php
function get_content_part($part=null, $website = null)
{
$website = file_get_contents($website);
$data = str_replace("\n", '', $website);
echo preg_match("/\<div class\=\"'.$part.'\"\>(.*)<\/div\>/",$data,$match);
return $match;
}
?>
Maybe integrate this with a javascript Ajax call and you can do it by javascript calls.
Best regards
Jonas
Now, the problem is that I want only a portion of the page loaded
Load page into the invisible container (iframe, div, etc..) or in a javascript string variable and do whatever you want to do with it. For example, load partial content into invisible div and than copy it into iframe or load the hole page into invisible iframe and then copy the part you need into another, visible one.

Categories

Resources