What I'm trying to do is read a specific line from a webpage from inside of my PHP application. This is my experimental setup thus far:
<?php
$url = "http://www.some-web-site.com";
$file_contents = file_get_contents($url);
$findme = 'text to be found';
$pos = strpos($file_contents, $findme);
if ($pos == false) {
echo "The string '$findme' was not found in the string";
} else {
echo "The string '$findme' was found in the string";
echo " and exists at position $pos";
}
?>
The "if" statements contain echo operators for now, this will change to database operators later on, the current setup is to test functionality.
Basically the problem is, with using this method any java on the page is returned as script. What I need is the text that the script is supposed to render inside the browser. Is there any way to do this within PHP?
What I'm ultimately trying to achieve is updating stock from within an ecommerce site via reading the stock level from the site's supplier. The supplier does not use RSS feeds for this.
cURL does not have a javascript parser. as such, if the content you are trying to read is placed in the page via Javascript after initial page render, then it will not be accesible via cURL.
The result of the script is supposed executed and return back to your script.
PHP doesn't support any feature about web browser itself.
I suggest you try to learn about "web crawler" and "webbrowsers" which are included in .NET framework ( not PHP )
so that you can use the exec() command in php to call it.
try to find out the example code of web crawler and web browsers on codeproject.com
hope it works.
You can get the entire web page as a file like this:
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('http://example.com/page.htm');
$my_file = 'file.htm';
$handle = fopen($my_file, 'w') or die('Cannot open file: '.$my_file);
fwrite($handle, $returned_content);
Then I suppose you can use a class such as explained in this link below as a guide to separate the javascript from the html (its in the head tags usually). for linked(imported) .js files you would have to repeat the function for those urls, and also for linked/imported css. You can also grab images if you need to save them as files.
http://www.digeratimarketing.co.uk/2008/12/16/curl-page-scraping-script/
Related
How do I prevent XSS (cross-site scripting) using just HTML and PHP?
I've seen numerous other posts on this topic but I have not found an article that clear and concisely states how to actually prevent XSS.
Basically you need to use the function htmlspecialchars() whenever you want to output something to the browser that came from the user input.
The correct way to use this function is something like this:
echo htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
Google Code University also has these very educational videos on Web Security:
How To Break Web Software - A look at security vulnerabilities in
web software
What Every Engineer Needs to Know About Security
and Where to Learn It
One of the most important steps is to sanitize any user input before it is processed and/or rendered back to the browser. PHP has some "filter" functions that can be used.
The form that XSS attacks usually have is to insert a link to some off-site javascript that contains malicious intent for the user. Read more about it here.
You'll also want to test your site - I can recommend the Firefox add-on [XSS Me]. Looks like Easy XSS is now the way to go.
Cross-posting this as a consolidated reference from the SO Documentation beta which is going offline.
Problem
Cross-site scripting is the unintended execution of remote code by a web client. Any web application might expose itself to XSS if it takes input from a user and outputs it directly on a web page. If input includes HTML or JavaScript, remote code can be executed when this content is rendered by the web client.
For example, if a 3rd party side contains a JavaScript file:
// http://example.com/runme.js
document.write("I'm running");
And a PHP application directly outputs a string passed into it:
<?php
echo '<div>' . $_GET['input'] . '</div>';
If an unchecked GET parameter contains <script src="http://example.com/runme.js"></script> then the output of the PHP script will be:
<div><script src="http://example.com/runme.js"></script></div>
The 3rd party JavaScript will run and the user will see "I'm running" on the web page.
Solution
As a general rule, never trust input coming from a client. Every GET parameter, POST or PUT content, and cookie value could be anything at all, and should therefore be validated. When outputting any of these values, escape them so they will not be evaluated in an unexpected way.
Keep in mind that even in the simplest applications data can be moved around and it will be hard to keep track of all sources. Therefore it is a best practice to always escape output.
PHP provides a few ways to escape output depending on the context.
Filter Functions
PHPs Filter Functions allow the input data to the php script to be sanitized or validated in many ways. They are useful when saving or outputting client input.
HTML Encoding
htmlspecialchars will convert any "HTML special characters" into their HTML encodings, meaning they will then not be processed as standard HTML. To fix our previous example using this method:
<?php
echo '<div>' . htmlspecialchars($_GET['input']) . '</div>';
// or
echo '<div>' . filter_input(INPUT_GET, 'input', FILTER_SANITIZE_SPECIAL_CHARS) . '</div>';
Would output:
<div><script src="http://example.com/runme.js"></script></div>
Everything inside the <div> tag will not be interpreted as a JavaScript tag by the browser, but instead as a simple text node. The user will safely see:
<script src="http://example.com/runme.js"></script>
URL Encoding
When outputting a dynamically generated URL, PHP provides the urlencode function to safely output valid URLs. So, for example, if a user is able to input data that becomes part of another GET parameter:
<?php
$input = urlencode($_GET['input']);
// or
$input = filter_input(INPUT_GET, 'input', FILTER_SANITIZE_URL);
echo 'Link';
Any malicious input will be converted to an encoded URL parameter.
Using specialised external libraries or OWASP AntiSamy lists
Sometimes you will want to send HTML or other kind of code inputs. You will need to maintain a list of authorised words (white list) and un-authorized (blacklist).
You can download standard lists available at the OWASP AntiSamy website. Each list is fit for a specific kind of interaction (ebay api, tinyMCE, etc...). And it is open source.
There are libraries existing to filter HTML and prevent XSS attacks for the general case and performing at least as well as AntiSamy lists with very easy use.
For example you have HTML Purifier
In order of preference:
If you are using a templating engine (e.g. Twig, Smarty, Blade), check that it offers context-sensitive escaping. I know from experience that Twig does. {{ var|e('html_attr') }}
If you want to allow HTML, use HTML Purifier. Even if you think you only accept Markdown or ReStructuredText, you still want to purify the HTML these markup languages output.
Otherwise, use htmlentities($var, ENT_QUOTES | ENT_HTML5, $charset) and make sure the rest of your document uses the same character set as $charset. In most cases, 'UTF-8' is the desired character set.
Also, make sure you escape on output, not on input.
Many frameworks help handle XSS in various ways. When rolling your own or if there's some XSS concern, we can leverage filter_input_array (available in PHP 5 >= 5.2.0, PHP 7.)
I typically will add this snippet to my SessionController, because all calls go through there before any other controller interacts with the data. In this manner, all user input gets sanitized in 1 central location. If this is done at the beginning of a project or before your database is poisoned, you shouldn't have any issues at time of output...stops garbage in, garbage out.
/* Prevent XSS input */
$_GET = filter_input_array(INPUT_GET, FILTER_SANITIZE_STRING);
$_POST = filter_input_array(INPUT_POST, FILTER_SANITIZE_STRING);
/* I prefer not to use $_REQUEST...but for those who do: */
$_REQUEST = (array)$_POST + (array)$_GET + (array)$_REQUEST;
The above will remove ALL HTML & script tags. If you need a solution that allows safe tags, based on a whitelist, check out HTML Purifier.
If your database is already poisoned or you want to deal with XSS at time of output, OWASP recommends creating a custom wrapper function for echo, and using it EVERYWHERE you output user-supplied values:
//xss mitigation functions
function xssafe($data,$encoding='UTF-8')
{
return htmlspecialchars($data,ENT_QUOTES | ENT_HTML401,$encoding);
}
function xecho($data)
{
echo xssafe($data);
}
<?php
function xss_clean($data)
{
// Fix &entity\n;
$data = str_replace(array('&','<','>'), array('&','<','>'), $data);
$data = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $data);
$data = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $data);
$data = html_entity_decode($data, ENT_COMPAT, 'UTF-8');
// Remove any attribute starting with "on" or xmlns
$data = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+>#iu', '$1>', $data);
// Remove javascript: and vbscript: protocols
$data = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $data);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $data);
// Remove namespaced elements (we do not need them)
$data = preg_replace('#</*\w+:\w[^>]*+>#i', '', $data);
do
{
// Remove really unwanted tags
$old_data = $data;
$data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
}
while ($old_data !== $data);
// we are done...
return $data;
}
You are also able to set some XSS related HTTP response headers via header(...)
X-XSS-Protection "1; mode=block"
to be sure, the browser XSS protection mode is enabled.
Content-Security-Policy "default-src 'self'; ..."
to enable browser-side content security. See this one for Content Security Policy (CSP) details: http://content-security-policy.com/
Especially setting up CSP to block inline-scripts and external script sources is helpful against XSS.
for a general bunch of useful HTTP response headers concerning the security of you webapp, look at OWASP: https://www.owasp.org/index.php/List_of_useful_HTTP_headers
Use htmlspecialchars on PHP. On HTML try to avoid using:
element.innerHTML = “…”;
element.outerHTML = “…”;
document.write(…);
document.writeln(…);
where var is controlled by the user.
Also obviously try avoiding eval(var),
if you have to use any of them then try JS escaping them, HTML escape them and you might have to do some more but for the basics this should be enough.
The best way to protect your input it's use htmlentities function.
Example:
htmlentities($target, ENT_QUOTES, 'UTF-8');
You can get more information here.
How can I store text from a site to a local file?
So basically the script needs to do the following:
go to this site (fake site)
http://website/webtv/secure?url=http://streamserver.net/channel/channel.m3u8**&TIMESTAMP**
where TIMESTAMP can be a timestamp to make it unique.
the site will respond with:
{
"url":"http://streamserver.net/channel/channel.m3u8?st=8frnWMzvuN209i-JaQ1iXA\u0026e=1451001462",
"alternateUrl":"",
"Ip":"IPADRESS"
}
Grab the url and convert the text as follows:
http://streamserver.net/channel/channel.m3u8?st=8frnWMzvuN209i-JaQ1iXA\u0026e=1451001462
must be:
http://streamserver.net/channel/channel.m3u8?st=8frnWMzvuN209i-JaQ1iXA&e=1451001462
so \u0026e is replaced by &
and store this text in a local m3u8 file.
I am looking for a script either php or any other code is welcome which can perform this. Any help is appreciated.
I tried a small script just to show the contents but then I get the error:
Failed to open stream: HTTP request Failed!
It seems that php tries to open it as a stream instead of a website. It should see it as a site because only then the response is sent.
<?php
$url = 'http://website/webtv/secure?url=http://streamserver.net/channel/channel.m3u8&1';
$output = file_get_contents($url);
echo $output;
?>
This is not a tutorial website, so I am not going to provide you more details. You can try the following code:
<?php
$json_url = "http://linktoyour.site"; //change the url to your needs
$data = file_get_contents($json_url); //Get the content from url
$json = json_decode($data, true); //Decodes string to JSON Object
$data_to_save=$json["url"]; //Change url to whatever key you want value of
$file = 'm3u8.txt'; //Change File name to your desire
file_put_contents($file, $data_to_save); //Writes to File
?>
I think there is issue with your PHP configuration.
It like as allow_url_fopen is denied.
See more http://php.net/manual/en/filesystem.configuration.php#ini.allow-url-fopen
How do I prevent XSS (cross-site scripting) using just HTML and PHP?
I've seen numerous other posts on this topic but I have not found an article that clear and concisely states how to actually prevent XSS.
Basically you need to use the function htmlspecialchars() whenever you want to output something to the browser that came from the user input.
The correct way to use this function is something like this:
echo htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
Google Code University also has these very educational videos on Web Security:
How To Break Web Software - A look at security vulnerabilities in
web software
What Every Engineer Needs to Know About Security
and Where to Learn It
One of the most important steps is to sanitize any user input before it is processed and/or rendered back to the browser. PHP has some "filter" functions that can be used.
The form that XSS attacks usually have is to insert a link to some off-site javascript that contains malicious intent for the user. Read more about it here.
You'll also want to test your site - I can recommend the Firefox add-on [XSS Me]. Looks like Easy XSS is now the way to go.
Cross-posting this as a consolidated reference from the SO Documentation beta which is going offline.
Problem
Cross-site scripting is the unintended execution of remote code by a web client. Any web application might expose itself to XSS if it takes input from a user and outputs it directly on a web page. If input includes HTML or JavaScript, remote code can be executed when this content is rendered by the web client.
For example, if a 3rd party side contains a JavaScript file:
// http://example.com/runme.js
document.write("I'm running");
And a PHP application directly outputs a string passed into it:
<?php
echo '<div>' . $_GET['input'] . '</div>';
If an unchecked GET parameter contains <script src="http://example.com/runme.js"></script> then the output of the PHP script will be:
<div><script src="http://example.com/runme.js"></script></div>
The 3rd party JavaScript will run and the user will see "I'm running" on the web page.
Solution
As a general rule, never trust input coming from a client. Every GET parameter, POST or PUT content, and cookie value could be anything at all, and should therefore be validated. When outputting any of these values, escape them so they will not be evaluated in an unexpected way.
Keep in mind that even in the simplest applications data can be moved around and it will be hard to keep track of all sources. Therefore it is a best practice to always escape output.
PHP provides a few ways to escape output depending on the context.
Filter Functions
PHPs Filter Functions allow the input data to the php script to be sanitized or validated in many ways. They are useful when saving or outputting client input.
HTML Encoding
htmlspecialchars will convert any "HTML special characters" into their HTML encodings, meaning they will then not be processed as standard HTML. To fix our previous example using this method:
<?php
echo '<div>' . htmlspecialchars($_GET['input']) . '</div>';
// or
echo '<div>' . filter_input(INPUT_GET, 'input', FILTER_SANITIZE_SPECIAL_CHARS) . '</div>';
Would output:
<div><script src="http://example.com/runme.js"></script></div>
Everything inside the <div> tag will not be interpreted as a JavaScript tag by the browser, but instead as a simple text node. The user will safely see:
<script src="http://example.com/runme.js"></script>
URL Encoding
When outputting a dynamically generated URL, PHP provides the urlencode function to safely output valid URLs. So, for example, if a user is able to input data that becomes part of another GET parameter:
<?php
$input = urlencode($_GET['input']);
// or
$input = filter_input(INPUT_GET, 'input', FILTER_SANITIZE_URL);
echo 'Link';
Any malicious input will be converted to an encoded URL parameter.
Using specialised external libraries or OWASP AntiSamy lists
Sometimes you will want to send HTML or other kind of code inputs. You will need to maintain a list of authorised words (white list) and un-authorized (blacklist).
You can download standard lists available at the OWASP AntiSamy website. Each list is fit for a specific kind of interaction (ebay api, tinyMCE, etc...). And it is open source.
There are libraries existing to filter HTML and prevent XSS attacks for the general case and performing at least as well as AntiSamy lists with very easy use.
For example you have HTML Purifier
In order of preference:
If you are using a templating engine (e.g. Twig, Smarty, Blade), check that it offers context-sensitive escaping. I know from experience that Twig does. {{ var|e('html_attr') }}
If you want to allow HTML, use HTML Purifier. Even if you think you only accept Markdown or ReStructuredText, you still want to purify the HTML these markup languages output.
Otherwise, use htmlentities($var, ENT_QUOTES | ENT_HTML5, $charset) and make sure the rest of your document uses the same character set as $charset. In most cases, 'UTF-8' is the desired character set.
Also, make sure you escape on output, not on input.
Many frameworks help handle XSS in various ways. When rolling your own or if there's some XSS concern, we can leverage filter_input_array (available in PHP 5 >= 5.2.0, PHP 7.)
I typically will add this snippet to my SessionController, because all calls go through there before any other controller interacts with the data. In this manner, all user input gets sanitized in 1 central location. If this is done at the beginning of a project or before your database is poisoned, you shouldn't have any issues at time of output...stops garbage in, garbage out.
/* Prevent XSS input */
$_GET = filter_input_array(INPUT_GET, FILTER_SANITIZE_STRING);
$_POST = filter_input_array(INPUT_POST, FILTER_SANITIZE_STRING);
/* I prefer not to use $_REQUEST...but for those who do: */
$_REQUEST = (array)$_POST + (array)$_GET + (array)$_REQUEST;
The above will remove ALL HTML & script tags. If you need a solution that allows safe tags, based on a whitelist, check out HTML Purifier.
If your database is already poisoned or you want to deal with XSS at time of output, OWASP recommends creating a custom wrapper function for echo, and using it EVERYWHERE you output user-supplied values:
//xss mitigation functions
function xssafe($data,$encoding='UTF-8')
{
return htmlspecialchars($data,ENT_QUOTES | ENT_HTML401,$encoding);
}
function xecho($data)
{
echo xssafe($data);
}
<?php
function xss_clean($data)
{
// Fix &entity\n;
$data = str_replace(array('&','<','>'), array('&','<','>'), $data);
$data = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $data);
$data = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $data);
$data = html_entity_decode($data, ENT_COMPAT, 'UTF-8');
// Remove any attribute starting with "on" or xmlns
$data = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+>#iu', '$1>', $data);
// Remove javascript: and vbscript: protocols
$data = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $data);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $data);
// Remove namespaced elements (we do not need them)
$data = preg_replace('#</*\w+:\w[^>]*+>#i', '', $data);
do
{
// Remove really unwanted tags
$old_data = $data;
$data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
}
while ($old_data !== $data);
// we are done...
return $data;
}
You are also able to set some XSS related HTTP response headers via header(...)
X-XSS-Protection "1; mode=block"
to be sure, the browser XSS protection mode is enabled.
Content-Security-Policy "default-src 'self'; ..."
to enable browser-side content security. See this one for Content Security Policy (CSP) details: http://content-security-policy.com/
Especially setting up CSP to block inline-scripts and external script sources is helpful against XSS.
for a general bunch of useful HTTP response headers concerning the security of you webapp, look at OWASP: https://www.owasp.org/index.php/List_of_useful_HTTP_headers
Use htmlspecialchars on PHP. On HTML try to avoid using:
element.innerHTML = “…”;
element.outerHTML = “…”;
document.write(…);
document.writeln(…);
where var is controlled by the user.
Also obviously try avoiding eval(var),
if you have to use any of them then try JS escaping them, HTML escape them and you might have to do some more but for the basics this should be enough.
The best way to protect your input it's use htmlentities function.
Example:
htmlentities($target, ENT_QUOTES, 'UTF-8');
You can get more information here.
Is it possible to download the resulting HTML code after the JavaScript code on the page has been run using PHP.
For example, when the page has this jQuery code $("p").html("Hello world"); and I use file_get_content('website.com') I don't get the string "Hello world" because the JavaScript runs after the page load.
use cURL:
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Then do :
<?php echo get_data('http://theURLhere.com'); ?>
Hope that helped
Please refer to similar questions and share your research you have done so far:
Get the content (text) of an URL after Javascript has run with PHP
Get Document source After AJAX or js Action With PHP
How to get content of a javascript/ajax -loaded div on a site?
Get HTML code after javascript execution using CURL PHP
One way to achieve this would be to use Selenium, and write a custom script to gather the output from it... But I'm sure that falls far beyond the scope of what you're attempting to do.
The way I would go would be to invert the responsibility. Have the JS send the output to a PHP endpoint, and use that output however you see fit.
Here's an example.
Javascript
<script>
var outputElement = 'html';
var HTML = $(outputElement).html();
var endpoint = 'myEndpoint.php';
$.post(endpoint, { html: HTML }, function(data) {
alert('Output sent');
});
</script>
One caveat here is that you will not get the DOCTYPE declaration, or any attributes on your HTML tag, if this isn't acceptable, you may reconstruct them in the PHP file below.
PHP
<?php
$html = $_POST['html']; // Be VERY CAREFUL with what you do with this...
// If you need to have the doctype and html tag... Use your own doctype.
// $html = sprintf('<DOCTYPE html><html class="my-class">%s</html>', $html);
// Do something with the HTML.
You have to be very careful when sending HTML over POST. If you're using this HTML to output on your website, it can easily be spoofed to reveal sensitive data on your website.
Reference
jQuery.post()
I need to do what a bookmarklet does but from my page directly.
I need to pull the document.title property of a web page given that url.
So say a user types in www.google.com, I want to be able to somehow pull up google.com maybe in an iframe and than access the document.title property.
I know that bookmarklets ( the javacript that runs from bookmark bar ) can access the document.title property of any site the user is on and then ajax that information to the server.
This is essentially what I want to do but from my web page directly with out the use of a bookmarklet.
According to This question
You can achive this using PHP, try this code :
<?php
function getTitle($Url){
$str = file_get_contents($Url);
if(strlen($str)>0){
preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);
return $title[1];
}
}
//Example:
echo getTitle("http://www.washingtontimes.com/");
?>
However, i assume it is possible to read file content with JS and do the same logic of searching for the tags.
Try searching here
Unfortunately, its not that easy. For security reasons, JavaScript is not allowed to access the document object of a frame or window that is not on the same domain. This sort of thing has to be done with a request to a backend PHP script that can fetch the requested page, go through the DOM, and retrieve the text in the <title> tag. If you don't have that capability, what you're asking will be much harder.
Here is the basic PHP script, which will fetch the page and use PHP's DOM extension to parse the page's title:
<?php
$html = file_get_contents($_GET["url"]);
$dom = new DOMDocument;
$dom->loadXML($html);
$titles = $dom->getElementsByTagName('title');
foreach ($titles as $title) {
echo $title->nodeValue;
}
?>
Demo: http://www.dstrout.net/pub/title.htm
You could write a server side script that would retrieve the page for you (i.e. using curl) and pars the dom and return the desired properties as json. Then call it with ajax.