for a project at school I am trying to make a website that can show your grades in a prettier way than it's being done now.
I have been able to log in to the site using cURL and now I want to get the grades in a string so I can edit it with PHP.
The only problem is that cURL gets the html source code when it hasn't been edited by the javascript that gets the grades.
So basically I want the code that you get when you open firebug or inspector in a string so I can edit it with php.
Does anyone have an idea on how to do this? I have seen several posts that say that you have to wait till the page has loaded, but I have no clue on how to make my site wait for another third-party site to be loaded.
The code that I am waiting to be executed and of which I want the result is this:
<script type="text/javascript">
var widgetWrapper = $("#objectWrapper325");
if (widgetWrapper[0].timer !== undefined) {
clearTimeout( jQuery('#objectWrapper325')[0].timer );
}
widgetWrapper[0].timer = setTimeout( function() {
if (widgetWrapper[0].xhr !== undefined) {
widgetWrapper[0].xhr.abort();
}
widgetWrapper[0].xhr = jQuery.ajax({
type: 'GET',
url: "",
data: {
"wis_ajax": 1,
"ajax_object": 325,
'llnr': '105629'
},
success: function(d) {
var goodWidth = widgetWrapper.width();
widgetWrapper.html(d);
/* update width, needed for bug with standard template */
$("#objectWrapper325 .result__overview").css('width',goodWidth-$("#objectWrapper325 .result__subjectlabels").width());
}
});
}, 500+(Math.random()*1000));
</script>
First you have to understand a subtle but very important difference between using cURL to get a webpage, and using your browser visiting that same page.
1. Loading a page with a browser
When you enter the address on the location bar, the browser converts the url into an ip address . Then it tries to reach the web server with that address asking for a web page. From now on the browser will only speak HTTP with the web server. HTTP is a protocol made for carrying documents over network. The browser is actually asking for an html document (A bunch of text) from the web server. The web server answers by sending the web page to the browser. If the web page is a static page, the web server is just picking an html file and sending it over network. If it's a dynamic page, the web server use some high level code (like php) to generate to the web page then send it over.
Once the web page has been downloaded, the browser will then parse the page and interprets the html inside which produces the actual web page on the browser. During the parsing process, when the browser finds script tags it will interpret their content as javascript, which is a language used in browser to manipulate the look of the web page and do stuff inside the browser.
Remember, the web server only sent a web page containing html content he has no clue of what's javascript.
So when you load a web page on a browser the javascript is ONLY interpreted once it is downloaded on the browser.
2. What is cURL
If you take a look at curl man page, you'll learn that curl is a tool to transfer data from/to servers which can speak some supported protocols and HTTP is one of them.
When you download a page with curl, it will try to download the page the same way your browser does it but will not parse or interpret anything. cURL does not understand javascript or html, all it knows about is how to speak to web servers.
3. Solution
So what you need in your case is to download the page like cURL does it and also somehow make the javascript to be interpreted as if it was inside a browser.
If you had follwed me up to here then you're ready to take a look at CasperJS.
Related
I wait to get the html web page from https://www.collinsdictionary.com/dictionary/english/supremacy, but part of the html file is loaded by javascript. When I use HTTP.jl to get the web page with HTTP.request(), I only get part of the html file that loaded before the javascript been run, so the web page I get is different to the web page I got from Chrome. How can I get the web page as same as Chrome get? Do I have to use WebDriver.jl with is a a wrapper around Selenium WebDriver's python bindings?
part of my source:
function get_page(w::word)::Bool
response = nothing
try
response = HTTP.request("GET", "https://www.collinsdictionary.com/dictionary/$(dictionary)/$(w.org_word)",
connect_timeout=connect_timeout, readtimeout=readtimeout, retries=retries, redirect=true,proxy=proxy)
catch e
push!(w.err_log, [get_page_http_err, string(e)])
return falses
end
open("./assets/org_page.html", "w") do f
write(f, String(response.body))
end
return true
end
dictionary and w.org_word are both String, the function is in a module.
What you want is impossible to achieve with just HTTP.jl. Running the Javascript part of the page is fundamentally different -- you need a Javascript engine to do so, which is nothing simple.
And this is not a unique weakness of Julia's HTTP:
Python requests.get(url) returning javascript code instead of the page html
(recently the standard library request in python seems to added Javascript rendering ability)
I moved my website and I have a QR code (which is printed in public and can't be easily replaced) that points to a specific file on my old website that has now been moved. Currently, the URL just points to a "Not found" page on my new website. I try to use javascript in the header to catch the URL and forward it to the right URL as following:
<script type="text/javascript">
if(window.location.href === "https://www.website.com/multimedia/hoerproben/1.mp3")
{
window.location.href = "https://www.webseite.com/app/download/10079133850/1.mp3";
}
</script>
But it doesn't work. Any hints what I am doing wrong?
when you open an url, the browser makes an http request to your server for that particular resource (in your example, an mp3 file).
JavaScript is not involved at all (actually, there are so called "service workers", but they are not what you're looking for, they are meant to do caching, not redirecting). The browser does not know that your JavaScript code exists and would not execute it.
What you should do is route redirecting from server, so when the browser asks from /oldlocation/file.mp3, instead the server answers with /newlocation/file.mp3
This could be in some different way according to your server. If you have no control on how your server works, what you're asking is simply not possibile.
It won't work unless you place that code in the "Not found" page that gets served. If your URL pointed to an HTML file, you could have just placed one to do the redirect. For media files you would have to configure your server to serve an HTML file instead. Don't worry about the extension, it's the Content-Type header that determines the type of the file served. Doing this, however, is not good practice because your server would still be returning a 200 response code.
It's good practice to return 301 Moved Permanently as 101arrowz pointed out in the comments. How that can be accomplished will depend on what server you're using.
Here's how that would have been accomplished with express.js:
app.get('/multimedia/hoerproben/1.mp3', function(req, res) {
res.redirect('/app/download/10079133850/1.mp3');
});
I want a script that will scrape a certain web page every hour, and will look for a certain string inside that page.
However, when I enter that page and use `view:source", I cannot see that string in the source. I was told that it's because the string I'm looking for comes from an element that is rendered on the client side (javascript), and thus I can see it only when I manually inspect that element with Chrome console for example.
Which practice / programming language / environment, would be the most efficient to achieve what I want, considering that I want to run that script from my webhost server, which has 2.25GB RAM?
Someone suggested that I will use Pyqt4, but my web-host warned me that this will kill my RAM and hurt server performance. I should note that the script supposed to be very simple, and scrape only a single page, once in an hour.
It seems that problem could be solved with PhantomJS, as it mocks real browser's action, which extracts information from client code.
For PhantomJS with Javascript, you may check testing-javascript-with-phantomjs
For how to use PhantomJS with python, please take a look at this
Hope it helps~
I cannot see that string in the source
If you only need to fetch one string of the page you might program to do the same what js performs.
If JS sends ajax request (GET or POST), you also do it using pure Python thus fetching the missing string.
Suppose in-page script performs the following (NB. code might be in pure JS see here an example):
$.ajax({
url: "test.html",
context: document.body
}).done(function() {
$( this ).addClass( "done" );
});
so in your Python scripting you request the 'test.html' file:
import requests
base='http://example.com/'
r = requests.get( base + 'test.html')
thus getting the data desired:
print r.headers['content-type']
// 'application/json; charset=utf8'
print r.text
// u'{"data":"<string>"...'
I want to load a external webpage on my own server and add my own header. Also i need to use the data from the external website like url and content (i need to search and find specific data, check if i got that data in my system and show my data in the header). The external webpage needs to be working (like the buttons for opening other pages, no new windows).
I know i can play with .NET to create software but i want to create a website that will do the trick. Can this be done? Php + iframe is to simple i think, that won't give me the data from external website and my server won't see changes in the external url (what i need).
If it's supposed to be client-side, then you can acquire the data necessary by using an Ajax request, parsing it in JavaScript and then just inserting it into an element. However you have to take into account that if the host doesn't support cross-origin resource sharing, then you won't be able to do it like this.
Ajax page source request: get full html source code of page through ajax request through javascript
Parsing elements from the source: http://ajaxian.com/archives/html-parser-in-javascript (not sure if useful)
Changing the element body:
// data --> the content you want to display in your element
document.getElementById('yourElement').innerHtml = data;
Other approach (server-side though) is to "act" like a browser by faking your user-agent to some browser's and then using cUrl for example to get the source. But you don't want to fake it, because that's not nice and you would feel bad..
Hope it gets you started!
I want to make an app that can display on any webpage, just like how Disqus or IntenseDebate render on articles & web pages.
It will display a mini-ecommerce store front.
I'm not sure how to get started.
Is there any sample code, framework, or design pattern for these "widgets"?
For example, I'd like to display products.
Should I first create a webservice or RSS that lists all of them?
Or can one of these Ajax Scripts simply digest an XHTML webpage and display that?
thanks for any tips, I really appreciate it.
Basically you have two options - to use iframes to wrap your content or to use DOM injection style.
IFRAMES
Iframes are the easy ones - the host site includes an iframe where the url includes all the nessessary params.
<p>Check out this cool webstore:</p>
<iframe src="http://yourdomain.com/store?site_id=123"></iframe>
But this comes with a cost - there's no easy way to resize the iframe considering the content. You're pretty much fixed with initial dimensions. You can come up with some kind of cross frame script that measures the size of the iframe contents and forwards it to the host site that resizes the iframe based on the numbers from the script. But this is really hacky.
DOM injection
Second approach is to "inject" your own HTML directly to the host page. Host site loads a <script> tag from your server and the script includes all the information to add HTML to the page. There's two approaches - first one is to generate all the HTML in your server and use document.write to inject it.
<p>Check out this cool webstore:</p>
<script src="http://yourdomain.com/store?site_id=123"></script>
And the script source would be something like
document.write('<h1>Amazing products</h1>');
document.write('<ul>');
document.write('<li>green car</li>');
document.write('<li>blue van</li>');
document.write('</ul>');
This replaces the original <script> tag with the HTML inside document.write calls and the injected HTML comes part of the original page - so no resizing etc problems like with iframes.
<p>Check out this cool webstore:</p>
<h1>Amazing products</h1>
<ul>
<li>green car</li>
<li>blue van</li>
</ul>
Another approach for the same thing would be separating to data from the HTML. Included script would consist of two parts - the drawing logic and the data in serialized form (ie. JSON). This gives a lot of flexibility for the script compared to the kind of stiffy document.write approach. Instead of outpurring HTML directly to the page, you generate the needed DOM nodes on the fly and attach it to a specific element.
<p>Check out this cool webstore:</p>
<div id="store"></div>
<script src="http://yourdomain.com/store_data?site_id=123"></script>
<script src="http://yourdomain.com/generate_store"></script>
The first script consists of the data and the second one the drawing logic.
var store_data = [
{title: "green car", id:1},
{title: "blue van", id:2}
];
The script would be something like this
var store_elm = document.getElementById("store");
for(var i=0; i< store_data.length; i++){
var link = document.createElement("a");
link.href = "http://yourdomain.com/?id=" + store_elmi[i].id;
link.innerHTML = store_elmi[i].title;
store_elm.appendChild(link);
}
Though a bit more complicated than document.write, this approach is the most flexible of them all.
Sending data
If you want to send some kind of data back to your server then you can use script injection (you can't use AJAX since the same origin policy but there's no restrictions on script injection). This consists of putting all the data to the script url (remember, IE has the limit of 4kB for the URL length) and server responding with needed data.
var message = "message to the server";
var header = document.getElementsByTagName('head')[0];
var script_tag = document.createElement("script");
var url = "http://yourserver.com/msg";
script_tag.src = url+"?msg="+encodeURIComponent(message)+"&callback=alert";
header.appendChild(script_tag);
Now your server gets the request with GET params msg=message to the server and callback=alert does something with it, and responds with
<?
$l = strlen($_GET["msg"]);
echo $_GET["callback"].'("request was $l chars");';
?>
Which would make up
alert("request was 21 chars");
If you change the alert for some kind of your own function then you can pass messages around between the webpage and the server.
I haven't done much with either Disqus or IntenseDebate, but I do know how I would approach making such a widget. The actual displaying portion of the widget would be generated with JavaScript. So, say you had a div tag with an id of commerce_store. Your JavaScript code would search the document, when it is first loaded (or when an ajax request alters the page), and find if a commerce_store div exists. Upon finding such a container, it will auto-generate all the necessary HTML. If you don't already know how to do this, you can google 'dynamically adding elements in javascript'. I recommend making a custom JavaScript library for your widget. It doesn't need to be anything too crazy. Something like this:
window.onload = init(){
widget.load();
}
var widget = function(){
this.load = function(){
//search for the commerce_store div
//get some data from the sql database
var dat = ajax('actions/getData.php',{type:'get',params:{page:123}});
//display HTML for data
for (var i in dat){
this.addDatNode(dat[i]);
}
}
this.addDatNode = function(stuff){
//make a node
var n = document.createElement('div');
//style the node, and add content
n.innerHTML = stuff;
document.getElementById('commerce_store').appendNode(n);
}
}
Of course, you'll need to set up some type of AJAX framework to get database info and things. But that shouldn't be too hard.
For Disqus and IntenseDebate, I believe the comment forms and everything are all just HTML (generated through JavaScript). The actual 'plugin' portion of the script would be a background framework of either ASP, PHP, SQL, etc. The simplest way to do this, would probably just be some PHP and SQL code. The SQL would be used to store all the comments or sales info into a database, and the PHP would be used to manipulate the data. Something like this:
function addSale(){ //MySQL code here };
function deleteSale(){ //MySQL code here };
function editSale(){ //MySQL code here };
//...
And your big PHP file would have all of the actions your widget would ever need to do (in regards to altering the database. But, even with this big PHP file, you'll still need someway of calling individual functions with your ajax framework. Look back at the actions/getData.php request of the example JavaScript framework. Actions, refers to a folder with a bunch of PHP files, one for each method. For example, addSale.php:
include("../db_connect.php");
db_connect();
//make sure the user is logged in
include("../authenticate.php");
authenticate();
//Get any data that AJAX sent to us
var dat = $_GET['sale_num'];
//Run the method
include("../PHP_methods.php");
addSale(dat);
The reason you would want separate files for the PHP_methods and run files, is because you could potentially have more than one PHP_methods files. You could have three method API's, one for displaying content, one for requesting content, and one for altering content. Once you start reusing your methods more and more, its best to have them all in one place. Rewritten once, rewritten everywhere.
So, really, that's all you'd need for the widget. Of course, you would want to have a install script that sets up the commerce database and all. But the actual widget would just be a folder with the aforementioned script files:
install.php: gets the database set up
JavaScript library: to load the HTML content and forms and conduct ajax requests
CSS file: for styling the HTML content and forms
db_connect: a generic php script used connect to the database
authenticate: a php script to check if a user is logged in; this could vary, depending on whether you have your own user system, or are using gravitars/facebook/twitter/etc.
PHP_methods: a big php file with all the database manipulation methods you'd need
actions folder: a bunch of individual php files that call the necessary PHP methods; call each of these php files with AJAX
In theory, all you'd have to do would be copy that folder over to any website, and run the install.php to get it set up. Any page you want the widget to run on, you would simply include the .js file, and it will do all the work.
Of course, that's just how I would set it up. I assume that changes in programming languages, or setup specifics will vary. But, the basic idea holds similar for most website plugins.
Oh, and one more thing. If you were intending to sell the widget, it would be extremely difficult to try and secure all of those scripts from redistribution. Your best bet would be to have the PHP files on your own server. The client would need to have their own db_connect.php, that connects to their own database and all. But, the actual ajax requests would need to refer to the files on your remote server. The request would need to send the url of the valid db_connect, with some type of password or something. Actually, come to think of it, I don't think its possible to do remote server file sharing. You'd have to research it a bit more, 'cuz I certainly don't know how you'd do it.
I like the Azmisov's solution, but it has some disadvantages.
Sites might not want your code on their servers. It'd be much better if you would switch from AJAX to loading scripts (eg. jQuery's getJSON)
So I suggest:
Include jquery hosted on google and a short jquery code from your domain to the client sites. Nothing more.
Write the script with cross-domain calls to your server (through getJSON or getScript) so that everything is fetched directly and nothing has to be inctalled on the client's server. See here for examples, I wouldn't write anything better here. Adding content to the page is easy enough with jQuery to allow me not mentioning it here :) Just one command.
Distribute easily by providing two lines of <script src= ... ></script>