Fetching the title of a website - javascript

I browsed through SO, but what I found were examples "how to manipulate a piece of html".
But in my case, I want to fetch a HTML file by a given URL and just parse the websites title not the whole file.
Is there any way to do this with jQuery or any jQuery like framework?
regards,

The only way is to use a server side proxy which makes the web request and parses out the title which you can return to your page.
See a php example here
For python try the urllib

2911930 looks to have the answer
$.get("yoururl", function(response){
var theTitle = (/<title>(.*?)<\/title>/m).exec(response)[1];
alert(theTitle);
});
edit As pointed out by the commenters, you'll be restricted by SOP to pages only within your domain. And, generally, parsing HTML with regular expressions is a Bad Thing, but not too bad in this case.

Related

How do I access the console of the website that I want to extract data from?

Sorry for the confusing title. I am a beginner in JavaScript and would like to build this little project to increase my skill level: an image extractor. The user is able to input the website name into the form input. Press Extract and the links of all images show up.
Question: how do I access the website DOM that was entered into the input field?
As mentioned by #Quentin in the comments, browsers enforce restrictions on cross-domain requests like this. The Same Origin policy will prevent your site from pulling the HTML source of a page on a different domain.
Since this is a learning exercise, I'd recommend picking another task that doesn't get into the weeds of cross-origin request security issues. Alternatively, you could implement a "scraper" like this out of the browser using Node (JavaScript), Python, PHP, Ruby, or many other scripting languages.
You could try something like this if you already have the html content:
var html = document.createElement('html');
html.innerHTML = "<html><body><div><img src='image-url.png'></div></body></html>";
console.log(html.querySelector("img").src);
If you also need to get the content via ajax calls, I would suggest doing your entire code server side, using something like scrapy

PHP HttpRequest to create a web page - how to handle long response times?

I am currently using javascript and XMLHttpRequest on a static html page to create a view of a record in Zotero. This works nicely except for one thing: The page html title.
I can of course also change the <title>...</title> tag, but if someone wants to post the view to for example facebook the static title on the web page will be shown there.
I can't think of any way to fix this with just a static page with javascript. I believe I need a dynamically created page from a server that does something similar to XMLHttpRequest.
For PHP there is HTTPRequest. Now to the problem. In the javascript version I can use asynchronous calls. With PHP I think I need synchronous calls. Is that something to worry about?
Is there perhaps some other way to handle this that I am not aware of?
UPDATE: It looks like those trying to answer are not at all familiar with Zotero. I should have been more clear. Zotero is a reference db located at http://zotero.org/. It has an API that can be used through XMLHttpRequest (which is what I said above).
Now I can not use that in my scenario which I described above. So I want to call the Zotero server from my server instead. (Through PHP or something else.)
(If you are not familiar with the concepts it might be hard to understand and answer the question. Of course.)
UPDATE 2: For those interested in how Facebook scraps an URL you post there, please test here: https://developers.facebook.com/tools/debug
As you can see by testing there no javascript is run.
Sorry, im not sure if i understand what you are trying to ask, are you just wanting to change the pages title?
Why not use javascript?
document.title = newTitle
Facebook expects the title (or opengraph :title tags) to be present when it fetches the page. It won't execyte any JavaScript for you to fill in the blanks.
A cool workaround would be to detect the Facebook scraper with PHP by parsing the User Agent string, and serving a version of the page with the information already filled in by PHP instead of JavaScript.
As far as I know, the Facebook scraper uses this header for User Agent: "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
You can check to see if part of that string is present in the header and load the page accordingly.
if (strpos($_SERVER['HTTP_USER_AGENT'], 'facebookexternalhit') !== false)
{
//synchronously load the title and opengraph tags here.
}
else
{
//load the page normally
}

Run/inject javascript on page to get the html and post it to a URL

Before I used to just go to "View source" in the browser and grap all the html and post it into a form on my page. But after there have been inplemented delayed loading with ajax of some of the content I can't do this anymore.
It was not a problem doing it the old way ... but this does not work any more, since I'm missing important information.
Is it possible to somehow run a javascript in the browser, like from a bookmark shortcut or something like that. So I can grep all the html(or better yet, now filter some of the data) and then post it back to my site?
I have no idea what this is called or if its even possible.
I guess a browser extension could do this, but making for all browsers would be a pain, if this could be done with javascript.
All ideas are welcome.
If you are using jquery, you could just use ajax and send the html of the body (or whatever area of the page you want) to your server.
$.post('url-to-send.ext', {data:$(body).html()});
So, after alot of searching ... I fianlly found the answer to my own question.
Bookmarklets: http://en.wikipedia.org/wiki/Bookmarklet
Which as descripbed here: http://www.learningjquery.com/2006/12/jquerify-bookmarklet let you inject jquery on the site:
Create the following as a bookmark:
var s=document.createElement('script');
s.setAttribute('src','https://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js');
document.getElementsByTagName('body')[0].appendChild(s);
Now it just extending it and fetch the information I need. Neat little trick I would say.

How to read text from a website using Javascript?

I am writing an HTML code for my cell phone. Its essentially a Dictionary App where I input a word and I look up the meaning from say, Dictionary.com ;
I am using Javascript to get the string but can I embed the word into the URL, "http://dictionary.reference.com/browse/"
Is there any way of doing this ?
You can embed the word into the url by doing this:
var lookup_url = "http://dictionary.reference.com/browse/" + your_word
From there, it depends on how you want to load the website. Unfortunately, there is no way to remotely query the website from afar in JavaScript. You're going to have to use some other tool you have at your disposal. Until then, maybe you can just do window.location = lookup_url ?
Due to same origin policy restrictions this is not possible with pure javascript. If the third party web site provides an API that supports JSONP then this could work but not in the general case.

Cross Domain Javascript Bookmarklet

I've been at this for several days and searches including here haven't give me any solutions yet.
I am creating a Bookmarklet which is to interact with a POST API. I've gotten most of it working except the most important part; the sending of data from the iframe (I know horrible! If anyone knows a better solution please let me know) to the javascript on my domain (same domain as API so the communication with the API is no problem).
From the page the user clicks on the bookmarklet I need to get the following data to the javascript that is included in the iFrame.
var title = pageData[0].title;
var address = pageData[0].address;
var lastmodified = pageData[0].lastmodified;
var referralurl = pageData[0].referralurl;
I first fixed it with parsing this data as JSON and sending it through the name="" attribute of the iFrame but realized on about 20% of webpages this breaks. I get an access denied; also it's not a very pretty method.
Does anyone have anyidea on how I can solve this. I am not looking to use POSTS that redirect I want it all to be AJAX and as unobtrusive as possible. It's also worth noting I use the jQuery library.
Thank you very much,
Ice
You should look into easyXDM, it's very easy to use. Check out one of the examples on http://consumer.easyxdm.net/current/example/methods.html
After a lot of work I was able to find a solution using JSONP which is enables Cross Domain Javascript. It's very tricky with the Codeigniter Framework because passing data allong the URLs requires a lot of encoding and making sure you dont have illegal characters. Also I'm still looking to see how secure it really is.
If I understand your question correctly, you might have some success by looking into using a Script Tag proxy. This is the standard way to do cross domain AJAX in javascript frameworks like jquery and extjs.
See Jquery AJAX Documentation
If you need to pass data to the iframe, and the iframe is actually including another page, but that other page is on the same domain (a lot of assumptions, I know).
Then the man page code can do this:
DATA_FOR_IFRAME = ({'whatever': 'stuff'});
Then the code on the page included by the iframe can do this:
window.parent.DATA_FOR_IFRAME;
to get at the data :)

Categories

Resources