How to read text from a website using Javascript? - javascript

I am writing an HTML code for my cell phone. Its essentially a Dictionary App where I input a word and I look up the meaning from say, Dictionary.com ;
I am using Javascript to get the string but can I embed the word into the URL, "http://dictionary.reference.com/browse/"
Is there any way of doing this ?

You can embed the word into the url by doing this:
var lookup_url = "http://dictionary.reference.com/browse/" + your_word
From there, it depends on how you want to load the website. Unfortunately, there is no way to remotely query the website from afar in JavaScript. You're going to have to use some other tool you have at your disposal. Until then, maybe you can just do window.location = lookup_url ?

Due to same origin policy restrictions this is not possible with pure javascript. If the third party web site provides an API that supports JSONP then this could work but not in the general case.

Related

How do I access the console of the website that I want to extract data from?

Sorry for the confusing title. I am a beginner in JavaScript and would like to build this little project to increase my skill level: an image extractor. The user is able to input the website name into the form input. Press Extract and the links of all images show up.
Question: how do I access the website DOM that was entered into the input field?
As mentioned by #Quentin in the comments, browsers enforce restrictions on cross-domain requests like this. The Same Origin policy will prevent your site from pulling the HTML source of a page on a different domain.
Since this is a learning exercise, I'd recommend picking another task that doesn't get into the weeds of cross-origin request security issues. Alternatively, you could implement a "scraper" like this out of the browser using Node (JavaScript), Python, PHP, Ruby, or many other scripting languages.
You could try something like this if you already have the html content:
var html = document.createElement('html');
html.innerHTML = "<html><body><div><img src='image-url.png'></div></body></html>";
console.log(html.querySelector("img").src);
If you also need to get the content via ajax calls, I would suggest doing your entire code server side, using something like scrapy

Parsing in JavaScript for chrome extension

I have a chrome extension that extracts all short url with the form e.g. ini/ini#8012 from any page using a regex.
var regex = /[\w]+.[\w]+#(?:\d*\.)?\d+/g;
What I want to do is to make that short url into a clickable link in my popup window, and parse it into my web app, so clicking any short url in the list would take you to a web app. The web app url is like this
http://192.101.21.1889:8000/links/?user_repo=ini%2Fini&artID=8012&tc=4&tm=years&rows=5&submit=
The user_repo, and ID characters are from the extracted short url. First of all, is this possible? And if it is can anyone point me in the right direction as to what to do?
You can use Content Scripts and inject JavaScript, then perform whatever you want to do.
Readings:
Content Scripts

Looking for a way to scrape HTML with JS

As the title suggests, I'm looking for a hopefully straightforward way of scraping all of the HTML from a webpage. Storing it in a string perhaps, and then navigating through that string to pull out the desired element.
Specifically, I want to scrape my twitter page and display my profile picture inside a new div. I know there are several tools for doing just this, but I would anyone have some code examples or suggestions for how I might do this myself?
Thanks a lot
UPDATE
After a very helpful response from T.J. Crowder I did some more searching online and found this resource.
In theory, this is easy. You just do an ajax call to get the text of the page, then use jQuery to turn that into a disconnected DOM, and then use all the usual jQuery tools to find and extract what you need.
$.ajax({
url: "http://example.com/some/path",
success: function(html) {
var tree = $(html);
var imgsrc = tree.find("img.some-class").attr("src");
if (imgsrc) {
// ...add the image to your page
}
}
});
But (and it's a big one) it's not likely to work, because of the Same Origin Policy, which prevents cross-origin ajax calls. Certain individual sites may have an open CORS policy, but most won't, and of course supporting CORS on IE8 and IE9 requires an extra jQuery plug-in.
So to do this with sites that don't allow your origin via CORS, there must be a server involved. It can be your server and you can grab the text of the page you want using server-side code and then send it to your page via ajax (or just build the bits you want into your page when you first render it). All of the usual server-side stacks (PHP, Node, ASP.Net, JVM, ...) have the ability to grab web pages. Or, in some cases, you may be able to use YQL as a cross-domain proxy, using their server rather than your own.

Fetching the title of a website

I browsed through SO, but what I found were examples "how to manipulate a piece of html".
But in my case, I want to fetch a HTML file by a given URL and just parse the websites title not the whole file.
Is there any way to do this with jQuery or any jQuery like framework?
regards,
The only way is to use a server side proxy which makes the web request and parses out the title which you can return to your page.
See a php example here
For python try the urllib
2911930 looks to have the answer
$.get("yoururl", function(response){
var theTitle = (/<title>(.*?)<\/title>/m).exec(response)[1];
alert(theTitle);
});
edit As pointed out by the commenters, you'll be restricted by SOP to pages only within your domain. And, generally, parsing HTML with regular expressions is a Bad Thing, but not too bad in this case.

Cross Domain Javascript Bookmarklet

I've been at this for several days and searches including here haven't give me any solutions yet.
I am creating a Bookmarklet which is to interact with a POST API. I've gotten most of it working except the most important part; the sending of data from the iframe (I know horrible! If anyone knows a better solution please let me know) to the javascript on my domain (same domain as API so the communication with the API is no problem).
From the page the user clicks on the bookmarklet I need to get the following data to the javascript that is included in the iFrame.
var title = pageData[0].title;
var address = pageData[0].address;
var lastmodified = pageData[0].lastmodified;
var referralurl = pageData[0].referralurl;
I first fixed it with parsing this data as JSON and sending it through the name="" attribute of the iFrame but realized on about 20% of webpages this breaks. I get an access denied; also it's not a very pretty method.
Does anyone have anyidea on how I can solve this. I am not looking to use POSTS that redirect I want it all to be AJAX and as unobtrusive as possible. It's also worth noting I use the jQuery library.
Thank you very much,
Ice
You should look into easyXDM, it's very easy to use. Check out one of the examples on http://consumer.easyxdm.net/current/example/methods.html
After a lot of work I was able to find a solution using JSONP which is enables Cross Domain Javascript. It's very tricky with the Codeigniter Framework because passing data allong the URLs requires a lot of encoding and making sure you dont have illegal characters. Also I'm still looking to see how secure it really is.
If I understand your question correctly, you might have some success by looking into using a Script Tag proxy. This is the standard way to do cross domain AJAX in javascript frameworks like jquery and extjs.
See Jquery AJAX Documentation
If you need to pass data to the iframe, and the iframe is actually including another page, but that other page is on the same domain (a lot of assumptions, I know).
Then the man page code can do this:
DATA_FOR_IFRAME = ({'whatever': 'stuff'});
Then the code on the page included by the iframe can do this:
window.parent.DATA_FOR_IFRAME;
to get at the data :)

Categories

Resources