Is there a method or is it even possible to get a products details by using a URL. Let's say I paste a URL of a product from a store like Walmart Or bestbuy, would it be possible to write something to retrieve the product info (price, name, info, etc..) does this exist? Or would this have to be something site specific that I can write for each specific store?
One solution I see is to parse the HTML code of the page the URL redirects to using for example Tika, but I'm not sure the e-commerce website in question will like that very much :) Maybe you could ask them if they have implemented an API to access their products data?
Yes, it is possible, but not using JavaScript due to same-origin-policy. You must send that URL to the server, read that external page on the server side and return results back to the server.
On the server side (in whichever language you are using) download the web page, parse it (using xml/xpath if you can) and extract relevant information.
As already noted watch out, some websites forbid such access (called web-scraping), other might actively try to prevent that, e.g. by discovering fake clients.
What you're talking about is website scraping and yes, it's possible and there are loads of tools out there to help you with it. Some websites aren't happy with you doing it though.
You could do it in C# using the HttpWebRequest class to request data from a url and then parse it with something like XmlReader or the http://html-agility-pack.net/
Related
I am trying to set up a simple set up as follows:
Have a mobile app with a page consisting of 4 lines (4 html paragraph lines (I am using phonegap)).
I want to use a web page from which I will input the data for those 4 lines. This information is sent to a server and that server transfers this information to that app on that mobile phone. Now, those 4 lines on the mobile phone is filled with the new information.
Similarly user inputs information on another page consisting of 10 lines of li (list). This information is again sent to the server and to the web page where the information is displayed.
I can almost feel the "internet police guys" getting all hyped and ready to vote this question down. But please understand that I have been on this site and various forums desperate to find a tutorial to guide me to do this and not able to find.
I am trying to use ajax to perform this setup. Confused how I would be using the php file. Information such as password n username is going to go in that php file to connect to the server. But php is a server side script thus needs to sit at the public_html folder. How do I use the php file from my desktop? Write a separate javascript to access it?
It is the concept that is confusing me. I am familiar with html,js,php.
I would appreciate any guidance or maybe a link to a tutorial which would help me to do the concept I mentioned. Thanks for listening.
You will need to create an API using PHP. This API is uploaded to your server and is considered "RESTful". Google a tutorial for what fits your needs. You can set all sorts of rules in this API such as requiring any requests to have an ID or access token.
Since you are using PhoneGap, your HTML and JS files rest on the device, so you will need to allow permissions to your API from anywhere. For this you will have to speak to your host provider about unless you know how to configure it yourself (some providers restrict what you want to do by default as an extra security precaution against XSS attacks).
Next, you can either use jQuery, or you can write some AJAX calls by writing the JavaScript yourself.
The most efficient way for this to work is to send JSON objects to and from the API. You will include a "command" in the JSON when you are sending from your app. On the PHP side, you will retrieve this command and use the rest of the data included in your JSON object to process the request. Your API will need to encode a JSON object for return (such as a user's profile information).
Here is a basic PHP API tutorial to get you going that explains some of the features of a RESTful API: PHP API
Here is a simple AJAX function (you will probably want to make this much more modular): AJAX
As broad as your question is, it seems like the best/easiest thing for you to do will be for you to first create a PHP webpage that will access a SQL database to perform the record updating. Actually, this should serve all of your needs for your mobile users assuming you don't need push notifications for live data updates.
I am assuming, since you are using phone gap, that you are more comfortable with web languages. After you get the webpage fully operational, then you should start building your app based on that exact same SQL database. With mobile app development there are a lot more "what if's" (what if the phone rings, what if the app is running in the background, what if there is no cellular service, etc...)
It is always easier to start with what you know and build on that, rather than starting with a new development platform and troubleshooting as problems arise.
In my app, I will provide my client a javascript plugin, which will collect some HTML data and send to my server. I wonder what's the best way to identify my client. Say someone copied the javascript and put into his website. A similar case is the live chat plugin.
Really your questions it is not very clear to me. I am monitoring it from the beginning, so as no one answers I can say the following:
1.- If your javascript plugin is to plug in websites, as a jquery plugin, then you don't be sure about nothing because the code can easily be modified to remove any security procedure.
2.- If your javascript plugin is to plug in browsers, as a FF addon. Well, indeed can be modified too, but in the most of cases you can track simply with cookies or a login procedure.
Said that I think that if the case is the first (plug in websites) you could identify the websites asking for a authentication token stored in the server's website (requested by AJAX) and add it to the HTML data that is send to your server.
Hopefully you can understand my Emglizch :) and do not say pure garbage.
I've got a problem. I'm working with a food supplier and I need save the content of each order as html. Orders are listed on a single page as links, but this has 2 difficulties
Page uses authentication (need to log me in in advance)
This is the real problem: the page use a lot of javascript. Actually everything works without changing the web address so I can't use wget or rio gem (url not like www.fooddoe.com/order, www.fooddoe.com/order/1, etc. but always like www.fooddoe.com/suplierx).
I think firewatir would be a good option but the problem is than I need to save the page in a format similar to html (including images). Is it possible using firewatir? Are there other options in clojure or javascript?
Thanks so much!!
I had to read your question twice to understand what you mean.
From web address from example I assume this is yours supplier web page. So IMHO the easiest way is:
Look into source of web page to get an idea how it gets the data (99% for some kind of AJAX request).
Request goes to the server which responds to it.
Now there are two ways:
Get idea how the request is made and write and app to make such request and generate web page with it (more difficult, more general)
Contact your supplier and get original database (simpler but one-time solution)
And I think that this is not the question specific to any language.
I am working on implementing a JavaScript web bug that will be inserted into our client's web pages. One of the features our clients would like, is a way to pass pieces of the HTML on their web pages to our server through the web bug. We are using JSONP and the server that is hosting the JavaScript web bug is different than the server hosting the we page. The basic idea is this:
var element = document.getElementById(id);
var html = element.innerHTML;
//Encodes HTML into GET request www.example.com/script?html=encodedhtml
var url = getSrcUrl(html);
document.write(unescape("%3Cscript src='" + url + "' type='text/javascript'%3E%3C/script%3E"));
The security problem is that anyone could make a get request to our server with arbitrary HTML that isn't from the web page that is hosting the web bug. Is there anyway to make this secure?
I know we could check HTTP headers for the referrer, but this can easily be forged. I saw some ideas where the server passed a unique token that had to be returned in the GET request, but it seems like this could be forged too.
My hunch is that what we're trying to do can't be done securely, but I wanted to throw this out to the community to see if there's something clever that can be done. Otherwise, I'm going to have to build a screen scraper that downloads the pages directly from our clients and extracts the relevant HTML for their page.
Thanks for any and all help!
EDIT
To be clear, our client's web pages are public-facing without security. In other words, any Internet user could visit the page and execute the JavaScript bug that submits the HTML fragment.
EDIT 2
An acceptable answer is "this is impossible"! If that is the case, and you give a good explanation of why, I will choose it as the accepted answer.
EDIT 3
What we are building is a kind of Google Analytics system for our clients. We are trying to track visits to unique "items" by each visitor and then automatically collecting information about that item via the HTML fragment. We will then insert information about the item on other pages by injecting the HTML fragment that we collected from the original item. We are trying to do all this without requiring our clients to install anything on their severs and by just including out JavaScript web bug in their HTML.
If you want to ensure something wasn't tampered with, it cannot go through the client unencrypted.
The only ways to do this securely are to:
As you suggest, retrieve the appropriate page server-side
or
Encrypt/sign the HTML before is goes to the client using a key unknown to them, so that the client cannot modify it
Assuming you can get your client's web server to md5 something for you, this seems like a good place to use an md5-hashed signature. Essentially, the client server determines which information it would like to send you, concatenates it all into a string, concatenates that with a secret key, and then md5's the whole thing, and passes the result along with all the rest of its input.
On your server, you take all of the input except that signature, concatenate it together, concatenate the secret key onto that, and md5 it. If it matches the signature, you know it's valid input.
Unfortunately, it looks like you're determining the HTML to send on the client (browser) side. Due to the fact that JavaScript is plainly visible for all to see, you can't really use a secret string.
So, unless it's possible to move that kind of processing to the server side, I think you're out of luck.
I have signed up(paid) for Google site search. They have me a url of a sort of web service where I can send a query to it, it searches my site, and it returns XML of the search results. Well I am trying to load this XML via Ajax from a page on my site but I cannot. I can load from any of my pages on my domain so I am assuming it is because of the XML being on Google's domain. So there has got to be a way to load it though, I don't think they would have given me the URL if I couldn't do anything with it lol. Does anyone know how to do this?
Thanks!
UPDATE:
this is what the page says on google that gave me the XML:
How to get XML
You can get XML results for your
search engine by replacing query+terms
with your search query in this URL:
http://www.google.com/cse?cx=MY_UNIQUE_KEY&client=google-csbe&output=xml_no_dtd&q=query+terms
Where MY_UNIQUE_KEY = my unique key.
You can't load external files with AJAX. However, you can set up a file on your own server that makes the content available on your server. For instance in PHP, you could write a file googlexml.php:
<?php
#readfile("http://www.google.com/cse?cx=MY_UNIQUE_KEY&client=googlecsbe&output=xml_no_dtd&q=query+terms");
?>
And then you could access that with AJAX. I'm not sure if Google's terms of use will let you do that, but if they do, then this is an option.
Does google not offer the ability to forward a DNS address to the IP of your service, folding it into your domain? This way you can do in AJAX
googleAlias.mydomain.com
Google should support this, but I don't know for sure. I imagine they would in the same way they do with GMail and external-domain mail.
Removes your cross-domain javascript issues
edit I expanded below and another user helpfully pointed out this should work (thanks Stobor)
Well, to get my company mail into GMail, if I recall, I needed to change the MX record on my DNS to point to a google IP. You may be able, if google supports it, to add an A record to your domain so an AJAX request to foo.yourdomain.com is the same as search.google.com or whatever. Google needs to recognize requests from your hostname in the A record and say "Oh yes, that's me, on my client's behalf"
For those coming across this now, the AJAX Search API may be what you want: http://code.google.com/apis/ajaxsearch/documentation/
EDIT: Actually, upon further review, that may not hook in with the site search...