I'm trying to access data from a government website, designed for "point-and-click" downloads. My objective is to find out what the pattern to get to CSVs, and then create an easy API for other people to get to that data. The website is supposed to be open data, but is quite obscure as to how to get data programatically.
I have however failed to figure out what the pattern is to find a URL to the CSVs, because they seem to be hidden behind some JavaScript.
An example of a page is this one, and I want to know what the link behind the png image on the page.
How can I programatically get to the links behind this button?
How can I get to the links behind this button?
Investigate your web browser's "web developer" features. There should be a way to get the browser to log the full URLs for all of the requests that it is making.
Then reverse engineer the pattern from the examples. (This may or may not be possible. But if it is not possible, you should let the people who designed the site that it is unfriendly to people trying to use it ... programatically.)
How can I programatically get to the links behind this button?
Different question. Here are some possible options:
Use a web-scraping framework that understands how to execute Javascript as well.
Use a web testing framework like Selenium
There is a "headless browser" framework called Phantom.JS that may help.
Note that it is a lot more complicated to do this programatically. If reverse engineering is possible, that would be simpler.
Related
I am creating a HTML5 website and I need to create a site search box that
displays results in a results page with description and photo.
How would I go about this.
I have looked alot and only see google search and thats not what im after.
Can this be done without PHP or RAILS?
Looking for purely JS and html5 and css and jquery.
Thanks and a point i the correct direction would be great.
Example is this Wordpress sites search http://agroamerica.com/
I dont want to use WP but hand code it.
Any help is great.
Your best bet, given that you don't want to implement a third party indexing service, would be to set an indexing function on your server's back end to handle search requests. You mentioned Rails, and there are some pretty great gems for this.
One point of trouble you will have with this question is that, in my experience, full site search functionality without a back end / database to query is not a very useful solution for any applications I've seen.
However, given that you want to keep it JS, you might look into the MEAN stack (MongoDB, Express.js, Angular.js, Node.js) which does some pretty sweet things like two-way data binding. It's a pure Javascript solution (albeit not a purely-front end solution).
Honestly, it sounds like you might be taking too big of bites to start off with. Try working through a scripting language on a site like Code Academy and learning about basic web application setups like MVC (a common way to handle different parts of a web application (used by the aforementioned Rails)). Stack Overflow users can be pretty brutal when you ask questions about advanced functionality without some understanding of the functionality's underlying elements or functional requirements, and search engines from the ground up have historically been the thing of doctoral dissertations.
Good luck!
I want to be able to open an Excel sheet using Office Web Viewer, HTML viewing component. (This seems to also be called the "Office Web Apps Viewer".) The viewer component is a really excellent HTML office document rendering engine, but I can find absolutely no documentation online for it. It's even hard to discern what its correct name is. (Does anyone have details on the API?)
I need to be able to load the document and immediately call a JavaScript function to do a search so that the document opens with the search result already highlighted. Even better would be to be able to set the search term in the query string given to the viewer component.
Does anyone know if this is possible and if so, how?
I can say with quite some certainty that there is no such API as the viewer is not intended for such usage. Yeah, I know, not the answer you wished to hear, but then again, that's just the way it is. It's just a viewer, not a component to be used as part of applications.
The quick & dirty solution: Using an extension
One way to achieve what you want is to write an extension that would expose this functionality to you. Of course this would require your users to install this extension, but it is definitely an option and such an extension would be relatively simple to write.
The better solution: PDF.js
Convert your documents to PDF's using some server side solution.
On Linux with OpenOffice.org this could for example look like oowriter -convert-to pdf:writer_pdf_Export doc_file.doc or swriter for LibreOffice.
Present them using Mozilla's cross browser PDF.js library.
Figure out the PDFFindController how to trigger the highlighting. Take a look at this demo and next at this source file. In there they are definitely triggering the highlighting (search for the object I named before), but as it is not directly addressing this question I am not going to figure that one out for you.
I am creating a web app, part of which requires people add "experiences" to a list. These might include foods to eat, places to go, sights to see, etc. I want people to be able to add their own event to a list and have an image associated with them.
I find that when people have to locate their own image from the web, download it, and then upload it to another site, it's too much effort and they won't do it. Therefore I think people should be able to choose an image from a Google search within my app, and select and image which the app then uses as the image for that "experience". I know this can be done using the Google search API, and you can specify only public domain images, so that there's no legal ramifications of using images from Google.
I imagine this is quite a common thing for websites to do. I thought I would be able to find a Javascript library that launches a lightbox over the top of a page and allows a search and selection of an image, but I can't seem to find one. Does anyone know of any such library, or anything similar?
Alternatively, is there some obvious way around this issue that someone has seen? What is a good way to avoid the "upload image" process for a website, but still have image-enriched content? I'm sure there's something obvious I'm overlooking here.
Thanks!
The now deprecated Image API was one starting point to do it, however you still have to implement the lightbox ... https://developers.google.com/image-search/v1/
Be careful when you are looking at using the image search results! apart from (possible) copyright issues also make sure that you have permission from Google to do automated requests against them.
I am building a completely ajax web app (this is the first web app I have ever created). I am not exactly sure if I am going about it the right way. Any suggestions or places where I can go to find suggestions?
Update:
I currently am using jQuery. I am working on fully learning that. I have designed a UI almost completely. I am struggling in some parts trying to balance a good UX, good design and fitting all the options I want to fit in it.
I have started with the design. I am currently struggling with whether to use absolute positioning or not and if not how do I use float etc. to do the same type of thing. I am trying to make it have a liquid layout (I hate fixed-layout pages) and am trying to figure out what I should use to make it look the same in most screen sizes.
Understand JavaScript. Know what a closure is, how JavaScript's event handling works, how JavaScript interacts with the DOM (beyond simply using jQuery), prototypal inheritance, and other things. It will help you when your code doesn't work and you need to fix it.
Maintain usability. All the AJAX magic you add is useless if users cannot figure out how to use it. Keep things simple, don't overload the user by giving him information he doesn't need to know (hide less important information, allowing the user to click a link to show it), and if possible, test your app with actual users to make sure that the interface is intuitive to them.
Code securely. Do not allow your server to get hacked. There are many different types of security flaws in web apps, including cross-site scripting (XSS), cross-site request forgery (CSRF), and SQL injection. You need to be well aware of these and other pitfalls and how to avoid them.
One starting point is to look at the Javascript Libraries and decide which one to use:
http://code.google.com/apis/libraries/
http://en.wikipedia.org/wiki/Comparison_of_JavaScript_frameworks
You probably don't want to do raw Javascript code without any library. Once you decide on a library to use, then you can look at its documentations online or the books about using them. jQuery does have pretty good documentation.
Define "right way."
There are many "right ways" to code an app.
Things to keep in mind are trying to design a nice interface. The interface can make or break an application and studies show that it can even make it seem faster if you do it right. jQuery is good for this.
Another thing to consider going in is what browsers do you want to support? Firefox is really doing well and Google Chrome's market share is growing so you will want so support those for sure. IE is a tough one as it doesn't have the best support for standards, but if you are selling a product you will really want this.
One of the best articles that I've ever come across about the structure of an ajax web application is this one. A little outdated because it refers to XML as the primary data-interchange format, now JSON. jQuery, a javascript framework, contains excellent functionality for both DOM manipulation and AJAX calls. Both are a must in any AJAX-driven web app.
I'm connecting to a web site, logging in.
The website redirects me to new pages and Mechanize deals with all cookie and redirection jobs, but, I can't get the last page. I used Firebug and did same job again and saw that there are two more pages I had to pass with Mechanize.
I took a quick look at the pages and saw that there is some JavaScript and HTML code but couldn't understand it because it doesn't look like normal page code. What are those pages for? How they can redirect to other pages? What should I do to pass these?
If you need to handle pages with Javascript, try WATIR or Selenium - those drive a real web browser, and can thus handle any Javascript. WATIR Classic requires either IE or Firefox with a certain extension installed, and you will see the pages flash on the screen as it works.
Your other option would be understanding what the Javascript on the offending page does and bypassing it manually, but that seems onerous.
At present, Mechanize doesn't handle JavaScript. There's talk of eventually merging Johnson's capabilities into Mechanize, but until that happens, you have two options:
Figure out the JavaScript well enough to understand how to traverse those pages.
Automate an actual browser that does understand JavaScript using Watir.
what are those pages for? how they can redirect to other pages. what should i do to pass these?
Sometimes work is done on those pages. Sometimes the JavaScript is there to prevent automated access like what you're trying to do :). A lot of websites have unnecessary checks to make sure you have a "good" browser, so make sure that your user_agent is set to something common, like IE. Sometimes setting the user_agent to look like an old browser will let you get past without JavaScript.
Website automation is fun because you have to outsmart the website and its software developers, using multiple strategies. Like the others said, Watir is the best tool for getting past JavaScript at the moment.