Crawling a website to extract data [closed] - javascript

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
There is a website which contains information we have paid for access to, however the only way available to access the information is through the website and there are 1400 records. So, since there is so much of it, we want to have the information in an excel spreadsheet which is manageable. However, the organization in charge of the website isn't willing to help.
I can write a python script that can parse the html and extract the relevant data, however, the problem is that the site is not easily crawlable because it is an ASP site and many of the "links" are in fact triggers to javascript which loads the destination page. This means that a tool like HTTrack doesn't really work.
Are there any other tools or python modules which can help me do this (bearing in mind the "javascript" links)? I'm totally new to this kind of thing, so I just have no experience of what kinds of things are available to me.

Jython + HtmlUnit may be very usefull in your task.

You can use Scrapy, which is a framework for scraping websites.

Related

Displaying spreadsheet data as plain text in HTML? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 days ago.
Improve this question
I was asked to rework our canteen's website to be more user-friendly to operate for the chef and other staff, who are unfamiliar with coding and content management systems. More specifically, we have a menu page where the meals for the next 3 weeks are displayed, and I was asked to make it easier to update said page.
I figured that the simplest approach would be to create a spreadsheet in Google Sheets or Excel and to pull the data - so the names of meals and some dates - into the HTML page from there, so that whenever the staff needs to update the information, they can just edit the spreadsheet instead of bothering with the backend of the website.
My first approach was to try and connect the HTML to a Google Sheet using Google Apps Script, but this proved difficult since I don't actually know any JavaScript and expected the whole ordeal to be simpler.
I tried following some tutorials I found online, but they a) were mostly for extracting entire tables, not just singular pieces of data and b) they didn't work. I thought about using an actual database program in the hopes that would help, but I don't have access to Microsoft Access at the moment.

display preview of webpage [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Tried searching for this and couldn't find much. Is there a package jQuery or otherwise that can present a small preview of a linked website.
I'd like to be able to more or less display a small feed of linked content for each sub niche on a social platform I'm creating. As an example let's say I wanted to link to a news website and display a live feed of updates in a niche of say tech updates for example almost akin to a live twitter feed plugin.
This would probably kill performance. Nonetheless, thought it was a cool idea and wondering if there's anything packages out there I can try this out with.
Well, if that news website provides an API to fetch news, you can keep making AJAX requests to their API URLs at regular intervals to retrieve latest news.
But if that website doesn't provide any API, make AJAX requests to your server, then at your server do some web-scraping and return the news to the AJAX request.
A solution is using HTML iFrame.
I would not recommend it though - loading your website would take a long time (because X iFrames needs to load as well).

How can I check how many visitors are visiting my website? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I've a software developer having done most of my work in older technology stacks, and recently mobile iOS development. However I really know nothing about web other than a bit of HTML and interacting with web API's.
I recently purchased an out of the box template website to serve as a launching page for an application of mine. I would like to find out how many people are visiting my landing page. What's the best way to do this? Just go out and find some javascript applet that will do this for me? Where is the data/running-count stored? Or should my hosting provider (namecheap) provide this information to me automatically?
Install some sort of analytics script. A good one is Google analytics.
Its common for webhosting providers to have some sort of analytics engine running like awstats or webanalyzer.
Also you could create a free account in Google Analytics and follow the instruction to add a snippet of code to your page to count your visits.
Now if you also want to show a visit counter, the best alternative is to make it with some server side code and a database (or a file). It is relatively easy and if you post your server infrastructure (php, asp, mysql,postgres, etc) i could expand this answer with more help.

Faceted search on client side [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Can you suggest a basic faceted search library use at client (browser) side.
I quickly looked through exhibit3, but looks heavy for my usage. It mentioned somewhere that exhibit3 can be used on client side alone, but the setup mentioned a backstage project in java.
http://www.simile-widgets.org/exhibit3/examples/nobelists/nobelists.html
Above is an example usage I am looking for (only the category search, no time line view, etc). If it can group data and allow faceted search from json file (independent of backend), that will suit my purpose.
EDIT:
Found a good walk through of one of the exhibit example here.
It pulls lot of dependencies, and wish there a minified version (ready to use :-) ). Anyone uses this for client only solution for relatively medium size data.
facetedsearch.js looks like it would be appropriate. Find it here: http://eikes.github.com/facetedsearch/ or skip the fancy site and go straight to the code: https://github.com/eikes/facetedsearch/

Looking for a killer javascript/website monitoring/debug tool [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I'm looking for a good way of monitoring and logging full exchange between website and the internet.
I know that this can be done with firebug. The main problem I have with it is, that it cannot persist log data over reloads.
Also, if would be very cool to have an ability to log javascript activity and filter the log by the action I'd like to debug (e.g. setting a cookie, change of attribute etc)
Does anyone know something that fits at least some of those requirements, and on the rest performs at least as well as firebug does?
You might want to check out one of the many JavaScript logging utilities, like log4js. You could use it and or something like PantomJS to build automated testing / monitoring of your web applications. The discussion around JavaScript logging on StackOverflow is pretty good too. Check out these questions:
https://stackoverflow.com/questions/1423267/are-there-any-logging-frameworks-for-javascript
A JavaScript frontend logging system that logs to our backend?
You could try fiddler: http://www.fiddler2.com/fiddler2/

Categories

Resources