Any good javascript library to allow crawling of website [closed] - javascript

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I need a javascript library to crawl a web application. I found this https://github.com/riccardo-forina/status-jquery-crawler but as the author claims , this is in early stage of development. I could not find anything after a lot of googling
Thanks for any inputs

Javascript has many utilities you can use.
The biggest question when choosing your tool is, "does my site use Javascript to load the content I want?". For example, Google's search page is almost all contained in the HTML they send in response to an HTTP GET request.
Other sites may load comments, notifications, or pictures that aren't contained in the HTML initially using Javascript. This means that if you just said, give me the HTML for Site A, the page you'd get back wouldn't be missing much of the content you wanted.
Static Sites
For most sites where what you want is in the HTML, there are some excellent node.js scraping libraries at your disposal:
x-ray - a neat package that bundles up cheerio inside a declarative scrape object. Provides some simple structure with which to build robust scrapes.
cheerio + request - this is a popular combination, using cheerio to parse the HTML and request to get it for you. You'll find lots of resources explaining the basics of requesting web-pages, extracting the HTML, and even adding authentication and maintaining sessions where required using these tools.
artoo.js - in browser scraping utility. Extremely useful for prototyping, and one-off scrapes. You can add it as a bookmarklet and run it in your browser developer's console. It allows jQuery like selectors and has some basic following logic.
Dynamic Sites
If you need a browser like environment to get content from your site, you'll want to check out headless web browsing and drivers in node.js. PhantomJS is the most popular, but there are many others. Be warned - to use PhantomJS with other Javascript libraries you'll need to find a node.js driver:
Nightmare - a node library that talks to PhantomJS and simplifies basic web-page workflow and scraping.
SpookyJS - a node library for CasperJS, a tool built on top of PhantomJS that is also a separate package.
PhantomJS-Node - the most popular PhantomJS driver for node.
(Sorry for the lack of links - I don't have enough reputation to post more than 2 right now)

PhantomJs is one of the Javascript based headless webkit, so you could use it for crawling. There is something new wrapper came up on top PhantomJS called Nightmare Js http://www.nightmarejs.org/.

Related

How to serve HTML + JS + CSS webapp from my machine so that others can view it? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm a newbie developing a small web application that uses HTML, JavaScript (Angular 1.5.0) & CSS. I am using Grunt to process, jslint, minify my javascript and CSS files. This front-end communicates via HTTP Rest calls to a Django application that is also running locally (which has its own database)
This webapp currently lives only on my laptop (MacBook Pro) and I use the PyCharm IDE to edit the files. When I want to test out the app, I simply go to http://localhost:63342/myapp/index.html#. PyCharm's built-in webserver serves it up for me and I can play with it there.
However, I want to allow a select few people to also access the webapp from other locations on the internet. When they try to access http://MyPublicIpAddress:63342/myapp/index.html, they get connection denied.
What is the quickest/easiest/simplest way I can share my webapp with those other people? I do not know much about setting up and configuring Webservers, so if you can give me the simple/easy instructions (or point me to a doc!) that would be most appreciated.
I posted this question to the PyCharm community forum here, but got no response.
Edit
Many answers say I need hosting service. Yes, If I want to deploy my website to a fixed IP address. But is there no way to simply allow them to briefly visit my webapp while temporarily running a toy web-server on my laptop? This is not a long-term solution I understand. But just to give them a peek. If possible I would like to avoid the effort and learning-curve involved in pushing it to a hosting service. I would have to setup the back-end API, database, etc (which are all currently running locally)
There's many services that allow you to host your project online.
For small projects
CodePen: http://codepen.io/
Plunker: http://plnkr.co/
kodeWeave: http://kodeweave.sourceforge.net/
For large projects
Cloud9IDE: https://c9.io/
Koding: http://koding.com/
Github: https://pages.github.com/
Sourceforge: https://sourceforge.net/
Heroku: https://www.heroku.com/
BTW: kodeWeave is my project. It uses Github Gists to save and retrieve your weaves online, thus is not actually saved on the site plus it's a very reliable host when it comes to small projects like it is. (Inspiration from Dabblet.)
It's being made kind of as a JSFiddle alternative for mobile devices, except without all the http requests.
It has many libraries built in (Such as JQuery, Angular, Font Awesome, etc:) in addition when you export as a zip file you will get all those libraries (Hence the except without all the http requests comment). You can also export your weave as a Windows, Linux, Mac, Chrome Application, and/or as a Chrome popup extension.
You can watch this video I made that explains how to use kodeWeave for desktop exportation.
I've listed services I use and recommend. I will not list something I haven't tried without warning.
If you have a spare laptop you can use that as a web server. I've never done it myself because it's not worth the this for me. However something you may want to look into
Lastly you can read Creating a Local Server Configuration with PyCharm which maybe the option you're looking for.
Use localtunnel to expose your localhost- https://github.com/localtunnel/localtunnel
You need hosting, or try codepen.io for small project.
Change the configuration in PyCharm to host at 0.0.0.0. You will also need to port forward your router... I would strongly suggest not using this as any sort of long, medium or short term solution.

Online etl for api mashup, filtering and ordering [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm looking for a free web-based solutions for little etl / mashup tasks.
An example could be:
connect to an api
filter response
using data as input to another api
It's something similar to now not working yahoo pipes and for me is important to have and interface for designers with little code ability (mostly javascript)
Note: I've found this paper with a lot of ideas on this field and some comparison between existing products
Pre warning - this is not a free solution - I did a lot of work around this about a year or so ago, and the free stuff at the time just would not do what I needed.
In the end I used Dell Boomi - now I know what you are thinking - Dell? that sounds horrendous, the manufacturer of crap laptops you say! Why yes….
Boomi came from a bunch of dudes who basically had (what I am assuming to be your problem) to connect a bunch of stuff together, in the cloud, without having to worry about how it all works behind the scenes. It has a fantastic user interface (all web based) - is completely cloud hosted (although you can run the endpoint on your server / computer if you so desire) and, if it all goes tits up with their inbuilt tooling (i.e. you can’t quite do what you need to) - you can run in-line Groovy (java) code within whatever ETL process you are having trouble with - i think this fits the bill for the user friendly designer stuff!
Boomi’s pedigree was and is connecting web services / rest API’s in a quick and easy way but also supports all the traditional stuff if you need it too (IBM MQ, blah blah)
The big downside is that it is not free - in fact quite expensive if this is not for a paid project
There is a 30 day free trial that i recommend you check out - I really did and do have a great time with Boomi for mashing endpoints together.
Now, at the time I also looked at Talend. IF i remember correctly this does not have a web interface, its all based in Eclipse, the problem with Talend when i looked at it was
You need to host the endpoint somewhere (this is usually true of all ETL however of course)
The UI was horrible at the time
Ultimately, finding free ‘ETL’ is nearly impossible - hence why pipes went down?
Sorry I can’t be of more help :(
Ballerina is a programming language custom built for integration, that includes a mature graphical syntax. It can easily be used to glue interfaces together. Since your requirement is to have such a mashup interface in the cloud you can utilize the WSO2 Integration Cloud free trial program to see if its right for you.
I've written a post here that demonstrates how easy it is to use Ballerina for scraping data from interfaces, you can create a service that's similar in logic and host it in the cloud. Find information on the WSO2 Integration Cloud usage here. Find information on serving a ballerina service from the cloud here.
Some more details would be helpful, such as which API you would like to connect to and how many requests you'd be making. Here's one way you might approach this with free tools:
Extract: An IFTTT integration plus their "Maker Channel" (Will post info from one of their 270+ integrations to an API)
Transform: Sheetsu, which turns a Google Spreadsheet into a restul API that you can post to. Transform the data and output it to another sheet.
Load: You can also make GET requests via Sheetsu, or just use the Google Spreadsheets API.

Can I host my front end in one hosting service and the backend somewhere else? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I have a website hosted in justhost.com. So far it is only HTML/CSS/JS done all from scratch. Recently I have been learning about Server Side Java Script (SSJS) using nodejs and I would like to add some JS backend processing to my site. The problem is that justhost.com does not seem to support nodejs applications, so now I am kind of stuck.
Is there a way to keep all the front end of my site (HTML, CSS and front end JS) hosted in justhost.com and then build the backend in nodejs SSJS and keep that part hosted in another service or server and somehow make it all work together?
Right not it is not a commercial application, so I can play around and break things, so I am open to any suggestion.
Thanks in advance.
The complete answer is "probably, but it's complicated" due to restrictions built into the web itself like cross-origin isolation as well as hosting provider restrictions. However, since you are asking this, my suggestion is just host your entire application (server side code, HTML, CSS, browser JS, images, etc) on a node.js hosting service since they all support that and it's trivial to do. No reason to complicate your architecture to stick with a static web host. It takes a handful of lines of "code" in your node app to have a fully functional static web site served along with any custom server-side logic your application may also need. (consider the static middleware bundled with the express.js application server, for example).
I agree with Peter Lyons answer but in case you really want to do this, the way I would do this is to treat your nodejs server as a rest api (or even SOAP api) and your front end server would treat your nodejs server as a database where it would require information from it like backendserver.com/users which would return your front end the users.. Or even better is your front end would have the UI code with the link to backendserver and then they would be loaded from the browser.. That's typically how your set up is handled

RIA application development on linux platform [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am building a web based RIA application using web technologies including Rails based server side code, and client side based on HTML,CSS,JQuery and plugins etc.
I am looking forward to creating a standalone cross platform application using the same code-base (I do not want to rewrite the application UI in Qt or some other language/library).
Also, I dont want that clients should have to set up a web server to view the content. The application is heavily ajax based and communicates with the restful backend. The desktop client should be able to have exactly the same functionality with additional provision of local storage and synchronization of data.
I wish to carry out the entire development process on linux. Now that Adobe has stopped supporting AIR on linux, I am not taking that as a viable candidate.
I have been looking into Mozilla prism, its almost what I am looking for, except for the fact that it does not seem to have any provision for local storage or interaction with local filesystem.
It would be preferable if the solution is open source. My entire codebase from bottom up is based on open source technologies and as far as possible I would like to keep it that way.
Also, I am comfortable hand-coding my application and features like, integration with existing IDEs, GUI development environment, powerful application builder wizards etc. are not necessary requirements.
I have been suggested that it is possible to have a webkit component embedded in a Qt application and carry out what I want, but I am unable to locate proper resources that can help me do that. I am familiar with Java and C plus plus, so writing additional wrapper code in some other language is not a major hurdle.
If somehow local storage facility can be added to prism, that would be a highly preferred solution.
Also creating a plugin for google-chrome/chromium is a possible alternative. How does it compare to above options.
Any help would be highly appreciated.
At the moment AppJS ( http://appjs.org ) seems to be the most robust contender designed exactly around these same principles.
Another alternative might be a GTK-webkit based solution ( http://webkitgtk.org ) .
[Update:Aug 2013]
Multiple other alternatives are available as well :
TideSDK
TideSDK is community based offshoot of the immensely popular Titanium SDK. While the project is very promising, last I checked there were major hiccups running the developer tools on linux.
Node-webkit
This interesting project provides seamless interoperability between Node.js and Webkit. The end result is that you can start developing an application just like you would write a web-page with the additional ability to call any built-in or third-party node-modules. CommonJS modules just work in the browser context. The project is intel sponsored and I have personally found it very simple to use and productive.

JavaScript editor within Eclipse [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm looking for the best JavaScript editor available as an Eclipse plugin. I've been using Spket which is good. But, is there more better one?
Eclipse HTML Editor Plugin
I too have struggled with this totally obvious question. It seemed crazy that this wasn't an extremely easy-to-find feature with all the web development happening in Eclipse these days.
I was very turned off by Aptana because of how bloated it is, and the fact that it starts up a local web server (by default on port 8000) everytime you start Eclipse and you can't disable this functionality. Adobe's port of JSEclipse is now a 400Mb plugin, which is equally insane.
However, I just found a super-lightweight JavaScript editor called Eclipse HTML Editor Plugin, made by Amateras, which was exactly what I was looking for.
Disclaimer, I work at Aptana. I would point out there are some nice features for JS that you might not get so easily elsewhere. One is plugin-level integration of JS libraries that provide CodeAssist, samples, snippets and easy inclusion of the libraries files into your project; we provide the plugins for many of the more commonly used libraries, including YUI, jQuery, Prototype, dojo and EXT JS.
Second, we have a server-side JavaScript engine called Jaxer that not only lets you run any of your JS code on the server but adds file, database and networking functionality so that you don't have to use a scripting language but can write the entire app in JS.
Try the Vjet Javascript IDE from ebay (installation)
Ganymede's version of WTP includes a revamped Javascript editor that's worth a try. The key version numbers are Eclipse 3.4 and WTP 3.0. See http://live.eclipse.org/node/569
There once existed a plugin called JSEclipse that Adobe has subsequently sucked up and killed by making it available only by purchasing and installing FlexBuilder 3 (please someone prove me wrong). I found it to worked excellent but have since lost it since "upgrading" from Eclipse 3.4 to 3.4.1.
The feature I liked most was Content Outline.
In the Outline window of your Eclipse
Screen, JSEclipse lists all classes in
the currently opened file. It provides
an overview of the class hierarchy and
also method and property names. The
outline makes heavy use of the code
completion engine to find out more
about how the code is structured. By
clicking on the function entry in the
list the cursor will be taken to the
function declaration helping you
navigate faster in long files with
lots of class and method definitions
The new release of Eclipse (Helios) has an especific package for javascript web development. I haven't tried it yet, but it certainly worth a look.
Didn't use eclipse for a while, but there are ATF and Aptana.
Oracle Workshop for WebLogic (formally BEA Workshop) has excellent support for JavaScript and for visually editing HTMLs. It support many servers, not only WebLogic, including Tomcat, JBoss, Resin, Jetty, and WebSphere.
It recently became free, check out my post about it. Given that it was an expensive product not long ago, I guess it's worth checking out.

Categories

Resources