For CouchDB, I know that show function can generate HTMLs / Images / XML feed on the fly.
While in that case they have to be in the script itself and encoded (e.g. base 64 for image), as in here
What is the best way to load static resources which are attachments of design documents
e.g. As simple as JSON, or Images and process with server-side javascript?
The script file itself is an attachment in the design doc. The variable doc is not available.
Are there any way similar to node.js for it? or we use trick in context like _show or _list to show the document with id: _design/ddoc ?
doing REST request inside that environment I believe is also not possible as XMLHttpRequest is also not available. Establishing DB connection is also not possible?
This supposed to be a simple question, I wonder I am missing something in couchDB?
In order to serve a website directly, you need to use url rewrites. You'd rewrite / to got to one of your show functions. to bootstrap your site with basic HTML and JS (embedded probably).
A lot of this work has been done already by CouchApps (basic tutorial here). This is by far the easiest way to get started. This seems to be the way http://npmjs.org is served.
This isn't the place for a walkthrough, so hopefully this gives you enough information to get started.
If your site needs server-side logic (websockets for example), this solution won't work for you. All you get with a couch app is a database, HTML, CSS and Javascript.
Related
I'm trying to write a webscraper, to get some sales leads. The problem is that in modern webdesign, most of websites uses some JavaScript to modify DOM (usually using React, Angular, or even just some jQuery). The problem is, that if I scrap some website by request node.js package, and pass html code to cheerio, then I'm simply not able to parse the code and get the info I want. Instead, all I can see are some React.js components ¯_ツ_/¯
Any resources on this topic will be helpful, thanks in advance.
Because the request package will not execute any of the javascript on the page. It will just download the html as is. If you want to see the actual page like a browser does, you would have to create a javascript parser that executes all javascript code in the state you want it to.
Luckily, there are some other options here:
You could take a look at the developer tools on the website you want to scrape and try to find the xhr requests that fetches the data you need. Then you can call this url directly.
You could use headless browser scraping like PhantomJS or CasperJS. These are packages that will try and modify the downloaded dom as good as possible with the included javascript resources.
I am trying to save a couple of web pages by using a web crawler. Usually I prefer doing it with perl's WWW::Mechanize modul. However, as far as I can tell, the site I am trying to crawl has many javascripts on it which seem to be hard to avoid. Therefore I looked into the following perl modules
WWW::Mechanize::Firefox
MozRepl
MozRepl::RemoteObject
The Firefox MozRepl extension itself works perfectly. I can use the terminal for navigating the web site just the way it is shown in the developer's tutorial - in theory. However, I have no idea about javascript and therefore am having a hard time using the moduls properly.
So here is the source i like to start from: Morgan Stanley
For a couple of listed firms beneath 'Companies - as of 10/14/2011' I like to save their respective pages. E.g. clicking on the first listed company (i.e. '1-800-Flowers.com, Inc') a javascript function gets called with two arguments -> dtxt('FLWS.O','2011-10-14'), which produces the desired new page. The page I now like to save locally.
With perl's MozRepl module I thought about something like this:
use strict;
use warnings;
use MozRepl;
my $repl = MozRepl->new;
$repl->setup;
$repl->execute('window.open("http://www.morganstanley.com/eqr/disclosures/webapp/coverage")');
$repl->repl_enter({ source => "content" });
$repl->execute('dtxt("FLWS.O", "2011-10-14")');
Now I like to save the produced HTML page.
So again, the desired code I like to produce should visit for a couple of firms their HTML site and simply save the web page. (Here are e.g. three firms: MMM.N, FLWS.O, SSRX.O)
Is it correct, that I cannot go around the page's javascript functions and therefore cannot use WWW::Mechanize?
Following question 1, are the mentioned perl modules a plausible approach to take?
And finally, if you say the first two questions can be anwsered with yes, it would be really nice if you can help me out with the actual coding. E.g. in the above code, the essential part which is missing is a 'save'-command. (Maybe using Firefox's saveDocument function?)
The web works via HTTP requests and responses.
If you can discover the proper request to send, then you will get the proper response.
If the target site uses JS to form the request, then you can either execute the JS,
or analyse what it does so that you can do the same in the language that you are using.
An even easier approach is to use a tool that will capture the resulting request for you, whether the request is created by JS or not, then you can craft your scraping code
to create the request that you want.
The "Web Scraping Proxy" from AT&T is such a tool.
You set it up, then navigate the website as normal to get to the page you want to scrape,
and the WSP will log all requests and responses for you.
It logs them in the form of Perl code, which you can then modify to suit your needs.
I believe that this question has been asked in a few different forms, but I've read quite a few different responses.
At first, I had a web-application written with mostly jQuery that would make use of servlets to retrieve information from various locations JavaScript could not access (ie. Feeds, images from a server, etc.). Now, however, I've been told to do away with the servlets and application configuration classes so that this project of mine contains only HTML, CSS, and JavaScript/jQuery. Rather than pulling the images off of the server, I need to retrieve them from a local file on the computer. I know that allowing this might seem like terrible design, but it's what I've been asked to do. At any rate, what I really need to do is count the number of image files in a directory and then perhaps compile an array of the filenames themselves. I could do this fine in Java when using the servlets, but without them, I'm not sure how or even if this can be done.
I'm basically trying to use the jQuery Cycle plug-in to cycle through these images like a slideshow. I inject (or $("#div").append()) these images into the div by using a loop based on the number of images present.
So, is there a way I can do this with using JavaScript, HTML, jQuery plug-in, etc? I'd like to avoid using PHP and Java at this point...
You can't just read a directory with JavaScript; however, there appears to be a way to "exploit" how browsers function using http://www.irt.org/articles/js014/. It may not be pretty, but the demo works in the latest Chrome and IE7-9 for me. I'm sure some of the techniques could be updated to use cleaner code if you'd like to improve upon it.
EDIT:
Another technique you could use can be found in Javascript read files in folder
It definitely looks to be a cleaner solution. What I'd recommend is extracting the body contents to inject into a hidden div or using the path for an iframe that you can read from.
I am a javascript newbie. I am trying to write a requirements document, and need some help describing what I am looking for. We want our application to generate a javascript snippet like this:
<script src="http://www.jotform.com/jsform/10511502633"></script>
This will load a web form.
So my question is:
- How does a single script load an entire web form? Is this a JSON?
- What is this called? Is this a cross browser javascript?
- Can anyone point me in the direction of learning more about what this is?
Thank you for your help!
The javascript file is just hosted on an external site. It appears to be dynamically generated, so feel free to use some fancy words ;) But basically, you just include it here, as if it was on your own site.
You could say "The application will generate the required script-tags to include dynamically generated javascript file from an external, third-party site".
Offcourse you need to take special cautions for cases when the include won't work, because the other site is not reachable (site is down, DNS does not work, file is moved on other webserver, your application is on an intranet/behind a proxy/firewall...). Why can't you copy their file and mirror it locally? Or use a reliable Content Delivery Network, like Google or Amazon.
There are many names for this type of inclusion. The most common being widget.
What does it actually do:
take an id of some sort as parameter
use the id to fetch some specific data (most likely from a database)
generate some js and html based on the id/data
usually this involves iframes of some sort.
To use a script rather than an html iframe has multiple advantages
you can change what is actually delivered to the users browsers without changing the include
you can resize the iframe to fit certain predefined sizes
you can inject the necessary things into the page the widget is included (of course you need to make sure this is sanctioned)
We use this all the time and we never regreted it.
If you don't want to build the widget infrastructure yourself you can always use one of the widget providers like widgetbox:
http://www.widgetbox.com/widgets/make/
With those you are up and running in no time.
This is typically called a script include.
Google have lots of these types of items, and even they call them by many names,
widgets, custom javascript, snippets, custom code, etc. It really depending on who you are writing for... I would go with "cross platform embeddable javascript code" meaning that it would need to load all its dependancies. Also specify which browsers need to be supported and what should happen is the user has javascript turned off.
EDIT :
Actually since we are talking unique IDs, you will need 2 parts probably, the user/site unique "cross platform embeddable javascript code" and whatever serverside code to support it. Basically this is an API that is accessed using your own javascript widget. Feel free you point to examples in your requirements document, programmers love examples.
I'm creating a website that could potentially contain many widgets. These widgets are rendered using external javascript code in external js files. Some of these widgets require the same external javascript files but i obviously dont want the browser including (ie: ) a javascript file whenever a widget requires one; instead it will only include a javascript file when it has not already been retrieved or is in the process of retrieving one.
Is there a javascript manager that can fulfill the requirements above? Extra points will be given to the solution with jQuery in mind :).
I use Google's AJAX API to load jQuery and various other libraries I use.
It cuts down on bandwidth since the user won't get the resource from my server, and also the odds of them having the library are higher.
Here's a link.
Dojo's dojo.require and dojo.provide does exactly what you are asking for. Example:
dojo.require("com.project.widgets.someWidget");
Which would load com/project/widgets/someWidget.js. Inside of that file, you would have
dojo.provide("com.project.widgets.someWidget");
Which registers the source as loaded.
It basically does a synchronous AJAX request, eval's the file, and marks the path as "fetched" so that it doesn't make the same request twice"