I have built a really simple script using JQuery/JSApi. I want to deploy it to my raspberry pi. As such, I need to make it only use the minimal amount of code possible (Pi is already full!).
It will not have a network connection, so let's say I just want to grab one file from the JSApi (as an example - I will not actually do this as it isn't legal).
So, I opened fiddler, opened my webpage, and saw what dependencies it had. After loading the JSApi, it fetched the following two files :
GET www.google.com/uds/api/visualization/1.0/69d4d6122bf8841d4832e052c2e3bf39/format+en,default+en.I.js HTTP/1.1
GET www.google.com/uds/?file=visualization&v=1 HTTP/1.1
So, two questions - firstly, is there any legal way for me to get this file and host it locally for JSApi?
Assuming no is the answer to this, let's assume the files are JQuery modules - where I believe this would be allowed. How would I grab them, and point to them locally? When I try to navigate to either of the addresses above, for example, I get an error message or nothing loads - so it is not possible to include these modules (or other JQuery modules) separately ?
Thank you!
I don't know if this will work in your exact setup, but you can do something like the following to get it pared down to the bare essentials:
Write up your webpage with the full jQuery package and any modules necessary (per your second example)
Pass all the javascript you're using through something like Google's Closure Compiler: https://closure-compiler.appspot.com/home
Include that compiled javascript, which will include only the absolutely necessary functions.
Related
I develope a plugin that is currently being used in thousands of websites. The code to install the plugin includes a reference to a javascript without the protocol, for example:
//www.mysite.com/js/script.js
This works fine on the majority of the websites, requesting from https or http version depending on the current protocol.
However, from time to time, let's 0,5%, there are websites that don't recognize this way of referencing a js script. When I look the website code I find:
http://www.userwebsite.com//www.mysite.com/js/script.js
This is not a browser specific issue, because I test it with any browser and I still have the issue.. it's more a website specific problem.
I've read everywhere that this is the recommended practice, and can't find the source of the problem. Any ideas?
- Most of the sites that use the plugin are WordPress sites
- The js reference is included directly into the HTML, inside the body
If it's already in the website code (the HTML source) then it can't be a browser or JavaScript issue. As you already assumed correctly, it must be a server side problem.
Maybe the pages where your plugin is placed on are converting those links. And they don't recognize the double slashes so they think it's a relative url on the server hence prepending the protocol and domain.
Maybe they use some sort of code optimization / JavaScript minification that is changing your links.
How do I read a local text file in JavaScript, w/o jquery etc.? I realize the question has been asked before, with a gazillion answers (but both html and javascript also have changed a lot over the years).
So here is my scenario:
do not want to use any webserver for this purpose as I wish to send a tarball+make install to someone for a demo and want zero installation
a blocking read is fine
files are simple text files such as "downloaded/xyz.json". They are indeed stringify'd json. Do not want to use a files dialog box to select file name - hardcoding is the desirable for this purpose
Security is not an issue in this context, although I am not prepared to run the browser with --disable-web-security or do anything that's even remotely cross-domain
env: linux (openSuse 11.4), using google-chrome
do not want to use jquery for this
Will greatly appreciate a simple code snippet, if it is feasible.
I am trying to save a couple of web pages by using a web crawler. Usually I prefer doing it with perl's WWW::Mechanize modul. However, as far as I can tell, the site I am trying to crawl has many javascripts on it which seem to be hard to avoid. Therefore I looked into the following perl modules
WWW::Mechanize::Firefox
MozRepl
MozRepl::RemoteObject
The Firefox MozRepl extension itself works perfectly. I can use the terminal for navigating the web site just the way it is shown in the developer's tutorial - in theory. However, I have no idea about javascript and therefore am having a hard time using the moduls properly.
So here is the source i like to start from: Morgan Stanley
For a couple of listed firms beneath 'Companies - as of 10/14/2011' I like to save their respective pages. E.g. clicking on the first listed company (i.e. '1-800-Flowers.com, Inc') a javascript function gets called with two arguments -> dtxt('FLWS.O','2011-10-14'), which produces the desired new page. The page I now like to save locally.
With perl's MozRepl module I thought about something like this:
use strict;
use warnings;
use MozRepl;
my $repl = MozRepl->new;
$repl->setup;
$repl->execute('window.open("http://www.morganstanley.com/eqr/disclosures/webapp/coverage")');
$repl->repl_enter({ source => "content" });
$repl->execute('dtxt("FLWS.O", "2011-10-14")');
Now I like to save the produced HTML page.
So again, the desired code I like to produce should visit for a couple of firms their HTML site and simply save the web page. (Here are e.g. three firms: MMM.N, FLWS.O, SSRX.O)
Is it correct, that I cannot go around the page's javascript functions and therefore cannot use WWW::Mechanize?
Following question 1, are the mentioned perl modules a plausible approach to take?
And finally, if you say the first two questions can be anwsered with yes, it would be really nice if you can help me out with the actual coding. E.g. in the above code, the essential part which is missing is a 'save'-command. (Maybe using Firefox's saveDocument function?)
The web works via HTTP requests and responses.
If you can discover the proper request to send, then you will get the proper response.
If the target site uses JS to form the request, then you can either execute the JS,
or analyse what it does so that you can do the same in the language that you are using.
An even easier approach is to use a tool that will capture the resulting request for you, whether the request is created by JS or not, then you can craft your scraping code
to create the request that you want.
The "Web Scraping Proxy" from AT&T is such a tool.
You set it up, then navigate the website as normal to get to the page you want to scrape,
and the WSP will log all requests and responses for you.
It logs them in the form of Perl code, which you can then modify to suit your needs.
Our project has more than 300 JSP files and more than 200 JavaScript files. I'd like to do some cleanup, removing unnecessary JS files. Even if the JSP includes the JS maybe none of the functions are used. The goal is to reduce both complexity and time needed to load the page. My IDE is Eclipse. Giving the dynamic nature of JavaScript I guess it will be hard or even impossible.
If it's conceivable that the application can be tested with a lot of coverage (i.e. going through every dialog, error message, and situation imaginable) you may be able to work with your access log files - compare the list of JS files to those fetched after period x of heavy use.
An alternative implementation of this would be setting up a "honeypot" (see my answer to this question).
Both these methods are of course "soft" in that their quality relies in how throroughly the application is actually used during testing time.
If you have any way of grepping all script references, that would be preferable. Maybe you can do a global search on {anything}.js, that would match most ways how to embed a JS file.
To find out what functions and javascript files are used in a project, you need code coverage tools, like JSCoverage or Code coverage for Firebug. These tools will return the functions used and the files used. Using these with an automated test suit like the Selenium or randomized testing should give you a fairly good idea which files are loaded.
If the files are loaded dynamically, you can also use Firebug or Fiddler to log the requests for the JS files.
Unfortunately if you want certainty, not just extremely high likeliness that you get with the above tools, you would have to generate a calling graph for your entire webapp, maybe using a Javascript Compiler, like Rhino...
What are your tricks on getting the caching part of web application just right?
Make the expiry date too long and we'll have a lot of stale caches, too short and we risk the servers overloaded with unnecessary requests.
How to make sure that all changes will refresh all cache?
How to embed SVN revision into code/url?
Does having multiple version side-by-side really help to address version mismatch problem?
Look at the minify project. It's written in PHP but you could use it as a blueprint for any language.
Key features:
a config file to combine & minify several js or css files into one
always uses the last modified date of the last modified file in a config group as a URL parameter
example resource might look like
<script type="text/javascript" src="/min/g=js1&1248185458"></script>
which would fetch the 'js1' group of javascript files in your configuration with the version number "1248185458" which is really just the last modified date converted to epoch time.
When you put updated js files on your production servers, they'll have a new modified date which automatically becomes a new version number - no stale caches, no manual versioning.
It's a very cool project with some really well thought out ideas about optimization and caching. I've modified the process slightly to insert YUI compressor into the build process. You can optimize it even more by preventing the last modified lookups from the browser by modifying your server's headers (here and here).
I think you are on the right track with putting version numbers on your js css files. And you may want to use a build tool to put all of this together for you like http://ant.apache.org/ or http://nant.sourceforge.net/
Couple of ways to deal with this issue:
Following the clue given about using version #s, if that presents difficulties for you in your build environment it is also just as effective to put a URL parameter at the end of your URL. The browser clients will treat each URL with a different version parameter as URL no in their cache and will download the file again. The servers won't care that the parameter is there, for static content
So, for example, http://mydomain.com/js/main.js can be included in your HTML as http://mydomain.com/js/main.js?v1.5. It might be easier for you to pass version #s into your serverside scripts and append them onto your clientside include URLs.
The second method I've seen work well is to use a controller serverside to deliver your code. Facebook makes use of this. You will see includes in script tags that end in ".php" all the time.
E.g.
<script src="http://static.ak.connect.facebook.com/js/api_lib/v0.4/FeatureLoader.js.php" type="text/javascript"></script>
Their backend determines what JS needs to be sent to the client based on the environment that was sent up in the request.