How do I automate Javascript and CSS minification on Google App Engine? - javascript

I couldn't find any proper solution for automating Google App Engine CSS and Javascript minification.

If your JS/CSS files are to be used inside an HTML page, then a very good option is to have App Engine optimize your site automatically (minification, bundling, inlining and more) via the experimental "Page Speed" feature. You can turn this on by doing this:
Go to your projects' App Engine dashboard:
https://appengine.google.com/settings?&app_id=s~your_project_id
Click on "Application Settings" (bottom left under "Administration" section).
Scroll down to the "Performance" section and locate "PageSpeed Service:". Check the " Enable PageSpeed Service" checkbox and hit "Save".
This will add response filters that will automatically do stuff like combine and minify your JS, turn the minified bundle from a script reference to an inline script (to lesser the count of server requests) and more cool and effortless performance improvments. Read more about this feature here: https://developers.google.com/speed/pagespeed/service/faq

Write a deploy script that makes a copy of your app with minified JS and CSS, and then calls appcfg on it. You shouldn't be minifying it dynamically unless you're generating it dynamically.

I ended up creating this appengine script (uses memcache and slimit).
I found slimit to be the best minification script all around, but I'm thinking about using the one from Google instead.
http://ronreiterdotcom.wordpress.com/2011/08/30/automatic-javascript-minification-using-slimit-on-google-app-engine/

You can automate the process pretty efficiently by loading the content of your script into a string, processing it with jsmin and finally save and serve the result. Don't worry about performance, you only run jsmin when the application instance is created (certainty not for every request).
you can grab jsmin.py here.
lets say I have this function that reads the JS from the filesystem (uncompressed, debug version) and returns it's string content in the logger.py module:
class ScriptManager():
def get_javascript(self):
path_to_js = os.path.join(os.path.dirname(__file__), 'js/script.js')
return file(path_to_js,'rb').read()
process it over with jsmin. make sure to use proper caching headers. take this jsrendered sample module as an examp
js_compressed =
jsmin.jsmin(scripts.logger.ScriptManager().get_javascript())
JS_CACHE_FOR_DAYS = 30
class scriptjs(webapp2.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/javascript'
expires_date = datetime.datetime.utcnow() + datetime.timedelta(JS_CACHE_FOR_DAYS)
expires_str = expires_date.strftime('%d %b %Y %H:%M:%S GMT')
self.response.headers.add_header('Expires', expires_str)
self.response.headers['Cache-Control'] = 'public'
self.response.cache_control.no_cache = None
self.response.out.write(js_compressed)
now return that from a dynamic contnet handler in your main.py:
app = webapp2.WSGIApplication([
('/scripts/script.js', jsrender.scriptjs),...

Nick's answer is the correct way to do it, but you could do it on the fly when the JS/CSS is requested - then set cache-control to public to cache the results upstream.
Slimmer - http://pypi.python.org/pypi/slimmer/
JSmin.py -
http://code.google.com/p/v8/source/browse/branches/bleeding_edge/tools/jsmin.py
Cache control header chatter -
http://groups.google.com/group/google-appengine/browse_thread/thread/be3fa7b5e170f378 and blog post - http://www.kyle-jensen.com/proxy-caching-on-google-appengine

You could try a build-time or a runtime solution (using maven plugin) provided by a tool called wro4j
Disclaimer: this is a project I'm working on.

Related

Minifying and Hosting Google tag manager JavaScript files locally

Before I start I should mention that I know about Hosting the Google JavaScript file locally is not recommended by Google. But since the Caching time is so low on these files I want to have it locally with much more cache time. I also automatically get the last version when the real file is changed.
I'm trying to fetch ga.js file by cron everyday like this and host it locally:
$ana = file_get_contents('https://www.google-analytics.com/analytics.js');
if ($ana!==false){
file_put_contents('ANA_LOCAL_PATH',$ana);
$tag = file_get_contents('https://www.googletagmanager.com/gtag/js');
if ($tag!==false){
$tag = str_replace('https://www.google-analytics.com/analytics.js','ANA_LOCAL_URL',$tag);
file_put_contents('GTAG_LOCAL_PATH',$tag);
}
}
I'm also getting analytics.js file and replacing Google URL with local analytics.js file.
But I want it to be future proof. Google using this URL:
https://www.googletagmanager.com/gtag/js?id=UA-xxxxxxxxx-x
I'm using this:
https://www.googletagmanager.com/gtag/js
I have more than 10 tag manager ID and I only get the JavaScript file one time without ID.
Another thing is that Google header file is starting with this:
// Copyright 2012 Google Inc. All rights reserved.
(function(){
var data = {
"resource": {
"version":"1",
"macros":[],
"tags":[],
"predicates":[],
"rules":[]
},
"runtime":[
[],[]
]
This file has multiple new lines (not only on header) which is not minified properly.
I found Google does not minified the script well. for example, case "url_no_fragment" has space after case. It can be case"url_no_fragment". You can find lots of switch case with white space in the file.
The Question
Is Downloading method has problem (now or future), just because I don't get it with Google tag manager ID?
Can I minify Google files with my own method and remove some flaws that I mentioned?
I know this part is not programming related and I think I'm getting NO for this. but just for the sake of minification. Am I allowed to remove that copyright comment to have the best minify file?

what is the filename.js?m=1497883755 in script attachment src? [duplicate]

I have noticed that some browsers (in particular, Firefox and Opera) are very zealous in using cached copies of .css and .js files, even between browser sessions. This leads to a problem when you update one of these files, but the user's browser keeps on using the cached copy.
What is the most elegant way of forcing the user's browser to reload the file when it has changed?
Ideally, the solution would not force the browser to reload the file on every visit to the page.
I have found John Millikin's and da5id's suggestion to be useful. It turns out there is a term for this: auto-versioning.
I have posted a new answer below which is a combination of my original solution and John's suggestion.
Another idea that was suggested by SCdF would be to append a bogus query string to the file. (Some Python code, to automatically use the timestamp as a bogus query string, was submitted by pi..)
However, there is some discussion as to whether or not the browser would cache a file with a query string. (Remember, we want the browser to cache the file and use it on future visits. We only want it to fetch the file again when it has changed.)
This solution is written in PHP, but it should be easily adapted to other languages.
The original .htaccess regex can cause problems with files like json-1.3.js. The solution is to only rewrite if there are exactly 10 digits at the end. (Because 10 digits covers all timestamps from 9/9/2001 to 11/20/2286.)
First, we use the following rewrite rule in .htaccess:
RewriteEngine on
RewriteRule ^(.*)\.[\d]{10}\.(css|js)$ $1.$2 [L]
Now, we write the following PHP function:
/**
* Given a file, i.e. /css/base.css, replaces it with a string containing the
* file's mtime, i.e. /css/base.1221534296.css.
*
* #param $file The file to be loaded. Must be an absolute path (i.e.
* starting with slash).
*/
function auto_version($file)
{
if(strpos($file, '/') !== 0 || !file_exists($_SERVER['DOCUMENT_ROOT'] . $file))
return $file;
$mtime = filemtime($_SERVER['DOCUMENT_ROOT'] . $file);
return preg_replace('{\\.([^./]+)$}', ".$mtime.\$1", $file);
}
Now, wherever you include your CSS, change it from this:
<link rel="stylesheet" href="/css/base.css" type="text/css" />
To this:
<link rel="stylesheet" href="<?php echo auto_version('/css/base.css'); ?>" type="text/css" />
This way, you never have to modify the link tag again, and the user will always see the latest CSS. The browser will be able to cache the CSS file, but when you make any changes to your CSS the browser will see this as a new URL, so it won't use the cached copy.
This can also work with images, favicons, and JavaScript. Basically anything that is not dynamically generated.
Simple Client-side Technique
In general, caching is good... So there are a couple of techniques, depending on whether you're fixing the problem for yourself as you develop a website, or whether you're trying to control cache in a production environment.
General visitors to your website won't have the same experience that you're having when you're developing the site. Since the average visitor comes to the site less frequently (maybe only a few times each month, unless you're a Google or hi5 Networks), then they are less likely to have your files in cache, and that may be enough.
If you want to force a new version into the browser, you can always add a query string to the request, and bump up the version number when you make major changes:
<script src="/myJavascript.js?version=4"></script>
This will ensure that everyone gets the new file. It works because the browser looks at the URL of the file to determine whether it has a copy in cache. If your server isn't set up to do anything with the query string, it will be ignored, but the name will look like a new file to the browser.
On the other hand, if you're developing a website, you don't want to change the version number every time you save a change to your development version. That would be tedious.
So while you're developing your site, a good trick would be to automatically generate a query string parameter:
<!-- Development version: -->
<script>document.write('<script src="/myJavascript.js?dev=' + Math.floor(Math.random() * 100) + '"\><\/script>');</script>
Adding a query string to the request is a good way to version a resource, but for a simple website this may be unnecessary. And remember, caching is a good thing.
It's also worth noting that the browser isn't necessarily stingy about keeping files in cache. Browsers have policies for this sort of thing, and they are usually playing by the rules laid down in the HTTP specification. When a browser makes a request to a server, part of the response is an Expires header... a date which tells the browser how long it should be kept in cache. The next time the browser comes across a request for the same file, it sees that it has a copy in cache and looks to the Expires date to decide whether it should be used.
So believe it or not, it's actually your server that is making that browser cache so persistent. You could adjust your server settings and change the Expires headers, but the little technique I've written above is probably a much simpler way for you to go about it. Since caching is good, you usually want to set that date far into the future (a "Far-future Expires Header"), and use the technique described above to force a change.
If you're interested in more information on HTTP or how these requests are made, a good book is "High Performance Web Sites" by Steve Souders. It's a very good introduction to the subject.
Google's mod_pagespeed plugin for Apache will do auto-versioning for you. It's really slick.
It parses HTML on its way out of the webserver (works with PHP, Ruby on Rails, Python, static HTML -- anything) and rewrites links to CSS, JavaScript, image files so they include an id code. It serves up the files at the modified URLs with a very long cache control on them. When the files change, it automatically changes the URLs so the browser has to re-fetch them. It basically just works, without any changes to your code. It'll even minify your code on the way out too.
Instead of changing the version manually, I would recommend you use an MD5 hash of the actual CSS file.
So your URL would be something like
http://mysite.com/css/[md5_hash_here]/style.css
You could still use the rewrite rule to strip out the hash, but the advantage is that now you can set your cache policy to "cache forever", since if the URL is the same, that means that the file is unchanged.
You can then write a simple shell script that would compute the hash of the file and update your tag (you'd probably want to move it to a separate file for inclusion).
Simply run that script every time CSS changes and you're good. The browser will ONLY reload your files when they are altered. If you make an edit and then undo it, there's no pain in figuring out which version you need to return to in order for your visitors not to re-download.
I am not sure why you guys/gals are taking so much pain to implement this solution.
All you need to do if get the file's modified timestamp and append it as a querystring to the file.
In PHP I would do it as:
<link href="mycss.css?v=<?= filemtime('mycss.css') ?>" rel="stylesheet">
filemtime() is a PHP function that returns the file modified timestamp.
You can just put ?foo=1234 at the end of your CSS / JavaScript import, changing 1234 to be whatever you like. Have a look at the Stack Overflow HTML source for an example.
The idea there being that the ? parameters are discarded / ignored on the request anyway and you can change that number when you roll out a new version.
Note: There is some argument with regard to exactly how this affects caching. I believe the general gist of it is that GET requests, with or without parameters should be cachable, so the above solution should work.
However, it is down to both the web server to decide if it wants to adhere to that part of the spec and the browser the user uses, as it can just go right ahead and ask for a fresh version anyway.
I've heard this called "auto versioning". The most common method is to include the static file's modification time somewhere in the URL, and strip it out using rewrite handlers or URL configurations:
See also:
Automatic asset versioning in Django
Automatically Version Your CSS and JavaScript Files
The 30 or so existing answers are great advice for a circa 2008 website. However, when it comes to a modern, single-page application (SPA), it might be time to rethink some fundamental assumptions… specifically the idea that it is desirable for the web server to serve only the single, most recent version of a file.
Imagine you're a user that has version M of a SPA loaded into your browser:
Your CD pipeline deploys the new version N of the application onto the server
You navigate within the SPA, which sends an XMLHttpRequest (XHR) to the server to get /some.template
(Your browser hasn't refreshed the page, so you're still running version M)
The server responds with the contents of /some.template — do you want it to return version M or N of the template?
If the format of /some.template changed between versions M and N (or the file was renamed or whatever) you probably don't want version N of the template sent to the browser that's running the old version M of the parser.†
Web applications run into this issue when two conditions are met:
Resources are requested asynchronously some time after the initial page load
The application logic assumes things (that may change in future versions) about resource content
Once your application needs to serve up multiple versions in parallel, solving caching and "reloading" becomes trivial:
Install all site files into versioned directories: /v<release_tag_1>/…files…, /v<release_tag_2>/…files…
Set HTTP headers to let browsers cache files forever
(Or better yet, put everything in a CDN)
Update all <script> and <link> tags, etc. to point to that file in one of the versioned directories
That last step sounds tricky, as it could require calling a URL builder for every URL in your server-side or client-side code. Or you could just make clever use of the <base> tag and change the current version in one place.
† One way around this is to be aggressive about forcing the browser to reload everything when a new version is released. But for the sake of letting any in-progress operations to complete, it may still be easiest to support at least two versions in parallel: v-current and v-previous.
In Laravel (PHP) we can do it in the following clear and elegant way (using file modification timestamp):
<script src="{{ asset('/js/your.js?v='.filemtime('js/your.js')) }}"></script>
And similar for CSS
<link rel="stylesheet" href="{{asset('css/your.css?v='.filemtime('css/your.css'))}}">
Example HTML output (filemtime return time as as a Unix timestamp)
<link rel="stylesheet" href="assets/css/your.css?v=1577772366">
Don’t use foo.css?version=1!
Browsers aren't supposed to cache URLs with GET variables. According to http://www.thinkvitamin.com/features/webapps/serving-javascript-fast, though Internet Explorer and Firefox ignore this, Opera and Safari don't! Instead, use foo.v1234.css, and use rewrite rules to strip out the version number.
Here is a pure JavaScript solution
(function(){
// Match this timestamp with the release of your code
var lastVersioning = Date.UTC(2014, 11, 20, 2, 15, 10);
var lastCacheDateTime = localStorage.getItem('lastCacheDatetime');
if(lastCacheDateTime){
if(lastVersioning > lastCacheDateTime){
var reload = true;
}
}
localStorage.setItem('lastCacheDatetime', Date.now());
if(reload){
location.reload(true);
}
})();
The above will look for the last time the user visited your site. If the last visit was before you released new code, it uses location.reload(true) to force page refresh from server.
I usually have this as the very first script within the <head> so it's evaluated before any other content loads. If a reload needs to occurs, it's hardly noticeable to the user.
I am using local storage to store the last visit timestamp on the browser, but you can add cookies to the mix if you're looking to support older versions of IE.
The RewriteRule needs a small update for JavaScript or CSS files that contain a dot notation versioning at the end. E.g., json-1.3.js.
I added a dot negation class [^.] to the regex, so .number. is ignored.
RewriteRule ^(.*)\.[^.][\d]+\.(css|js)$ $1.$2 [L]
Interesting post. Having read all the answers here combined with the fact that I have never had any problems with "bogus" query strings (which I am unsure why everyone is so reluctant to use this) I guess the solution (which removes the need for Apache rewrite rules as in the accepted answer) is to compute a short hash of the CSS file contents (instead of the file datetime) as a bogus querystring.
This would result in the following:
<link rel="stylesheet" href="/css/base.css?[hash-here]" type="text/css" />
Of course, the datetime solutions also get the job done in the case of editing a CSS file, but I think it is about the CSS file content and not about the file datetime, so why get these mixed up?
For ASP.NET 4.5 and greater you can use script bundling.
The request http://localhost/MvcBM_time/bundles/AllMyScripts?v=r0sLDicvP58AIXN_mc3QdyVvVj5euZNzdsa2N1PKvb81 is for the bundle AllMyScripts and contains a query string pair v=r0sLDicvP58AIXN_mc3QdyVvVj5euZNzdsa2N1PKvb81. The query string v has a value token that is a unique identifier used for caching. As long as the bundle doesn't change, the ASP.NET application will request the AllMyScripts bundle using this token. If any file in the bundle changes, the ASP.NET optimization framework will generate a new token, guaranteeing that browser requests for the bundle will get the latest bundle.
There are other benefits to bundling, including increased performance on first-time page loads with minification.
For my development, I find that Chrome has a great solution.
https://superuser.com/a/512833
With developer tools open, simply long click the refresh button and let go once you hover over "Empty Cache and Hard Reload".
This is my best friend, and is a super lightweight way to get what you want!
Thanks to Kip for his perfect solution!
I extended it to use it as an Zend_view_Helper. Because my client run his page on a virtual host I also extended it for that.
/**
* Extend filepath with timestamp to force browser to
* automatically refresh them if they are updated
*
* This is based on Kip's version, but now
* also works on virtual hosts
* #link http://stackoverflow.com/questions/118884/what-is-an-elegant-way-to-force-browsers-to-reload-cached-css-js-files
*
* Usage:
* - extend your .htaccess file with
* # Route for My_View_Helper_AutoRefreshRewriter
* # which extends files with there timestamp so if these
* # are updated a automatic refresh should occur
* # RewriteRule ^(.*)\.[^.][\d]+\.(css|js)$ $1.$2 [L]
* - then use it in your view script like
* $this->headLink()->appendStylesheet( $this->autoRefreshRewriter($this->cssPath . 'default.css'));
*
*/
class My_View_Helper_AutoRefreshRewriter extends Zend_View_Helper_Abstract {
public function autoRefreshRewriter($filePath) {
if (strpos($filePath, '/') !== 0) {
// Path has no leading '/'
return $filePath;
} elseif (file_exists($_SERVER['DOCUMENT_ROOT'] . $filePath)) {
// File exists under normal path
// so build path based on this
$mtime = filemtime($_SERVER['DOCUMENT_ROOT'] . $filePath);
return preg_replace('{\\.([^./]+)$}', ".$mtime.\$1", $filePath);
} else {
// Fetch directory of index.php file (file from all others are included)
// and get only the directory
$indexFilePath = dirname(current(get_included_files()));
// Check if file exist relativ to index file
if (file_exists($indexFilePath . $filePath)) {
// Get timestamp based on this relativ path
$mtime = filemtime($indexFilePath . $filePath);
// Write generated timestamp to path
// but use old path not the relativ one
return preg_replace('{\\.([^./]+)$}', ".$mtime.\$1", $filePath);
} else {
return $filePath;
}
}
}
}
I have not found the client-side DOM approach creating the script node (or CSS) element dynamically:
<script>
var node = document.createElement("script");
node.type = "text/javascript";
node.src = 'test.js?' + Math.floor(Math.random()*999999999);
document.getElementsByTagName("head")[0].appendChild(node);
</script>
Say you have a file available at:
/styles/screen.css
You can either append a query parameter with version information onto the URI, e.g.:
/styles/screen.css?v=1234
Or you can prepend version information, e.g.:
/v/1234/styles/screen.css
IMHO, the second method is better for CSS files, because they can refer to images using relative URLs which means that if you specify a background-image like so:
body {
background-image: url('images/happy.gif');
}
Its URL will effectively be:
/v/1234/styles/images/happy.gif
This means that if you update the version number used, the server will treat this as a new resource and not use a cached version. If you base your version number on the Subversion, CVS, etc. revision this means that changes to images referenced in CSS files will be noticed. That isn't guaranteed with the first scheme, i.e. the URL images/happy.gif relative to /styles/screen.css?v=1235 is /styles/images/happy.gif which doesn't contain any version information.
I have implemented a caching solution using this technique with Java servlets and simply handle requests to /v/* with a servlet that delegates to the underlying resource (i.e. /styles/screen.css). In development mode I set caching headers that tell the client to always check the freshness of the resource with the server (this typically results in a 304 if you delegate to Tomcat's DefaultServlet and the .css, .js, etc. file hasn't changed) while in deployment mode I set headers that say "cache forever".
You could simply add some random number with the CSS and JavaScript URL like
example.css?randomNo = Math.random()
Google Chrome has the Hard Reload as well as the Empty Cache and Hard Reload option. You can click and hold the reload button (in Inspect Mode) to select one.
I recently solved this using Python. Here is the code (it should be easy to adopt to other languages):
def import_tag(pattern, name, **kw):
if name[0] == "/":
name = name[1:]
# Additional HTML attributes
attrs = ' '.join(['%s="%s"' % item for item in kw.items()])
try:
# Get the files modification time
mtime = os.stat(os.path.join('/documentroot', name)).st_mtime
include = "%s?%d" % (name, mtime)
# This is the same as sprintf(pattern, attrs, include) in other
# languages
return pattern % (attrs, include)
except:
# In case of error return the include without the added query
# parameter.
return pattern % (attrs, name)
def script(name, **kw):
return import_tag('<script %s src="/%s"></script>', name, **kw)
def stylesheet(name, **kw):
return import_tag('<link rel="stylesheet" type="text/css" %s href="/%s">', name, **kw)
This code basically appends the files time-stamp as a query parameter to the URL. The call of the following function
script("/main.css")
will result in
<link rel="stylesheet" type="text/css" href="/main.css?1221842734">
The advantage of course is that you do never have to change your HTML content again, touching the CSS file will automatically trigger a cache invalidation. It works very well and the overhead is not noticeable.
You can force a "session-wide caching" if you add the session-id as a spurious parameter of the JavaScript/CSS file:
<link rel="stylesheet" src="myStyles.css?ABCDEF12345sessionID" />
<script language="javascript" src="myCode.js?ABCDEF12345sessionID"></script>
If you want a version-wide caching, you could add some code to print the file date or similar. If you're using Java you can use a custom-tag to generate the link in an elegant way.
<link rel="stylesheet" src="myStyles.css?20080922_1020" />
<script language="javascript" src="myCode.js?20080922_1120"></script>
For ASP.NET I propose the following solution with advanced options (debug/release mode, versions):
Include JavaScript or CSS files this way:
<script type="text/javascript" src="Scripts/exampleScript<%=Global.JsPostfix%>" />
<link rel="stylesheet" type="text/css" href="Css/exampleCss<%=Global.CssPostfix%>" />
Global.JsPostfix and Global.CssPostfix are calculated by the following way in Global.asax:
protected void Application_Start(object sender, EventArgs e)
{
...
string jsVersion = ConfigurationManager.AppSettings["JsVersion"];
bool updateEveryAppStart = Convert.ToBoolean(ConfigurationManager.AppSettings["UpdateJsEveryAppStart"]);
int buildNumber = System.Reflection.Assembly.GetExecutingAssembly().GetName().Version.Revision;
JsPostfix = "";
#if !DEBUG
JsPostfix += ".min";
#endif
JsPostfix += ".js?" + jsVersion + "_" + buildNumber;
if (updateEveryAppStart)
{
Random rand = new Random();
JsPosfix += "_" + rand.Next();
}
...
}
If you're using Git and PHP, you can reload the script from the cache each time there is a change in the Git repository, using the following code:
exec('git rev-parse --verify HEAD 2> /dev/null', $gitLog);
echo ' <script src="/path/to/script.js"?v='.$gitLog[0].'></script>'.PHP_EOL;
Simply add this code where you want to do a hard reload (force the browser to reload cached CSS and JavaScript files):
$(window).load(function() {
location.reload(true);
});
Do this inside the .load, so it does not refresh like a loop.
For development: use a browser setting: for example, Chrome network tab has a disable cache option.
For production: append a unique query parameter to the request (for example, q?Date.now()) with a server-side rendering framework or pure JavaScript code.
// Pure JavaScript unique query parameter generation
//
//=== myfile.js
function hello() { console.log('hello') };
//=== end of file
<script type="text/javascript">
document.write('<script type="text/javascript" src="myfile.js?q=' + Date.now() + '">
// document.write is considered bad practice!
// We can't use hello() yet
</script>')
<script type="text/javascript">
hello();
</script>
For developers with this problem while developing and testing:
Remove caching briefly.
"keep caching consistent with the file" .. it's way too much hassle ..
Generally speaking, I don't mind loading more - even loading again files which did not change - on most projects - is practically irrelevant. While developing an application - we are mostly loading from disk, on localhost:port - so this increase in network traffic issue is not a deal breaking issue.
Most small projects are just playing around - they never end-up in production. So for them you don't need anything more...
As such if you use Chrome DevTools, you can follow this disable-caching approach like in the image below:
And if you have Firefox caching issues:
Do this only in development. You also need a mechanism to force reload for production, since your users will use old cache invalidated modules if you update your application frequently and you don't provide a dedicated cache synchronisation mechanism like the ones described in the answers above.
Yes, this information is already in previous answers, but I still needed to do a Google search to find it.
It seems all answers here suggest some sort of versioning in the naming scheme, which has its downsides.
Browsers should be well aware of what to cache and what not to cache by reading the web server's response, in particular the HTTP headers - for how long is this resource valid? Was this resource updated since I last retrieved it? etc.
If things are configured 'correctly', just updating the files of your application should (at some point) refresh the browser's caches. You can for example configure your web server to tell the browser to never cache files (which is a bad idea).
A more in-depth explanation of how that works is in How Web Caches Work.
Just use server-side code to add the date of the file... that way it will be cached and only reloaded when the file changes.
In ASP.NET:
<link rel="stylesheet" href="~/css/custom.css?d=#(System.Text.RegularExpressions.Regex.Replace(File.GetLastWriteTime(Server.MapPath("~/css/custom.css")).ToString(),"[^0-9]", ""))" />
<script type="text/javascript" src="~/js/custom.js?d=#(System.Text.RegularExpressions.Regex.Replace(File.GetLastWriteTime(Server.MapPath("~/js/custom.js")).ToString(),"[^0-9]", ""))"></script>
This can be simplified to:
<script src="<%= Page.ResolveClientUrlUnique("~/js/custom.js") %>" type="text/javascript"></script>
By adding an extension method to your project to extend Page:
public static class Extension_Methods
{
public static string ResolveClientUrlUnique(this System.Web.UI.Page oPg, string sRelPath)
{
string sFilePath = oPg.Server.MapPath(sRelPath);
string sLastDate = System.IO.File.GetLastWriteTime(sFilePath).ToString();
string sDateHashed = System.Text.RegularExpressions.Regex.Replace(sLastDate, "[^0-9]", "");
return oPg.ResolveClientUrl(sRelPath) + "?d=" + sDateHashed;
}
}
You can use SRI to break the browser cache. You only have to update your index.html file with the new SRI hash every time. When the browser loads the HTML and finds out the SRI hash on the HTML page didn't match that of the cached version of the resource, it will reload your resource from your servers. It also comes with a good side effect of bypassing cross-origin read blocking.
<script src="https://jessietessie.github.io/google-translate-token-generator/google_translate_token_generator.js" integrity="sha384-muTMBCWlaLhgTXLmflAEQVaaGwxYe1DYIf2fGdRkaAQeb4Usma/kqRWFWErr2BSi" crossorigin="anonymous"></script>

AngularJS - SEO - S3 Static Pages

My application uses AngularJS for frontend and .NET for the backend.
In my application I have a list view. On clicking each list item, It will fetch a pre rendered HTML page from S3.
I am using angular state.
app.js
...
state('staticpage', {
url: "/staticpage",
templateUrl: function (){
return 'http://xxxxxxx.cloudfront.net/staticpage/staticpage1.html';
},
controller: 'StaticPageCtrl',
title: 'Static Page'
})
StaticPage1.html
<div>
Hello static world 1!
<div>
How do I do SEO here?
Do I really need to do HTML snapshot using PanthomJS or so.
Yes PhantomJS would do the trick or you can use prerender.io with that service you can just use their open source renderer and have your own server.
Another way is to use _escaped_fragment_ meta tag
I hope this helps, if you have any questions add comments and I will update my answer.
Do you know that google renders html pages and executes javascript code in the page and does not need any pre-rendering anymore?
https://webmasters.googleblog.com/2014/05/understanding-web-pages-better.html
And take a look at these :
http://searchengineland.com/tested-googlebot-crawls-javascript-heres-learned-220157
http://wijmo.com/blog/how-to-improve-seo-in-angularjs-applications/
My project front-end also has biult on top of Angular and I decieded to solve SEO issue like this:
I've created an endpiont for all search engines (SE) where all the requests go with _escaped_fragment_ parameter;
I parse a HTTP Request for _escaped_fragment_ GET parameter;
I make cURL request with parsed category and article parameters and get the article content;
Then I render a simpliest (and seo friendly) template for SE with the article content or throw a 404 Not Found Exception if article does not exists;
In total: I do not need to prerender some html pages or use prrender.io, have a nice user interface for my users and Search Engines index my pages very well.
P.S. Do not forget to generate sitemap.xml and include there all urls (with _escaped_fragment_) wich you want to be indexed.
P.P.S. Unfortunately my project's back-end has built on top of php and can not show you suitable example for you. But if you want more explanations do not hesitate to ask.
Firstly you can not assume anything.
Google does say that there bots can very well understand javascript application but that is not true for all scenarios.
Start from using crawl as google feature from the webmaster for your link and see if page is rendered properly. If yes, then you need not read further.
In case, you see just your skeleton HTML, this is because google bot assumes page load complete before it actually completes. To fix this you need an environment where you can recognize that a request is from a bot and you need to return it a prerendered page.
To create such environment, you need to make some changes in code.
Follow the instructions Setting up SEO with Angularjs and Phantomjs
or alternatively just write code in any server side language like PHP to generate prerendered HTML pages of your application.
(Phantomjs is not mandatory)
Create a redirect rule in your server config which detects the bot and redirects the bot to prerendered plain html files (Only thing you need to make sure is that the content of the page you return should match with the actual page content else bots might not consider the content authentic).
It is to be noted that you also need to consider how will you make entries to sitemap.xml dynamically when you have to add pages to your application in future.
In case you are not looking for such overhead and you are lacking time, you can surely follow a managed service like prerender.
Eventually bots will get matured and they would understand your application and you will say goodbye to your SEO proxy infrastructure. This is just for time being.
At this point in time, the question really becomes somewhat subjective, at least with Google -- it really depends on your specific site, like how quickly your pages render, how much content renders after the DOM loads, etc. Certainly (as #birju-shaw mentions) if Google can't read your page at all, you know you need to do something else.
Google has officially deprecated the _escaped_fragment_ approach as of October 14, 2015, but that doesn't mean you might not want to still pre-render.
YMMV on trusting Google (and other crawlers) for reasons stated here, so the only definitive way to find out which is best in your scenario would be to test it out. There could be other reasons you may want to pre-render, but since you mentioned SEO specifically, I'll leave it at that.
If you have a server-side templating system (php, python, etc.) you can implement a solution like prerender.io
If you only have AngularJS-only files hosted on a static server (e.g. amazon s3) => Have a look at the answer in the following post : AngularJS SEO for static webpages (S3 CDN)
yes you need to prerender the page for the bots, prrender.io
can be used and your page must have the
meta tag
<meta name="fragment" content="!">

javascript (DOJO) file caching - Client side

I want to implement caching of the javascript files (Dojo Tool kit) which are not going to change.. Currently my home page takes about 15-17 secs to load and upon refresh it takes 5-6 secs on load.. But is there a way to use the cached files again when we load it in a new browser session.. I do not want the browser to make request to the server on load of the application home page in a new browser session? Also is there a option to set the expiry to a certain number of days.. I tried with META tag and not helping much.. Either I'm doing something wrong or I'm not implementing it correctly..
I have implemented the dojo compression tool kit and see a slight improvement in the performance but not significant..
Usually your browser should do that already anyway. Please check if caching is really turned on and not only for session.
However, creating a custom dojo build with your app profile defining layers with dojo the build puts all your code together and bundles it with dojo.js (the files are still available independently). The result is just one http request for all of the code (larger file but just once). The gained speed due to reduced http requests is much more than a cache could ever provide.
For details refer to the tutorial: http://dojotoolkit.org/documentation/tutorials/1.8/build/
The caching is made by the browser, which behaviour is influenced by Cache-Control HTTP header (see http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html). Normally, the browser would ask, if newer version of resources is available, so you get 1 short request for each resource anyway.
From my experience, even with very aggresive caching, when the browser is instructed not to ask for the version of the resource on server for the given period of time, the checking in the browser cache for such an immense number of resources is a costly process.
The real solution are custom builds. You have written something about "dojo compression", I assume you're acquainted with the Dojo build profiles. It's quite nasty documented, but once you're successfull with it, you should have some big file(s) with layer(s), with the following format:
require({cache:{
"name/of/dojo/resource": function() { ... here comes the content of JS file ... }
, ...
}});
It's a multi-definition file that inlines all definitions of modules within single layer. So, loading such file will load many modules in single request. But you must load the layer.
In order to get layers to run I've had to add the extra require to each of my entry JS files (those that are referenced in the headers of HTML file):
require(["dojo/domReady!"], function(){
// load the layers
require(['dojo/dojo-qm',/*'qmrzsv/qm'*/], function(){
// here is your normal dojo code, all modules will be loaded from the memory cache
require(["dojo", "dojo/on", "dojo/dom-attr", "dojo/dom-class"...],
function(dojo,on,domAttr,domClass...){
....
})
})
})
It has significantly improved the performance. The bottleneck was loading a large amount of little javascript modules, not parsing them. Loading and parsing all modules at once is much cheaper that loading hundreds of them on demand.

Rails Asset Pipeline: Return digest version when non-digest requested

I am providing a snippet for a client to paste into their static html that refers to my application.js file.
As this sits on a page that I do not have control over, and I do not want to ask the client to update their snippet every time I push a release, I am wondering if there is a way to return my digest-application.js version when the normal one is requested to ensure that the browser is getting the the most recent version?
I can set a cache-busting timestamp on the script src, but not sure this is really reliable.
Any thoughts on the best way to handle this?
we are doing something similar for our "public" javascript, which is integrated into a 3rd party web-application.
the way we do this is by creating a symlink on our asset-server during the capistrano deployment that points to the non-digest name of the file. since they are just files on our webserver, the apache does the rest.
I think the must elegant way is doing some Controller to do some redirect 302 to you assets.
You can paste to your client a code link /public-assets/my_assets
In your route create the route :
match '/public-assets/:asset_name' => 'PublicAsset#index'
And create your controller PublicAssetController
class PublicAssetContoller < ApplicationController::Base
def index
redirect_to asset_path(params[:asset_name])
end
end

Categories

Resources