Git post-receive Hooks, Pt. 2

So, last time I covered a post-receive hook for WordPress deployment. This time we tackle one for a Jeyll-based site.

A Jekyll post-receive Hook

Here’s what we’re working towards:

#!/bin/sh

REPONAME=.git
GIT_REPO=$HOME/$REPONAME
TMP_GIT_CLONE=$HOME/.tmp/$REPONAME
PUBLIC=/home/public

git clone $GIT_REPO $TMP_GIT_CLONE

jekyll --no-auto $TMP_GIT_CLONE $PUBLIC

rm -Rf $HOME/.tmp/$REPONAME/.git/objects
rm -Rf $HOME/.tmp/$REPONAME/.git
rm -Rf $HOME/.tmp/$REPONAME

find $PUBLIC/ -iname '*.html' -exec sh -c 'gzip < {} > {}.gz' \;
find $PUBLIC/ -iname '*.css' -exec sh -c 'gzip < {} > {}.gz' \;
find $PUBLIC/ -iname '*.js' -exec sh -c 'gzip < {} > {}.gz' \;

exit

Slightly more exciting than the previous one. Insofar as a post-receive hook can be considered exciting. Here’s the English translation of the process:

Set up some variables to make things marginally clearer (in retrospect, these are not particularly useful, so perhaps just skip this step – truly, naming things is the hardest part of programming);
Clone the remote repo into a temporary folder in $HOME, so that we can conveniently have our wicked way with it;
Run Jekyll and output the compiled site to the public folder;
Get rid of the evidence of us having our wicked way (by which I mean delete the temporary folder);
Since, as discussed, NFSN doesn’t do on-the-fly compression, create compressed versions of all the usefully-compressible files.

I was initially a little unsure about how best to handle versioning: should the whole directory be a repo, or just the _site folder? How was deployment to work? Obviously, the ideal was to have the whole thing be a repo, so that version control was applied to the source files and not just the generated site. (In fact, in the end, I actually have the _site folder added to .gitignore.) But I wasn’t sure how to just push one subfolder, and I didn’t want to resort to FTP for deployment. The solution ended up being obvious: push the source files and run Jekyll on the server.

I won’t go through the whole thing line by line, as for the most part it’s pretty self-explanatory. But I will talk a bit more about the compression step, because it caused me some headaches.

Compressing at Compile-Time

So with Jekyll I’m just using HTML, CSS and JavaScript files. Obviously there’s no need to compress images further, as they’re already compressed and you’ll just waste visitors’ time. The three steps I use to do this in the post-receive hook are as follows:

find $PUBLIC/ -iname '*.html' -exec sh -c 'gzip < {} > {}.gz' \;
find $PUBLIC/ -iname '*.css' -exec sh -c 'gzip < {} > {}.gz' \;
find $PUBLIC/ -iname '*.js' -exec sh -c 'gzip < {} > {}.gz' \;

Before I go any further, let me issue this disclaimer: there is probably a much better way of doing this than I know. But as crummy and ugly a hack of a method as this is, I can at least testify that it does work.

The basic process is that we find any of the appropriate file types and create a Gzipped copy of them. I initially tried this with the marginally more elegant:

find $PUBLIC/ -iname '*.html' -exec gzip {} \;

(Taking just the HTML files as an example.)

This works, in a sense. It correctly Gzips all the files we want zipped. But there’s a problem: it overwrites the unzipped version of the files. This is an issue for two reasons.

Firstly, some older browsers might not support compressed files. Honestly, these days, this is basically irrelevant. If this was the only problem I straight-up wouldn’t care.

The second reason is, unfortunately, fatal. For security reasons, I use the following .htaccess setting:

Options All -Indexes

That prevents people from viewing bare directories. Definitely a setting I want enabled. Apparently, though, Apache is not okay with swapping out index.html for index.html.gz – it treats directories with the latter as not having index files at all. That’s something of a big deal, as it stops anyone viewing anything on the site.

Now it is possible to get Gzip to create a duplicate rather than overwriting. Apparently this involves STDOUT or something. Suffice to say that, while I did manage to get this working when targetting specific files, I could not get it to work on whatever find was spitting out. From what I can gather – which is not much – the issue stems from the fact that using -exec gzip runs the command in the same shell, or something. You need to actually run it in a nested shell: hence using -exec sh and then calling in Gzip in the second shell. Thank you stackoverflow.

I won’t pretend to really understand how this is working. I also don’t understand why -c is passed to sh and not to gzip. I don’t know what < {} > does. In fact, my understanding of how that command works tells me that this shouldn’t work. It does work. Clearly I don’t understand.

Fun with .htaccess

Whatever, the point of the story is this: regardless of my lack of comprehension, this does generate compressed duplicates of all the necessary files. Now all we need to do is serve them instead of the uncompressed versions, using a bit of .htaccess-fu. Which, ironically, I do understand. How many people can honestly say they understand Apache configuration better than Bash? Corollary, how many people are that bad at Bash scripting?

<IfModule mod_rewrite.c>
    RewriteEngine on
    RewriteCond %{HTTP:accept-encoding} gzip
    RewriteCond %{REQUEST_FILENAME} !^.+\.gz$
    RewriteCond %{REQUEST_FILENAME}.gz -f
    RewriteRule ^(.+) $1.gz [L]
</IfModule>

All this does is check if mod_rewrite is available (it is on NFSN), and if it is, maybe do some rewriting of the request. Basically, if the request header includes accept-encoding gzip (i.e. – the browser making the request supports Gzipped files), then the server checks to see if a Gzipped version of the requested file exists.

Say I request /css/styles.css and my browser can handle Gzip, Apache will check to see if /css/styles.css.gz exists. If it doesn’t, it just serves the original file as requested. Thanks to our earlier post-receive hook, though, the compressed version should also be available. In that case, Apache will serve up that version rather than the original, and the visitor gets all the benefits in speed that this brings.

This does mean that you have to pay storage costs for the compressed duplicates of the files on NFSN. So cunning. Either you pay the extra bandwidth cost of serving uncompressed files, or you pay for the storage. Still, the latter is more predictable, and amounts to – seriously – a couple of dollars per year. Totally worth it.