Building a Blog with Mirage OS

So why a blog? Well it's almost a cliché at this point that the first post in a tech blog is about the writer making their own blogging engine, but I was looking for something simple and web based to play with Mirage. It seems like a good first step with network and disk interaction and some dynamic processing. I'm certainly not the first to do it and somerandomidiot even beat me to a yak shaving blog title about it.

A few special requirements make it a little more complicated than a static website. I wanted to use this novelty domain I've had for years as a tag-indexer, so you should be able to see this post at ocaml.is-awesome.net. On top of that I wanted a bit of logic so I could set a future date on my posts and have them appear as appropriate. I'm also doing a little dynamic content rendering.

Building blocks

As I mentioned in my previous post, Mirage comes with a simple webserver. It uses a co-operative, user-space threading library called Lwt, and there are some compiler extensions that make working with that easier. You write your code parameterised over the devices you need and wire them in as part of the config. This is the signature of my main module:

1
module Main (S : Cohttp_lwt.Server) (C : CONSOLE) (BLOG : KV_RO) (CONTENT : KV_RO)

I've parameterised my code over the web server, a console (for logging output) and two key-value stores (filesystem representations of my blog content and static resources). From there I set up all my request handling and pass it to the HTTP server to start the main event loop.

A couple of parsing functions and a big pattern match does my routing:

1
2
3
4
5
6
7
8
9
match (subdomain, parsePath (Uri.path uri)) with
| _, [`String "img"; `String filename]       -> File ("/img/" ^ filename)
| _, [`String "css"; `String filename]       -> File ("/css/" ^ filename)
| _, [`String "js"; `String filename]        -> File ("/js/"  ^ filename)
| _, [`Int year]                             -> BlogListing (get_by_year blog year)
| _, [`Int year; `Int month]                 -> BlogListing (get_by_month blog year month)
| ...
| Some tag, []                               -> BlogListing (get_by_tag blog tag)
| ...

Most of what I've built is to this level of "the simplest that works" but in several places the OCaml language makes it seem like a fairly clean, sometime elegant, solution.

Storage and Async I/O

Mirage can use Xen block devices (FAT32 is supported) but I've gone with the simpler solution of crunch. The static content for the site is compressed and compiled into an OCaml binary object that will exist in memory. I access it through the standard IO interfaces and from the callers point of view there's no difference between it and a harddrive. The Mirage project makes use of the powerful OCaml module system to provide this unified interface, my code is written against the signature of a KV_RO (key-value read-only) store and at configure/compile time the implementation is provided. If I later want to swap out for a real disk it's only a small configuration change.

I've used the ocaml-mustache templating library to provide basic server-side rendering. The blog content and metadata is stored in json files and rendered into the templates at request time. I want to show you a bit of the the template handling code because it demonstrates a few features. I'll admit this can look a little alien at first but it's actually not too bad. These are two functions in my renderer module that are part of the process of getting getting a mustache template from disk.

1
2
3
4
5
6
let get_template (fs : FS.t) (filename : string) : Mustache.t Lwt.t =
  match checkCache filename with
  | Some template -> return template
  | None -> FS.read_file fs filename
            >|= Mustache.of_string
            >|= updateCache filename

First up, the signature: get_template : (fs : FS.t) -> (filename : string) -> Mustache.t Lwt.t It takes a filesystem handle and a filename and returns a mustache template (Mustache.t) wrapped in an Lwt thread (Lwt.t, similar to Task<T> in C#).

Looking at the implementation, it checks an in-memory cache in case we've already generated the template, and wraps it up with return (think Task.FromResult) if it exists. Otherwise we're reading from the filesystem which is an async operation. The map operator a >|= f (which reads like the Haskell a >>= return . f) sets up an async callback so when the IO is over the template is generated from the file contents, the cache is updated and (because the update function returns the template argument) returns the template. The plumbing for the async might not be the clearest but you can see a nicer way of handling it in our next function, wrap_template:

1
2
3
4
5
let wrap_template (fs : FS.t) (filename : string) : Mustache.t Lwt.t =
  lwt head = get_template fs "header.html"
  and body = get_template fs filename
  and foot = get_template fs "footer.html"
  in return (Mustache.concat [head; body; foot])

Because wrap_template calls get_template it also has to handle the async nature of the underlying IO. This is where the compiler extension I mentioned earlier comes in. The lwt keyword that looks so similar to the standard let bindings is actually a compiler extension that awaits those three get_template calls, and does the bind and callback setup for us. This gives us the readability of async/await in OCaml where we want it and alternatively the monadic style where that's clearer.

Web stuff

I'm using the Pure library (and examples) to provide the starting point for the styling, and using the codehilite markdown plugin to generate pygments highlighted code blocks. I've pulled in a mustache templating library to save reinventing that wheel and have the comments sections hosted by disqus.

There's nothing really exciting in there but I should point out I've still got a lot to do. I'm not setting any Cache-Control headers, the MIME-types are all wrong, there's no RSS feed (does anyone still use them?) and I should add some Open Graph tags. I've also got giving this a good accessibility review on my list (I usually go with NVDA for this sort of testing). These are part of the long tail of little tasks you get when you roll your own blog.

Development Experience

Testing was great with Mirage and this is where the OCaml module system really shines. Because the entry point is parameterised over everything it needs there are no hard dependencies on implementations. Depending on the configure and compile flags those requirements could be satisfied by a local webserver on a standard socket and stdout, the Mirage TCP/IP stack and a tun/tap interface, or the full Mirage stack on top of a Xen network driver compiled into a VM. Being able to build and run locally with a simple userspace webserver knowing I could expect the same interactions with my unikernel VM was excellent.

There are a few things that were tough as I was learning. Mirage is still not stable and a couple of times I was running into issues with the toolchain, mostly with the CLOCK interface. Fortunately they were mostly raised and resolved on the mailing list before I had a chance to worry about them. There still is the outstanding issue with that device that the fields on the date object match the time.h tm struct, that is the month is 0-indexed and the year is "years since 1900". That's probably an interface I'd break now before it gets too relied upon.

Note: At time of publishing I'm still having issues with CLOCK on AWS

I also struggled a bit with OUnit, the OCaml unit testing framework. The module system makes it very easy to mock dependencies but without some way of doing a "Show" type-class I couldn't work out how to get useful information from my failing tests in a re-usable way. There's probably something I'm just missing on that one but I couldn't find anything online.

Finally the "ocamlbuild" tool is very powerful and really useful when Mirage generates your makefiles for you but trying to work it or the "oasis" tool out to use standalone was tough and there doesn't seem to be much documentation out there.

Again these are all mostly community maintenance issues. OCaml hasn't been widely used and the people that do use it at this stage are all fairly experienced. I expect these sort of things will improve with time as at least the Mirage team are doing an excellent job of engaging new users.

Mirage Community

I also wanted to point out that the Mirage OS community is really active. I've been following the mailing list for over six months now and it's a very friendly and helpful space. The excellent resource Real World OCaml was released last year and is available freely online.

On top of that there are some great blogs around about how to get started with Mirage OS, check out the blogroll on the main Mirage site for more info.

Conclusion

Writing this blog engine has been a lot of fun. It's still a long way from where I'd like it to be and I'm sure there's plenty of improvements to what already there but it works and I'm quite happy with the code. Doing this amount of OCaml has certainly helped my F# skills and given me a better appreciation on the tradeoffs they've made there. Whether I stick with a Mirage unikernel long-term for my blog is probably still up in the air but for now it works and it'll be fun to improve.

At some stage I'd like to put my blog source up but as I'm still learning OCaml I want to tidy it up a bit first. It turned out at just over 400 lines of OCaml including configuration (and overzealous use of module definitions) for everything between the webserver and the content. Total size of the VM (including my monkey picture) is under 6 MB.


comments powered by Disqus