This lab notebook is built on top of Github, using the Jekyll static website build system created by Tom Preston-Warner and others from Github and the open source community. My requirements for the lab notebook are:
Previous attempts at these requirements always missed something. Instiki satisfied most of these for several years, but the difficulty of finding an offline + hosted solution that didn’t involve writing code and manually managing a hosted website account with all of Instiki’s dependencies and a database engine caused it to never happen. Wikispaces was a good online solution, with decent math, but no offline solution and no ability to write plugins to do the bibliographic bits easily.
Carl Boettiger’s lab notebook intrigued me, and was built on Github Pages, so the hosting was included in my Github account already, and I could do “interesting” things with its underlying system. It’s been a journey, but I think I have a version figured out which is different from CB’s, but workable given my needs. Welcome to my lab notebook. Here’s how it works, if you want to do your own.
Github Pages allows one to host straight HTML or Github-flavored Markdown, or Jekyll-based Markdown, which is then turned into HTML. For very good reasons, Github Pages disallows third-party Jekyll plugins, because that would be a massive trojan horse and performance disaster. Bad juju.
So the solution is that I have two Github repositories for the lab notebook. One is the “source code” for the lab notebook, where I write posts and test formatting, and do everything “offline” – I compile it using a Rakefile and it runs on localhost. When I’m happy, I commit the changes to git, and run a script which pushes everything to the live site. More on this below under “Deployment”.
Everything here is ultimately a “post,” meaning that apart from site structure itself (which are often HTML pages with Jekyll/Liquid) tags, Javascript, and logic, everything is a Markdown file located in the "_posts" directory of the lnraw
source repository.
At a conceptual level, I distinguish between “essays” and “notes,” in the following way. A note is like a page in your lab notebook. It records something useful (e.g., the setup of an experiment, ideas about fixing a problem, some results) but a note is not expected to stand alone without other context. The context of a note is provided by the project
it lives in, the model
or simulation code it may relate to, perhaps an experiment
within which the note is a step or result. Each of these bits of metadata are recorded as a category
, which Jekyll provides to break websites into logical units. A note can (and almost always does) belong to more than one category. For example, a recent note on planning an experiment for ctmixtures
had the following header:
---
layout: post
title: Experiment Planning for CT Mixture Model
tags: [cultural transmission, coarse graining, simulation, dissertation, experiments, experiment-ctmixture]
categories:
- project:coarse grained model project
- model:ctmixtures
- experiment:experiment-ctmixtures
---
You’ll see at the bottom of the YAML header, that this note belongs to project:coarse grained model
, it relates to the model model:ctmixtures
, and the ongoing experiment experiment:experiment-ctmixtures
. These categories are used by the website logic itself to build lists for project pages, experiments, and to list notes related to a given body of code.
Essays, on the other hand, may be related to a specific project but are expected to be comprehensible in a standalone manner. For example, a recent essay on “continuous integration” and Travis CI had this metadata:
---
layout: essay
title: Continuous Integration and Testing for Simulation Codes
tags: [experiments, simulation, open science, reproducible science]
categories:
- essays
- simulation
- reproducible science
---
This essay isn’t related to any specific project or experiment. The category “essays” will put it in the list of essays and general writings on the site. Also note that I give the layout
as essay
here, which has minor UI tweaks compared to the lab note/post format. It puts a link to the essay list in the upper toolbar, at the moment, which lab notes don’t have because they are assumed to be cross-linked in all sorts of ways, with no obvious “back” button.
Each posting also has a series of tags, which are wholly optional. These are often related to topics, and represent another way to see content, via the tag cloud (Notes by Tag).
Finally, the “byexperiment” page (and eventually, other lists) are all built from custom Liquid filter tags which can pull out and make nice lists of posts from experiments, models, or projects. The source (and executable) for this Liquid filter is located in the _plugins
directory. At the moment, the “bymodel” page is not fully automatic, because I put the Github repository links on each model.
I also have a few small shell scripts which make it easy to start a new note or essay. newlabnote.sh
copies a template Markdown file, with the proper YAML header, to a date-stamped file with the current date, in the _posts
directory of the lab notebook’s source directory. newblogpost.sh
does the same thing with an essay template. For example, starting a new note about the seriationct
project, I type:
mark:_posts/ (master*) $ newlabnote.sh seriationct-requirements [13:15:00] Creating lab notebook post for: 2014-06-17-seriationct-requirements.md
This helps me get the categories, tags, and layout right since they’re important to how the site is organized.
When I’m ready to push a note or some edits to the live website, the Rakefile
in the source tree takes the following steps:
That’s it. Takes about 60 seconds to push everything on a reasonably slow connection, and when I’m done both the source and the production site are version controlled. I can go back to work locally at this point.
The site uses Jekyll at whatever its current release is. I only modify Jekyll by upgrading it, or adding plugins. When you upgrade, pay attention to updates for Jekyll Scholar dependencies, and in particular the csl-styles
package, which hasn’t necessarily come along for the ride in the past and can cause errors. It updates just fine via the normal Ruby gem
mechanism, but you may have to do it manually.
I also use Bootstrap as the CSS and layout engine for the site. Bootstrap is superb and allows a flexible gridding system for easy layout, has terrific pre-defined styles for buttons, menus, and a ton of small icons for common UI design tasks. If you’re not using Bootstrap, check it out. It’s one of Twitter’s great gifts to the software community, along with Storm.
In addition, I use several plug-ins to Jekyll:
I have also replaced the default Markdown parser (and its supported alternatives) with Pandoc. Pandoc is a generic document conversion utility, written in Haskell, and I have a plug-in, borrowed originally from Carl Boettiger but modified slightly, to generate appropriate links for my website. It does a superb job of translating Markdown with bibliographic citations to BibTeX files to HTML5 with a full bibliography in Chicago author-date format.
In addition, I have templates for writing RMarkdown, with embedded R language code, and using Knitr to produce either plain Markdown with source code annotations, or full LaTeX files. This allows reproducible research results on the lab notebook and an easy transition to journal articles. With bibliographies, source code, results, tables, and figures. Nice.
The only thing Pandoc isn’t good at is handling big bibliography files, since it does a file scan per post. This is a known issue, and a 1000+ citation file basically bogs down the site build to unusability. 100+ is fine, with a small but noticeable build time. Sometime in the next year, I’ll write a build script adjunct that pulls out citations, builds a per-post subset BibTeX file, and feeds it to Jekyll, but I need to know more about Jekyll’s core before I do that.
All of this is open source, and the source to my lab notebook is available on Github. Grab it, wipe out my posts, install Jekyll, Pandoc, and whatever Ruby dependencies it all needs, and go to town. The layouts need some editing, and the config.yml file, and some other bits, but you should have this up and running within an hour or two. Email me if you need to, and I’ll provide guidance.