Thursday, May 15, 2014

Shake as a dependency library

Summary: You can use Shake as a library to implement other build tools.

The Shake build tool is often used to define a specific build system, as an alternative to Make. But Shake is really a library, and can be used to implement other build tools. In this post I'm going to show a rough implementation of the Sake build tool using Shake.

What is Sake?

Extracted from the Sake documentation:

Sake is a way to easily design, share, build, and visualize workflows with intricate interdependencies. Sake is a simple and self-documenting build system, targeted at scientists, data analysts and business teams.

The Sake build rules are defined in YAML, and a simple example is:

create the input:
    help: create the input file
    formula: echo test > input.txt
    output:
        - input.txt
convert to uppercase:
    help: change the input file to uppercase
    dependencies:
        - input.txt
    formula: cat input.txt | tr '[a-z]' '[A-Z]' > output.txt
    output:
        - output.txt

Sake build rules are simple, contain lots of help text, and are quite explicit. I can see why some users would prefer it to Shake or Make (especially as the Sake tool also produces nice visualisations and help information).

Sake on top of Shake

This section contains an implementation of Sake that can execute the file above, along with tests from the Sake repo. I'm going to intersperse the implementation along with some notes. First we give language extensions and imports:

{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative
import Control.Exception
import Development.Shake
import Data.Yaml
import qualified Data.HashMap.Strict as Map
import qualified Data.Vector as Vector
import qualified Data.Text as Text

The interesting imports are Shake (the build system) and Yaml (the parser for YAML files). Our main function loads the Sake YAML file, then defers to Shake:

main = do
    build <- either throw id <$> decodeFileEither "Sakefile.yaml"
    shakeArgs shakeOptions $ elaborate build

We are using shakeArgs to get Shake to provide command line handling for our tool. The interesting part is elaborate, which translates the Sake rules into Shake rules. We define elaborate as:

elaborate (Object x) | Map.member "formula" x = do
    let formula = fromString $ x Map.! "formula"
    let dependencies = map fromString . fromArray <$> Map.lookup "dependencies" x
    let output = map fromString . fromArray <$> Map.lookup "output" x
    let act = do
            maybe alwaysRerun need dependencies
            command_ [] "sh" ["-c",formula]
    case output of
        Nothing -> action act
        Just output -> do want output; output *>> \_ -> act
elaborate (Object x) = mapM_ elaborate $ Map.elems x
elaborate _ = return ()

The first case is the interesting one. We look for formula fields which indicate build rules. We extract out the fields formula, dependencies and output. We then define act which is the action Shake will run:

maybe alwaysRerun need dependencies
command_ [] "sh" ["-c",formula]

If there were no dependencies, we always rerun the rule, otherwise we require the dependencies using need. Next we run the formula command using sh. Then we define the rules:

case output of
    Nothing -> action act
    Just output -> do want output; output *>> \_ -> act

If a Sake rule has no output field, then it is always run, which Shake specifies with action. Otherwise we want the output (since all Sake outputs are always built) and define a rule producing multiple outputs (the *>> function) which runs act. Finally, we have a few helpers to extract the fields from the YAML:

fromString (String x) = Text.unpack x
fromArray (Array x) = Vector.toList x
fromArray Null = []

Note that the full Sake implementation contains additional features and error checking. However, I think it is quite nice that a reimplementation of the basics can be done in only 16 lines of Haskell. The reimplementation also supports several features that the original Sake does not, including profiling, progress reporting and staunch mode.

Conclusions

Shake is capable of implementing other build tools, and can be used as a build system in its own right, or a library supplying dependency tracking. I believe there is plenty scope for higher-level build specifications (Cabal is one example), and hope that these tools can delegate their dependency logic to Shake.

5 comments:

Anonymous said...

wow. very nice stuff. glad to see you continuing to advance the good parts of haskell.

Boris Lykah said...

Do you plan on promoting Shake? It is hard to persuade people to consider it when there is even no page on Wikipedia and it is not clear how many projects use it.

Neil Mitchell said...

Boris: I do want to promote it. What are the notability requirements to giving it a Wikipedia page? I'm very happy to help with the content, especially if someone who knows more about Wikipedia rules can help guide me. I know many companies using Shake, but only a few have openly declared so - I'll try and persuade some more.

Boris Lykah said...

I think it should have references to several articles and pages preferably written by someone else (no original research). A while ago I added a link to Shake Github on a page "List of build automation software" but it was removed with commentary that the link was unhelpful.

Neil Mitchell said...

Boris: As the author of most material on Shake, it sounds like it would be necessary for someone else to write the page? I'd be more than happy to supply any material, answer any questions, write blog posts covering anything that was ambiguous, proof read etc. There is plenty of content floating around, such as the ICFP paper (http://community.haskell.org/~ndm/downloads/paper-shake_before_building-10_sep_2012.pdf) and user manual (https://github.com/ndmitchell/shake/blob/master/docs/Manual.md#readme).

Certainly there are build systems on Wikipedia that look a lot less popular than Shake (e.g. Maak) or where very little is known about them (e.g. MPW Make). I notice currently Shake is on the list of build automation software, but with no page behind it.