Wednesday, September 20, 2017

Shake 0.16 - revised rule definitions

Summary: I've just released shake v0.16. A lot has changed, but it's probably only visible if you have defined your own rules or oracles.

Shake-0.16 is now out, 8 months since the last release, and with a lot of improvements. For full details read the changelog, but in this post I'm going to go through a few of the things that might have the biggest impact on users.

Rule types redefined

Since the first version of Shake there has been a Rule key value type class defining all rule types - for instance the file rule type has key of filename and value of modification time. With version 0.16 the type class is gone, rules are harder to write, but offer higher performance and more customisation. For people using the builtin rule types, you'll see those advantages, and in the future see additional features that weren't previously possible. For people defining custom rule types, those will require rewriting - read the docs and if things get tough, ask on StackOverflow.

The one place many users might encounter the changes are that oracle rules now require a type instance defining between the key and value types. For example, if defining an oracle for the CompilerVersion given the CompilerName, you would have to add:

type instance RuleResult CompilerName = CompilerVersion

As a result of this type instance the previously problematic askOracle can now infer the result type, removing possible sources of error and simplifying callers.

The redefining of rule types represents most of the work in this release.

Add cmd_

The cmd_ function is not much code, but I suspect will turn out to be remarkably useful. The cmd function in Shake is variadic (can take multiple arguments) and polymorphic in the return type (you can run it in multiple monads with multiple results). However, because of the overloading, if you didn't use the result of cmd it couldn't be resolved, leading to ugly code such as () <- cmd args. With cmd_ the result is constrained to be m (), so cmd_ args can be used.

Rework Skip/Rebuild

Since the beginning Shake has tried to mirror the make command line flags. In terms of flags to selectively control rebuilding, make is based entirely on ordered comparison of timestamps, and flags such as --assume-new don't make a lot of sense for Shake. In this release Shake stops trying to pretend to be make, removing the old flags (that never worked properly) and adding --skip (don't build something even if it is otherwise required) and --build (build something regardless). Both these flags can take file patterns, e.g, --build=**/*.o to rebuild all object files. I don't think these flags are finished with, but it's certainly less of a mess than before.

Sunday, September 17, 2017

Existential Serialisation

Summary: Using static pointers you can perform binary serialisation of existentials.

Many moons ago I asked how to write a Binary instance for a type including an existential, such as:

data Foo = forall a . (Typeable a, Binary a) => Foo a

Here we have a constructor Foo which contains a value. We don't statically know the type of the contained value, but we do know it has the type classes Typeable (so we can at runtime switch on its type) and Binary (so we can serialise it). But how can we deserialise it? We can store the relevant TypeRep when serialising, but when deserialising there is no mechanism to map from TypeRep to a Binary instance.

In Shake, I needed to serialise existentials, as described in the S4.1 of the original paper. My solution was to build a global mapping table, storing pairs of TypeRep and Binary instances for the types I knew were relevant. This solution works, but cannot deserialise anything that has not already been added to the global table, which required certain functions to live in weird places to ensure that they were called before deserialisation. Effective, but ugly.

Recently Abhiroop Sarkar suggested using the relatively new static pointers extension. This extension lets you turn top-level bindings with no arguments into a StaticPtr which can then be serialised/deserialsed, even between different instances of a process. To take advantage of this feature, we can redefine Foo as:

data Foo = forall a . (StaticFoo a, Binary a) => Foo a

class StaticFoo a where
    staticFoo :: a -> StaticPtr (Get Foo)

The approach is to switch from serialising the TypeRep (from which we try to look up Get Foo), to serialising the Get Foo directly. We can write a Binary Foo instance by defining put:

put :: Foo -> Put
put (Foo x) = do
    put $ staticKey $ staticFoo x
    put x

Here we simply grab a StaticPtr (Get Foo) which can deserialise the object, then use staticKey to turn it into something that can be serialised itself. Next, we write out the payload. To reverse this process we define get:

get :: Get Foo
get = do
    ptr <- get
    case unsafePerformIO (unsafeLookupStaticPtr ptr) of
        Just value -> deRefStaticPtr value :: Get Foo
        Nothing -> error "Binary Foo: unknown static pointer"

We first get the staticKey, use unsafeLookupStaticPtr to turn it into a StaticPtr (Get Foo) followed by deRefStaticPtr to turn it into a Get Foo. The unsafe prefix on these functions is justified - bugs while developing this code resulted in segfaults.

The final piece of the puzzle is defining StaticFoo instances for the various types we might want to serialise. As an example for String:

instance StaticFoo String where
    staticFoo _ = static (Foo <$> (get :: Get String))

We perform the get, wrap a Foo around it, and then turn it into a StaticPtr. All other types follow the same pattern, replacing String with Int (for example). The expression passed to static must have no free variables, including type variables, so we cannot define an instance for a, or even an instance for [a] - it must be [Char] and [Int] separately.

A complete code sample and test case is available here.

This approach works, and importantly allows extra constraints on the existential. The two disadvantages are: 1) that static isn't very flexible or easy to abstract over, resulting in a lot of StaticFoo boilerplate; 2) the static pointer is not guaranteed to be valid if the program changes in any way.

Will Shake be moving over to this approach? No. The next version of Shake has undergone an extensive rewrite, and in the process, moved away from needing this feature. A problem I had for 8 years has been solved, just as I no longer need the solution!

Sunday, August 20, 2017

Ghcid and VS Code

Summary: There's now a Ghcid VS Code addin that gives you red squiggles.

I've been using Ghcid for about 3 years, and VS Code for about 6 months. Ghcid alone is able to display the errors and warnings, and update them whenever you save. In this post I'll show how VS Code users can also click on errors to jump to them, and how to get red squiggles in the text buffer for errors.

Clicking on errors

Using a recent VS Code, if you run ghcid from the terminal window, hold Ctrl and click the filename and it jumps to the right location in the erroneous file.

Red squiggles

Red squiggles are now possible using the haskell-ghcid addin. To get it working:

  • Run ghcid -o ghcid.txt which will produce a file ghcid.txt which updates every time ghcid updates. Running the underlying ghci with -ferror-spans will significantly improve the errors reported.
  • Open ghcid.txt in VS Code as the active editor. Run the VS Code command (Ctrl+Shift+P) named "Watch Ghcid output".

These steps cause the ghcid errors to appear in the VS Code Problems pane, and have red squiggles in the editor. Even though the errors are in the proper problems pane, I still prefer the output provided by the ghcid terminal, so still look at that.

The VS Code addin is not very well polished - but I'm using it on a daily basis.

Thursday, July 06, 2017

HaskellX Bytes talk next Tuesday

I'm talking about "Static Analysis in Haskell" at HaskellX Bytes next Tuesday (11th July), in London UK. Registration is free. The abstract is:

Haskell is a strongly typed programming language, which should be well suited to static analysis - specifically any insights about the program which don't require running the program. Alas, while type systems are becoming increasingly powerful, other forms of static analysis aren't advancing as fast. In this talk we'll give an overview of some of the forms of non-type-based static analysis that do exist, from the practical (GHC warnings, HLint, Weeder) to the research (LiquidHaskell, Catch).

I'm a big fan of static analysis, so this will be part summary, part sales pitch, and part call to arms. Followed by beer.

Clarification: I originally got the day wrong, and the url persists the original incorrect day. The talk is on Tuesday 11th July.

Tuesday, June 20, 2017

Announcing Weeder: dead export detection

Most projects accumulate code over time. To combat that, I've written Weeder which detects unused Haskell exports, allowing dead code to be removed (pulling up the weeds). When used in conjunction with GHC -fwarn-unused-binds -fwarn-unused-imports and HLint it will enable deleting unused definitions, imports and extensions.

Weeder piggy-backs off files generated by stack, so first obtain stack, then:

  • Install weeder by running stack install weeder --resolver=nightly.
  • Ensure your project has a stack.yaml file. If you don't normally build with stack then run stack init to generate one.
  • Run weeder . --build, which builds your project with stack and reports any weeds.

What does Weeder detect?

Weeder detects a bunch of weeds, including:

  • You export a function helper from module Foo.Bar, but nothing else in your package uses helper, and Foo.Bar is not an exposed-module. Therefore, the export of helper is a weed. Note that helper itself may or may not be a weed - once it is no longer exported -fwarn-unused-binds will tell you if it is entirely redundant.
  • Your package depends on another package but doesn't use anything from it - the dependency should usually be deleted. This functionality is quite like packunused, but implemented quite differently.
  • Your package has entries in the other-modules field that are either unused (and thus should be deleted), or are missing (and thus should be added). The stack tool warns about the latter already.
  • A source file is used between two different sections in a .cabal file - e.g. in both the library and the executable. Usually it's better to arrange for the executable to depend on the library, but sometimes that would unnecessarily pollute the interface. Useful to be aware of, and sometimes worth fixing, but not always.
  • A file has not been compiled despite being mentioned in the .cabal file. This situation can be because the file is unused, or the stack compilation was incomplete. I recommend compiling both benchmarks and tests to avoid this warning where possible - running weeder . --build will use a suitable command line.

Beware of conditional compilation (e.g. CPP and the Cabal flag mechanism), as these may mean that something is currently a weed, but in different configurations it is not.

I recommend fixing the warnings relating to other-modules and files not being compiled first, as these may cause other warnings to disappear.

Ignoring weeds

If you want your package to be detected as "weed free", but it has some weeds you know about but don't consider important, you can add a .weeder.yaml file adjacent to the stack.yaml with a list of exclusions. To generate an initial list of exclusions run weeder . --yaml > .weeder.yaml.

You may wish to generalise/simplify the .weeder.yaml by removing anything above or below the interesting part. As an example of the .weeder.yaml file from ghcid:

- message: Module reused between components
- message:
  - name: Weeds exported
  - identifier: withWaiterPoll

This configuration declares that I am not interested in the message about modules being reused between components (that's the way ghcid works, and I am aware of it). It also says that I am not concerned about withWaiterPoll being a weed - it's a simplified method of file change detection I use for debugging, so even though it's dead now, I sometimes do switch to it.

Running with Continuous Integration

Before running Weeder on your continuous integration (CI) server, you should first ensure there are no existing weeds. One way to achieve that is to ignore existing hints by running weeder . --yaml > .weeder.yaml and checking in the resulting .weeder.yaml.

On the CI you should then run weeder . (or weeder . --build to compile as well). To avoid the cost of compilation you may wish to fetch the latest Weeder binary release. For certain CI environments there are helper scripts to do that.

Travis: Execute the following command:

curl -sL | sh -s .

The arguments after -s are passed to weeder, so modify the final . if you want other arguments.

Appveyor: Add the following statement to .appveyor.yml:

- ps: Invoke-Command ([Scriptblock]::Create((Invoke-WebRequest '').Content)) -ArgumentList @('.')

The arguments inside @() are passed to weeder, so add new arguments surrounded by ', space separated - e.g. @('.' '--build').

What about Cabal users?

Weeder requires the textual .hi file for each source file in the project. Stack generates that already, so it was easy to integrate in to. There's no reason that information couldn't be extracted by either passing flags to Cabal, or converting the .hi files afterwards. I welcome patches to do that integration.

Sunday, June 11, 2017

Haskell Website Working Group - Update

Summary: We have agreed a set of principles for the website and are collecting information.

I'm writing this partly in my capacity as the chair of the Haskell Website Working Group, and partly as an individual (so blame me rather than the committee). It's fair to say that the original goal of the committee was to make sure everyone agrees on the download page. Discussions amongst the committee lead to a shared goal that the download page itself should clearly direct users along a precise path, without requiring beginners to make decisions requiring judgement. That probably means that download page should only describe one installation path, pushing alternatives onto a separate page.

To decide what should go on the download page, our first step was to evaluate what was currently available, and what experience a beginner might have. We've started that on this page. As an example, it says how to install Haskell, how to open ghci, how to install a tool etc - all using the different options.

When I actually tried installing and using the various options listed on the current download page, they all had confusing steps, unintuitive behaviour and problems that I had to overcome. As an example, Stack on Windows recommends using the 32bit version, while noting that only the 64bit version works. At the same time, Core Platform starts by telling me to edit a global Cabal config file.

I invite everyone to help contribute to that page, via pull requests. At the same time, it would be great if the issues raised could be ironed out, leading to a smooth beginner experience (I'm talking to maintainers in person and raising GitHub tickets). Once the information is collected, and ideally the tools have improved, it will be time to make a decision between the options. When the decision comes, it will be technically motivated, and hopefully unambiguous.

Sunday, May 21, 2017

Proving fib equivalence

Summary: Using Idris I proved the exponential and linear time fib functions are equivalent.

The Haskell wiki proclaims that Implementing the Fibonacci sequence is considered the "Hello, world!" of Haskell programming. It also provides multiple versions of fib - an exponential version and a linear version. We can translate these into Idris fairly easily:

fibExp : Nat -> Nat
fibExp Z = 0
fibExp (S Z) = 1
fibExp (S (S n)) = fibExp (S n) + fibExp n

fibLin' : Nat -> Nat -> Nat -> Nat
fibLin' Z a b = b
fibLin' (S n) a b = fibLin' n (a + b) a

fibLin : Nat -> Nat
fibLin n = fibLin' n 1 0

We've made the intermediate go function in fibLin top-level, named it fibLib' and untupled the accumulator - but it's still fundamentally the same. Now we've got the power of Idris, it would be nice to prove that fibExp and fibLin are equivalent. To do that, it's first useful to think about why fibLib' works at all. In essence we're decrementing the first argument, and making the next two arguments be fib of increasingly large values. If we think more carefully we can come up with the invariant:

fibLinInvariant : (d : Nat) -> (u : Nat) ->
    fibLin' d (fibExp (1+u)) (fibExp u) = fibExp (d+u)

Namely that at all recursive calls we must have the fib of two successive values (u and u+1), and after d additional loops we end up with fib (d+u). Actually proving this invariant isn't too hard:

fibLinInvariant Z u = Refl
fibLinInvariant (S d) u =
    rewrite fibLinInvariant d (S u) in
    rewrite plusSuccRightSucc d u in

Idris simplifies the base case away for us. For the recursive case we apply the induction hypothesis to leave us with:

fibExp (plus d (S u)) = fibExp (S (plus d u))

Ignoring the fibExp we just need to prove d+(u+1) = (d+u)+1. That statement is obviously true because + is associative, but in our particular case, we use plusSuccRightSucc which is the associative lemma where +1 is the special S constructor. After that, everything has been proven.

Armed with the fundamental correctness invariant for fibLib, we can prove the complete equivalence.

fibEq : (n : Nat) -> fibLin n = fibExp n
fibEq n =
    rewrite fibLinInvariant n 0 in
    rewrite plusZeroRightNeutral n in

We appeal to the invariant linking fibLin' and finExp, do some minor arithmetic manipulation (x+0 = x), and are done. Note that depending on exactly how you define the fibLinInvariant you require different arithmetic lemmas, e.g. if you write d+u or u+d. Idris is equipped with everything required.

I was rather impressed that proving fib equivalence wasn't all that complicated, and really just requires figuring out what fundamental property makes fibLin actually work. In many ways the proof makes things clearer.