GPG signing for github & mac

I just went through a few steps to get gpg signing to work on my mac and show up on github. I wanted to quickly document the process since the instructions are a little bit scattered. All of it basically came from a few clicks on the github help pages, though, so thanks for all the info, github.

Step 1: Download and install GPG Suite
https://gpgtools.org/

Step 2: Create a gpg key via GUI, but not really?
After I followed the GPG Suite wizard to create a gpg key, it for some reason did not show up in the GPG Keychain GUI. Maybe I accidentally clicked “cancel”. I don’t know. I ended up ignoring this and moving on to the next step.

Step 3: Create a gpg key via command line.
https://help.github.com/articles/generating-a-new-gpg-key/

Step 4: Add that gpg key to GPG Keychain
While following instructions on Step 3, at one point you copy the public key to your clipboard. At this point, GPG Keychain notices that a gpg key is in your clipboard and asks if you want to import it. I said yes.

Step 5: Associate this GPG Key with your account on github
https://help.github.com/articles/adding-a-new-gpg-key-to-your-github-account/

Step 6: Tell git to always sign my commits

git config --global commit.gpgsign true

Step 7: Make a commit as usual
This magically pops up a window asking for my passphrase.
I told the keychain to remember my passphrase after entering it.
On subsequent commits, it doesn’t ask me anymore.

The end. After these steps, my commits started being automagically signed. On github the commits show up as “Verified”. I assume that it might be important to have parity between your committer identity and the name/email you put on your commits. Since mine match, I just didn’t have to worry about it.

Posted in Uncategorized | 2 Comments

Stackage LTS and GHC 8.0

The release of GHC 8.0.1 has recently been announced. Hooray! People are already asking about when LTS Haskell will include the new GHC. While I’m also excited for this to happen as soon as possible, it’s worth taking a look at what happened last time in order to set realistic expectations.

Here’s the rough timeline as observed last year:

It took about 4.5 months from the release of GHC 7.10 until its inclusion in Stackage LTS.

I’d like to see this time span shortened, but I don’t expect it to be much shorter this time around. Here’s an optimistic estimate of how I expect it to go down. This is just my personal estimate; specific timeline decisions are still being discussed amongst the Stackage curators.

* May 21: GHC 8.0.1 release announced
* June: Stackage LTS 6.0 released using GHC 7.10.3 (again)
* June: Stackage nightly switched to GHC 8.0.1
* Sept: Stackage LTS 7.0 released with GHC 8.0.1

There are a few reasons we might delay the release of LTS 7.0 with GHC 8.0.

First and foremost is the obvious: the whole package ecosystem needs to be ready. I am optimistic that this can be accomplished by September. Perhaps sooner. I expect some nightly snapshots to be available in the next few weeks that will be quite useable, if not the full Stackage experience.

Another reason to delay an LTS is because each LTS lasts for a minimum of 3 months before we start the next one (except in the case of LTS 4 which we cut short due to issues with aeson-0.10). LTS 5 has been around for about 4 months, and we’re itching to make a new LTS with aeson-0.11. So we intend to release LTS 6 soon, and then it’ll be at least 3 months until LTS 7.

One more reason to delay an LTS is because each LTS is also pegged to a particular compiler version. If GHC 8.0.2 is close to being released, we’ll probably want to delay the next LTS until it is released, like we did last year with GHC 7.10.2.

I hope this sets some clear expectations and explains some of the reasons why LTS Haskell is probably going to take a couple months to adopt GHC 8.0. LTS Haskell isn’t on bleeding edge, and that’s the whole point. LTS Haskell lags a little behind the latest and greatest in order to deliver a stable and cohesive Haskell experience.

Posted in Uncategorized | 1 Comment

Stackage is reverting to aeson-0.9

Starting immediately, Stackage nightly builds will be stepping back from aeson-0.10.0.0 to aeson-0.9.0.1. Due to issues with aeson-0.10, we are planning to discontinue LTS 4. Next Sunday (2016-01-24) we will begin LTS 5, which will ship with aeson-0.9.

Under normal circumstances, the support duration of an LTS Haskell series is 3 months at minimum. We believe that, in retrospect, the inclusion of aeson-0.10 in LTS Haskell was premature, and we felt it necessary to take quick action to reverse this mistake. We anticipate that LTS 5 and onward will be supported for the usual 3+ months.

Looking forward, we hope that LTS 6 (around April/May 2016) will be able to incorporate ghc-8. Several patches for aeson can be seen in the pipeline, so we also hope that LTS 6 will be able to include a new version of aeson with all of these improvements. Aeson is a key component of the Haskell ecosystem, and we thank Bryan O’Sullivan and aeson contributors for the hard work that has gone into it. We also thank all of the downstream package authors that have been working hard to keep pace.

Posted in Uncategorized | 1 Comment

What to do with aeson-0.10?

aeson-0.10.0.0 has been out since September 2015. Aeson is widely used, and adoption of the new version was fairly slow. The changelog claims, “the API changes below should be upwards compatible from older versions of aeson. If you run into upgrade problems, please file an issue with details.”

An issue was immediately filed: Breaking changes on 0.10 regarding Null field parsing on Maybe types. (This issue remains open.)

Not long after, another issue was filed: 0.10 causes Couldn’t match expected type `Data.Text.Internal.Text’ with actual type `Data.ByteString.Builder.Internal.Builder’ (This issue was addressed fairly quickly.)

I’m not going to rehash every single issue and bugfix, but the point is that a handful of bugs have been fixed since the 0.10.0.0 release, and a few regressions still haven’t been fixed. However, what bugfixes there are have not yet been published to Hackage.

When LTS Haskell 4.0 was released with aeson-0.10, one particularly nasty regression came back into the spotlight: GHC Killed with out of memory when using generics.

Stackage curators tend to agree that including aeson-0.10.0.0 in LTS Haskell was a mistake. So, where do we go from here? We have a few options.

* Revert to aeson-0.9
* Use a “compatibility layer” package
* Wait for a patch to aeson-0.10

How have you been dealing with aeson-0.10? Did its inclusion in LTS Haskell have any impact on you and your projects? How can we improve the way we deal with situations like this in the future?

Posted in Uncategorized | Leave a comment

An informal explanation of stackage-sandbox

Works on my machine, will it work on yours?

Suppose there’s a Haskell project called stackage-cli that I’d like to share with you. It builds just fine on my machine, but will it build on yours? If we have different versions of installed Haskell packages, we might run into cabal hell trying to sort things out.

To start off, let’s first agree on using ghc-7.8.4 and cabal-install-1.20.0.3. (In this blog post, I’ll assume you are able to install these somehow. Next time I might have some more to say about how to get this far.)

Next, let’s agree to use the same package versions for the dependencies. Cabal provides a handy command to help with this: cabal freeze. This creates a file called cabal.config which lists package constraints. Here’s what I got when I did a cabal freeze on the project as built on my machine: http://lpaste.net/130815.

It would be rather tedious for you to blow away your entire package database, start fresh, and install these exact versions of dependencies. Thankfully, there’s a better way.

cabal sandbox

You don’t have to blow away your whole package database just to build this project I want to share with you. You can instead create a “cabal sandbox” just for this project. Then you can install the dependencies there.

I did a cabal freeze when I was on git commit 9b68a74, so let’s check out that particular version.

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.8.4
$ cabal --version
cabal-install version 1.20.0.3
using version 1.20.0.0 of the Cabal library 
$ git clone git@github.com:fpco/stackage-cli
$ cd stackage-cli
$ git checkout 9b68a74
$ wget -O cabal.config http://lpaste.net/raw/130815
$ cabal sandbox init
$ cabal install --only-dependencies -j && cabal build

And now we wait for the dependencies to install. Maybe take a bathroom break, grab a coffee. This took about 8 minutes on my machine. Presumably, it worked, and we have successfully avoided a trip through cabal hell. Not bad, but we can do even better.

shared sandboxes

Local cabal projects can share local sandboxes. So, if you happen to be working on different projects that have the exact same dependency versions, you can safely share one sandbox between the various projects. If they have differing dependency versions, then you’re in cabal hell territory. You might get it to work. You might have some butterflies to battle. It’s probably easier to just have different sandboxes in that case.

Let’s check out the same project and rebuild it against the same sandbox.

$ git clone git@github.com:fpco/stackage-cli
$ cd stackage-cli
$ git checkout 9b68a74
$ cp ../cabal.config ./cabal.config
$ cabal sandbox init --sandbox ../.cabal-sandbox
Writing a default package environment file to
/home/dan/stackage-cli/stackage-cli/cabal.sandbox.config
Using an existing sandbox located at /home/dan/stackage-cli/.cabal-sandbox

See how it said “using an existing sandbox”, because we told it which sandbox to reuse?

$ cabal install --only-dependencies
Resolving dependencies...
All the requested packages are already installed:
Use --reinstall if you want to reinstall anyway.
$ cabal build

No need to reinstall the dependencies. They’re all there. The build itself took under 30 seconds on my machine.

The exact same dependencies? Really?

Suppose someone else wants your help developing a different project. It seems pretty unlikely that she and I would just happen to be using the exact same dependency versions.

Suppose you’re starting up your own project. How do you pick dependency versions that will work together? Just cabal install dependencies as they arise and cross your fingers?

Stackage

FP Complete has developed a project called Stackage.org that is meant to help with these concerns. The main service provided by Stackage.org that we will make use of in this blog post is the cabal.config files it provides. Remember how we got our cabal.config via cabal freeze, which allowed me to share with you the dependency versions that worked for me? Stackage provides cabal.config files that include dependency constraints for a very large subset of Hackage. These dependency versions are known to all work together. New builds are calculated nightly, but to provide a more stable point of reference, LTS Haskell is also available.

So here’s an idea: create a shared sandbox on your machine with only LTS Haskell packages inside. Let’s create a location for lts-2.3 and install things there. We’ll make sure to use the lts-2.3 cabal.config with our project, so that only lts-2.3 packages get installed into the shared sandbox.

$ git clone git@github.com:fpco/stackage-cli
$ cd stackage-cli
$ git checkout 9b68a74
$ mkdir -p ~/.stackage/sandboxes/ghc-7.8.4/lts-2.3/
$ wget -O cabal.config http://stackage.org/snapshot/lts-2.3/cabal.config
$ cabal sandbox init --sandbox ~/.stackage/sandboxes/ghc-7.8.4/lts-2.3/
$ cabal install --only-dependencies -j

Since this is the first time anything has been installed to that sandbox, it will take the same 8 minutes as before. But only the first time. Now, any of my other projects that can be developed against lts-2.3 can share those same dependencies! No need to reinstall the same dependencies over and over into fresh sandboxes for each project.

The stackage command line interface

Installation

The stackage-cli project provides a tool that automates some of these processes for you. Move out of these sandboxed project directories and do cabal install stackage-cli to install it to your usual place. If that fails, then use the sandbox tricks I’ve described above to install it into a sandbox:

$ mkdir sandboxed-dir
$ cd sandboxed-dir
$ wget -O cabal.config http://stackage.org/lts/cabal.config
$ cabal sandbox init
$ cabal update
$ cabal install stackage-cli

You can see where the binaries got installed with cabal exec, which lets you execute commands “in the sandbox context”:

$ cabal exec which stackage
/home/dan/sandboxed-dir/.cabal-sandbox/bin/stackage

Copy the various binaries named stackage and stackage-* onto your path, or add that bin directory to your path. Let’s review the functionality provided by the stackage command-line tool.

Managing cabal.config and a pre-configured sandbox

First off, is stackage purge, which deletes your current cabal.config and prompts you to unregister everything in your sandbox. If you don’t have a sandbox configured, it will instead prompt you to unregister from your global package database.

$ stackage purge
(Sandbox) /home/dan/.stackage/sandboxes/ghc-7.8.4/lts-2.3/x86_64-linux-ghc-7.8.4-packages.conf.d
Detected 221 packages to purge from this database
Unregister 221 packages (y/n)? [default: n] 

I don’t actually want to purge that shared sandbox, so I chose the default: no. As you can see, I’ve installed a lot of packages in there that I’d rather not purge. If you have a non-shared project sandbox, you may want to purge it prior to stackage init.

Indeed, next up is stackage init. This just downloads the appropriate cabal.conig for you.

$ stackage init lts-2.3

Leave off the lts-2.3 argument and it will just download the latest LTS. You can also use stackage upgrade which is simply purge followed by init.

These three commands are not particularly aware of shared sandboxes. You can use stackage sandbox to automatically set up and use LTS-based shared sandboxes.

Managing cabal.config, cabal.sandbox.config and shared sandboxes

$ stackage sandbox delete

This command deletes both your cabal.config and your cabal.sandbox.config. It doesn’t touch your actual sandbox unless you give it an argument. There are certian sandboxes that are managed by stackage sandbox, and it can help you delete them so you don’t have to go looking to see where they are.

$ stackage sandbox delete lts-2.3

I didn’t actually run that command, though, because I don’t actually want to delete my precious lts-2.3 sandbox, because it has 221 packages that I really don’t want to bother reinstalling until the next LTS release.

$ stackage sandbox init lts-2.3

This does a couple things.

  • It downloads the lts-2.3 cabal.config
  • It creates a folder for the lts-2.3 shared sandbox, if it doesn’t already exist
  • It calls cabal sandbox init with the --sandbox argument set to the lts-2.3 shared sandbox

In other words, it readies your project to be built with your shared lts-2.3 sandbox.

You can easily do a delete followed by an init by using:

$ stackage sandbox upgrade lts-2.3

This leaves your old sandbox intact, wherever it was, but replaces your cabal.config and cabal.sandbox.config with the appropriate configurations for the lts-2.3 shared sandbox (which it also sets up if necessary). stackage sandbox upgrade is the command I recommend using most of the time.

Build it again

Remember this example from before?

$ git clone git@github.com:fpco/stackage-cli
$ cd stackage-cli
$ git checkout 9b68a74
$ mkdir -p ~/.stackage/sandboxes/ghc-7.8.4/lts-2.3/
$ wget -O cabal.config http://stackage.org/snapshot/lts-2.3/cabal.config
$ cabal sandbox init --sandbox ~/.stackage/sandboxes/ghc-7.8.4/lts-2.3/
$ cabal install --only-dependencies -j

We can replace the wget and cabal sandbox init commands with stackage sandbox upgrade, and accomplish the same thing.

$ git clone git@github.com:fpco/stackage-cli
$ cd stackage-cli
$ git checkout 9b68a74
$ stackage sandbox upgrade lts-2.3
$ cabal install --only-dependencies -j

Again, since we’ve already installed this package’s deps there, there should be nothing new to install.

Summary

Let’s recap the main ideas behind stackage sandbox upgrade.

  • LTS Haskell provides a common platform of dependency versions for developers to develop against. It stays fresh enough to remain relevant, but stable enough to provide a solid point of reference.
  • Shared sandboxes allow you to develop your various projects against the same platform. No more reinstalling all-the-things for every single project.
  • stackage sandbox upgrade helps you to easily manage your shared sandboxes based on LTS Haskell

Upgrade your sandboxes to Stackage! Or if you want to customize the workflow, then mix and match whichever commands provided by stackage that you find convenient. You can even develop your own executable as a “stackage-cli plugin”. Any executable on your path with a name that starts with “stackage-” will be treated as a plugin, and you’ll notice that stackage-init, stackage-purge, stackage-upgrade, and stackage-sandbox are all simply plugins.

How is this different than Halcyon?

Halcyon is a project focused on installation of Haskell executables. In contrast, the stackage command line tool is focused on aiding development of Haskell projects.

Future work

We still haven’t gotten rid of all of the tedium in this process yet. For example, imagine that we both remotely log in to the same machine; we could share build artifacts! But that’s unrealistic. Instead, we can agree on using the same kernel or VM; that way we can share build artifacts between machines.

That’s roughly the idea behind docker, and FP Complete is working on docker-based build tools. It’s also roughly what Halcyon does, and we want to extend this benefit to development, not just installation.

Your feedback is more than welcome on the issue tracker.

https://github.com/fpco/stackage-cli/issues

Further reading

For a more nuanced example of stackage-cli, check out the project wiki on github:

https://github.com/fpco/stackage-cli/wiki/Example

Posted in Uncategorized | 1 Comment

Similarities: Monoid, MonadPlus, Category

This is perhaps obvious to anyone who has thoroughly studied category theory, but the similarities between Monoid, MonadPlus, and Category, have really struck me lately. I’m going to take a smidgeon of artistic license to present this train of thought.

class Monoid (a :: *) where
  mempty :: a
  (<>) :: a -> a -> a

class MonadPlus (m :: * -> *) where
  mzero :: forall x. m x
  (<+>) :: forall x. m x -> m x -> m x

class Category (c :: * -> * -> *) where
id :: forall x. c x x
(>>>) :: forall x y z. c x y -> c y z -> c x z

These classes all come with the same set of laws.

id >>> x = x -- left identity
x >>> id = x -- right identity
(x >>> y) >>> z = x >>> (y >>> z) -- associativity

I’d like to present three structures, which correspond to some sort of “free” instance. Notice how they all have the same shape. (I added a superfluous unit to the Nat to make the parallel just a little clearer.) I put “free” in quotes because I do not claim to actually understand what this means in category theory, nor do I claim to be using that term correctly in the category theoretic sense. I’m pretty sure I’m somewhat close to that definition, though.

My “free” Monoid is the natural numbers. Inexplicably, I’m going to do something weird and hide an arbitrary value inside the succ constructor. Just pretend that “v” isn’t there if it confuses you. It’s just the nats.

data Nat where
  Zero :: Nat
  Succ :: v -> Nat -> Nat
instance Monoid Nat where
  mempty = Zero
  Zero y = y
  Succ v x <> y = Succ v (x <> y)

My “free” MonadPlus is homogeneous lists.

data List x where
  Nil :: List x
  Cons :: x -> List x -> List x
instance MonadPlus List where
  mzero = Nil
  Nil my = my
  Cons x mx <+> my = Cons x (mx <+> my)

My “free” category is… a little more abstract than the last two. It’s extending any type relation with reflexivity and transitivity (regardless of whether the original relation includes reflexivity and transitivity).

data ReflTrans (rel :: * -> * -> *) :: * -> * -> * where
  Refl :: ReflTrans rel x x
  Trans :: rel x y -> ReflTrans rel y z -> ReflTrans rel x z
instance Category (ReflTrans rel) where
  id = Refl
  Refl >>> yz = yz
  Trans rwx xy >>> yz = Trans rwx (xy >>> yz)

Also note an added similarity between the three:

unity :: v -> Nat
unity v = Succ v Zero

singleton :: x -> List x
singleton x = Cons x Nil

liftRel :: rel x y -> ReflTrans rel x y
liftRel r = Trans r Refl

infinity :: () -> Nat
infinity () = Succ () (infinity ())

repeat x :: x -> List x
repeat x = List x (repeat x)

wat :: rel x x -> ReflTrans rel x x
wat r = Trans r (iterate r)

So what’s my point? If you erase all of the types, then the code I have written for all three of these is identical modulo alpha renaming.

data Nat       data List      data ReflTrans
Zero Nil Refl
Succ v Nat Cons x List Trans r ReflTrans

Succ v x <> y = Succ v (x <> y)
Cons x xs <+> ys = Cons x (xs <+> ys)
Trans rwx xy >>> yz = Trans rwx (xy >>> yz)

And this begs the question: why should I write this code over and over, just to appease the type system? Is there a good way to unify these abstractions instead? How might we adjust Haskell’s type system to alleviate this redundancy?

Posted in Uncategorized | 4 Comments

Two implementations of Seers

Last time, we implemented a bowling game scorer by using a Tardis. If you aren’t yet familiar with the Tardis’s interface, then I recommend you check out the explanation on Hackage. (tl;dr it’s a State monad with get and put, except there are two streams of state, one forwards and one backwards, so there are four operations: getPast, getFuture, sendPast, and sendFuture.)

Today, we’ll take another large step in the esoteric drection, and implement a Seer by using a Tardis. Why, you ask? My response: why not? There may be some deep motivating reasons for you to study this, but I don’t pretend to know what those might be.

> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE FunctionalDependencies #-}
> {-# LANGUAGE FlexibleInstances #-}
> {-# LANGUAGE FlexibleContexts #-}
> {-# LANGUAGE GeneralizedNewtypeDeriving #-}
> {-# LANGUAGE DoRec #-}
> {-# OPTIONS_GHC -Wall #-}
> import Control.Applicative (Applicative)
> import Control.Monad (liftM)
> import Control.Monad.Fix (MonadFix, mfix)
> import Control.Monad.Trans.Class (lift)
> import Control.Monad.Trans.Tardis
> import Control.Monad.Trans.Reader (ReaderT, ask, runReaderT)
> import Control.Monad.Trans.Writer (WriterT, tell, runWriterT)
> import Data.Monoid

What is a Seer?

A seer is someone that foretells the future.1 But how do seers know the future? Suppose you are writing a novel, and you want to devise a semi-believable “system” for how seers work. What would the rules be?

Well, rule number one for me is that in a legitimate system, all seers must agree about the future. If different seers predict different outcomes for the same future period, then there is reason to doubt such a system. I decided that in my seer system, all seers see “the whole universe”. All seers see the same thing, regardless of when or where in space and time they decide to “see” it.

Now, where does this information come from? Are there separate people that send information to these seers? My first idea was that the seer system could be a network of seers, and all information comes from within the network itself. All seers are therefore required to provide accurate information about their “present” in order to tap into the reservoir of mystical information about their past and future.

We therefore come to the main operation that I have devised for seers.

contact :: Monoid w => w -> Seer w

A seer provides their worldview in exchange for the grand worldview. The “whole” world should be of the form past <> present <> future, where present is whatever value is provided as the argument to contact.

Remember when I wondered about whether those that “see” the universe and those that “send” information about the universe might be different people? It turns out that we can easily write operations see and send in terms of contact. Or, alternatively, given see and send, we can easily write contact in terms of those.

> class (Monad m, Monoid w) => MonadSeer w m | m -> w where
>   see :: m w
>   send :: w -> m ()
>   contact :: w -> m w
>   
>   see = contact mempty
>   send w = contact w >> return ()
>   contact w = send w >> see

I’ve created a typeclass for the Seer interface, because we are going to implement a seer in two different ways.

Seer in terms of a Tardis

The Tardis allows us to both get and send messages to both the past and future. Given the timey-wimey nature of seers, a tardis seems like the perfect candidate for implementing them.

> newtype SeerT w m a = SeerT { unSeerT :: TardisT w w m a }
>                     deriving (Functor, Applicative, Monad, MonadFix)

A single contact consists of a seer getting in touch with both the past and the future. It seems only fair that this seer should share with the future his newfound knowledge of the past, and with the past his knowledge of the future. The past is inquiring the present about its (the past’s) future, which includes both the present and the future, or in other words present <> future. The future is inquiring the present about its (the future’s) past, which includes both the present and the past, or in other words, past <> present. The result of the contact is the whole universe, spanning all of time, in other words, past <> present <> future. In all cases, we want to make sure to keep things in “chronological” order.

Did you follow all of that? In short, information from the past should be sent forwards to the future, and information from the future should be sent backwards to the past. We can encode this flow of information easily using the Tardis operations:

> instance (Monoid w, MonadFix m) => MonadSeer w (SeerT w m) where
>   contact present = SeerT $ do
>     rec past <- getPast
>         sendPast (present <> future)
>         sendFuture (past <> present)
>         future <- getFuture
>     return (past <> present <> future)

Now, in order to “run” a seer operation, all we have to do is provide mempty at both ends of the time continuum, and run the tardis as usual.

> runSeerT :: (MonadFix m, Monoid w) => SeerT w m a -> m a
> runSeerT = flip evalTardisT (mempty, mempty) . unSeerT

Here is a dumb example demonstrating how it works.

> dumbExample :: MonadSeer [Int] m => m [Int]
> dumbExample = do
>   world1 <- see
>   send [world1 !! 2]
>   send [1]
>   world2 <- see
>   send [world2 !! 1]
>   world3 <- see
>   return world3
ghci> runSeerT dumbExample
  [1,1,1]

It is actually unnecessary to see more than once, since it is always the unchanging truth of past <> present <> future. The following is equivalent:

dumbExample = do
  world <- see
  send [world !! 2]
  send [1]
  send [world !! 1]
  return world

Seer in terms of a Reader/Writer

The astute observer should have noticed an odd similarity between see and ask, send and tell. They embody practically the same concept! The only nuance is that when you ask, what you will get is everything that you have tell’d, and everything you will tell. It turns out that this is quite easy to write in terms of the Reader and Writer monad transformers, which happen to be instances of MonadFix.

> newtype RWSeerT w m a = RWSeerT { unRWSeerT :: ReaderT w (WriterT w m) a }
>                       deriving (Functor, Applicative, Monad, MonadFix)

As I said before, see is simply ask, while send is simply tell. We merely lift and wrap the operations as necessary to keep the type system happy:

> instance (Monoid w, Monad m) => MonadSeer w (RWSeerT w m) where
>   see = RWSeerT ask
>   send w = RWSeerT (lift (tell w))

Now, to run a Seer built on top of a Reader/Writer pair, all we have to do is feed the results of the Writer straight back into the Reader. We accomplish this via mfix.

> runRWSeerT :: (Monoid w, MonadFix m) => RWSeerT w m a -> m a
> runRWSeerT (RWSeerT rwma) = liftM fst $
>   mfix (\ ~(_, w) -> runWriterT (runReaderT rwma w))

Here is a dumb example demonstrating that it works

ghci> runRWSeerT dumbExample
  [1,1,1]

So why use a Tardis?

For fun, obviously!

More seriously, notice that we can “run” SeerT differently, depending on whether we implemented it with Tardis or with Reader/Writer. With Tardis, we can supply “bookends”, the further past and the further future.

> runSeerTWith :: (MonadFix m, Monoid w) => w -> w -> SeerT w m a -> m a
> runSeerTWith past future = flip evalTardisT (future, past) . unSeerT

Exercise: Predict the output of runSeerTWith [10, 11, 12] [16, 17, 18] dumbExample.

Whereas with the reader/writer pair, we can fool the seers by giving them a false reality.

> runRWSeerTWith :: (Monoid w, Monad m) => w -> RWSeerT w m a -> m a
> runRWSeerTWith falseReality (RWSeerT rwma) = liftM fst $
>   runWriterT (runReaderT rwma falseReality)

Exercise: Predict the output of runRWSeerTWith [10, 11, 12] dumbExample.

What the ramifications of these are, I really don’t know. I just follow the types, lean on the laziness, and things just seem to work in Haskell, even mystical things like time travel and seers.

Download this code and play with it! Don’t forget to cabal install tardis first.

Posted in Uncategorized | 1 Comment

Bowling on a Tardis

> {-# LANGUAGE DoRec #-}
> import Control.Monad.Tardis

A few months ago, I released the tardis package. I promised a few blog posts about it, but put it off until now. If you haven’t heard of my "tardis" package yet, then you should probably take a look at the hackage documentation I’ve already written up for Control.Monad.Tardis.

Bowling

Let’s whip up a contrived example to which Tardis is applicable. Bowling scores is one such example, because the score you have on a given frame depends on both the past score as well as up to two future throws. Any time you need to know something from both the past and the future, Tardis might be able to help.

Let’s first define a data type that captures the essence of a bowling game. A game consists of 10 "frames". Although we model a single Frame as a data type, there are special rules that apply to the final frame, so we will model it separately as LFrame.

> data BowlingGame = BowlingGame
>   { frames :: [Frame]  -- should be 9, too tedious to type restrict
>   , lastFrame :: LFrame }
> 
> data Frame = Strike
>            | Spare { firstThrow :: Int }
>            | Frame { firstThrow, secondThrow :: Int }
> 
> data LFrame = LStrike { bonus1, bonus2 :: Int }
>             | LSpare { throw1, bonus1 :: Int }
>             | LFrame { throw1, throw2 :: Int }

For details on how bowling is scored, see Wikipedia > Bowling # Scoring.

Sample data

Here’s a few games’ worth of sample bowling data.

> --    X  9/ X  X  X   81  7/  X   X   XXX
> -- 0  20 40 70 98 117 126 146 176 206 236
> -- this guy is really good.
> sampleGame = BowlingGame
>   { frames =
>     [ Strike    , Spare 9
>     , Strike    , Strike
>     , Strike    , Frame 8 1
>     , Spare 7   , Strike
>     , Strike
>     ]
>   , lastFrame = LStrike 10 10
>   }
> 
> perfectGame = BowlingGame
>   { frames = replicate 9 Strike
>   , lastFrame = LStrike 10 10
>   }
> 
> worstGame = BowlingGame
>   { frames = replicate 9 (Frame 0 0)
>   , lastFrame = LFrame 0 0
>   }
> 
> main = mapM_ (print . toScores) [sampleGame, perfectGame, worstGame]

Using a Tardis

Well now we want to write the function toScores :: BowlingGame -> [Int]. We’ll do this by stepping through each Frame and creating the appropriate score. Whenever using a Tardis, I recommend you create separate newtypes for the backwards- and forwards-travelling state so you don’t get them mixed up.

> newtype PreviousScores = PreviousScores [Int]
> newtype NextThrows = NextThrows (Int, Int)

Here I’ve chosen the newtype PreviousScores for the forwards state, (because coming from the past to the present is moving "forwards" in time) and NextThrows as the backwards state (because coming from the future to the present is moving "backwards" in time).

> toScores :: BowlingGame -> [Int]
> toScores game = flip evalTardis initState $ go (frames game) where
>   go :: [Frame] -> Tardis NextThrows PreviousScores [Int]

First, we handle the case where we have another frame to process. We begin by assuming we have access to the next two throws (nextThrow1 and nextThrow2), as well as the previous score.

>   go (f : fs) = do
>     rec
>       let (score', throws') = case f of
>             Strike    -> (score + 10 + nextThrow1 + nextThrow2, (10, nextThrow1))
>             Spare n   -> (score + 10 + nextThrow1,              (n, 10 - n))
>             Frame n m -> (score + n + m,                        (n, m))

We need to determine the new state for each of the two streams of state. score' is determined by a combination of the previous score, the current frame, and future throws. This is the new score that we will send forwards in time. throws' is determined only by the current frame and future throws. This is the new "next two throws" that we will send backwards in time, which is why we put the current frame’s first throw as the earliest.

Now that we’ve got that figured out, we just use the tardis’s capabilities in order to retrieve and send information along its correct time stream. A good rule of thumb seems to be, if you want to get information from the past, then send the past some information first. Likewise, if you want info from the future, then send it some info first. However, I have no idea if this rule of thumb is necessary at all; the Tardis will sometimes Just Work even if you jumble it up a little.

>       sendPast $ NextThrows throws'
>       PreviousScores scores@(score : _) <- getPast
>       sendFuture $ PreviousScores (score' : scores)
>       NextThrows ~(nextThrow1, nextThrow2) <- getFuture

Great! Finally, we move on to the rest of the frames.

>     go fs

Once we run out of frames, we need to handle the last frame. There is no future to be concerned about, and we can just set up the values to be sent to the recent past via initState, so all we have to do is look at the past score, add the final frame’s score, and we’re done.

>   go [] = do
>     PreviousScores scores@(score : _) <- getPast
>     return $ (finalFrameScore + score) : scores

All that’s left is to figure out how to determine the final frame’s score, as well as the initial state. The former is easy, given the specifications of how to score a bowling game.

>   finalFrameScore = case lastFrame game of
>     LStrike b1 b2 -> 10 + b1 + b2
>     LSpare  t1 b1 -> 10 + b1
>     LFrame  t1 t2 -> t1 + t2

The "initial state" fed into a tardis is the farthest past for the forwards-travelling state, and the farthest future for the backwards-travelling state. The farthest past is a score of zero, while the farthest future is the final two throws of the game. Well, not quite. It’s the final two throws that come before the second-to-last frame. The last frame is guaranteed to consist of at least two throws. In the case of LStrike or LSpare, there are always three throws in the last frame, so the final throw is ignored. Remember, we’re sending the past its "closest" future two throws.

>   initState = (NextThrows $ case lastFrame game of
>     LStrike b1 b2 -> (10, b1)
>     LSpare t1 _b1 -> (t1, 10 - t1)
>     LFrame t1 t2  -> (t1, t2)
>     , PreviousScores [0])

And… that’s it! All we had to do was encode the rules of Bowling into a Tardis, and via some timey-wimey trickery, the Tardis assembles all of the information into a list of bowling scores, from the last frame to the first.

ghci> main
  [236,206,176,146,126,117,98,70,40,20,0]
  [300,270,240,210,180,150,120,90,60,30,0]
  [0,0,0,0,0,0,0,0,0,0,0]

Exercise: download this code, and remove the tilde (~) from line 133. What happens? Why?

Next time

Bowling was a rather simple example, to warm you up to the idea of what a Tardis is and what it can do. Next time, we’ll get even more timey-wimey by sketching out the concept of "seers" with nothing more than tardis primitives and a vague idea of some ground rules to rationally explain how you might describe a believable system of "seers" in a fictional setting.

Posted in Uncategorized | 1 Comment

My experience with Typed Racket: the negative parts

My experience with Typed Racket

A while ago, I began a project to re-implement Racket’s web server library in Typed Racket (TR) by providing type annotations to the existing code. I began by typing the xml collection library, but after spending a lot of time on just that, I decided to drop the project. I think TR is a very impressive feat, and I really like the direction it is going, but I feel that the current state of TR is not adequate for typing large swaths of pre-existing Racket code. In this document, I will attempt to explain the road blocks and speed bumps I encountered while working on this project.

Although I will focus on some negative aspects here, I do not wish to communicate that my overall experience with TR was bad. Quite the contrary, friends on the #racket irc channel were extremely helpful, and for every buggy type annotation that I demonstrate, there are mountains of code that are easily and effortlessly typed. Also, most of these problems could be avoided by simply starting your project with Typed Racket, rather than trying to go back to old code and type it.

Expectations

Let me start by stating what I expect, or rather, what I wish were true of Typed Racket (TR).

First, I’d like TR to be a drop-in Racket replacement for any module. I’d like to be able to take a project, and pick any file written in regular racket, and just add type annotations and have it just automagically work with the rest of the project.

Second, I’d like to be able to take any racket code, and provide a type for it. Anything I write in regular racket should somehow be typeable in TR, without modifying the code itself. Sometimes, TR type annotations are intrusive, but as long as the original logic remains identical, I consider that to be "unmodified" code. This is essentially impossible, so let me limit that a bit: any sane code that I write in Racket should be typeable by TR. Most people write code with particular types in mind, whether or not they are explicitly using type annotations. The xml collection code I was working with provided types in the comments for every function, and also provided contracts for most functions. I think it’s reasonable to expect this kind of code to be typeable.

Now with that in mind, let’s start with one of the first bugs I ran into, and as it turns out, one of the most devastating.

Regarding Structs

TR originally targeted R5RS, and used to be called "Typed Scheme". This perhaps explains why struct support isn’t quite there.

As a passing thought, let me just mention that there is an unfortunate disconnect between TR conventions and racket struct conventions. It is typical for TR types to be capitalized (e.g. Continuation-Mark-Set). This helps visually distinguish type names, and I think it is a good convention. However, it is typical for racket structs to be lowercased (e.g. (struct location (line char offset))). This is convenient because it makes for nice derived identifiers (e.g. location-line). However, there is a disconnect between the two conventions: TR will create a type with the same name as the struct, which means that we usually end up with a lowercase type name. This is nothing that a define-type can’t fix, but it’s annoying nonetheless.

Extending a struct

Now on to the main event. Consider these two racket files: the first provides a struct, and the second creates another struct which extends the first.

foo.rkt

#lang racket
(define-struct foo ())
(provide (struct-out foo))

bar.rkt

#lang racket
(require "foo.rkt")
(define-struct (bar foo) ())

Let’s try to convert foo.rkt to Typed Racket. We’ll simply switch define-struct to define-struct:. If that struct had fields, we would provide type annotations for those, too.

foo.rkt

#lang typed/racket
(define-struct: foo ())
(provide (struct-out foo))

That file works just fine, but now if we try to run bar.rkt, the type checker reprimands us:

Type Checker: The type of struct:foo cannot be converted to a contract in: struct:foo5

Thus was born ticket 12503.

Typing a stream consumer

On the side, I’ve been following recent iteratee conversations in the Haskell community, and wanted to write up "pipes" in TR.

A simplified version of a "pipe" is a stream consumer. It consumes an unknown number of inputs of the same type, and then produces some result. The consumer therefore has two states: "need another input", or "have a result". (Let’s ignore side effects for the sake of simplicity here.) This is easily written in Racket using structs to distinguish the two cases.

#lang racket
(require racket/match)

(struct fun (f))
(struct done (result))

; A way to run a consumer by giving it the same thing over and over
(define (repeatedly-apply consumer x)
  (match consumer
    [(fun f) (repeatedly-apply (f x) x)]
    [(done result) result]))

Now we wish to type this code. TR’s union types should do the trick.

#lang typed/racket
(require racket/match)

(struct: (I R) fun ([f : (I -> (Consumer I R))]))
(struct: (R) done ([result : R]))

(define-type (Consumer I R)
  (U (fun I R)
     (done I R)))

; A way to run a consumer by giving it the same thing over and over
(: repeatedly-apply ((Consumer I R) I -> R))
(define (repeatedly-apply consumer x)
  (match consumer
    [(fun f) (repeatedly-apply (f x) x)]
    [(done result) result]))

Unfortunately, in Racket 5.3, TR says that this is a type error:

Type Checker: Structure type constructor fun applied to
non-regular arguments (g5588 R) in: (fun I R)

Fortunately, merely a few days after I created ticket 12999, a fix was patched onto HEAD. Keep up the good work, guys!

Filters

Type filters are a really cool feature of Typed Racket, and are essential to typing Racket code. Basically, whenever you use a function with a filter attached, you can refine the type information for code constructs with multiple branches, such as cond and if. See the pastebin link at the end of this section for a more detailed explanation of type filters; I’m unaware of any good official documentation on the topic.

One annoyance I ran into was being unable to control which filter a function I defined has. For example, suppose I have a function that determines whether its input is a happy char.

#lang racket
(require racket/match)

(define (happy-char? c)
  (match c
    [(or #\h #\a #\p #\y) #t]
    [_ #f]))

(andmap happy-char? (string->list "happy")) ;; => #t
(andmap happy-char? (string->list "sad"))   ;; => #f

(Coming from Haskell, I really like racket/match.)

We’d like to define a type to represent happy chars. Happily, this can be done in Typed Racket with relative ease.

#lang typed/racket

(define-type Happy-Char (U #\h #\a #\p #\y))
(define-predicate happy-char? Happy-Char)

(andmap happy-char? (string->list "happy")) ;; => #t
(andmap happy-char? (string->list "sad"))   ;; => #f

Sadly, this requires using define-predicate. Remember that one of the things I expect from TR is that all sane Racket code that I write should be typeable. If we try to type the happy-char? that I wrote originally, we’ll run into problems:

#lang typed/racket
(require racket/match)

(define-type Happy-Char (U #\h #\a #\p #\y))

(: happy-char? (Any -> Boolean : Happy-Char))
(define (happy-char? c)
  (match c
    [(or #\h #\a #\p #\y) #t]
    [_ #f]))

The type system says "no" to this.

Expected result with filter ((Happy-Char @ c) | (! Happy-Char @ c)),
got filter (Top | Top) in: (match c ((or #\h #\a #\p #\y) #t) (_ #f))

: Happy-Char is the filter part of the function type annotation: if this function produces #t, then the result is guaranteed to have type Happy-Char, and if this function produces #f, then the result is guaranteed to not have type Happy-Char. In error messages, TR expresses this as ((Happy-Char @ c) | (! (Happy-Char @ c))).

Aside: One annoyance is that there is no way for the programmer to annotate anything other than a filter of the form (definitely-yes | definitely-no); see feature request #12528 for details.

Back to the issue at hand. The main reason this is invalid is because TR is simply unaware of racket/match; filters don’t flow through branches of racket/match like you would expect them to. (This is on the long term list of goals for TR.)

However, there is a more fundamental problem with the current implementation of filters. I wrote up a Typed Racket file with comments that explain what filters are, and one of the latest limitations that I ran into: http://pastebin.com/JQ9txdrX.

Macros

Optional arguments are somewhat annoying to deal with in TR. Suppose I want to type the following function:

#lang racket
(define (foo x [y 3]) (+ x y))

TR provides the case-> type, which allows a function to have multiple arities. We can use this to type foo like so:

#lang racket
(: foo (case-> (Number -> Number)
               (Number Number -> Number)))
(define (foo x [y 3]) (+ x y))

Not bad, though it’s a bit annoying to have to repeat the other arguments. Well, in Racket, when you find yourself writing the same flavor of annoying code over and over, what do you do? You write a macro!

I’m not much of a macro wiz, but here’s a function describing something like the macro I’d like to write:

#lang racket

(define (case-opt s)
 (match s [`(,args ? ,opt-args -> ,result)
  (match opt-args
   [(list)
    (append args `(-> ,result))]
   [(cons opt-first opt-rest)
    `(case->
      ,(append args `(-> ,result))
      ,(case-opt `(,(append args (list opt-first)) ? ,opt-rest -> ,result)))]
  )]))

(case-opt '((Foo Bar) ? (Baz Quux) -> End)) ;; =>
;; '(case-> (Foo Bar -> End)
;;   (case-> (Foo Bar Baz -> End)
;;            (Foo Bar Baz Quux -> End)))

Seems reasonable, right? I would like a cleaner syntax for optional args, and it seems like a straightforward desugaring of my desired syntax could be accomplished through the macro system. Alas, Typed Racket hijacks macros, and happens before them.

#lang typed/racket

(define-syntax-rule (never-mind-me t) t)

(: x (never-mind-me Integer))
(define x 3)

;; Type checker: Unbound type name never-mind-me in: never-mind-me

The define-type mechanism could cover this simple example, but isn’t flexible enough to define more complex desugarings, like case-opt. There are probably some good reasons to keep it this way, but it’s disappointing to run into restrictions like this; it just feels non-Racket-y.

Contracts

A Good Racket library will often provide contracts with the functions that it exports. Contracts are basically restrictions on input and promises about output that are checked at runtime. At this point in time, I think it’s safe to say that Racket is the one true implementation of Contracts, and other languages sometimes provide a dumbed down version of them.

There is a lot of overlap between contracts and a type system. If TR is to serve as a drop-in replacement for Racket, then it needs to be able to define and export contracts just like regular Racket. Remember, an important use case of TR is to type some code originally written in Racket, in such a way that said code behaves just like it used to, without having to type any code that depends on it.

Contracts can behave like predicates, and are therefore connected to the idea of type filters. Consider the following simplified code taken from collects/xml/private/xexpr.rkt:

#lang racket/base
(require racket/contract)

(define (correct-xexpr? true false x) ...)
(define (xexpr? x) (correct-xexpr? (λ () #t) (λ (exn) #f) x))
(define (validate-xexpr x) (correct-xexpr? (λ () #t) (λ (exn) (raise exn)) x))

(define xexpr/c
  (make-flat-contract
   #:name 'xexpr?
   #:projection
   (lambda (blame)
     (lambda (val)
       (with-handlers ([exn:invalid-xexpr?
                        ... ])
         (validate-xexpr val)
         val)))
   #:first-order xexpr?))

The contract xexpr/c is designed around the correct-xexpr? function. Rather than spitting out plain #t or #f values, correct-xexpr? can take two actions to run under the "true" or "false" circumstances, respectively. The "false" action must be a function that can take an exn:invalid-xexpr as input. This design allows correct-xexpr? to provide detailed custom error messages in its implementation, and the caller can choose whether to inspect the error message, or simply throw it away.

Like racket/match, contracts play a crucial role in large, well-designed Racket programs, but Typed Racket just isn’t powerful enough yet to grant the programmer the ability to customize contracts. Instead, you can only generate contracts mechanically for a given data type using define-predicate. define-predicate is an invasive change; it forces me to throw away the custom code that served the same purpose, and to modify all code that depended on custom behavior of the pre-existing code.

Conclusions

I love Typed Racket. I hope this post does not discourage you from looking into TR. I especially think that TR is well-suited to new projects that can be built from the ground up with TR in mind. The TR type system is surprisingly flexible about the programs that it can type.

Unfortunately, TR doesn’t quite cover all of Racket. It can serve you very well as Typed Scheme, but it lacks full support for and cooperation with Rackety things such as structs, pattern matching, contracts, and macros. I eagerly look forward to the day when Typed Racket fully and completely meets my expectations, and I wouldn’t be too surprised if this happened over the next few years.

Posted in Uncategorized | 3 Comments

Pipes to Conduits part 8: A comparison

In this series, we started with the simplest of Pipe implementations, and added features one by one until we reached Conduit-like functionality. Today, we’ll strip away the abort and close features not present in Conduit (the former might be considered a misfeature, though without using indexed monads it is a necessity for the latter), and compare the results. There is one major difference, which I believe illustrates a serious flaw in both implementations. I will illustrate this issue at the end of the post.

For now, walk with me through some of our old code, as we compare it side-by-side with the code from Data.Conduit.Internal (from conduit-0.5.2.2).

> {-# OPTIONS_GHC -Wall #-}
> {-# LANGUAGE NoMonomorphismRestriction #-}
> 
> module PipeConduit where
> 
> import Control.Monad.Trans.Free (FreeT(..), FreeF(..), liftF, wrap)
> 
> import Data.Void (Void, absurd)
> import Control.Monad.Trans.Class
> import Control.Monad.Trans.Resource (MonadResource, allocate, release)
> 
> import qualified Data.Conduit as C
> import qualified Data.Conduit.List as C
> 
> import qualified Control.Frame as F
> import Control.IMonad.Trans (liftU)
> import Control.IMonad.Restrict (foreverR, mapMR_, (!>=), (!>))

Helpers

> pass :: Monad m => m ()
> pass = return ()
> 
> unreachable :: Monad m => m ()
> unreachable = error "You've reached the unreachable finalizer"

The Pipe type

I’ve decided for this post to revamp and re-arrange the PipeF type. Sans close and abort, and with mild re-arrangement of the order of fields for given constructors, you should be able to tell that it is identical to the PipeF type we have worked with before.

> data PipeF l i o u m next
>   = Yield next (m ()) o
>   | Await (i -> next) (u -> next)
>   | Leftover next l

The Functor instance for this type is entirely mechanical, based on its components. We could have just as easily used -XDeriveFunctor and arrived at the same instance.

> instance Functor (PipeF l i o u m) where
>   fmap h (Yield next fin o) = Yield (h next) fin o
>   fmap h (Await f g) = Await (h . f) (h . g)
>   fmap h (Leftover next l) = Leftover (h next) l
> type Pipe l i o u m r =
>   FreeT (PipeF l i o u m) m r

Now compare this with the Pipe type from Data.Conduit.Internal. I’ve rearranged the order of the lines of code, and removed comments, but otherwise the code is untouched.

data Pipe l i o u m r =
    HaveOutput (Pipe l i o u m r) (m ()) o
  | NeedInput (i -> Pipe l i o u m r) (u -> Pipe l i o u m r)
  | Leftover (Pipe l i o u m r) l

  | Done r
  | PipeM (m (Pipe l i o u m r))

If you are comfortable in your grasp of the Free monad transformer, then you should be able to see that our two representations are equivalent. The Done and PipeM constructors are analogous to Return and Wrap, while HaveOutput, NeedInput, and Leftover are analogous to Yield, Await, and Leftover respectively.

I’m going to define some synonyms for FreeT and runFreeT to help illustrate the similarities in implementation.

> pipeM :: m (FreeF (PipeF l i o u m) r (Pipe l i o u m r))
>       -> Pipe l i o u m r
> pipeM m = FreeT m
> 
> runPipeM :: Pipe l i o u m r
>          -> m (FreeF (PipeF l i o u m) r (Pipe l i o u m r))
> runPipeM (FreeT m) = m

For the Conduit implementation, you could imagine analogous methods that would allow us to write the Conduit code in similar fashion to what you’ve seen here before.

pipeM :: Monad m => m (Pipe l i o u m r) -> Pipe l i o u m r
pipeM m = PipeM m

runPipeM :: Monad m => Pipe l i o u m r -> m (Pipe l i o u m r)
runPipeM (PipeM m) = m >>= runPipeM
runPipeM p = return p
> type Producer   o   m r = Pipe Void () o    () m r
> type Consumer l i u m r = Pipe l    i  Void u  m r
> type Pipeline       m r = Pipe Void () Void () m r

Working with PipeF

I’ll keep using pipeCase to maintain similarity with previous code, although without the functor composition cruft, it’s really not that bad to just use direct pattern matching.

I’ve upgraded to transformers-free-1.0 which means that Return and Wrap are now called Pure and Free respectively.

> pipeCase :: FreeF (PipeF l i o u m) r next
>  -> (r                          -> a) -- Return
>  -> (next -> l                  -> a) -- Leftover
>  -> (next -> m () -> o          -> a) -- Yield
>  -> ((i -> next) -> (u -> next) -> a) -- Await
>                                 -> a
> pipeCase (Pure r)
>   k _ _ _ = k r
> pipeCase (Free (Leftover next l))
>   _ k _ _ = k next l
> pipeCase (Free (Yield next fin o))
>   _ _ k _ = k next fin o
> pipeCase (Free (Await f g))
>   _ _ _ k = k f g

Pipe primitives

The Free monad transformer allows us to write the primitives using the convenient liftF combinator.

> awaitE :: Monad m =>        Pipe l i o u m (Either u i)
> awaitE = liftF $ Await Right Left
> 
> yield :: Monad m => o ->    Pipe l i o u m ()
> yield b = liftF $ Yield () pass b
> 
> leftover :: Monad m => l -> Pipe l i o u m ()
> leftover l = liftF $ Leftover () l

The Conduit implementation is a bit crufty in comparison, but obviously identical.

awaitE :: Pipe l i o u m (Either u i)
awaitE = NeedInput (Done . Right) (Done . Left)

yield :: Monad m => o -> Pipe l i o u m ()
yield = HaveOutput (Done ()) (return ())

leftover :: l -> Pipe l i o u m ()
leftover = Leftover (Done ())

Pipe composition

> (<+<) :: Monad m => Pipe Void i' o u' m r -> Pipe l i i' u m u' -> Pipe l i o u m r
> p1 <+< p2 = composeWithFinalizer pass p1 p2
> (<?<) :: Monad m => Pipe Void i' o u' m r -> Pipe l i i' u m u' -> Pipe l i o u m r
> p1 <?< p2 = composeWithFinalizer unreachable p1 p2

Conduit uses the same technique of defining <+< in terms of a "compose with finalizer" function. Well to be honest, I stole the technique from Conduit code, because I just couldn’t figure out how to do it on my own. However, after I got the idea from Conduit, I implemented it separately. I knew that Conduit didn’t use unreachable, but that doesn’t really change the behavior of the code. There is another important difference that I will point out. Let’s compare the code case by case.

> composeWithFinalizer :: Monad m => m ()
>                  -> Pipe Void i' o u' m r -> Pipe l i i' u m u' -> Pipe l i o u m r
> composeWithFinalizer finalizeUpstream p1 p2 = pipeM $ do
>   x1 <- runPipeM p1
>   let p1' = pipeM $ return x1
>   runPipeM $ pipeCase x1
pipe' final left right =
  case right of
    PipeM mp -> PipeM (liftM (pipe' final left) mp)

Note that one unimportant difference is that pipe' has the two pipe inputs in the opposite order of composeWithFinalizer. So left is p2 and right is p1. We both begin by casing on the downstream pipe.

>   {- Return -} (\r       -> lift finalizeUpstream >> return r)
    Done r2 -> PipeM (final >> return (Done r2))

If downstream returns, we both run the current finalizer and then return the same result.

>   {- L-over -} (\_next l -> absurd l)
    Leftover _ i -> absurd i

Obviously the same.

>   {- Yield  -} (\next finalizeDownstream o ->
>                        let (<*<) = composeWithFinalizer finalizeUpstream
>                        in wrap $ Yield
>                            (next <*< p2)
>                            (finalizeUpstream >> finalizeDownstream)
>                            o)
    HaveOutput p c o -> HaveOutput (pipe' final left p) c o

Notice that (next <*< p2) is identical to pipe' final left p, we both resuse the current finalizer for the next computation. And we both yield the o without modification. However, there is an important difference: in the yield construct, I have created a new finalizer by combining the "current" finalizeUpstream with the finalizer found inside the yield we are inspecting. This way, when control is transferred further downstream, both p1 and p2 will have a chance to be finalized. The conduit-0.5.2.2 implementation does not factor in the current upstream finalizer (instead, it just passes c along), and as I will later demonstrate, this causes undesirable behavior. I have to admit, when I saw this discrepancy, I was unsure whether I had missed something, or whether I was right. I put a lot of effort into part 5 explaining finalization, and it turns out that I was right, but not without a grave mistake of my own, which I shall also demonstrate.

Let’s press on with our comparison.

>   {- Await  -} (\f1 g1 -> pipeM $ do
>     x2 <- runPipeM p2
>     runPipeM $ pipeCase x2
    NeedInput rp rc -> upstream rp rc
  where
    upstream rp rc =
      case left of
        PipeM mp -> PipeM (liftM (\left' -> pipe' final left' right) mp)

In the event of downstream await, control shifts upstream in both implementations.

>     {- Return -} (\u'     -> g1 u' <+< return u')
        Done r1 -> pipe (Done r1) (rc r1)

In the absence of abort, we must return to the broken record technique: just keep giving the upstream result every time an upstream value is awaited. This is identical to Conduit behavior.

>     {- L-over -} (\next l -> wrap $ Leftover (p1' <?< next) l)
        Leftover left' i -> Leftover (pipe' final left' right) i

Here the only difference is that I use unreachable while Conduit just passes the current finalizer. Since it will never be reached, the behavior is the same.

>     {- Yield  -} (\next newFinalizer o ->
>                       let (<*<) = composeWithFinalizer newFinalizer
>                       in f1 o <*< next)
        HaveOutput left' final' o -> pipe' final' left' (rp o)

When upstream yields to downstream, the choice is obvoius. A new upstream finalizer is provided, so we both use that.

>     {- Await  -} (\f2 g2 -> wrap $ Await
>                           (\i -> p1' <?< f2 i)
>                           (\u -> p1' <?< g2 u)))
        NeedInput left' lc -> NeedInput
          (\a -> pipe' final (left' a) right)
          (\r0 -> pipe' final (lc r0) right)

This is also the same, modulo unreachable. Notice how in our code, we had to bind p1', the pipe we got after runPipeM p1. We wouldn’t want to re-invoke those effects all over again; they should only be invoked once. The Conduit code doesn’t have to worry about that, since it partitions effects into PipeM.

> (>+>) :: Monad m => Pipe l i i' u m u' -> Pipe Void i' o u' m r -> Pipe l i o u m r
> (>+>) = flip (<+<)
> 
> infixr 9 <+<
> infixr 9 >+>

Running a pipeline

It is easy to observe that runPipe is the same.

> runPipe :: Monad m => Pipeline m r -> m r
> runPipe p = do
>   e <- runPipeM p
>   pipeCase e
>   {- Return -} (\r             -> return r)
>   {- L-over -} (\_next l       -> absurd l)
>   {- Yield  -} (\_next _fin o  -> absurd o)
>   {- Await  -} (\f _g          -> runPipe $ f ())
runPipe :: Monad m => Pipe Void () Void () m r -> m r
runPipe (PipeM mp) = mp >>= runPipe
runPipe (Done r)              = return r
runPipe (Leftover _ i)        = absurd i
runPipe (HaveOutput _ _ o)    = absurd o
runPipe (NeedInput _ c)       = runPipe (c ())

Getting rid of leftovers

The code is a little more involved here, but inspect each case and you’ll see that our implementations of injectLeftovers are also identical.

> injectLeftovers :: Monad m => Pipe i i o u m r -> Pipe l i o u m r
> injectLeftovers = go [] where
>   go ls p = pipeM $ do
>     x <- runPipeM p
>     runPipeM $ pipeCase x
>     {- Return -} (\r -> return r)
>     {- L-over -} (\next l -> go (l:ls) next)
>     {- Yield  -} (\next fin o -> wrap $ Yield (go ls next) fin o)
>     {- Await  -} (\f g -> case ls of
>       [] -> wrap $ Await (go [] . f) (go [] . g)
>       l : ls' -> go ls' (f l))
injectLeftovers :: Monad m => Pipe i i o u m r -> Pipe l i o u m r
injectLeftovers =
    go []
  where
    go ls (PipeM mp) = PipeM (liftM (go ls) mp)
    go _ (Done r) = Done r
    go ls (Leftover p l) = go (l:ls) p
    go ls (HaveOutput p c o) = HaveOutput (go ls p) c o
    go [] (NeedInput p c) = NeedInput (go [] . p) (go [] . c)
    go (l:ls) (NeedInput p _) = go ls $ p l

Adding finalizers to a pipe

cleanupP and addCleanup differ only in a matter of style: Conduit’s addCleanup finalizer takes a Bool input to determine whether termination is "normal" or "abnormal", while cleanupP takes two separate finalizers to cover the two cases. The third abort case is obviously removed with the removal of abort.

> cleanupP :: Monad m => m () -> m () -> Pipe l i o u m r
>          -> Pipe l i o u m r
> cleanupP discardedFinalize returnFinalize = go where
>   go p = pipeM $ do
>     x <- runPipeM p
>     runPipeM $ pipeCase x
addCleanup :: Monad m => (Bool -> m ()) -> Pipe l i o u m r -> Pipe l i o u m r
addCleanup cleanup (PipeM msrc) = PipeM (liftM (addCleanup cleanup) msrc)

Identical modulo pipeM/runPipeM.

>     {- Return -} (\r -> lift returnFinalize >> return r)
addCleanup cleanup (Done r) = PipeM (cleanup True >> return (Done r))

Here we see both invoke the "normal termination" finalizer.

>     {- L-over -} (\next l -> wrap $ Leftover (go next) l)
addCleanup cleanup (Leftover p i) = Leftover (addCleanup cleanup p) i

Identical.

>     {- Yield  -} (\next finalizeRest o -> wrap $
>                       Yield (go next) (finalizeRest >> discardedFinalize) o)
addCleanup cleanup (HaveOutput src close x) = HaveOutput
    (addCleanup cleanup src)
    (cleanup False >> close)
    x

Here we see both will pass along the "abnormal termination" finalizer. However, we chose to order them differently. This may be significant.

>     {- Await  -} (\f g -> wrap $ Await (go . f) (go . g))
addCleanup cleanup (NeedInput p c) = NeedInput
    (addCleanup cleanup . p)
    (addCleanup cleanup . c)

Identical.

> finallyP :: Monad m => m () -> Pipe l i o u m r -> Pipe l i o u m r
> finallyP finalize = cleanupP finalize finalize
> 
> catchP :: Monad m => m () -> Pipe l i o u m r -> Pipe l i o u m r
> catchP finalize = cleanupP finalize pass
> 
> successP :: Monad m => m () -> Pipe l i o u m r -> Pipe l i o u m r
> successP finalize = cleanupP pass finalize

I didn’t see these combinators provided by Conduit, but they are nothing more than trivial wrappers around addCleanup. I patterned bracketP after the Conduit code so it should be no surprise that they are identical modulo pipeM/runPipeM. I think my code is a touch more readable, though I cannot speak for efficiency.

> bracketP :: MonadResource m => IO a -> (a -> IO ()) -> (a -> Pipe l i o u m r)
>          -> Pipe l i o u m r
> bracketP create destroy mkPipe = do
>   (key, val) <- lift $ allocate create destroy 
>   finallyP (release key) (mkPipe val)
bracketP alloc free inside =
    PipeM start
  where
    start = do
        (key, seed) <- allocate alloc free
        return $ addCleanup (const $ release key) (inside seed)

Finalization and associativity of composition

Let’s explore the discrepancy in finalization.

> finallyC :: Monad m => m () -> C.Pipe l i o u m r -> C.Pipe l i o u m r
> finallyC fin = C.addCleanup (const fin)
> 
> idC :: Monad m => C.Pipe l i i u m u
> idC = C.awaitForever C.yield
> 
> printerC :: Show i => C.Pipe l i Void u IO u
> printerC = C.awaitForever $ lift . print
> 
> idMsgC :: String -> C.Pipe l i i u IO u
> idMsgC msg = finallyC (putStrLn msg) idC
> 
> takeC :: Monad m => Int -> C.Pipe l i i u m ()
> takeC 0 = return ()
> takeC n = C.awaitE >>= \ex -> case ex of
>   Left _u -> return ()
>   Right i -> C.yield i >> takeC (pred n)
> 
> testPipeC :: Show o => C.Pipe Void Int o () IO r -> IO r
> testPipeC p = C.runPipe $ printerC C.<+< p C.<+< C.sourceList [1..]

Now that we’re equipped with a few convenient ways to create pipes with finalizers, let’s see what happens when we compose three pipes together: the farthest downstream will cause termination, and the two upstream of it will both contain finalizers.

ghci> testPipeC $ (takeC 2 C.<+< idMsgC "foo") C.<+< idMsgC "bar"
  1
  2
  foo
  bar

ghci> testPipeC $ takeC 2 C.<+< (idMsgC "foo" C.<+< idMsgC "bar")
  1
  2
  foo

Where did the "bar" go? It is as I suspected, conduit-0.5.2.2 drops the up-upstream finalizers. While I certainly approve of the use of ResourceT, I’m afraid that relying on it too much could be hiding these sorts of bugs in Conduit code.

The deeply scary thing about this is that it illustrates that conduit composition is not associative. It’s known now that pipes with upstream results do not behave entirely like a Category, but they nevertheless should try to behave as much like a Category as possible, especially when you are constructing, composing, and running pipes using only the primitives provided.

Let’s take a look at my implementation and see how it handles this situation.

> fromList :: Monad m => [o] -> Producer o m ()
> fromList = mapM_ yield
> 
> awaitForever :: Monad m => (i -> Pipe l i o u m r) -> Pipe l i o u m u
> awaitForever f = go where
>   go = awaitE >>= \ex -> case ex of
>     Left u  -> return u
>     Right i -> f i >> go
> 
> pipe :: Monad m => (i -> o) -> Pipe l i o u m u
> pipe f = awaitForever $ yield . f
> 
> idP :: Monad m => Pipe l i i u m u
> idP = pipe id
> 
> printer :: Show i => Consumer l i u IO u
> printer = awaitForever $ lift . print
> 
> idMsg :: String -> Pipe l i i u IO u
> idMsg msg = finallyP (putStrLn msg) idP
> 
> take' :: Monad m => Int -> Pipe l i i u m ()
> take' 0 = return ()
> take' n = awaitE >>= \ex -> case ex of
>   Left _u -> return ()
>   Right i -> yield i >> take' (pred n)
> 
> testPipe :: Show o => Pipe Void Int o () IO r -> IO r
> testPipe p = runPipe $ printer <+< p <+< fromList [1..]
ghci> testPipe $ (take' 2 <+< idMsg "foo") <+< idMsg "bar"
  1
  2
  foo
  bar

ghci> testPipe $ take' 2 <+< (idMsg "foo" <+< idMsg "bar")
  1
  2
  bar
  foo

Ugh! While it didn’t drop the bar finalizer (yay!), my choices for "consistency" were obviously wrong, because it still does not preserve associativity of composition.

> printerF :: Show i => F.Frame Void IO (F.M i) F.C r
> printerF = foreverR $ (F.await !>= liftU . print)
> 
> idMsgF :: String -> F.Frame i IO (F.M i) F.C r
> idMsgF msg = F.finallyF (putStrLn msg) F.idF
> 
> takeF :: Int -> F.Frame i IO (F.M i) F.C ()
> takeF 0 = F.close
> takeF n = F.await !>= F.yield !> takeF (pred n)
> 
> fromListF :: [o] -> F.Frame o IO (F.M i) F.C ()
> fromListF xs = F.close !> mapMR_ F.yield xs
> 
> testPipeF :: Show o => F.Frame o IO (F.M Int) F.C () -> IO ()
> testPipeF p = F.runFrame $ printerF F.<-< p F.<-< fromListF [1..]
ghci> testPipeF $ (takeF 2 F.<-< idMsgF "foo") F.<-< idMsgF "bar"
  1
  2
  bar
  foo

ghci> testPipeF $ takeF 2 F.<-< (idMsgF "foo" F.<-< idMsgF "bar")
  1
  2
  bar
  foo

Looks like somebody got it right. :)

Next time

There is no next time; that’s it folks! Personally, I will be taking a closer look at the order of finalizers; hopefully we can pick an order that always preserves the associativity of composition, and patch that into the next version of conduit!

There are still a lot of interesting options to explore when it comes to implementing pipes. See also:

You can play with this code for yourself by downloading PipeConduit.lhs.

Posted in Uncategorized | Leave a comment