An encoding of a simple programming language well-typed by construction, first in Haskell and then in Rust.

Haskell is notorious for its sophisticated type system, in which we express certain properties as types and let type checking enforce them.

Consider the task of constructing an AST for a simple programming language with booleans, integers, addition, and if–then–else. We’d like to forbid certain nonsensical expressions, such as:

adding to booleans to integers
ifs with non-boolean conditions
ifs whose then and else branches yield different types

Rather than implementing a separate function to check well-typedness prior to evaluation, we’d rather enforce that only well-typed expressions can ever be instantiated.

As per Typed Tagless Final Interpreters, we can employ GADTs, which, broadly speaking, enable us to assign more precise types to data constructors; for example:

data Expr v where
  EBool :: Bool -> Expr Bool
  EInt :: Int -> Expr Int
  EAdd :: Expr Int -> Expr Int -> Expr Int
  EIf :: Expr Bool -> Expr v -> Expr v -> Expr v

The main idea is to let each data constructor (EBool, EInt, EAdd, and EIf ) pick a suitable type for v. Namely, v defines the type of the expression produced by the data constructor, e.g. EBool is an Expr Bool.

Attempts to violate our rules are met with type errors, say adding 10 to false:

λ> EAdd (EInt 10) (EBool False)

Triggers a type-error:

• Couldn't match type ‘Bool’ with ‘Int’
  Expected: Expr Int
    Actual: Expr Bool
• In the second argument of ‘EAdd’, namely ‘(EBool False)’
  In the expression: EAdd (EInt 10) (EBool False)
  In an equation for ‘it’: it = EAdd (EInt 10) (EBool False)

For completeness, let’s write a simple recursive evaluator:

eval :: Expr v -> v
eval (EBool v) = v
eval (EInt v) = v
eval (EAdd l r) = eval l + eval r
eval (EIf c t e) = if eval c then eval t else eval e

GHC is clever enough to use the matched data constructor to refine the return type. So, an Expr Book evaluates to a Bool and an Expr Int to an Int.

We could improve this much further, e.g. by constraining the kind of the type parameter v in the definition of Expr v to valid types. However, I’m more interested in the Rust encoding, so let’s jump into that.

Rust encoding

Although Rust lacks GADTs, we can take inspiration from the concept and perhaps reap some benefits.

Each node in the AST will get its own type, unrelated to each other:

#[derive(Debug, Clone, Copy)]
struct EBool(bool);

#[derive(Debug, Clone, Copy)]
struct EInt(i64);

#[derive(Debug, Clone, Copy)]
struct EAdd<Left, Right>(Left, Right);

#[derive(Debug, Clone, Copy)]
struct EIf<Cond, Then, Alt> {
    cond: Cond,
    then: Then,
    alt: Alt,
}

Note that what otherwise would be recursive elements EIf‘cond are replaced with type parameters.

Also, although we could constrain the type parameters already, for simplicity’s sake, let’s add the bounds only to Eval (so, it’ll be possible instantiate invalid expressions, but not evaluate them):

// Outcome is a little extra protection to limit Eval to manipulate only bools and i64s. TODO: Make it sealed.
trait Outcome {}

impl Outcome for bool {}
impl Outcome for i64 {}

trait Eval {
    type Out: Outcome;

    fn eval(self) -> Self::Out;
}

Each node will implement Eval and get to pick the type of the evaluation outcome and therefore its type:

impl Eval for EBool {
    type Out = bool;

    fn eval(self) -> Self::Out {
        self.0
    }
}

impl Eval for EInt {
    type Out = i64;

    fn eval(self) -> Self::Out {
        self.0
    }
}

impl<Left, Right> Eval for EAdd<Left, Right>
where
    Left: Eval<Out = i64>,
    Right: Eval<Out = i64>,
{
    type Out = i64;

    fn eval(self) -> Self::Out {
        self.0.eval() + self.1.eval()
    }
}

impl<Cond, Then, Alt, O> Eval for EIf<Cond, Then, Alt>
where
    Cond: Eval<Out = bool>,
    Then: Eval<Out = O>,
    Alt: Eval<Out = O>,
    O: Outcome,
{
    type Out = O;

    fn eval(self) -> Self::Out {
        if self.cond.eval() {
            self.then.eval()
        } else {
            self.alt.eval()
        }
    }
}

We added trait bounds to each implementation block and constrained each node’s Eval<Out> to a suitable type.

For example, Cond: Eval<Out = bool>, Then: Eval<Out = O>, Alt: Eval<Out = O>, O: Outcome means that an if’s condition must evaluate to a boolean, both branches to the same type O, and O must be a valid type (bool or i64, the only implementers of Outcome).

We can evaluate expressions as:

let e1 = EIf {
    cond: EBool(true),
    then: EBool(false),
    alt: EBool(true),
};
dbg!(e1.eval());

let e2 = EAdd(EInt(10), EInt(20));
dbg!(e2.eval());    

Similar to Haskell, attempts to evaluate nonsensical expressions are type errors, e.g.:

let e3 = EAdd(EInt(10), EBool(false));
dbg!(e3.eval());

Triggers a type-error:

error[E0599]: the method `eval` exists for struct `EAdd<EInt, EBool>`, but its trait bounds were not satisfied
  --> src/main.rs:75:9
   |
 4 | struct EBool(bool);
   | ------------ doesn't satisfy `<EBool as Eval>::Out = i64`
...
10 | struct EAdd<Left, Right>(Left, Right);
   | ------------------------ method `eval` not found for this struct because it doesn't satisfy `EAdd<EInt, EBool>: Eval`
...
75 | dbg!(e3.eval());
   |         ^^^^ method cannot be called on `EAdd<EInt, EBool>` due to unsatisfied trait bounds
   |
note: trait bound `<EBool as Eval>::Out = i64` was not satisfied

Although there’s a lot we could improve on, it’s good enough for today.

Conclusion

We saw how we could use type-parameters and bounds to classify expressions into valid types and thus type-check them by construction.

Tags: haskell rust