An encoding of a simple programming language well-typed by construction, first in Haskell and then in Rust.
Haskell is notorious for its sophisticated type system, in which we express certain properties as types and let type checking enforce them.
Consider the task of constructing an AST for a simple programming language with booleans, integers, addition, and if–then–else. We’d like to forbid certain nonsensical expressions, such as:
- adding to booleans to integers
- ifs with non-boolean conditions
- ifs whose then and else branches yield different types
Rather than implementing a separate function to check well-typedness prior to evaluation, we’d rather enforce that only well-typed expressions can ever be instantiated.
As per Typed Tagless Final Interpreters, we can employ GADTs, which, broadly speaking, enable us to assign more precise types to data constructors; for example:
data Expr v where
EBool :: Bool -> Expr Bool
EInt :: Int -> Expr Int
EAdd :: Expr Int -> Expr Int -> Expr Int
EIf :: Expr Bool -> Expr v -> Expr v -> Expr v
The main idea is to let each data constructor (EBool, EInt, EAdd,
and EIf ) pick a suitable type for v. Namely, v defines the type
of the expression produced by the data constructor, e.g. EBool is an
Expr Bool.
Attempts to violate our rules are met with type errors, say adding 10
to false:
λ> EAdd (EInt 10) (EBool False)
Triggers a type-error:
• Couldn't match type ‘Bool’ with ‘Int’
Expected: Expr Int
Actual: Expr Bool
• In the second argument of ‘EAdd’, namely ‘(EBool False)’
In the expression: EAdd (EInt 10) (EBool False)
In an equation for ‘it’: it = EAdd (EInt 10) (EBool False)
For completeness, let’s write a simple recursive evaluator:
eval :: Expr v -> v
eval (EBool v) = v
eval (EInt v) = v
eval (EAdd l r) = eval l + eval r
eval (EIf c t e) = if eval c then eval t else eval e
GHC is clever enough to use the matched data constructor to refine the
return type. So, an Expr Book evaluates to a Bool and an Expr Int
to an Int.
We could improve this much further, e.g. by constraining the
kind of the type parameter v in the definition of Expr v to valid
types. However, I’m more interested in the Rust encoding, so let’s jump
into that.
Rust encoding
Although Rust lacks GADTs, we can take inspiration from the concept and perhaps reap some benefits.
Each node in the AST will get its own type, unrelated to each other:
#[derive(Debug, Clone, Copy)]
struct EBool(bool);
#[derive(Debug, Clone, Copy)]
struct EInt(i64);
#[derive(Debug, Clone, Copy)]
struct EAdd<Left, Right>(Left, Right);
#[derive(Debug, Clone, Copy)]
struct EIf<Cond, Then, Alt> {
cond: Cond,
then: Then,
alt: Alt,
}
Note that what otherwise would be recursive elements EIf‘cond are
replaced with type parameters.
Also, although we could constrain the type parameters already,
for simplicity’s sake, let’s add the bounds only to Eval (so, it’ll be
possible instantiate invalid expressions, but not evaluate them):
// Outcome is a little extra protection to limit Eval to manipulate only bools and i64s. TODO: Make it sealed.
trait Outcome {}
impl Outcome for bool {}
impl Outcome for i64 {}
trait Eval {
type Out: Outcome;
fn eval(self) -> Self::Out;
}
Each node will implement Eval and get to pick the type of the
evaluation outcome and therefore its type:
impl Eval for EBool {
type Out = bool;
fn eval(self) -> Self::Out {
self.0
}
}
impl Eval for EInt {
type Out = i64;
fn eval(self) -> Self::Out {
self.0
}
}
impl<Left, Right> Eval for EAdd<Left, Right>
where
Left: Eval<Out = i64>,
Right: Eval<Out = i64>,
{
type Out = i64;
fn eval(self) -> Self::Out {
self.0.eval() + self.1.eval()
}
}
impl<Cond, Then, Alt, O> Eval for EIf<Cond, Then, Alt>
where
Cond: Eval<Out = bool>,
Then: Eval<Out = O>,
Alt: Eval<Out = O>,
O: Outcome,
{
type Out = O;
fn eval(self) -> Self::Out {
if self.cond.eval() {
self.then.eval()
} else {
self.alt.eval()
}
}
}
We added trait bounds to each implementation block and constrained each
node’s Eval<Out> to a suitable type.
For example,
Cond: Eval<Out = bool>, Then: Eval<Out = O>, Alt: Eval<Out = O>, O: Outcome
means that an if’s condition must evaluate to a boolean, both branches
to the same type O, and O must be a valid type (bool or i64, the
only implementers of Outcome).
We can evaluate expressions as:
let e1 = EIf {
cond: EBool(true),
then: EBool(false),
alt: EBool(true),
};
dbg!(e1.eval());
let e2 = EAdd(EInt(10), EInt(20));
dbg!(e2.eval());
Similar to Haskell, attempts to evaluate nonsensical expressions are type errors, e.g.:
let e3 = EAdd(EInt(10), EBool(false));
dbg!(e3.eval());
Triggers a type-error:
error[E0599]: the method `eval` exists for struct `EAdd<EInt, EBool>`, but its trait bounds were not satisfied
--> src/main.rs:75:9
|
4 | struct EBool(bool);
| ------------ doesn't satisfy `<EBool as Eval>::Out = i64`
...
10 | struct EAdd<Left, Right>(Left, Right);
| ------------------------ method `eval` not found for this struct because it doesn't satisfy `EAdd<EInt, EBool>: Eval`
...
75 | dbg!(e3.eval());
| ^^^^ method cannot be called on `EAdd<EInt, EBool>` due to unsatisfied trait bounds
|
note: trait bound `<EBool as Eval>::Out = i64` was not satisfied
Although there’s a lot we could improve on, it’s good enough for today.
Conclusion
We saw how we could use type-parameters and bounds to classify expressions into valid types and thus type-check them by construction.