Sneak peeking at Dependent Types and an encoding of length-indexed vectors in Rust to constraint consumers on operations they can perform.
It’s customary to introduce Dependent
Types (types as
first-class citizens that may depend on terms) with vectors (or rather
linked-lists) parameterized by a type A
, indexed by a type N
of
Peano-encoded (unary)
natural numbers, and two constructors:
VEmpty
: an empty vector with lengthZero
($0$)VCons
: adding an element into a vector with lengthN
gives back a vector with lengthSucc(N)
($N + 1$)
From there, we implement all sorts of operations and establish
relationships between input/output vectors and their lengths N
via
constructors to provide some useful guarantees, such as:
- Accessing the first element of a vector is only allowed on
IndexedVec<Succ<N>>
for someN
, i.e. it’s impossible to read from an empty vector - Mapping an
IndexedVec<A, N>
withf: A -> B
produces anIndexedVec<B, N>
, i.e. mapping preserves lengths - Appending an
IndexedVec<A, M>
ontoIndexedVec<A, N>
produces anIndexedVec<A, N + M
- Zipping two vectors is only allowed when their lengths are equal
Haskell encoding
To start off, let’s write it in Haskell with just enough GHC extensions enabled to get us somewhere “close” to dependent typing:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE TypeFamilies #-}
import GHC.Types (Type)
data Nat = Zero | Succ Nat deriving Show
type family Add n m where
Add Zero m = m -- 0 + m = m
Add (Succ n) m = Succ (Add n m) -- (n + 1) + m = 1 + (n + m)
data IndexedVec (a :: Type) (n :: Nat) where
VEmpty :: IndexedVec a Zero -- an empty vector has length 0
VCons :: a -> IndexedVec a n -> IndexedVec a (Succ n) -- add an element to vector increments its length
deriving instance (Show a) => Show (IndexedVec a n)
ivhead :: IndexedVec a (Succ n) -> a
ivhead (VCons a _) = a
-- >>> ivhead (1 `VCons` VEmpty)
ivmap :: (a -> b) -> IndexedVec a n -> IndexedVec b n
ivmap _ VEmpty = VEmpty
ivmap f (VCons a as) = VCons (f a) (ivmap f as)
-- >>> ivmap (+10) (1 `VCons` VEmpty)
ivappend :: IndexedVec a n -> IndexedVec a m -> IndexedVec a (Add n m)
ivappend VEmpty ys = ys
ivappend (VCons x xs) ys = VCons x (ivappend xs ys)
-- >>> ivappend (1 `VCons` VEmpty) (2 `VCons` VEmpty)
ivzip :: IndexedVec a n -> IndexedVec b n -> IndexedVec (a, b) n
ivzip VEmpty VEmpty = VEmpty
ivzip (VCons a as) (VCons b bs) = VCons (a, b) (ivzip as bs)
-- >>> ivzip (1 `VCons` VEmpty) ("a" `VCons` VEmpty)
It’s worth noting that ivhead
is a total-function: we specify
Succ m
in the type-signature, so GHC knows that only VCons
can
inhabit such a type and therefore the VEmpty
branch isn’t needed –
consumers can’t call into this function with an empty vector. Similarly
in ivzip
, by pattern-matching on constructors, we refine N
, such
that if the left vector was VEmpty
, then the right must be too (the
same applies to VCons
).
We could keep on implementing more functions (as I did here or even better go with a proper package like vec), but that’d require more effort and what we already have is enough to illustrate the main point:
The more correctness properties we encode into types, the more we get from the type-system.
Additionally, we’ve made it impossible to ever produce invalid vectors
with respect to their lengths. For instance, an empty vector whose
length isn’t Zero
, or an adding vector whose length isn’t Succ(N)
(N
being the length of the added-to vector).
This means that an IndexedVec<A, N>
is valid by construction and
that’s a strong property.
We ensure both:
- consumers can only ever see valid vectors
- implementers of functions on vectors must obey the constraints imposed by the constructors
Now, classes of errors due to consumers misusing our API or implementers providing incorrect implementations are type errors.
By the way, there is a deeper relationship between constraints and guarantees that we might explore someday. Check this presentation Constraints liberate, liberate constraints for ideas.
Rust encoding
Although Rust doesn’t have Dependent Types and hence can’t offer the same lever of protection to consumers and implementers. We can at least get something a little close in terms of guarantees to consumers, if we’re willing to push things further with some machinery.
We shall tackle the problem as follows:
Encapsulate
IndexedVec
and functions over it in a module where we establish trust:
- Inside the module, all bets are off and we’re on our own to write the implementation without any assurance: we can see everything and do anything
- Outside the module, we can only perform operations constrained: we can only see what type-signatures allow us to see
IMPORTANT: There’s no type-system-verified relationship between type-signatures and their bodies, it’s upon us (the implementers) to ensure they hold.
In Haskell, the constructors held tight relationships between themselves and their lengths. In Rust, we use modules (with a couple of tricks) to restrict how consumers can create and modify vectors and thus artificially establish relationships roughly similar Haskell.
We can do with by having just a few selected functions as the only
way to access/manipulate the internal representation of IndexedVec
.
Namely, we produce an empty vector with IndexedVec<A, _>::default
returning IndexedVec<A, Zero>
(like VEmpty
). We add an element A
into an IndexedVec<A, N>
with IndexedVec<A, N>::pushed
returning an
IndexedVec<A, Succ<N>>
( like VCons
).
Additionally, all functions moves input vectors and produce new
vectors (internally by simply consuming thr input). This happens because
calling functions change types, e.g. from IndexedVec<A, N>
to
IndexedVec<A, Succ<N>>
to encode the usage protocol we want, akin to
the Type-state
pattern.
The code looks like this:
pub mod ivec {
use std::marker::PhantomData;
pub trait Nat: private::Sealed {}
#[derive(Debug)]
pub struct Zero;
impl private::Sealed for Zero {}
impl Nat for Zero {}
#[derive(Debug)]
pub struct Succ<N: Nat>(N);
impl<N: Nat> private::Sealed for Succ<N> {}
impl<N: Nat> Nat for Succ<N> {}
pub trait AddI<M: Nat>: Nat + private::Sealed {
type Out: Nat;
}
impl<M: Nat> AddI<M> for Zero {
type Out = M;
}
impl<N, M: Nat> AddI<M> for Succ<N>
where
N: AddI<M>,
{
type Out = Succ<<N as AddI<M>>::Out>;
}
pub type Add<N, M> = <N as AddI<M>>::Out;
#[derive(Debug)]
pub struct IndexedVec<A, N: Nat> {
pub inner: Vec<A>,
_len: PhantomData<N>,
}
impl<A> Default for IndexedVec<A, Zero> {
fn default() -> Self {
unverified_from(Vec::default())
}
}
impl<A, N: Nat> IndexedVec<A, Succ<N>> {
pub fn first(&self) -> &A {
self.inner.first().unwrap()
}
}
impl<A, N: Nat> IndexedVec<A, N> {
pub fn pushed(mut self, value: A) -> IndexedVec<A, Succ<N>> {
self.inner.push(value);
unverified_from(self.inner)
}
pub fn zipped<B>(self, rhs: IndexedVec<B, N>) -> IndexedVec<(A, B), N> {
unverified_from(self.inner.into_iter().zip(rhs.inner).collect())
}
pub fn appended<M: Nat>(mut self, mut rhs: IndexedVec<A, M>) -> IndexedVec<A, Add<N, M>>
where
N: AddI<M>,
{
self.inner.append(&mut rhs.inner);
unverified_from(self.inner)
}
}
fn unverified_from<A, N: Nat>(v: Vec<A>) -> IndexedVec<A, N> {
IndexedVec {
inner: v,
_len: PhantomData::default(),
}
}
mod private {
pub trait Sealed {}
}
}
Inside ivec
, we as implementers don’t get much from the type-system in
terms of bodies of functions satisfying their type-signatures – special
mention to unverified_from
.
The private
unverified_from
produces anIndexedVec<A, N>
whose lengthN
is fully controlled by callers and therefore itself doesn’t perform any sort of checking on the length of the inputVec<A>
– like I said, no assurances for implementers. But since we control who can call into that by making it private to the module, we end up with a small surface to mess and/or audit.
However, assuming our implementation is correct, type-signatures propagate as constraints to consumers and restrict the usage of the API, triggering type-errors on misuses:
fn main() {
use ivec::*;
let x: IndexedVec<i32, Zero> = IndexedVec::<i32, _>::default();
// x.first(); // empty vector -> doesn't type-check.
assert_eq!(x.pushed(1).first(), &1);
let y: IndexedVec<(i32, char), Succ<Succ<Zero>>> = IndexedVec::default()
.pushed(1)
.pushed(2)
// .pushed(30) // zipping vectors with different lengths -> doesn't type-check.
.zipped(IndexedVec::default().pushed('a').pushed('b'));
assert_eq!(y.inner, vec![(1, 'a'), (2, 'b')]);
let z: IndexedVec<i32, Succ<Succ<Succ<Succ<Zero>>>>> = IndexedVec::default()
.pushed(1)
.pushed(2)
.appended(IndexedVec::default().pushed(3).pushed(4));
assert_eq!(z.inner, vec![1, 2, 3, 4]);
}
Conclusion
We’ve seen how Dependent Types extends the capabilities of a type-checker to statically verify correctness properties and how we can reap some benefits in Haskell (really, GHC with a couple of extensions) and then we’ve built an encoding in Rust (by the way, perhaps we could simplify it with generic expressions in const-context?).
For simplicity, we’ve implemented only a handful of functions. Yet they should be enough to substantiate the main point:
We can express (some) properties as types and let the type-checker verifies them.
There are limits in terms of capabilities required from the type-system and the overall usability of resulting APIs. This is likely a trade-off between several factors at play, so it’s up to us to decide when and how this is appropriate.
There’s far more to be said about Dependent Types and the whole notion of types as first-class citizens. This installment barely scratches the surface, but I hope to inspire your curiosity.
Anyway, to be fair, this was more of an excuse for me to play with Rust (shh! that’s a secret).