A length-indexed Vector in Rust

Sneak peeking at Dependent Types and an encoding of length-indexed vectors in Rust to constraint consumers on operations they can perform.


It’s customary to introduce Dependent Types (types as first-class citizens that may depend on terms) with vectors (or rather linked-lists) parameterized by a type A, indexed by a type N of Peano-encoded (unary) natural numbers, and two constructors:

  • VEmpty: an empty vector with length Zero ($0$)
  • VCons: adding an element into a vector with length N gives back a vector with length Succ(N) ($N + 1$)

From there, we implement all sorts of operations and establish relationships between input/output vectors and their lengths N via constructors to provide some useful guarantees, such as:

  1. Accessing the first element of a vector is only allowed on IndexedVec<Succ<N>> for some N, i.e. it’s impossible to read from an empty vector
  2. Mapping an IndexedVec<A, N> with f: A -> B produces an IndexedVec<B, N>, i.e. mapping preserves lengths
  3. Appending an IndexedVec<A, M> onto IndexedVec<A, N> produces an IndexedVec<A, N + M
  4. Zipping two vectors is only allowed when their lengths are equal

Haskell encoding

To start off, let’s write it in Haskell with just enough GHC extensions enabled to get us somewhere “close” to dependent typing:

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE TypeFamilies #-}

import GHC.Types (Type)

data Nat = Zero | Succ Nat deriving Show

type family Add n m where
  Add Zero     m = m -- 0 + m = m
  Add (Succ n) m = Succ (Add n m) -- (n + 1) + m = 1 + (n + m)

data IndexedVec (a :: Type) (n :: Nat) where
  VEmpty :: IndexedVec a Zero                           -- an empty vector has length 0
  VCons :: a -> IndexedVec a n -> IndexedVec a (Succ n) -- add an element to vector increments its length

deriving instance (Show a) => Show (IndexedVec a n)
ivhead :: IndexedVec a (Succ n) -> a
ivhead (VCons a _) = a
-- >>> ivhead (1 `VCons` VEmpty)

ivmap :: (a -> b) -> IndexedVec a n -> IndexedVec b n
ivmap _ VEmpty = VEmpty
ivmap f (VCons a as) = VCons (f a) (ivmap f as)
-- >>> ivmap (+10) (1 `VCons` VEmpty)

ivappend :: IndexedVec a n -> IndexedVec a m -> IndexedVec a (Add n m)
ivappend VEmpty       ys = ys
ivappend (VCons x xs) ys = VCons x (ivappend xs ys)
-- >>> ivappend (1 `VCons` VEmpty) (2 `VCons` VEmpty)

ivzip :: IndexedVec a n -> IndexedVec b n -> IndexedVec (a, b) n
ivzip VEmpty VEmpty = VEmpty
ivzip (VCons a as) (VCons b bs) = VCons (a, b) (ivzip as bs)
-- >>> ivzip (1 `VCons` VEmpty) ("a" `VCons` VEmpty)

It’s worth noting that ivhead is a total-function: we specify Succ m in the type-signature, so GHC knows that only VCons can inhabit such a type and therefore the VEmpty branch isn’t needed – consumers can’t call into this function with an empty vector. Similarly in ivzip, by pattern-matching on constructors, we refine N, such that if the left vector was VEmpty, then the right must be too (the same applies to VCons).

We could keep on implementing more functions (as I did here or even better go with a proper package like vec), but that’d require more effort and what we already have is enough to illustrate the main point:

The more correctness properties we encode into types, the more we get from the type-system.

Additionally, we’ve made it impossible to ever produce invalid vectors with respect to their lengths. For instance, an empty vector whose length isn’t Zero, or an adding vector whose length isn’t Succ(N) (N being the length of the added-to vector).

This means that an IndexedVec<A, N> is valid by construction and that’s a strong property.

We ensure both:

  • consumers can only ever see valid vectors
  • implementers of functions on vectors must obey the constraints imposed by the constructors

Now, classes of errors due to consumers misusing our API or implementers providing incorrect implementations are type errors.

By the way, there is a deeper relationship between constraints and guarantees that we might explore someday. Check this presentation Constraints liberate, liberate constraints for ideas.

Rust encoding

Although Rust doesn’t have Dependent Types and hence can’t offer the same lever of protection to consumers and implementers. We can at least get something a little close in terms of guarantees to consumers, if we’re willing to push things further with some machinery.

We shall tackle the problem as follows:

Encapsulate IndexedVec and functions over it in a module where we establish trust:

  • Inside the module, all bets are off and we’re on our own to write the implementation without any assurance: we can see everything and do anything
  • Outside the module, we can only perform operations constrained: we can only see what type-signatures allow us to see

IMPORTANT: There’s no type-system-verified relationship between type-signatures and their bodies, it’s upon us (the implementers) to ensure they hold.

In Haskell, the constructors held tight relationships between themselves and their lengths. In Rust, we use modules (with a couple of tricks) to restrict how consumers can create and modify vectors and thus artificially establish relationships roughly similar Haskell.

We can do with by having just a few selected functions as the only way to access/manipulate the internal representation of IndexedVec.

Namely, we produce an empty vector with IndexedVec<A, _>::default returning IndexedVec<A, Zero> (like VEmpty). We add an element A into an IndexedVec<A, N> with IndexedVec<A, N>::pushed returning an IndexedVec<A, Succ<N>> ( like VCons).

Additionally, all functions moves input vectors and produce new vectors (internally by simply consuming thr input). This happens because calling functions change types, e.g. from IndexedVec<A, N> to IndexedVec<A, Succ<N>> to encode the usage protocol we want, akin to the Type-state pattern.

The code looks like this:

pub mod ivec {
    use std::marker::PhantomData;

    pub trait Nat: private::Sealed {}

    #[derive(Debug)]
    pub struct Zero;
    impl private::Sealed for Zero {}
    impl Nat for Zero {}

    #[derive(Debug)]
    pub struct Succ<N: Nat>(N);
    impl<N: Nat> private::Sealed for Succ<N> {}
    impl<N: Nat> Nat for Succ<N> {}

    pub trait AddI<M: Nat>: Nat + private::Sealed {
        type Out: Nat;
    }
    impl<M: Nat> AddI<M> for Zero {
        type Out = M;
    }
    impl<N, M: Nat> AddI<M> for Succ<N>
    where
        N: AddI<M>,
    {
    }

    pub type Add<N, M> = <N as AddI<M>>::Out;

    #[derive(Debug)]
    pub struct IndexedVec<A, N: Nat> {
        pub inner: Vec<A>,
        _len: PhantomData<N>,
    }

    impl<A> Default for IndexedVec<A, Zero> {
        fn default() -> Self {
            unverified_from(Vec::default())
        }
    }

    impl<A, N: Nat> IndexedVec<A, Succ<N>> {
        pub fn first(&self) -> &A {
            self.inner.first().unwrap()
        }
    }

    impl<A, N: Nat> IndexedVec<A, N> {
        pub fn pushed(mut self, value: A) -> IndexedVec<A, Succ<N>> {
            self.inner.push(value);
            unverified_from(self.inner)
        }

        pub fn zipped<B>(self, rhs: IndexedVec<B, N>) -> IndexedVec<(A, B), N> {
            unverified_from(self.inner.into_iter().zip(rhs.inner).collect())
        }

        pub fn appended<M: Nat>(mut self, mut rhs: IndexedVec<A, M>) -> IndexedVec<A, Add<N, M>>
        where
            N: AddI<M>,
        {
            self.inner.append(&mut rhs.inner);
            unverified_from(self.inner)
        }
    }

    fn unverified_from<A, N: Nat>(v: Vec<A>) -> IndexedVec<A, N> {
        IndexedVec {
            inner: v,
            _len: PhantomData::default(),
        }
    }

    mod private {
        pub trait Sealed {}
    }
}

Inside ivec, we as implementers don’t get much from the type-system in terms of bodies of functions satisfying their type-signatures – special mention to unverified_from.

The private unverified_from produces an IndexedVec<A, N> whose length N is fully controlled by callers and therefore itself doesn’t perform any sort of checking on the length of the input Vec<A> – like I said, no assurances for implementers. But since we control who can call into that by making it private to the module, we end up with a small surface to mess and/or audit.

However, assuming our implementation is correct, type-signatures propagate as constraints to consumers and restrict the usage of the API, triggering type-errors on misuses:

fn main() {
    use ivec::*;

    let x: IndexedVec<i32, Zero> = IndexedVec::<i32, _>::default();
    // x.first(); // empty vector -> doesn't type-check.
    assert_eq!(x.pushed(1).first(), &1);

    let y: IndexedVec<(i32, char), Succ<Succ<Zero>>> = IndexedVec::default()
        .pushed(1)
        .pushed(2)
        // .pushed(30) // zipping vectors with different lengths -> doesn't type-check.
        .zipped(IndexedVec::default().pushed('a').pushed('b'));
    assert_eq!(y.inner, vec![(1, 'a'), (2, 'b')]);

    let z: IndexedVec<i32, Succ<Succ<Succ<Succ<Zero>>>>> = IndexedVec::default()
        .pushed(1)
        .pushed(2)
        .appended(IndexedVec::default().pushed(3).pushed(4));
    assert_eq!(z.inner, vec![1, 2, 3, 4]);
}

Conclusion

We’ve seen how Dependent Types extends the capabilities of a type-checker to statically verify correctness properties and how we can reap some benefits in Haskell (really, GHC with a couple of extensions) and then we’ve built an encoding in Rust (by the way, perhaps we could simplify it with generic expressions in const-context or typenum?).

For simplicity, we’ve implemented only a handful of functions. Yet they should be enough to substantiate the main point:

We can express (some) properties as types and let the type-checker verifies them.

There are limits in terms of capabilities required from the type-system and the overall usability of resulting APIs. This is likely a trade-off between several factors at play, so it’s up to us to decide when and how this is appropriate.

There’s far more to be said about Dependent Types and the whole notion of types as first-class citizens. This installment barely scratches the surface, but I hope to inspire your curiosity.

Anyway, to be fair, this was more of an excuse for me to play with Rust (shh! that’s a secret).

Tags: haskell rust
Share: X (Twitter) Facebook LinkedIn