Let’s stretch Go a bit and fake sum types for the sake of, well, fun.
Introduction
We can compose Algebraic Data Types (ADTs) as products (structs, records, conjunctions, etc), in Haskell:
data SocketAddress = SocketAddress { host :: Host, port :: Int }
Or sums (co-products, variants, disjunction, etc):
data Host = Hostname String | HostIp IpAddress
We construct instances of these types using constructors and deconstruct them with pattern-matching:
ping :: Host -> IO ()
ping = \case
Hostname h -> putStrLn $ "Pinging with hostname: " <> h
HostIp i -> putStrLn $ "Pinging with ip: " <> show i
We may call into ping as:
ping $ Hostname "localhost"
-- OUTPUT: Pinging with hostname: localhost
The key advantage is that an instance of Host is exactly a
Hostname or a HostIp and the choices are mutually-exclusive, always.
Moreover, it is statically impossible to access an IpAddress from a
Hostname (modulo some shenanigans). These guarantees apply to both the
implementers and users of the type. This ensures that illegal states are
unrepresentable, making the design correct by construction.
While most statically-typed languages natively support product types,
sum types are less common. We shall briefly revisit a way to fake
encode sum types in Go with a visitor.
Sums as visitation
First, we define our data types:
type Host struct {
name *Hostname
ip *HostIp
}
type Hostname struct {
Value string
}
type HostIp struct {
Value net.IPAddr
}
A Host has pointers to Hostname and HostIp, which
approximates the “either this or that” relationship. Crucially, it could
point to both simultaneously or none at all.
In isolation, a Host “is” either a Hostname or HostIp or both or
neither; nothing prevents both pointers from being set or left nil.
Let’s patch that up with a little encapsulation.
While I have decided to implement
Hostwith two pointers. We could implement it differently, e.g. two value types plus a boolean (or an enum) to disambiguate which variant we’re in. The main point stays as-is.
Instead of having the compiler enforcing the correct construction of
Host, we rely on encapsulation through (i) making name and ip
unexported fields (ii) factories. Therefore, we restrict the creation of
Host instances outside its package to these two factories:
func NewHostname(v string) Host {
return Host{name: &Hostname{v}}
}
func NewHostIp(v net.IPAddr) Host {
return Host{ip: &HostIp{v}}
}
To emphasise: The only way a user (more on that later) can
produce a Host is by calling either NewHostname or NewHostIp.
Furthermore, assuming that we have implemented both constructors
correctly: any instance of Host we receive from users is guaranteed,
by encapsulation, to represent a valid Hostname or HostIp. In
effect, the package that defines Host acts as a safe boundary,
establishing this relationship as an invariant that we, as
implementers, must uphold.
DISCLAIMER: This is not really accurate due to zero-value initialisation, e.g.
var aHost host.Hostcompiles and thus trivially violates the invariant as both pointers arenilnow. Oops! We could probably paper over it with documentation or even by makingHostitself unexported and exporting an interface instead. Although slightly safer, albeit more verbose, it still isn’t surefire.
With constructors out of the way, we’re only missing a way to
“pattern-match” on Host. For that, let’s introduce an aptly named
match function:
func match[T any](host *Host, caseName func(Hostname) T, caseIp func(HostIp) T) T {
switch {
case host.name != nil:
return caseName(*host.name)
case host.ip != nil:
return caseIp(*host.ip)
}
panic("Host must be a Hostname (via NewHostname) or a HostIp (via NewHostIp)")
}
Each variant gets its own function defining its behaviour.
The regrettable panic is there because only we as implementers
know that Host holds either a Hostname or a HostIp. The
compiler, on the other hand, is none the wiser and therefore cannot
check that invariant for us. We’ve basically moved away from
constructive to predicative
data.
matchis a function and not a method ofHostbecause at the time of writing methods cannot introduce type parameters.
For convenience, let’s add a function that matches on Host only for
its side-effects, i.e. make T effectively like void in C:
func matchDiscarding(host *Host, caseName func(Hostname), caseIp func(HostIp)) {
match(host, discarding(caseName), discarding(caseIp))
}
func discarding[T any](f func(T)) func(T) struct{} {
return func(t T) struct{} {
f(t)
return struct{}{}
}
}
With the machinery in-place, we’re finally ready to implement Ping:
func (h *Host) Ping() {
caseHostname := func(h Hostname) {
fmt.Printf("Pinging with hostname: %v", h.Value)
}
caseHostIp := func(h HostIp) {
fmt.Printf("Pinging with ip: %v", h.Value)
}
matchDiscarding(h, caseHostname, caseHostIp)
}
We specify what to do with each variant and offload the rest onto
matchDiscarding.
For exposition, I’ve used function literals. But we could (and probably should) use top-level functions.
The key advantage of this approach is that, as long as the
implementation is sound, users of Host (such as Ping or other
externally defined functions) cannot observe invalid states. From their
perspective, a Host is always either a Hostname or a HostIp, and
they can rely on pattern matching to be exhaustively checked.
Generalising into Either[L, R]
Looking into the implementation, we note that parts of it are not tied
to the specific Host type.
Let’s factor them out into a reusable, generic type for the case where
we have two variants into an Either.
package either
type Either[L, R any] struct {
left *L
right *R
}
func NewLeft[L, R any](l L) Either[L, R] {
return Either[L, R]{left: &l}
}
func NewRight[L, R any](r R) Either[L, R] {
return Either[L, R]{right: &r}
}
func Match[L, R, T any](e *Either[L, R], caseLeft func(L) T, caseRight func(R) T) T {
switch {
case e.left != nil:
return caseLeft(*e.left)
case e.right != nil:
return caseRight(*e.right)
}
panic("Either must be a Left (via NewLeft) or a Right (via NewRight)")
}
func MatchDiscarding[L, R any](e *Either[L, R], caseLeft func(L), caseRight func(R)) {
Match(e, discarding(caseLeft), discarding(caseRight))
}
func discarding[T any](f func(T)) func(T) struct{} {
return func(t T) struct{} {
f(t)
return struct{}{}
}
}
As this is just a toy, I didn’t give it much thought on pointer vs value parameters. A serious implementation should.
Then Host becomes a wrapper around Either:
type Host struct {
inner either.Either[Hostname, HostIp]
}
The constructors delegate to Either’s NewLeft and NewRight:
func NewHostname(v string) Host {
return Host{either.NewLeft[Hostname, HostIp](Hostname{v})}
}
func NewHostIp(v net.IPAddr) Host {
return Host{either.NewRight[Hostname, HostIp](HostIp{v})}
}
Finally, Ping calls into MatchDiscarding to handle the branching
logic:
func (h *Host) Ping() {
caseHostname := func(h Hostname) {
fmt.Printf("Pinging with hostname: %v", h.Value)
}
caseHostIp := func(h HostIp) {
fmt.Printf("Pinging with ip: %v", h.Value)
}
either.MatchDiscarding(&h.inner, caseHostname, caseHostIp)
}
Either[L, R] is probably fine when there are only two variants.
However, supporting more variants requires nesting (e.g.,
Either[Hostname, Either[HostIp, HostId]]), which can quickly become
unwieldy.
Conclusion
In conclusion, ADTs offer a powerful framework for modeling domain problems with type safety. Adding sum types enriches our vocabulary and reduces the likelihood of illegal states manifesting as bugs.
However, applying these concepts in languages without native support, like Go, can be challenging and may feel unnatural or non-idiomatic.
Despite the mechanics, reasoning in terms of ADTs as closed set of variants can be valuable. Even if we solve the problem using type-switches (naturally non-exhaustive) or a wholly different design, such as defining operations first and types second (often the more idiomatic approach in Go).
There’s much more to the topic, for instance the Expression Problem shows an interesting trade-off in the design space. But that’s for another day.