On Go 2 Generics
Last summer the Go team announced an updated design
draft for adding generics to the
language. Included in the announcement was a new experimentation tool, named
go2go
, for developers to compile, test, and run code written using the
generics feature described in the latest design draft. This blog post documents
my experience using the go2go
tool to add support for generics to an existing
PubSub
package.
tl;dr: here is the code diff
go2go
The go2go
tool is a working prototype. It can parse the new language syntax
and transform it to a sort of “intermediate” Go 1 code, which can then be fed
to the standard Go compiler and executed. I have read that it relies on
monomorphization to transform code, but I can’t say that I understand what that
means.
You can install the go2go
tool now by fetching & building the latest version
of the dev.go2go
branch of the Go project’s source
repository. There is also an online version at
The go2go Playground.
Generic PubSub
There are many ways generics can be useful in a codebase, but I am particularly interested in how they can be applied to “container” code. That is, code that typically implements a specialized algorithm or data structure unrelated to the structure of the objects it holds.
PubSub is a good fit for generics for a few reasons. The need for high performance (either low latency or high throughput) meshes well with compiler optimizations that generics enable. And it is typically used in highly concurrent programs where correctness checks are critical to avoiding data races and memory bugs. Generics can lift these checks from runtime to compile time, which helps the implementer build a correct PubSub implementation.
But, (in my opinion) the most important benefit is the elimination of “book keeping” code that simplifies the API surface area for consumers of the PubSub library. It’s a particularly tricky pattern to use correctly as a developer, so any reductions in API complexity can be a major boon to them.
github.com/benburkert/pubsub
The package I’ve chosen to analyze is a pretty bare bones PubSub implementation I wrote a while ago. It allows package users to publish and/or subscribe to a single stream of objects, via a function call/callback or channel send/receive. The objects can be of any type and are internally tracked as empty interface instances. There is a complicated API for adding type safe publishers and subscribers (enforced at runtime), which works with both functions and channels.
Under the hood, it is broken down into two major components: a fixed size ring buffer with a max number of readers/writers, and a PubSub layer that sits over the underlying buffer. The basic API is quite simple, but the API for typed functions and channels is not: the user has to manipulate the underlying buffer at times. It’s a leaky abstraction that I wish did not exist.
It has already shown to be quite reliable and performant in production environments, but (im)proving correctness is always a good thing. Likewise for removing potential runtime panics.
Caveats
At the moment, I’m only interested in how generics can improve the correctness
of this package, its APIs, and programs that consume them. As mentioned above,
go2go
is only a prototype, and I don’t expect it to produce a faster or
smaller binary, so I’ve skipped that analysis entirely. There will most
certainly be impacts on performance (both good and bad!), but any investigation
would be pure speculation until generics support moves beyond the prototype
phase.
The go2go
tool itself is still rough around the edges: error messages can be
confusing and un-googlable. And it may require code changes to get around
limitations of the tool. The go2go
playground is a better place to start
exploring with generics; I don’t recommend the go2go
tool route unless you
are working on an existing, multi-file package.
Overview
I spent two Sunday afternoons on this project, or about 10 hours total. I hadn’t kept up closely with the state of the generics language feature, so this included reading parts of the proposal and getting up to speed with the new tooling and syntax.
The changes clocked in at a little over 500 LOC, with about 100 lines of added code. Almost all of the added code was in additional tests.
I started by updating the Buffer
type, next the PubSub
type, and finally
the Publisher
, Subscriber
, and Context
types. Nothing else required changes.
Buffer
The major change to the Buffer
type was swapping out the underlying storage slice,
changing it from an empty interface slice (data []interface{}
) to a slice of
T
s, where T
is a comparable
type parameter on Buffer
. The comprable
trait was needed because the buffer uses an equality check (==
) to test if
the next position to read is marked empty (i.e. unwritten). Previously the
package used a typed byte value to mark the initial data slice items as empty.
But because the type of T
is now a type parameter, the global empty marker
trick no longer works.
To compensate, an empty value parameter was added to the NewBuffer
func and
an empty T
field added to Buffer
. The new data
slice of T
s is set to
the empty value instead of the zero value for T
. Careful: the empty value
cannot be written to the buffer, so using the T
's zero value may have very
bad consequences if the zero value of T
can be written by the program (i.e.
int(0)
). In practice, this shouldn’t be an issue because the PubSub
type
uses its own trick to avoid having the empty value of T
conflict with user
values.
After this small but cardinal set of changes, the rest was mostly swapping out
empty interfaces for type parameters, and adding additional Buffer
tests.
Since writing the latest version of this package, I have picked up the habit
of mostly writing table style
tests. The added
tests were done in this fashion, and they are structured pretty much the same
as non-generic table style tests. It feels like generics could improve these
table style tests further, but it’s not immediately obvious to me. Perhaps
there are improvements lurking around the corner, but for now it’s only a hope.
PubSub
Next up was the PubSub
type. Unlike the Buffer
type, it was important that
the PubSub
type be able to work on any type T
, not just comparable types
(i.e. [T any]
vs [T comparable]
). Otherwise, maps and slices would be
unsupported type parameters. This requirement introduced a catch: reusing the
T
type parameter on PubSub
for Buffer[T]
would lift that constraint,
meaning the type for Buffer
would have to be comparable.
To get around this, I relied on the comparability of pointers: a pointer to a
type with an any
type parameter is always
comparable. This meant adding a
layer of indirection via a cell
struct, which the underlying Buffer
holds a
slice of instead of the T
s to PubSub
. It might seem unfortunate that the
solution required this extra indirection, but in actuality it doesn’t introduce
any extra overhead compared to the previous version. That is because the cell
type is not much different than the empty interface type that Buffer
used
prior. And it also provided a way of reimplementing the unsubscribe mechanism.
The API allows for the function and channel subscribers to unsubscribe by calling a function or closing a channel, respectively. In both these cases, the subscriber can signal to the callback/receiver handler to close the subscription. So each subscriber must have their own way of being signaled for shutdown.
This signal is implemented as an instance of an empty struct chan. Because
channels have the property of being comparable and equal only to the same
instance of itself, these channels can be used as for both a per-subscriber
marker and for signal delivery. Instead of pushing these channels directly into
the buffer, the cell
type added a ch
field for these channels. These
“control” cell
values can be differentiated from “data” cell
values by
setting a non-nil ch
.
The PubSub
methods were then updated to drop the empty interfaces for type
parameters. The only real behavioral change to a method was in PubSlice
: the
[]T
argument had to be copied into an equivalent []cell[T]
before being
written to the Buffer
. While it is additional overhead, that overhead was
likely already being performed by users, because their typed slice would have
to be copied to an empty interface slice prior to calling the old PubSlice
.
If anything, it’s an improvement because it eliminates more book-keeping for
the user. Typically the conversion to interface{}
parameters is handled by
the compiler/runtime, but not for a slice of empty interfaces.
Publisher
, Subscriber
, & Context
With the move to type parameters for the PubSub
type, there is no longer a
need for the old “type safe” API: the basic API now supports compile time type
safety. It’s a big improvement over the old API. The Publisher
& Subscriber
interfaces were deleted, along with the Context
interface.
n.b.
Context
was unfortunately named due to being written before thecontext
package was part of the standard library.
I don’t think I can understate how gratifying it is to see this sort of
improvement naturally fall out from a new language feature. Deleting code
always feels good, but that hardly compares to removing leaky abstractions. At
this point the Buffer
type could be unexported, but I held off for the sake
of keeping the diff readable.
The “unusable empty value” Buffer
problem mentioned above is solved rather
trivially: because the Buffer
of a PubSub
is no longer exposed via the API
to users, a nil cell
value can safely be used as the empty value,
because there is no other opportunity for a nil cell
to be written to the
Buffer
.
Conclusion
I struggled to grasp generics at the outset of this experiment. They are
complex in a way that I haven’t encountered in a while with Go: I wasn’t sure
when to reach for them and when to use another feature of Go. I didn’t have a
firm understanding of “what” needed to be done until I started using the
go2go
tool. It was difficult, but they worked. And most importantly, I was
able to solve a problem in a way that was not previously possible.
It’s not a silver bullet though. While it was surprising easy to replace empty
interface code with type parameters, I still wished for sum types in a few
places like the cell
type. And the extra cognitive overhead of type
parameters and constraints is real. On a minor note, I would like to see some
sort of unification between the new any
keyword and empty interfaces, I would
rather not have to remember which one to use and where to use it.
I’m also concerned that it’s going to be overused, and thus abused, for a time after its introduction. Probably similar to the hangover of “channels everywhere” that seemed rampant in early Go code. Perhaps not though; it doesn’t seem to me to help much with code verbosity or coding in styles that are idiomatic to other languages, so maybe it won’t be used as a cudgel for code that is too clever by half.
Thanks
A big thanks to the Go team, Ian Lance Taylor and Robert Griesemer in particular, for all their hard work and dedication to making Generics happen for Go. It’s been a herculean effort and I’m excited to see where it goes from here.