On Go 2 Generics

Posted on Jan 29, 2021

Last summer the Go team announced an updated design draft for adding generics to the language. Included in the announcement was a new experimentation tool, named go2go, for developers to compile, test, and run code written using the generics feature described in the latest design draft. This blog post documents my experience using the go2go tool to add support for generics to an existing PubSub package.

tl;dr: here is the code diff

go2go

The go2go tool is a working prototype. It can parse the new language syntax and transform it to a sort of “intermediate” Go 1 code, which can then be fed to the standard Go compiler and executed. I have read that it relies on monomorphization to transform code, but I can’t say that I understand what that means.

You can install the go2go tool now by fetching & building the latest version of the dev.go2go branch of the Go project’s source repository. There is also an online version at The go2go Playground.

Generic PubSub

There are many ways generics can be useful in a codebase, but I am particularly interested in how they can be applied to “container” code. That is, code that typically implements a specialized algorithm or data structure unrelated to the structure of the objects it holds.

PubSub is a good fit for generics for a few reasons. The need for high performance (either low latency or high throughput) meshes well with compiler optimizations that generics enable. And it is typically used in highly concurrent programs where correctness checks are critical to avoiding data races and memory bugs. Generics can lift these checks from runtime to compile time, which helps the implementer build a correct PubSub implementation.

But, (in my opinion) the most important benefit is the elimination of “book keeping” code that simplifies the API surface area for consumers of the PubSub library. It’s a particularly tricky pattern to use correctly as a developer, so any reductions in API complexity can be a major boon to them.

github.com/benburkert/pubsub

The package I’ve chosen to analyze is a pretty bare bones PubSub implementation I wrote a while ago. It allows package users to publish and/or subscribe to a single stream of objects, via a function call/callback or channel send/receive. The objects can be of any type and are internally tracked as empty interface instances. There is a complicated API for adding type safe publishers and subscribers (enforced at runtime), which works with both functions and channels.

Under the hood, it is broken down into two major components: a fixed size ring buffer with a max number of readers/writers, and a PubSub layer that sits over the underlying buffer. The basic API is quite simple, but the API for typed functions and channels is not: the user has to manipulate the underlying buffer at times. It’s a leaky abstraction that I wish did not exist.

It has already shown to be quite reliable and performant in production environments, but (im)proving correctness is always a good thing. Likewise for removing potential runtime panics.

Caveats

At the moment, I’m only interested in how generics can improve the correctness of this package, its APIs, and programs that consume them. As mentioned above, go2go is only a prototype, and I don’t expect it to produce a faster or smaller binary, so I’ve skipped that analysis entirely. There will most certainly be impacts on performance (both good and bad!), but any investigation would be pure speculation until generics support moves beyond the prototype phase.

The go2go tool itself is still rough around the edges: error messages can be confusing and un-googlable. And it may require code changes to get around limitations of the tool. The go2go playground is a better place to start exploring with generics; I don’t recommend the go2go tool route unless you are working on an existing, multi-file package.

Overview

I spent two Sunday afternoons on this project, or about 10 hours total. I hadn’t kept up closely with the state of the generics language feature, so this included reading parts of the proposal and getting up to speed with the new tooling and syntax.

The changes clocked in at a little over 500 LOC, with about 100 lines of added code. Almost all of the added code was in additional tests.

I started by updating the Buffer type, next the PubSub type, and finally the Publisher, Subscriber, and Context types. Nothing else required changes.

Buffer

The major change to the Buffer type was swapping out the underlying storage slice, changing it from an empty interface slice (data []interface{}) to a slice of Ts, where T is a comparable type parameter on Buffer. The comprable trait was needed because the buffer uses an equality check (==) to test if the next position to read is marked empty (i.e. unwritten). Previously the package used a typed byte value to mark the initial data slice items as empty. But because the type of T is now a type parameter, the global empty marker trick no longer works.

To compensate, an empty value parameter was added to the NewBuffer func and an empty T field added to Buffer. The new data slice of Ts is set to the empty value instead of the zero value for T. Careful: the empty value cannot be written to the buffer, so using the T's zero value may have very bad consequences if the zero value of T can be written by the program (i.e. int(0)). In practice, this shouldn’t be an issue because the PubSub type uses its own trick to avoid having the empty value of T conflict with user values.

After this small but cardinal set of changes, the rest was mostly swapping out empty interfaces for type parameters, and adding additional Buffer tests. Since writing the latest version of this package, I have picked up the habit of mostly writing table style tests. The added tests were done in this fashion, and they are structured pretty much the same as non-generic table style tests. It feels like generics could improve these table style tests further, but it’s not immediately obvious to me. Perhaps there are improvements lurking around the corner, but for now it’s only a hope.

PubSub

Next up was the PubSub type. Unlike the Buffer type, it was important that the PubSub type be able to work on any type T, not just comparable types (i.e. [T any] vs [T comparable]). Otherwise, maps and slices would be unsupported type parameters. This requirement introduced a catch: reusing the T type parameter on PubSub for Buffer[T] would lift that constraint, meaning the type for Buffer would have to be comparable.

To get around this, I relied on the comparability of pointers: a pointer to a type with an any type parameter is always comparable. This meant adding a layer of indirection via a cell struct, which the underlying Buffer holds a slice of instead of the Ts to PubSub. It might seem unfortunate that the solution required this extra indirection, but in actuality it doesn’t introduce any extra overhead compared to the previous version. That is because the cell type is not much different than the empty interface type that Buffer used prior. And it also provided a way of reimplementing the unsubscribe mechanism.

The API allows for the function and channel subscribers to unsubscribe by calling a function or closing a channel, respectively. In both these cases, the subscriber can signal to the callback/receiver handler to close the subscription. So each subscriber must have their own way of being signaled for shutdown.

This signal is implemented as an instance of an empty struct chan. Because channels have the property of being comparable and equal only to the same instance of itself, these channels can be used as for both a per-subscriber marker and for signal delivery. Instead of pushing these channels directly into the buffer, the cell type added a ch field for these channels. These “control” cell values can be differentiated from “data” cell values by setting a non-nil ch.

The PubSub methods were then updated to drop the empty interfaces for type parameters. The only real behavioral change to a method was in PubSlice: the []T argument had to be copied into an equivalent []cell[T] before being written to the Buffer. While it is additional overhead, that overhead was likely already being performed by users, because their typed slice would have to be copied to an empty interface slice prior to calling the old PubSlice. If anything, it’s an improvement because it eliminates more book-keeping for the user. Typically the conversion to interface{} parameters is handled by the compiler/runtime, but not for a slice of empty interfaces.

Publisher, Subscriber, & Context

With the move to type parameters for the PubSub type, there is no longer a need for the old “type safe” API: the basic API now supports compile time type safety. It’s a big improvement over the old API. The Publisher & Subscriber interfaces were deleted, along with the Context interface.

n.b. Context was unfortunately named due to being written before the context package was part of the standard library.

I don’t think I can understate how gratifying it is to see this sort of improvement naturally fall out from a new language feature. Deleting code always feels good, but that hardly compares to removing leaky abstractions. At this point the Buffer type could be unexported, but I held off for the sake of keeping the diff readable.

The “unusable empty value” Buffer problem mentioned above is solved rather trivially: because the Buffer of a PubSub is no longer exposed via the API to users, a nil cell value can safely be used as the empty value, because there is no other opportunity for a nil cell to be written to the Buffer.

Conclusion

I struggled to grasp generics at the outset of this experiment. They are complex in a way that I haven’t encountered in a while with Go: I wasn’t sure when to reach for them and when to use another feature of Go. I didn’t have a firm understanding of “what” needed to be done until I started using the go2go tool. It was difficult, but they worked. And most importantly, I was able to solve a problem in a way that was not previously possible.

It’s not a silver bullet though. While it was surprising easy to replace empty interface code with type parameters, I still wished for sum types in a few places like the cell type. And the extra cognitive overhead of type parameters and constraints is real. On a minor note, I would like to see some sort of unification between the new any keyword and empty interfaces, I would rather not have to remember which one to use and where to use it.

I’m also concerned that it’s going to be overused, and thus abused, for a time after its introduction. Probably similar to the hangover of “channels everywhere” that seemed rampant in early Go code. Perhaps not though; it doesn’t seem to me to help much with code verbosity or coding in styles that are idiomatic to other languages, so maybe it won’t be used as a cudgel for code that is too clever by half.

Thanks

A big thanks to the Go team, Ian Lance Taylor and Robert Griesemer in particular, for all their hard work and dedication to making Generics happen for Go. It’s been a herculean effort and I’m excited to see where it goes from here.