Padded Slices
#BLM
If this blog post interests you, there’s a good chance you work in tech. Given the nature of our work and the effects of the internet, many of us can have an outsized impact on people’s lives & livelihoods. This is an important moment and also an opportunity to examine, challenge, and change problematic parts of our society. And for us in tech, it is critical that we take this moment to better the work we do and how we do it. Please familiarise yourself with ways you can help & actions you can take.
Garbage Creation
Go: there is no free. There is only garbage collection.
– Rob Pike
As with all languages, memory management inevitably becomes a concern when writing performant Go programs. And because the language specification covers garbage collection and memory allocation, there is a high cost for bypassing the standard memory management components. Particular care is needed to avoid violating the Go memory model. As a result, well designed Go programs tend to focus on minimizing garbage creation instead of optimizing garbage collection.
There are a few common techniques for minimizing garbage and they each address different types of garbage. So, typically, the best approach is to first identify the type of garbage before reaching for one of these solutions.
sync.Pool
Short lived objects are often a big source of garbage in transactional
programs. The allocation cost of some temporary objects can be amortized by
pooling and reusing them. A sync.Pool
is
the standard way to pool temporary objects, however it’s not a magic
solution
and it’s important to understand the
tradeoffs.
Be sure to always zero out memory returned to a pool!
AppendX
Intermediate slices are slices allocated and returned by a function call, and
typically the data is read/copied only once and then left unused. A useful
technique for eliminating these intermediate slices is to add a destination
argument (typically the first argument) to the function, so
called
AppendX
functions.
By allowing the caller of the function to specify the destination, the callee function can write directly to the destination slice and eliminate an extra copy.
append
, len
, & cap
Slice reallocations are another common source of garbage. During an append
,
when the combined new and old data exceeds the capacity of a slice, a new slice
(and backing array) is allocated and the data is copied from old to new.
These reallocations can often be avoided by initially allocating the destination slice with enough capacity to hold the finished data. Typically, it’s better to over allocate the initial capacity if doing so can prevent possible reallocations.
Prepend
These sources of garbage are especially common when converting between internal & external representations of data, such as object serialization/deserialization and working with network protocols.
When encoding messages with header data before the payload data, particularly variable
length messages, the payload is often constructed first followed by the header
data. This often requires extra allocating and copying to combine the header
and payload data. Smart use of append
and slice capacity can minimize this
overhead, but extra bookkeeping is still required because in order to
efficiently use append
to “prepend” slice data.
The github.com/benburkert/padded
package aims to eliminate the cognitive
overhead required to efficiently prepend data to the front of a slice.
github.com/benburkert/padded
The padded
package provides a padded byte slice type. The padding is a
scratch space allocated in front of the slice and behaves roughly the same as
the capacity. The slice type is conceptually a hybrid of
sk_buff
in the Linux kernel and the
sds
string encoding: internally they are
represented the same as regular byte slices, and are safe to pass to functions
that read but don’t modify the slice.
Before explaining how the padded slice works, it’s important to first
understand how regular slices and append
operate.
Regular Slices
Let’s start by looking at how a regular byte slice works:
s := make([]byte, 4, 6) ┌────cap────┐
┌───────────┐
│0|0|0|0|0|0│
└───────────┘
└──len──┘
Allocating the byte slice s
with a length of 4
and capacity of 6
gives us
a contiguous block of memory (i.e. a backing array) with each cell initialized
to the zero value (0
for byte
values). As Russ explains in his blog
post, s
is really a struct that holds a
pointer to the beginning of the backing array, along with a length & capacity value.
s = append(s, 1) ┌────cap────┐
┌───────────┐
│0|0|0|0|1|0│
└───────────┘
└───len───┘
Appending to s
manipulates both the data in the backing array and the length
value of s
.
Padded Slices
The padded.Make
function adds an extra argument for the padding size of the new slice.
s := padded.Make(4, 6, 2) ┌─pad─┬────cap────┐
┌─────────────────┐
│p|p|p|0|0|0|0|0|0│
└─────────────────┘
└──len──┘
s
is a padded slice with a length of 4
, capacity of 6
, and padding of
2
. The value of p
depends on the position of the cell in the padding: each
p
value is the number of cells of remaining padding. The first p
always has
a value of 0
to indicate the left end of the backing array.
Because the padding is encoded in the cell data (unlike the length & capacity
of a slice, which are struct fields), a padded.Slice
is just a []byte
, so
len
, cap
, and slicing works the same.
The size of each p
value’s cell is the size of an integer for the machine
architecture: 8
bytes on 64bit machines and 4
bytes on 32bit machines. In
this example, each p
value is one byte in length, but in reality they will
span multiple bytes in the backing array.
Because the
p
values span multiple bytes, prepending a number of bytes that is not a multiple of the integer size causes the prepend to fall back to a regularappend
.
s = padded.Prepend(s, 1) ┌pad┬────cap──────┐
s = padded.Append(s, 2, 3) ┌─────────────────┐
│p|p|1|0|0|0|0|3|4│
└─────────────────┘
└─────len─────┘
The Prepend
call places the 1
into the first spot in s
, then updates s
so that s[0]
points to the location of the 1
, and increases the capacity &
length.
Caveats
The padded
package can give a big speed boost to Prepend
operations when
used well with appropriate slack space. (Check out the package’s benchmarks for
a comparison.) And the API can improve the bookkeeping & readability of code
that uses append
to prepend. However, there is one major downside to the
padded
package.
The Go language specification covers slices but it does not include a
description of the backing arrays. So the trick used in the padded
package to
grow into unreferenced parts of a backing array is not covered by the language
spec. At the moment, this is not a problem because the garbage collector will
not sweep unreferenced parts of a backing array. However, in the future, the
implementation of the garbage collector and/or allocator may change and break
this assumption. Be aware that by using the padded
package you may be
sacrificing compatibility with future versions (and alternate implementations)
of the Go runtime. It’s for this reason that I have only ever used this
package once in production code, even though prepending to a byte slice is
something I do regularly in the code I write.