Padded Slices

Posted on Jun 30, 2020

#BLM

If this blog post interests you, there’s a good chance you work in tech. Given the nature of our work and the effects of the internet, many of us can have an outsized impact on people’s lives & livelihoods. This is an important moment and also an opportunity to examine, challenge, and change problematic parts of our society. And for us in tech, it is critical that we take this moment to better the work we do and how we do it. Please familiarise yourself with ways you can help & actions you can take.

Garbage Creation

Go: there is no free. There is only garbage collection.
    – Rob Pike

As with all languages, memory management inevitably becomes a concern when writing performant Go programs. And because the language specification covers garbage collection and memory allocation, there is a high cost for bypassing the standard memory management components. Particular care is needed to avoid violating the Go memory model. As a result, well designed Go programs tend to focus on minimizing garbage creation instead of optimizing garbage collection.

There are a few common techniques for minimizing garbage and they each address different types of garbage. So, typically, the best approach is to first identify the type of garbage before reaching for one of these solutions.

sync.Pool

Short lived objects are often a big source of garbage in transactional programs. The allocation cost of some temporary objects can be amortized by pooling and reusing them. A sync.Pool is the standard way to pool temporary objects, however it’s not a magic solution and it’s important to understand the tradeoffs.

Be sure to always zero out memory returned to a pool!

AppendX

Intermediate slices are slices allocated and returned by a function call, and typically the data is read/copied only once and then left unused. A useful technique for eliminating these intermediate slices is to add a destination argument (typically the first argument) to the function, so called AppendX functions.

By allowing the caller of the function to specify the destination, the callee function can write directly to the destination slice and eliminate an extra copy.

append, len, & cap

Slice reallocations are another common source of garbage. During an append, when the combined new and old data exceeds the capacity of a slice, a new slice (and backing array) is allocated and the data is copied from old to new.

These reallocations can often be avoided by initially allocating the destination slice with enough capacity to hold the finished data. Typically, it’s better to over allocate the initial capacity if doing so can prevent possible reallocations.

Prepend

These sources of garbage are especially common when converting between internal & external representations of data, such as object serialization/deserialization and working with network protocols.

When encoding messages with header data before the payload data, particularly variable length messages, the payload is often constructed first followed by the header data. This often requires extra allocating and copying to combine the header and payload data. Smart use of append and slice capacity can minimize this overhead, but extra bookkeeping is still required because in order to efficiently use append to “prepend” slice data.

The github.com/benburkert/padded package aims to eliminate the cognitive overhead required to efficiently prepend data to the front of a slice.

github.com/benburkert/padded

The padded package provides a padded byte slice type. The padding is a scratch space allocated in front of the slice and behaves roughly the same as the capacity. The slice type is conceptually a hybrid of sk_buff in the Linux kernel and the sds string encoding: internally they are represented the same as regular byte slices, and are safe to pass to functions that read but don’t modify the slice.

Before explaining how the padded slice works, it’s important to first understand how regular slices and append operate.

Regular Slices

Let’s start by looking at how a regular byte slice works:

s := make([]byte, 4, 6)       ┌────cap────┐
                              ┌───────────┐
                              │0|0|0|0|0|0│
                              └───────────┘
                              └──len──┘

Allocating the byte slice s with a length of 4 and capacity of 6 gives us a contiguous block of memory (i.e. a backing array) with each cell initialized to the zero value (0 for byte values). As Russ explains in his blog post, s is really a struct that holds a pointer to the beginning of the backing array, along with a length & capacity value.

s = append(s, 1)              ┌────cap────┐
                              ┌───────────┐
                              │0|0|0|0|1|0│
                              └───────────┘
                              └───len───┘

Appending to s manipulates both the data in the backing array and the length value of s.

Padded Slices

The padded.Make function adds an extra argument for the padding size of the new slice.

s := padded.Make(4, 6, 2)     ┌─pad─┬────cap────┐
                              ┌─────────────────┐
                              │p|p|p|0|0|0|0|0|0│
                              └─────────────────┘
                                    └──len──┘

s is a padded slice with a length of 4, capacity of 6, and padding of 2. The value of p depends on the position of the cell in the padding: each p value is the number of cells of remaining padding. The first p always has a value of 0 to indicate the left end of the backing array.

Because the padding is encoded in the cell data (unlike the length & capacity of a slice, which are struct fields), a padded.Slice is just a []byte, so len, cap, and slicing works the same.

The size of each p value’s cell is the size of an integer for the machine architecture: 8 bytes on 64bit machines and 4 bytes on 32bit machines. In this example, each p value is one byte in length, but in reality they will span multiple bytes in the backing array.

Because the p values span multiple bytes, prepending a number of bytes that is not a multiple of the integer size causes the prepend to fall back to a regular append.

s = padded.Prepend(s, 1)      ┌pad┬────cap──────┐
s = padded.Append(s, 2, 3)    ┌─────────────────┐
                              │p|p|1|0|0|0|0|3|4│
                              └─────────────────┘
                                  └─────len─────┘

The Prepend call places the 1 into the first spot in s, then updates s so that s[0] points to the location of the 1, and increases the capacity & length.

Caveats

The padded package can give a big speed boost to Prepend operations when used well with appropriate slack space. (Check out the package’s benchmarks for a comparison.) And the API can improve the bookkeeping & readability of code that uses append to prepend. However, there is one major downside to the padded package.

The Go language specification covers slices but it does not include a description of the backing arrays. So the trick used in the padded package to grow into unreferenced parts of a backing array is not covered by the language spec. At the moment, this is not a problem because the garbage collector will not sweep unreferenced parts of a backing array. However, in the future, the implementation of the garbage collector and/or allocator may change and break this assumption. Be aware that by using the padded package you may be sacrificing compatibility with future versions (and alternate implementations) of the Go runtime. It’s for this reason that I have only ever used this package once in production code, even though prepending to a byte slice is something I do regularly in the code I write.