Accidental complexity, essential complexity, and Kubernetes
2022-09-05
Paul Butler
Kubernetes has a reputation among developers for being complex and scary, enough so that posts about not using Kubernetes are a genre unto themselves. In this post, I want to delve into the roots of Kubernetes’ complexity, as a framework for evaluating when it's the right tool for the job.
Complexity in software systems can be broken into essential complexity and accidental complexity, following Fred Brooks’ 1987 paper No Silver Bullet. Essential complexity is the unavoidable complexity that results from building a system that solves complex problems. Accidental complexity might be best described as any complexity that isn’t essential: complexity introduced by path dependence, design decisions, bad assumptions, and poorly-chosen abstractions.
Through this accidental-vs-essential lens, a lot of discussion around when to use Kubernetes boils down to the idea that essential complexity can become accidental complexity when you use the wrong tool for the job. The number of states my microwave can enter is essential complexity if I want to heat food, but accidental complexity if I just want a timer.
With that in mind, I want to talk about the sources of complexity in Kubernetes. It’s useful to separate Kubernetes’ functionality into three distinct responsibilities: a distributed control loop framework, a container orchestrator, and an abstract interface to cloud resources, which I'll discuss in turn.
Control loops
One way to think of Kubernetes is as a distributed framework for control loops. Broadly speaking, control loops create a declarative layer on top of an imperative system. Think of a thermostat: instead of switching a cooler or heater on or off, you set a desired state (temperature). The thermostat makes frequent measurements, calculates the difference in the actual and desired state, and translates that into a sequence of imperative actions.
Control loops are just a generalization of this concept to anything that can be measured, compared with a desired state, and acted upon. This idea is at the core of Kubernetes. “Deploying” software with Kubernetes really means providing Kubernetes with a desired state in which that software is deployed. The same is true for setting up networking, storage, and every other resource that Kubernetes interacts with.
Resources in Kubernetes are just YAML blobs, and developers can create their own resource types by providing a schema definition. The underlying storage for these resources is a strongly consistent, distributed data store. The combination of these two facts makes Kubernetes a solid base for implementing your own control loops, which is called the Operator Pattern.
Control loops are a building block of highly-available systems, but bring with them their own complexity. In an imperative system, there is a tight feedback loop between action and effect. If I switch on a broken fan, I know immediately when I don’t feel a nice breeze. By contrast, if I turn a thermostat down and I don’t immediately feel cold air, I can't tell whether the thermostat is broken, the AC is broken, or if everything works and just hasn’t kicked in yet.
I suspect this detachment between cause and effect grates on many beginners of Kubernetes. The tightness of the cause/effect loop directly impacts how easy (and fun) something is to learn. Bret Victor took this to one extreme in Inventing on Principle. Control loops take it to the opposite extreme. Learning Kubernetes is less about learning how to do things as it is about learning how to observe the state of Kubernetes.
Container orchestration
On top of the underlying control loop framework, Kubernetes is a container orchestrator. Fundamentally, a container orchestrator creates an abstraction over multiple computers that allow them to be treated as one abstract blob within which containers can be run. When you run a container on Kubernetes, you don’t specify which computer they run on; you (hopefully) don’t care.
Actually, that’s not quite true. Containers generally correspond to one process, and inter-process communication is a common thing to want. If processes aren’t running on the same machine, that IPC now has to go over the network, which may be orders of magnitude slower. So instead of containers being the level of abstraction, Kubernetes treats groups of containers (pods) as the unit of compute that gets scheduled.
Unfortunately, as Joel Spolsky’s law of leaky abstractions states:
All non-trivial abstractions, to some degree, are leaky.
Kubernetes abstracts away the decision of which computer a pod runs on, but reality has a way of poking through. For example, if you want multiple pods to access the same persistent storage volume, whether or not they are running on the same node suddenly becomes your concern again.
This is just the nature of providing abstractions: reality doesn't care about them. It doesn’t mean that creating abstractions is a bad approach, it just means that you can’t get rid of all the complexity you would like to.
Aside: Containers
Some other complexity around container orchestration comes from the nature of containers themselves. At some point, we decided that it was useful for programs to be able to share libraries, data, and language interpreters with other programs on the same computer. This meant that programs occupied less space on disk and in memory.
Then, disk and memory got cheap, and we realized that depending on shared libraries and interpreters led to deployment headaches. Containers came around to solve the problem by clumping together all a program’s user-space dependencies into one deployable unit.
The result is kind of a mess. It’s nobody’s fault, it’s just vestigial. But the result is that in addition to writing code, we also need to collect all of our system dependencies and bake them into a mini Linux distro (sans kernel).
Containers were a great advance, but much of their complexity falls into the “accidental” category. There’s no fundamental reason that I should need anything except my compiler to produce a universally deployable unit of code. To that end, I’m excited by what’s happening with WebAssembly (including our stateroom and wasmbox), but for now containers are a fact of life.
Cloud interface
A third piece of Kubernetes is that it provides a vendor-agnostic(ish) abstraction over cloud services.
A container orchestrator is more useful if things like network ingress and storage volumes can be attached to it, and doing so on the cloud requires interacting with vendor-specific APIs. Kubernetes provides a higher-level, resource-based abstraction on top of these interfaces, so that you can specify them declaratively in the same way you do containers.
This aspect of Kubernetes might be its most under-appreciated, because if you use a managed Kubernetes service like GKE or EKS, it’s not always clear where Kubernetes ends and the cloud APIs begin. But it’s the reason that releasing Kubernetes was an ingenious strategy for Google: it gave cloud vendors and open-source developers an API to target that wasn't tied to AWS.
As you might expect, though, abstracting over cloud providers is a minefield of leaky abstractions. The Kubernetes documentation is scattered with notes about special cases to be aware of with different providers, even for core resources like services. Kubernetes reduces lock-in, but the idea of seamlessly transfering infrastructure between cloud vendors is still a pipe dream.
Accidental complexity
Each of the above cases are instances of essential complexity. They do incorporate some accidental complexity as I discussed around containers, but we can't fault Kubernetes for the world that it was born into.
There are other areas where Kubernetes does introduce accidental complexity, though. I won’t attempt an exhaustive list, but a few that I’ve encountered:
- Multiple sets of semantics for patching resources that differ in subtle ways.
- The use of YAML as the primary interface, which is notoriously full of foot-guns.
- The fact that you need a special distribution like minikube, Kind, k3s, microk8s, or k0s to run on a single-node instance. (The fact that I named five such distributions is a canary of its own.)
Tool/job fit
A tool is a good fit for a job when their complexities line up. Kubernetes’ complexities come from being highly-available, container-based, and interacting with cloud providers.
At Drifting in Space, we’ve found that Kubernetes makes sense for use-cases that align with those complexities (for example, hosting our NATS cluster). For the ephemeral containers that are our bread and butter, Kubernetes was handy for our proof-of-concept, but we quickly realized the complexities we faced were not the same ones Kubernetes was built for. To orchestrate those containers, we built a control plane called Plane, but that's a subject for another post.
Note: when this post was written, Plane was called Spawner. The post has been updated to reflect this change.