Demystifying etcd in Kubernetes
As we continue to expand our knowledge and experience working with Kubernetes on our managed cloud clusters, we’ve been working closely with etcd, the configuration store that lives at the center of a Kubernetes cluster. Our cluster ops team decided to dig deeper into etcd since we are so reliant on etcd being available in order for our clusters to be available. We also thought of some use cases where etcd could be leveraged to solve other problems we face in maintaining our clusters.
What is etcd?
At its simplest, etcd is a decentralized key/value store. Etcd stores data that can be retrieved later via the key associated with it. Functionally, it is similar to Redis or Memcached. Once you insert key-value pairs, you can use the key to retrieve the value later. Like Redis, etcd supports expiring values, but there are many characteristics that set it apart from Redis. Instead of primarily being an in-memory data-store, etcd is focused on persistent data storage via the filesystem. It’s also designed to be highly available, meaning a single etcd deployment will have a number of redundant nodes as well as an elected leader at all times.
Often etcd is introduced as the “backbone” of Kubernetes. However, Kubernetes hides the vast majority of the interaction with etcd, with the only notable exception being backing up the cluster state using the client `etcdctl`.
Backing Up, Here’s a Brief History of etcd
CoreOS built etcd in 2013 for the use case of updating configuration across a number of infrastructure nodes without the services hosted on those nodes going down. When pushing config updates, the CoreOS engineers had a problem where all the nodes would receive config updates at the same time and restart simultaneously, leaving the cluster with no active nodes, which basically shuts the cluster down.
Having a system in place that allowed each server to update in an orderly fashion prevented too many nodes from being out of service for updates at one time. This required an external datastore, so CoreOS built etcd, which allowed them to control the updates via storing information in a centrally accessible, fault-tolerant repository and providing an API for obtaining locks to ensure that only a limited number of hosts would update at the same time. Since its inception, etcd has undergone a number of refactors and iterations in order to improve the API and performance. In 2014, Google launched the Kubernetes project using etcd.
Raft is a protocol for achieving consensus in a decentralized, multi-node environment. If the cluster is having a failure or bad data is coming in, Raft allows the system to determine the correct state. If you want to learn more about Raft, I highly recommend checking out this consensus visualization walkthrough and here’s a cool informational video from Hashicorp.
We did some etcd durability testing using a kubeadm multi-master cluster. First, I killed one of three master nodes to see how the cluster would react in the real world. I was happy to see that the cluster tolerated the dead node and ran on two nodes only. But when I killed a second master node, etcd entered an error state which made kubectl stop working entirely. The etcd pod then ran in a crashloop, preventing any access to the cluster control plane and effectively rendering that test cluster dead.
Conclusion and Next Steps
Etcd tends to be overlooked since it’s not something you typically have to interact with directly. When you update Kubernetes, etcd gets updated behind the scenes. However, ectd is important to maintain the state of Kubernetes, so it is important to understand if you are working with multiple Kubernetes clusters because you will eventually end up in a place where you have to attempt to recover it. This post introduces some concepts, but etcd is somewhat difficult to interact with. In the next post, we will discuss some methods of backing up etcd and how to restore it if you get in a bind.