TL;DR - Kubernetes Up & Running, Part 3

How deep it goes, no one knows.
gcp
cloud
k8s
aws
kubernetes
Published

September 12, 2019

Abstract

These are my notes from reading Kubernetes Up & Running by Kelsey Hightower, Brendan Burns, and Joe Beda. Kelsey Hightower is a Staff Developer Advocate for the Google Cloud Platform. Brendan Burns is a Distinguished Engineer in Microsoft Azure and cofounded the Kubernetes project at Google. Joe Beda is the CTO of Heptio and cofounded the Kubernetes project, as well as Google Compute Engine.

This is a phenomenal book that covers both the whys and hows of Kubernetes. I read the 1st edition, but a 2nd edition is coming out soon. I’m using this as study material for my CKAD and CKA certifications.

This article is part of a series. You can read Part 1 and Part 2

Chapter 9: DaemonSets

The Pods deployed by a ReplicaSet are completely decoupled from the node they are running on - the pods can run anywhere and/or multiple Pods can be on the same node. The DaemonSet is distinctly different in that it it places a Pod onto every node in the cluster (or a subset of nodes). The Pods managed by a DaemonSet are usually landing some sort of agent or daemon onto the node. They are not traditional serving applications (like ReplicaSet Pods), but instead augment the capabilities of the cluster itself. By defining DaemonSets in declarative configuration, we can be sure that Pods are running on all of the proper nodes, even in an autoscaling cluster where nodes from and go freely.

Which nodes a DaemonSet runs on in a cluster is defined in the DaemonSet spec using labels. It’s possible to select a subset of nodes. Common use cases for this are selecting nodes with certain hardware (e.g., GPUs or SSDs). In order to do this, nodes must be properly labeled. Here is an example command to label a node: $ kubectl label nodes galens-awesome-node-123 gpu=true. This label can now be specified in the NodeSelector field of the DaemonSet spec. Because DaemonSets manage Pods using a reconciliation loop, if any required labels are removed from a node, the DaemonSet Pods will also be removed. Similar to a ReplicaSet, if a DaemonSet is deleted, its Pods will be deleted as well unless you are using --cascade=false.

Prior to Kubernetes version 1.6, updating DaemonSets required updating the declarative configuration for the the DaemonSet and then performing a rolling delete of each Pod, or deleting the entire DaemonSet and redeploying. While the latter is much simpler, the drawback is downtime. A rolling delete/update can be performed using the following code snippet:

PODS=$(kubectl get pods -o jsonpath -template='{.items[*]metadata.name}')
for x in $PODS; do
  kubectl delete pods ${x}
  sleep 60 #delete one pod every 60 seconds
done

The delete method is still the default update strategy for DaemonSets in order to support backwards compatibility. However, newer versions of Kubernetes now support a rolling update strategy similar to Deployments. You will need to configure spec.updateStrategy.type field of the DaemonSet to have the value RollingUpdate. Any changes to the DaemonSet spec.template field or subfields will trigger a rolling update. Rolling updates come with two additional parameters: - spec.minReadySeconds, determines how long a Pod’s status must be “ready” before moving onto the next Pod
- spec.updateStrategy.rollingUpdate.maxUnavailable, how many Pods can be being updated simultaneously
It’s best practice to set spec.minReadySeconds to something like 30-60 seconds to ensure that Pods are truly healthy before proceeding. Setting spec.updateStrategy.rollingUpdate.maxUnavailable to 1 is the safest value, but depending on the size of the application and cluster, can result in long rollouts. Increasing the value increases the blast radius for a failed rollout. It’s best practice to set the value low and only increase it if users or admins complain about rollout speed.

Chapter 10: Jobs

Jobs are used to run short-lived, one-off tasks. They create and manage Pods that run until successful termination (i.e., exit with 0). If a Pod fails before successful termination, the Job controller will create another one from the Pod template in the Job spec. When a Job completes, the Job and related Pod are still around. You will need to provide the -a flag in kubectl to see completed Jobs. Jobs can be created both imperatively and declaratively. Both options will use the parameter/field restartPolicy. It is recommended to use restartPolicy=OnFailure so that Pods are recreated in place. Using restartPolicy=Never will create an entirely new Pod after each failure and can lead to a lot of “junk”. It’s not uncommon for a bug to cause a Pod to crash as soon as it starts. This behavior is monitored by kubelet on the node and will set the Pod status to CrashLoopBackOff without the Job controller doing anything. CrashLoopBackOff delays the Pod from being recreated to avoid eating resources on the node. Pods can also appear healthy, but be deadlocked. Jobs support liveness probes to determine Pod health in these situations.

Jobs have 2 major parameters that control their behavior, completions and parallelism. parallelism determines how many copies of a Pods to spin up at once. completions determines the number of successful exits before a Job stops running. If completions is left unset, the Job will be put into a worker pool mode. Once the first Pod exits successfully, the Job will start winding down and not add any new Pods. This means that none of the workers should exit until the work is done and they are all in the process of finishing up.

Chapter 11: ConfigMaps and Secrets

We want to make our container images as reusable and portable as possible. In order to do this, we need to be able to configure them at runtime so that the application runs properly according to its environment. This is where ConfigMaps and Secrets come in handy. In essence, both ConfigMaps and Secrets provide key-value pairs to containers right before they are run.

There are 3 main ways to use a ConfigMap: 1. Filesystem: the ConfigMap is mounted as a volume in the container. A file is created for each key-pair based on the key name. The contents of the file are set to the value. 2. Environment variable: Set an environment variable $KEY=VALUE. 3. Command-line argument: Reference environment variables in command-line.

Because key names for both ConfigMaps and Secrets are designed to be able to map to valid environment variable names, they have appropriate naming constraints. If a ConfigMap or Secret is updated, the new information becomes available to the application without restarting. However, this means your application must be written to reread its configuration values. Currently, there is no built-in way to signal an application when a new version of a ConfigMap or Secret is deployed. ConfigMap values are UTF-8 text, where as Secret values can hold arbitrary data encoded in base64, which makes it possible to store binary data. However, this makes it much more difficult to manage secrets stored in YAML files.

Secrets can be consumed via the k8s API, or more preferably via a secrets volume. Secrets volumes are managed by the kubelet and store secrets on tmpfs volumes - the secret is never written to disk on nodes. There is a special use case for secrets to access private Docker registries that is supported via image pull secrets. These are consumed just like regular secrets, but are declared in the spec.imagePullSecret field of the Pod manifest.

Resources: