Kubernetes Node Affinity

In this article we will explore the node affinity features in Kubernetes. If you are not familiar with Node Selectors or just want a refresher, check out my node selector article.

The primary purpose of node affinity feature is to ensure that pods are hosted on particular nodes. There are two ways of achieving this:

  1. Node Selectors the simple way. Discussed in another Article.
  2. Node affinity more complex and powerful.

Unlike the node selectors, node affinity feature provides us with more advanced capabilities to limit pod placement on specific nodes. Keep in mind: with great power comes great complexity.

Work It Out

Let's start of with a simple node selector specification file:

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
    - name: data-processor
      image: data-processor

  nodeSelector:
    size: Large

In the previous file, we specified that we want the myapp-pod to only be deployed on a nodeSelecter that has the size set to large.

The previous node selector specification will now look like this with node affinity.

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
    - name: data-processor
      image: data-processor

  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: size
                operator: In
                values:
                  - Large

Although both files do exactly the same thing let's look at it a bit closer.

Under spec you have affinity and then node affinity under that and then you have a property that looks like a sentence called required during scheduling ignored during we will see what this means later on.

Finally, we have the node selector terms that is an array and that is where you will specify the key-value pairs.

The key value pairs are in the form key operator and value where the operator is in the operator ensures that the pod will be placed on a node whose label size has any value in the list of values specified here in this case it is just one called large.

If you think your pod could be placed on a large or a medium node you could simply add the value to the list of values like this:

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
    - name: data-processor
      image: data-processor

  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: size
                operator: In
                values:
                  - Large
                  - Medium # New

You could use the not in operator to say something like size notIn Operator where node affinity will match the node with a size not set to small:

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
    - name: data-processor
      image: data-processor

  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: size
                operator: NotIn
                values:
                  - Small # New

We know that we have only set the labels size too large and medium nodes the smaller nodes don't even have the labels set.

So we don't really have to even check the value of the label as long as we are sure we don't set a label

size to the smaller nodes using the exists operator will give us the same result.

The exists the operator will simply check if the label size exists on the nodes and you don't need the

values section for that as it does not compare the values.

There are a number of other operators as well. Check the documentation for specific details.

Node Affinity Types

  1. But what if node affinity could not match a node with a given expression?

  2. What if there are no nodes with a label called size?

  3. Say we had the labels and the pods are scheduled. What if someone changes the label on the node at a future point in time. Will the pod continue to stay on the Node?

All of these questions are answered by the long sentence-like property under node affinity also known as the type of node affinity. In our example we had:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    ## ^^ This One ^^

The type of node affinity defines the behaviour of the scheduler with respect to node affinity and the stages in the lifecycle of the pod.

There are currently two types of node affinity available:

  1. requiredDuringSchedulingIgnoredDuringExecution
  2. preferredDuringSchedulingIgnoredDuringExecution

Planned:

  1. requiredDuringSchedulingRequiredDuringExecution

we will now break these down to understand further.

We will start by looking at the two available affinity types.

There are two states in the lifecycle of a pod when considering node affinity during scheduling and

during execution during scheduling.

Is the state where a pod does not exist and is created for the first time.

We have no doubt that when a pod is first created the affinity rules specified are considered to place

the pods on the right note.

Now what if the nodes with matching labels are not available.

For example we forgot to label the node as large.

That is where the type of node affinity used comes into play.

If you select the required type which is the first one the scheduler will mandate that the pod be placed

on a node with a given affinity.

Rules if it cannot find one the pod will not be scheduled.

This type will be used in cases where the placement of the pod is crucial.

If a matching node does not exist the pod will not be scheduled.

But let's say the pod placement is less important than running the workload itself.

In that case you could set it to preferred and in cases where a matching node is not found.

The scheduler will simply ignore node affinity rules and place the card on any available note.

This is a way of telling the scheduler hey try your best to place the pod on matching node but if you

really cannot find one just plays it anywhere.

The second part of the property or the other state is during execution during execution is the state

where a part has been running and a change is made in the environment that affects node affinity such

as a change in the label of a node.

For example say an administrator removed the label we said earlier called size equals large from the

node.

Now what would happen to the pods that are running on the Node.

As you can see the two types of node affinity available today has this value set too ignored which means

pods will continue to run and any changes in node affinity will not impact them once they are scheduled.

The new types expected in the future only have a difference in the during execution phase a new option

called required during execution is introduced which will evict any pods that are running on nodes

that do not meet affinity rules.

In the earlier example, a pod running on the large node will be evicted or terminated if the label large

is removed from the node.

Well that's it for this lecture.

Head over to the coding exercises and practice working with node affinity rules in the next lecture.

We will compare tables and toleration and node affinity.