OpenShift Pod Scheduling for a Highly Available Architecture on AWS

Gerry Kovan
4 min readNov 2, 2021

The project I am currently working on requires an application to be deployed using a highly available architecture running on an OpenShift cluster deployed on the AWS cloud.

Our OpenShift cluster is deployed in the us-east-1 AWS region and the cluster has six worker nodes spread across three availability zones as shown in the diagram below.

Six OpenShift worker nodes dedployed in the us-east-1 AWS region

There are many aspects to consider when designing a highly available application that runs in a OpenShift cluster. One of them involves pod scheduling. To achieve application high availability and resiliency, you want to have several pods running the application and the pods should be located in different availability zones. The application should be fully operational in the event of pod failure, node failure or even an availability zone failure.

We examined three pod schedulers available on OpenShift:

  • default
  • pod anti-affinity preferred
  • pod anti-affinity required

We scaled out our application from one to six pods and observed the pod placement.

Default Scheduler

The default pod placement algorithm produced the following pod placement:

Pod placement for six pods using default scheduler

Each pod is labeled according to the order the pod was created in. So “pod 1” was created first, “pod 2” was created second and so on. We observed that the default algorithm did a pretty good job of placing pods across the three availability zones but did not place pods in all the worker nodes. The first three pods were spread across the three AZ’s and the next three were also placed across the three AZ’s. However only three of the six nodes were used.

Pod Anti-Affinity Preferred Scheduler

The pod anti-affinity preffered scenario produced the following result:

Pod placement for six pods using “pod anti-affinity preferred” scheduler

The pod anti-affinity preferred scheduler attemps to prevent a pod to be placed on the same node as pods with the same label.

As the diagram above shows, the first three pods were placed on three unique nodes across the three availability zones which is good. The fourth pod was placed on a fourth unique node however pods five and six were placed on a node that already had a running pod. Two of the nodes never got picked by the scheduler as other criteria are used such as resource availability (i.e. cpu, memory).

Pod Anti-Affinity Required Scheduler

The pod anti-affinity required scenario produced the following result:

Pod placement for six pods using ‘pod anti-affinity required” scheduler

This scheduling algorithm will only schedule a pod on a node that does not have a running pod. As the diagram above shows, pods one through six got scheduled across the six unique nodes. From a high availability perspective, it may appear that this is the best option however other important factors can come into play when scheduling a pod such as cpu and memory utilization. For example, using this scheduler a pod can be placed on a node that is experiencing very high cpu utilization just because it meets the anti-affinity scheduling rules.

Conclusion

We examined three different pod schedulers in order to understand the high availability properties of our application when deployed on an OpenShift cluster with six worker nodes across three availability zones. For our application, the default scheduler was determined to provide a sufficient level of high availability as the scheduler did a good job of locating the pods evenly across the three availaibility zones. The pod anti-affinity preferred scheduler was also a good choice as it was able to utilize more of the nodes relative to the default scheduler. The pod anti-affinity required scheduler seemed too strict for our purposes and it could end up lowering the high availability properties of the application with its narrow focus on scheduling only based on the anti affinity criteria.

Appendix

Deployment manifest for the “pod anti-affinity preferred” scenario

apiVersion: apps/v1
kind: Deployment
metadata:
name: httpd-gk-anti-affinity-preferred
namespace: gktest
spec:
selector:
matchLabels:
app: httpd-gk
replicas: 1
template:
metadata:
labels:
app: httpd-gk
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: name
operator: In
values:
- httpd-gk

topologyKey: topology.kubernetes.io/region
containers:
- name: httpd-gk
image: docker.io/gkovan/httpd-example@sha256:76b3a0
ports:
- containerPort: 8080

Deployment manifest for the “pod anti-affinity required” scenario

apiVersion: apps/v1
kind: Deployment
metadata:
name: httpd-gk-anti-affinity-required
namespace: gktest
spec:
selector:
matchLabels:
app: httpd-gk
replicas: 1
template:
metadata:
labels:
app: httpd-gk
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- httpd-gk
topologyKey: "kubernetes.io/hostname"

containers:
- name: httpd-gk
image: docker.io/gkovan/httpd-example@sha256:76b3a0
ports:
- containerPort: 8080

References:

https://docs.openshift.com/container-platform/4.7/nodes/scheduling/nodes-scheduler-pod-affinity.html#nodes-scheduler-pod-affinity-example-antiaffinity_nodes-scheduler-pod-affinity

https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/

--

--

Gerry Kovan

IBMer, software engineer, Canadian living in New York, husband, father and many other things.