Pravar Agrawal Technology & Travel

An Intro to Kubernetes Descheduler

If I were to mention “control plane” in Kubernetes, then automatically four components would come in our minds: kube-apiserver, etcd, kube-scheduler and kube-controller. And out of these four as we know, kube-scheduler is the one which is responsible for assigning our workload pods to the nodes in a K8s cluster. But, often we see our worker node resources are not utilized properly. Since, the kube-scheduler doesn’t take any responsibility in re-scheudling the pods from highly utilized nodes to under-utilized nodes, we are sometimes left with our cluster resources not being used fully. Some of the nodes will have higher number of pods located on it while some may have very few pods. The kubernetes scheduler does not handles the re-distribution of pods based on resource usage in the cluster and there only we may need, Descheduler.

If we are to understand, consider a multi-node K8s cluster which may have few under-utilized and few over-utilized nodes. Which means, few of the nodes will be running with resource usage say over 80 or 90% while few with as less as maybe 40 or 30%.

Multi Node cluster

Now, with the help of Descheduler we can evict few of the pods from overly utilized nodes and have them scheduled onto the nodes which are under utilized.

Descheduler is a kubernetes-sigs project maintained under sig-scheduling. It’s an Open-Sourced project written in Go which runs as a Kubernetes Add-on. It can be deployed inside a k8s cluster as a Deployment, K8s Job, CronJob or with the help of Helm chart. Descheduler also supports a cli which can be used to interact with the descheduler server running inside the cluster. Descheduler’s main objective is to efficiently manage the node resources by evicting the pods from over-utilized nodes and moving those to under-utilized nodes. And it’s able to do all of this with the help of custom policies or DescheduerPolicies. Descheduler has following main components:

  • Descheduler-server, which runs inside a Pod under kube-system namespace.
  • Policy ConfigMap, which has different policies defined to use for pod evictions and is created under kube-system namespace.
  • Descheduler server does all of it’s work with the help of differnet informers like PodInformer, NodeInformer, NamespaceInformer etc. These informers watch different resources like Nodes, Pods for resource utilization and report back to descheduler server to take action based on the policy defined.

Descheduler is pretty easy to get started with, the installation guide mentioned here. We can also edit the default descheduler policy based on few examples mentioned here.

I also presented a talk at Kubernetes Community Days Bangalore 2022 and here’s the link

So, this was a short introduction of Kubernetes Descheduler and I hope to cover a much detailed post on it later.

Until next time \o/ .