Yes, you read it right – build a K8s operator in Python! I often get reactions like, "But doesn't it have to be in Golang?" Fortunately, that's not necessary if you have a preference for Python. Drawing on my experience in data/ml, where Python is my strong suit, I find it more comfortable to work with. So, let me guide you through a small how-to, allowing you to build your own operator. 

Why use an operator?

An operator is like a smart assistant that understands the complexity of your Kubernetes cluster and can automate specific tasks. It enables you to create custom automation for your (custom) applications, minimizing manual intervention.

The power of an operator lies in its ability to autonomously perform tasks such as scalability, maintenance, and self-healing. This boosts efficiency and minimises errors.

While some tasks can be performed with deployments, etc., if you're looking for additional logic, an operator is an excellent choice.

When deploying something on k8s, I always look for the opportunity to write an operator for it. Automating on Kubernetes makes life a lot easier in the long run for your team.

Let's Start

Now that we know you can build an operator in Python and understand their usefulness, let's get started with building one.

The operator is a lightweight monitoring tool that checks the status of pods and containers. If something is amiss, it sends a Teams message to your MS Teams channel. Consider this as a tool for scenarios where Grafana or other monitoring tools might be overkill for your project.

I assume you are somewhat familiar with Python, but let's start from the beginning.


# Creating a folder
mkdir fullstaq-operator

Making a venv to controle and detain the python version and packages.

Create a venv

Starting the venv to activate it and you can begin

venv activation

A small tip to make sure the venv is actually active, is to make a pip list. If the list is empty, you know the venv is active.

Create pip list

The first step is making a

In, I use it to load environment variables and set up a logging function, saving the trouble of defining them repeatedly. It might seem a bit overkill now, but making it a habit pays off.

Python Kubernetes Operator

I create a Config class where I place the necessary environment variables for our operator. If you don't know everything at once, you can add extra details later.

Next, we move to logging. A static method in Python is a special type of method in a class that doesn't receive an implicit reference to the object (the class instance) when called. Instead, it behaves like a regular function associated with the class and can be called without creating an instance of the class.

I've also done this with the cluster. You can place it in Config without a static method, but in this case, I chose to keep it separate to avoid calling the entire class repeatedly.

Now, let's move on to our main script where we can pick up these values and use them.

Config main script

Starting from the top, import the necessary libraries. Additionally, set up the logger from and then set the cluster. Choose the cluster you want to use, especially for local development where you might have multiple clusters in your context.

Next, you have kubeconfig, allowing you to run the operator locally for testing and in your cluster. This saves the hassle of rebuilding or setting up a local cluster for testing. You can now use any random cluster online/offline to test your script.

Now, let's dive into functions to inspect pods, divided into three variations: namespace, label, or the entire cluster. You can determine the scope of our monitoring tool with an environment variable.

Inspecting pods

In the next step, create a function to determine the action to take when we notice that a pod or container is not working.

What to do when pod not working

We look at the pods, and in the second loop, we look at the failed containers, defined in another function.

In handle_failed_pods, we define how to make a request to the Teams channel.

Containers in pods

Define what we are looking for in the containers in the pod. Then, create a function to determine the scope of our operator.

Startup tasks

Now, we start with the kopf component. kopf is an operator framework with decorators you can use for operator tasks. We create two – one on startup to start the defined tasks when the executable runs in the pod, and another on a timer to complete tasks at specified intervals.

Now, let's start creating a Dockerfile. It's a simple Dockerfile I often use for Python-based side projects.

Create Dockerfile

A work directory is being made, the right scripts are beind copied and requirements are being downloaded. There's no commands. I usually put those in the yaml. When I have multiple scripts in one side-project, and I want to run them per container, I usually need only one dockerfile.

Now, we can start with the YAML files to deploy and provide the necessary permissions.

Deploy Kubernetes Operator

I've created a deployment where you can fill in the values needed to run the K8s operator.

The deployment is straightforward; the service account is named, and we define the environment variables. With this setup, the variables are not secrets. If running this in a production environment, it's advisable to use secrets. For now, I find that a bit overkill.

Now, we need a service account, a binding, and the rights for the service account. These rights are needed to monitor the pods; otherwise, the operator won't work.

Create service account to monitor pods

Now that you have all the YAMLs, you can build the Docker image and deploy using the YAMLs. Your lightweight monitoring tool is now deployed.

Here is the repo:

Feel free to fork and use it according to your needs. Good luck!


You've learned how to create a simple K8s operator and deploy it.

What can you add to make the operator more mature?

  • Add values as secrets
  • Create a Helm chart
  • Make the code more mature, e.g., by placing the kubeconfig in
  • Add tests ;)

I help organizations as a Data/AI Expert to get the most value out of their data and IT systems. I do that with honesty, empathy, and technical knowledge. Working at a start up in the early days of my career has given me the superpower to move fast and work in a truly agile manner. Later in my career I did projects abroad where I learned to work with diverse cultures and mindsets. One thing that stayed the same all the time? Working on Azure and on-premises solutions, solving challenges for AI, BI architectures, hardware architectures and most important secure and compliant with the clients' standards within tight deadlines. As computer sience keeps evolving, I want to keep learning and work on though challenges.
May 07, 2024 | BLOG | 6 MINUTES

8 questions you were afraid to ask about Talos answerd

Talos is a minimal Kubernetes OS that's quickly gaining popularity because of its ease of use and strong focus on security by default. It has already been …

April 30, 2024 | BLOG | 9 MINUTES

12 Factor: 13 years later

How can we make applications easy to operate? The 12-factor methodology is about 13 years old. How did it age in the cloud-native era? Do we need a 13th …

April 17, 2024 | BLOG | 3 MINUTES

KubeCon: K8s as the OS for cloud-native apps and the role of Kubernetes for AI

KubeCon & CloudNativeCon means immersing yourself in the realm of open-source cloud-native technologies. This premier cloud-native conference was held in …