8 questions about Talos answered

Talos is a minimal Kubernetes OS that's quickly gaining popularity because of its ease of use and strong focus on security by default. It has already been deployed in production by a significant number of companies and has reduced cost, maintenance time and operation complexity for those that use it. In this article, we'll dive into the use cases of Talos and when you should or shouldn't use it. 

What is Talos? 

"Talos Linux is Linux designed for Kubernetes - secure, immutable and minimal."

In other words, Talos is designed from the ground up to run Kubernetes. Stripping away unnecessary components streamlines the Kubernetes experience and simplifies maintenance and setup through straightforward API calls. It was first released as an alpha in 2018.

Thalos SRE and Kubernetes

It’s an open-source project from Sidero Labs. They also offer a service for provisioning clusters completely hands-off called Sidero Omni: a graphical user interface for managing clusters and machines within your cluster. This allows complete hands-off provisioning of worker nodes.

It is already used for maintaining Kubernetes clusters at companies like Equinix and Nokia. It has enabled them to reduce operational costs (in money and time) and complexity in their environments.

When should I use Talos?

Consider using Talos (or maybe a different immutable k8s OS) whenever you are managing Kubernetes clusters yourself, including the OS running the cluster. It makes it significantly easier to run and manage Kubernetes, especially if you are running on bare metal. It eliminates all host-level dependencies and operational costs of maintaining a full operating system. Talos forces you to think about your hosts as cattle, never as pets. This might be annoying to start with, but in the end, forces you to engineer your applications in a more cloud-native way, which also results in more stable deployments.

When shouldn’t I use Talos?

If your applications cannot handle data loss and you don’t have a good backup strategy. Talos will wipe entire disks during upgrades if not instructed not to, which could lead to data loss if you're not careful. 

Some users prefer more control to allow hyperparameter tuning, like data scientists and machine learning engineers, to squeeze the most performance out of the hardware.

If your organization only allows the use of licensed or supported operating systems like Red Hat Enterprise Linux or Suse Linux Enterprise Server it can be a hard “no” to use a custom OS like Talos Linux.

It can look daunting to start with Talos in comparison to running a regular Linux distribution with a Kubernetes cluster on top. But because Talos removes a lot of the moving parts underneath Kubernetes it removes quite a bit of operating overhead. The immutable nature of Talos makes maintenance simpler and reasoning about the state of your cluster becomes trivial because there can be no configuration drift.

What does it mean that Talos is immutable?

The root filesystem on Talos is mounted read-only and all host-level packages like shells and ssh are removed. It runs entirely from an in-memory SquashFS without persisting anything, which means that every reboot is a clean start.

OS upgrades are handled by an API call and will, by default, wipe any storage in the EPHEMERAL partition. It uses an A-B image scheme, which means that the update is first installed separately from the running image, then reboots in the B image and if that fails it will roll back to the A image.

a-b upgrade

That sounds nice and all, but this is the real world, how does Talos handle persistent data?

It is still possible to use disks mounted on the nodes for storage. Depending on the workload you could use distributed storage like Rook-ceph, distributed object storage like Minio or native clustering in Postgres with Postgres-operator.

Or if you are running the cloud you can use all the CSI providers that are provided by the cloud provider or go full cloud-native and make sure you don’t need any local storage.

Why would I use Talos over K3S?

If your workloads are fully based on Kubernetes, Talos makes it a lot easier to manage your cluster and upgrades than having to manage a full OS + upgrades to k8s itself. Having a mutable OS that can break when installing upgrades without a roll back strategy can leave your cluster in a broken state and cost you many hours of debugging. Fixing a broken Talos node should just be as simple as removing it from the cluster and adding a new node. Or issuing a Talosctl rollback command to roll back the affected nodes.

Setting up a highly available k8s cluster with Talos is trivial, while K3S makes it a bit more complicated. Also, Talos forces you to create configuration files that describe your cluster beforehand which, if stored correctly, allows you to easily modify and update your cluster.

K3S on the other hand can be installed with a single curl command, but if you want to change options the curl command can become more complicated and if you don’t store the curl command somewhere upgrades can become a nightmare to figure out because you don’t know what options were used when creating the cluster.

Why would I use K3S over Talos?

If you have a mixed workload where some services are running as systemd units, some are running as k8s deployments and your applications are not ready for running in a more cloud-native way you probably are better off running K3S on your existing nodes.

If you just want a quick and dirty way of starting a k8s cluster, k3s makes it trivial and allows you to start using it within a couple of minutes.

If your company has hard requirements on the operating system used it might be inevitable to use K3S on top of that provided OS.

What alternatives are available?

Besides Talos, there are several options available, including the cloud-provided Kubernetes offerings like GKE, EKS and AKS. These might be more attractive if you are already locked into a certain cloud provider, or if you want to externalize your operational overhead.

Other options include k3os (now deprecated), Elemental (by Rancher) and Kairos. Elemental might be more attractive to engineers who prefer a graphical user interface because you can use it with the Rancher Manager interface.

Kairos allows you to choose your own (or build) your own underlying operating system image. Which allows you more manual control over certain parameters.

Conclusion

If you’re looking for a way to make it easier to manage your Kubernetes cluster(s), you should really consider Talos. The immutable nature and API-driving management make it trivial to manage and reason about your cluster. 

To quickly try it locally take a look at the Talos Quickstart.

Thalos not Thanos

Not Talos, but still relevant ;)

Erwin is a DevOps Engineer at Fullstaq. He loves educating people on what it means to build cloud-native applications (and it’s not just using the “cloud”). Using the Cloud Native landscape to make application delivery and lifecycle simple and reliable. At home he is always looking for sidequests; monitoring his power usage to send notifications when his washing machine is done, connecting the garage door to the internet (securely) and adding music to his elevator are just a few examples. His experience makes him a very practical engineer, finding a simple solution is more important than having a modern, shiny solution.
April 30, 2024 | BLOG | 9 MINUTES

12 Factor: 13 years later

How can we make applications easy to operate? The 12-factor methodology is about 13 years old. How did it age in the cloud-native era? Do we need a 13th …

April 25, 2024 | BLOG | 5 MINUTES

Build your own Python Kubernetes Operator

Yes, you read it right – build a K8s operator in Python! I often get reactions like, "But doesn't it have to be in Golang?" Fortunately, that's not …

April 17, 2024 | BLOG | 3 MINUTES

KubeCon: K8s as the OS for cloud-native apps and the role of Kubernetes for AI

KubeCon & CloudNativeCon means immersing yourself in the realm of open-source cloud-native technologies. This premier cloud-native conference was held in …