What is Kubernetes?
Kubernetes, also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications. The name Kubernetes originates from Greek, meaning helmsman or pilot. Google open-sourced the Kubernetes project in 2014.
Why Kubernetes is so useful?
Let’s take a look and understand how applications were deployed and managed earlier.
- Traditional Deployment Era: Earlier applications were run on physical servers. There we no way to define resource boundaries for applications in a physical server, this caused the issues of resource allocation. For example, if multiple applications were run on the physical server then there can be an instance when one application may take up most of the resources and other applications would underperform. One solution for this is to launch each application on a different physical server but in this case, resources were underutilized.
- Virtualized Deployment Era: Virtualization allows us to run multiple Virtual Machines(VMs) on a single physical server. Applications can be isolated between VMs and it also provides security as one application cannot be accessed by another application. Also, virtualization allows a better utilization of resources.
- Container Deployment Era: Containers are similar to VMs, Containers has their own filesystem, the share of CPU, memory, process space, and more. But containers are very lightweight. Applications can be bundled in a good way and run inside containers easily.
Why we need Kubernetes and what it can do
Today applications are run inside containers and in a production environment, we need to manage the containers so that there is no downtime. Suppose if a container goes down then another container needs to start so here Kubernetes comes into the picture. Kubernetes takes care of scaling and failover for the application, it also provides deployment patterns and more.
Kubernetes provides the following features:
- Service discovery and load balancing: Kubernetes can expose a container using the DNS name or IP address and it also able to do load balancing to distribute the traffic.
- Storage orchestration: Kubernetes automatically allows to mount of the storage system.
- Automated rollouts and rollbacks: You can describe the desired state for your deployed containers using Kubernetes, and it changes the actual state to the desired state.
- Automatic bin packing: You can provide Kubernetes with a cluster of nodes that it can use to run containers. Also, you can tell about the RAM and CPU each container need and it will fit containers onto the nodes.
- Self-healing: Kubernetes restarts containers that fail, replace, and kill containers that don’t respond.
- Secret and configuration management: Kubernetes helps us to store and manage sensitive information such as passwords.
Over the last couple of years, the amount of attention paid to Kubernetes has increased. Started as a container management system open-sourced by Google has turned into the must-have technology for running machine learning and advanced analytics applications, among other workloads. Today applications are run on clouds in a self-contained unit of infrastructure called containers that can be started, stopped, scaled up, scaled-down, and moved without impacting the underlying application and Google developed Kubernetes to be the orchestration layer for managing large numbers of Docker containers.
According to a recent survey from the Cloud Native Computing Foundation (CNCF) survey, 84% of companies are using containers in production this year, an increase from 23% who reported that in 2016. Nearly 80% of them are using Kubernetes to manage those containers.
The major cloud providers have developed their own Kubernetes distribution based on Google’s source, including Google Cloud’s Google Kubernetes Engine (GKE), Amazon Web Service’s Elastic Kubernetes Service (EKS), and Microsoft Azure’s Azure Kubernetes Service (AKS). Other distributions include Red Hat’s OpenShift, Rancher from RacherLabs, and Cloud Foundry. In the Hadoop ecosystem, K8s has overtaken YARN as the most-used resource scheduler, at least for cloud deployments.
Kubernetes Case Study
OpenAI is an AI research and deployment company. Its mission is to ensure that artificial general intelligence benefits all of humanity.
Challenge: OpenAI needed infrastructure for deep learning that would allow experiments to be run either in the cloud or in its own data center, and to easily scale. Portability, speed, and cost were the main drivers.
Solution: OpenAI started running Kubernetes on top of AWS in 2016, and they migrated to AZURE in early 2017. OpenAI runs key experiments in fields including robotics and gaming both in Azure and in its own data centers, depending on which cluster has free capacity. They use Kubernetes mainly as a batch scheduling system and rely on their own autoscaler to dynamically scale up and scale down their cluster.
Benefits: Kubernetes provides a consistent API, through which they can move their research experiments very easily between clusters and also being able to use their own data centers. It is lowering the cost and also providing access to the hardware that wouldn’t necessarily have access to the cloud. Launching experiments also takes far less time. One of their researchers who is working on a new distributed training system has been able to get his experiment running in two or three days. In a week or two, he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work.
“Research teams can now take advantage of the frameworks we’ve built on top of Kubernetes, which make it easy to launch experiments, scale them by 10x or 50x, and take little effort to manage.” — CHRISTOPHER BERNER, HEAD OF INFRASTRUCTURE FOR OPENAI
The New York Times
The New York Times is an American daily newspaper based in New York City with a worldwide influence and readership. Nicknamed “the Gray Lady”, the Times has long been regarded within the industry as a national “newspaper of record”.
Challenge: Initially when The New York Times decided to move to public cloud out of its data centers then their critical applications were manged on virtual machines. “We started building more and more tools, and at some point we realized that we were doing a disservice by treating Amazon as another data center,” says Deep Kapadia, Executive Director, Engineering at The New York Times. Kapadia was given the responsibility to lead a Delivery Engineering Team that would design the infrastructurethat cloud providers offers.
Solution: The Delivery Engineering Team decide to use Google Cloud Platform and its Kubernetes-as-a-service offering, GKE to create the infrastructure for their applications.
Benefits: Speed of delivery increased from 45 minutes on VM-based deployment to just a few seconds to couple of minutes with Kubernetes says Engineering Manager Brian Balser. Adopting Cloud Native Computing Foundation technologies allows for a more unified approach to deployment across the engineering staff, and portability for the company.
“I think once you get over the initial hump, things get a lot easier and actually a lot faster.” — DEEP KAPADIA, EXECUTIVE DIRECTOR, ENGINEERING AT THE NEW YORK TIMES
Spotify is a Swedish audio streaming and media services provider, launched in October 2008. The audio-streaming platform has grown to over 200 million monthly active users across the world. With Spotify, it’s easy to find the right music or podcast for every moment — on your phone, your computer, your tablet and more.
Challange: Earlier spotify had containerized microservices running across its fleet of VMs with a container orchestration system called Helios. But By late 2017, it became clear that having a small team working on the features was just not as efficient as adopting something that was supported by a much bigger community.
Solution: “We saw the amazing community that had grown up around Kubernetes, and we wanted to be part of that,” says Chakrabarti, Director of Engineering, Infrastructure and Operations. Kubernetes was more feature-rich than Helios. Plus, “we wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools.” The migration, which would happen in parallel with Helios running, could go smoothly because “Kubernetes fit very nicely as a complement and now as a replacement to Helios,” says Chakrabarti.
Benefits: The biggest service currently running on Kubernetes takes about 10 million requests per second as an aggregate service and benefits greatly from autoscaling, says Site Reliability Engineer James Wen. he adds, “Before, teams would have to wait for an hour to create a new service and get an operational host to run it in production, but with Kubernetes, they can do that on the order of seconds and minutes.” In addition, with Kubernetes’s bin-packing and multi-tenancy capabilities, CPU utilization has improved on average two- to threefold.
“We saw the amazing community that’s grown up around Kubernetes, and we wanted to be part of that. We wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools.” — JAI CHAKRABARTI, DIRECTOR OF ENGINEERING, INFRASTRUCTURE AND OPERATIONS, SPOTIFY