What is Kubernetes (also called K8s)
Kubernetes is like a post office, managing containerised environment and making sure packages are delivered correctly. No matter the distance be it local or overseas as long as we know the address of the recipient, the post office will take care of the rest. Though the duration of the delivery depends on size of the package and also how far away it is. The main goal is always to ensure that the package is delivered to where it is suppose to be delivered.
Now imagine Kubernetes being a post office but only for containers. We can declare everything we need to do with our containers but we don’t care how it gets done. The containerised environments in Kubernetes are like boxes and they are called pods. Each pod can contain 1 or more containers. The pod also contains information to whom it is for, where does it run and also the the time you will to wait to get there. And of cause, each pod has their own unique name and IP address. Lastly, we will put the application into the box and describe how and where this box is delivered.
Fun fact: The word Kubernetes is Greek for captain or pilot.
Another awesome feature that Kubernetes has to offer is the autoscaling feature, where it allows companies to scale up and down based on actual demands. With autoscaling, Kubernetes can help you monitor and increase the pod when there is a rise in the traffic and reduces the number of pods when the traffic load is low. In addition, this will also save up some cost as we do not pay for resources that we don’t use.
If we have a sudden 100,000 users going into https://app.sca3.wpenginepowered.com, our server would need to scale to handle this amount of load, K8s will automatically spawn more pods (say 10-15 new pods) to split the load of the 100,000 users. When the user count drop back to 1,000 or 100, K8s will automatically kills the “extra” pods to help us save cost (because it is not needed anymore).
And in case of downtime or a node is down, or a container is not responding, Kubernetes will help you replace the containers or kill the containers that don’t respond to the configured health check. Usually we will configure at least 2 nodes ready for our customer instance to ensure availability. Because of that setup, we are able to do our production deployment with a little to no downtime for our customer’s instance as only one node will be brought down one at a time while the other is still available.
A good example to explain this is to talk about the release (eg, M47 production release). Let say the SCA instance have 2 pods (Pod A and Pod B). Pod A and Pod B both have the M46 code now.
During the release, k8s will automatically create a 3rd Pod (Pod C) with M47 code here. When Pod C is ready, k8s will slowly turn off Pod A and Pod B to avoid any downtime. Slowly switching everything to M47 code base.
Kubernetes uses a consistent and highly available key value store called the etcd that acts as a backing store for all cluster data. In case of the cluster failure, we can use snapshots stored in the etcd to restore the cluster and its components to its current state.
So think of etcd as like an external hard disk you have or the online storage like apple cloud or google drive. They can act as a backup for our precious pictures that you took when you are on a vacation last summer. If let’s say that you have accidentally deleted your photos from your phone, you can always get it back from your external hard disk or from the cloud storage. Pretty cool. Now for Kubernetes, they have a built-in storage that keeps all the current state from time to time and if case something bad happens, we can use this states that we have backup to restore our clusters.
How does it looks like in Connect?
Here is an example of how a request ended in our k8s cluster
In the example above, when you(the user) hit the visit button on google chrome to https://app.sca3.wpenginepowered.com , the frontend web page will send a request (API) to our GKE cluster to get the latest news for the current user (User A in the diagram).
The request will first land in our Ingress web server (reverse proxy) and he will do the load balancing and allocate the request to one of the active pods that are found in our cluster.
In the example shown above, only pod #2 and pod #3 are currently free and pod #3 is chosen to serve the request. All the mechanisms happen automatically in our Kubernetes cluster.
The pod runs our backend implementation, retrieving information needed from the database, repackaging everything nicely and returns the data back to our lovely frontend.
Each Connect instance on our Kubernetes production cluster has at least two pods to serve requests and if one of them fails/takes too long to respond, the cluster will automatically spawn new pods and replace the failed pods.
It is to ensure 100% uptime for all the instances.