K8

Architecture

Masters

In a basic kubernetes cluster the masters run the following components:

  • kube-apiserver
  • kube-controller
  • kube-scheduler
  • etcd

Workers

In a basic K8's cluster the workers run the following components:

  • kube-proxy
  • kubelet

kube-apiserver

The kube-apiserver is the front end of the K8's control plane. It resides on the master and exposes the K8's api. That means all control plane communication with the cluster - both internal and external must pass through the kube-apiserver

When we submit kubectl commands we are interacting with the kube-apiserver.

etcd

etcd is a distributed key-value store used as a k8's backing store for all cluster data.

kube-scheduler

Control plane component that watches for newly created Pods with no assigned node, and selects a node for them to run on.

kube-controller-manager

Control Plane component that runs controller processes.

Logically, each controller is a separate process, but to reduce complexity, they are all compiled into a single binary and run in a single process.

These controllers include:

  • Node controller: Responsible for noticing and responding when nodes go down.
  • Replication controller: Responsible for maintaining the correct number of pods for every replication controller object in the system.
  • Endpoints controller: Populates the Endpoints object (that is, joins Services & Pods).
  • Service Account & Token controllers: Create default accounts and API access tokens for new namespaces.

cloud-controller-manager

A Kubernetes control plane component that embeds cloud-specific control logic. The cloud controller manager lets you link your cluster into your cloud provider's API, and separates out the components that interact with that cloud platform from components that just interact with your cluster.
The cloud-controller-manager only runs controllers that are specific to your cloud provider. If you are running Kubernetes on your own premises, or in a learning environment inside your own PC, the cluster does not have a cloud controller manager.

As with the kube-controller-manager, the cloud-controller-manager combines several logically independent control loops into a single binary that you run as a single process. You can scale horizontally (run more than one copy) to improve performance or to help tolerate failures.

The following controllers can have cloud provider dependencies:

  • Node controller: For checking the cloud provider to determine if a node has been deleted in the cloud after it stops responding
  • Route controller: For setting up routes in the underlying cloud infrastructure
  • Service controller: For creating, updating and deleting cloud provider load balancers

Node Components

Node components run on every node, maintaining running pods and providing the Kubernetes runtime environment.

kubelet

An agent that runs on each node in the cluster. It makes sure that containers are running in a Pod.

The kubelet takes a set of PodSpecs that are provided through various mechanisms and ensures that the containers described in those PodSpecs are running and healthy. The kubelet doesn't manage containers which were not created by Kubernetes.

kube-proxy

kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service concept.

kube-proxy maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.

kube-proxy uses the operating system packet filtering layer if there is one and it's available. Otherwise, kube-proxy forwards the traffic itself.

Container runtime

The container runtime is the software that is responsible for running containers.

Kubernetes supports several container runtimes: Docker, containerd, CRI-O, and any implementation of the Kubernetes CRI (Container Runtime Interface).

Addons

Addons use Kubernetes resources (DaemonSet, Deployment, etc) to implement cluster features. Because these are providing cluster-level features, namespaced resources for addons belong within the kube-system namespace.

Selected addons are described below; for an extended list of available addons, please see Addons.

DNS

While the other addons are not strictly required, all Kubernetes clusters should have cluster DNS, as many examples rely on it.

Cluster DNS is a DNS server, in addition to the other DNS server(s) in your environment, which serves DNS records for Kubernetes services.

Containers started by Kubernetes automatically include this DNS server in their DNS searches.

If you are running CoreDNS as a Deployment, it will typically be exposed as a Kubernetes Service with a static IP address (10.32.0.10) . The kubelet passes DNS resolver information to each container with the --cluster-dns=<dns-service-ip> flag.

Hence, this works: As the cluster dns ip address is being passed to the created container busybox and it knows how to resolve dns

[root@c8vm-k8 ~]# kubectl exec -ti busybox -- nslookup kubernetes
Server:    10.32.0.10
Address 1: 10.32.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes
Address 1: 10.32.0.1 kubernetes.default.svc.cluster.local

CoreDNS takes it's configuration from a ConfigMap. A forwarder can be set, to forward queries that are not within the cluster domain.

kubectl get configmap coredns -n kube-system -o yaml

Web UI (Dashboard)

Dashboard is a general purpose, web-based UI for Kubernetes clusters. It allows users to manage and troubleshoot applications running in the cluster, as well as the cluster itself.

Container Resource Monitoring

Container Resource Monitoring records generic time-series metrics about containers in a central database, and provides a UI for browsing that data.

Cluster-level Logging

A cluster-level logging mechanism is responsible for saving container logs to a central log store with search/browsing interface.

K8's Ingress - wtf?

https://oteemo.com/2019/10/28/ingress-101-what-is-kubernetes-ingress-why-does-it-exist/

NodePort - Exists to allow you to expose containers to the outside world. When you create this type of svc k8's will expose the same port on every node where your pods are running. You can then open a firewall port to the node and access the service on the ip:port e.g. https://<node external ip>:<port > 30000>/ Obviously the disadvantage with this is the use of a non-standard port to access for instance your webapp. To overcome this, you could create a load balancer and point it at the nodes that could reverse proxy say 80/443 to the exposed ports on the upstream nodes.

loadBalancer - Exists for cloud providers that offer L7 load balancer services that can be called via api. This allow exact domains and paths to be created dynamically on the upstream/external/cloud hosted load balancer automatically. You could also use a product like AVI for bare-metal on prem installations.

Ingress-Controller e.g. ingress-nginx - This is a pod that handles incoming requests from an external entrypoint

Health Checks

[1] Blog Healthchecks
[2] Official K8's documentation on healthcheck endpoints
[3] Worker Node Health DaemonSet

Seems to dump logs to: tail -f /var/log/containers/node-problem-detector-v0.1-8q6jx_kube-system_node-problem-detector-ed38e636839c2cd73ec9dcbc05543a744790730ea27a85629b8e70a277aa43df.log

kubectl get componentstatus is deprecated.

[root@c8vm-k8 ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE             ERROR
controller-manager   Healthy   ok
scheduler            Healthy   ok
etcd-1               Healthy   {"health":"true"}
etcd-0               Healthy   {"health":"true"}
etcd-2               Healthy   {"health":"true"}

Use instead healthcheck calls to individual services:

kube-apiserver health check locally: or kubectl get --raw='/readyz?verbose' remotely with correct auth includes the etcd health check

root@controller-0:~# curl -k https://localhost:6443/livez?verbose
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check passed

Each one of these healthchecks can be called individually e.g. kubectl get --raw='/livez/etcd' or kubectl get --raw='/livez/poststarthook/rbac/bootstrap-roles'

kube-scheduler - There can only be one active kube-scheduler - while the other master server’s schedulers are dormant. This is achieved by updating a lease key in etcd periodically. The following command shows which master is active / the leader here.

kubectl get endpoints kube-scheduler -n kube-system -o yaml | grep control-plane

kube-controller-manager - can only be active on one of the master nodes.

kubectl get endpoints kube-controller-manager -n kube-system -o yaml | grep control-plane