k8s 05: scheduler

In this entry, I will show you how to deploy scheduler. Up to this point, I can deploy a pod by specifying which node to be used(e.g. aio-node). With scheduler, I don’t need to specify the pods any more. And I can deploy pods more intelligently because the scheduler “schedules” the pod to run in the best node available based on various criteria.

Perviously, I showed image like this below, which depict the case if there is no scheduler configured.

In this entry it would be like below:

Note that I deleted all-in-one node(aio-node) which I had been working on, and I changed the deployment a bit. I have a dedicated master node(master-01) and two worker node(worker-01, worker-02) for easy visualisation. Deployment is not much different from the all-in-one deployment. Only one difference is that we need to tell kubelet in worker nodes that the api server address is not the localhost but a reachable external ip address(internal ip address of the instance in my case – GCE).

Deploy a master node

1. Create compute instance (GCP)

I use ubuntu16.04, and put a “kube-master” tag so that this tag can be used for firewall rules later.

$ gcloud compute instances create master-01 --zone=europe-west3-b --machine-type=n1-standard-1 --subnet=subnet-00 --private-network-ip=10.240.0.11 --tags=kube-master --image=ubuntu-1604-xenial-v20180912 --image-project=ubuntu-os-cloud --boot-disk-size=10GB --boot-disk-type=pd-standard --boot-disk-device-name=master-01-sda
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
Created [https://www.googleapis.com/compute/v1/projects/python100pj/zones/europe-west3-b/instances/master-01].
NAME       ZONE            MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
master-01  europe-west3-b  n1-standard-1               10.240.0.11   35.198.68.x  RUNNING

2. Install etcd and API server 

You can refer my previous post “k8s 03: etcd-and-api-server”.

[ etcd install ]

shogo@master-01:~/work$ curl -L https://github.com/etcd-io/etcd/releases/download/v3.3.9/etcd-v3.3.9-linux-amd64.tar.gz -o etcd-v3.3.9-linux-amd64.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   620    0   620    0     0    713      0 --:--:-- --:--:-- --:--:--   712
100 10.7M  100 10.7M    0     0  4494k      0  0:00:02  0:00:02 --:--:-- 15.0M
shogo@master-01:~/work/etcd-v3.3.9-linux-amd64$ sudo mv etcd* /usr/local/bin/
shogo@master-01:~/work/etcd-v3.3.9-linux-amd64$ sudo su -
root@master-01:~# mkdir -p /work/etcd-data/
root@master-01:~# cat << EOF >> /etc/systemd/system/etcd.service
> [Unit]
> Description=etcd
> Documentation=https://github.com/etcd-io/etcd
> 
> [Service]
> ExecStart=/usr/local/bin/etcd --data-dir=/work/etcd-data
> Restart=on-failure
> RestartSec=5
> 
> [Install]
> WantedBy=multi-user.target
> EOF
root@master-01:~# systemctl daemon-reload
root@master-01:~# systemctl start etcd
root@master-01:/home/k_shogo/work# systemctl enable etcd
Created symlink from /etc/systemd/system/multi-user.target.wants/etcd.service to /etc/systemd/system/etcd.service.

[ API server install ]

You can download the latest server binaries from here.

shogo@master-01:~/work$ curl -L https://github.com/kubernetes/kubernetes/releases/download/v1.10.8/kubernetes.tar.gz -o kubernetes.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   607    0   607    0     0    922      0 --:--:-- --:--:-- --:--:--   923
100 2657k  100 2657k    0     0   706k      0  0:00:03  0:00:03 --:--:--  889k
k_shogo@master-01:~/work$ tar xzf kubernetes.tar.gz 
k_shogo@master-01:~/work$ ll ./kubernetes/server/
total 96
drwxr-xr-x 2 k_shogo k_shogo  4096 Sep 14 17:07 ./
drwxr-xr-x 9 k_shogo k_shogo  4096 Sep 14 17:07 ../
-rw-r--r-- 1 k_shogo k_shogo 82827 Sep 14 17:07 kubernetes-manifests.tar.gz
-rw-r--r-- 1 k_shogo k_shogo   153 Sep 14 17:07 README
k_shogo@master-01:~/work$ ./kubernetes/cluster/get-kube-binaries.sh 
Kubernetes release: v1.10.8
Server: linux/amd64  (to override, set KUBERNETES_SERVER_ARCH)
Client: linux/amd64  (autodetected)

Will download kubernetes-server-linux-amd64.tar.gz from https://dl.k8s.io/v1.10.8
Will download and extract kubernetes-client-linux-amd64.tar.gz from https://dl.k8s.io/v1.10.8
Is this ok? [Y]/n

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   161  100   161    0     0    234      0 --:--:-- --:--:-- --:--:--   234
100  413M  100  413M    0     0  59.3M      0  0:00:06  0:00:06 --:--:-- 76.3M

md5sum(kubernetes-server-linux-amd64.tar.gz)=8f6520bcc84fb77863ddd6da2fad89b0
sha1sum(kubernetes-server-linux-amd64.tar.gz)=3da578370d54344e2a54e992c9bf92dac48328eb

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   161  100   161    0     0    242      0 --:--:-- --:--:-- --:--:--   242
100 12.8M  100 12.8M    0     0  9008k      0  0:00:01  0:00:01 --:--:-- 37.3M

md5sum(kubernetes-client-linux-amd64.tar.gz)=48f39399da29d4b8c66f79d00b5b625a
sha1sum(kubernetes-client-linux-amd64.tar.gz)=f783ebc8b2c5698612870302fd22c8d4ef241d6b

Extracting /home/k_shogo/work/kubernetes/client/kubernetes-client-linux-amd64.tar.gz into /home/k_shogo/work/kubernetes/platforms/linux/amd64
Add '/home/k_shogo/work/kubernetes/client/bin' to your PATH to use newly-installed binaries.
shogo@master-01:~/work$ ll ./kubernetes/server/
total 423920
drwxr-xr-x  2 k_shogo k_shogo      4096 Sep 25 09:20 ./
drwxr-xr-x 10 k_shogo k_shogo      4096 Sep 25 09:20 ../
-rw-r--r--  1 k_shogo k_shogo     82827 Sep 14 17:07 kubernetes-manifests.tar.gz
-rw-rw-r--  1 k_shogo k_shogo 433994920 Sep 25 09:20 kubernetes-server-linux-amd64.tar.gz
-rw-r--r--  1 k_shogo k_shogo       153 Sep 14 17:07 README
k_shogo@master-01:~/work$ tar xzf ./kubernetes/server/kubernetes-server-linux-amd64.tar.gz 
k_shogo@master-01:~/work$ ll ./kubernetes/server/
total 423924
drwxr-xr-x  3 k_shogo k_shogo      4096 Sep 14 16:58 ./
drwxr-xr-x 11 k_shogo k_shogo      4096 Sep 14 16:59 ../
drwxr-xr-x  2 k_shogo k_shogo      4096 Sep 14 16:59 bin/
-rw-r--r--  1 k_shogo k_shogo     82827 Sep 14 17:07 kubernetes-manifests.tar.gz
-rw-rw-r--  1 k_shogo k_shogo 433994920 Sep 25 09:20 kubernetes-server-linux-amd64.tar.gz
-rw-r--r--  1 k_shogo k_shogo       153 Sep 14 17:07 README
shogo@master-01:~/work$ ll ./kubernetes/server/bin/
total 2050124
drwxr-xr-x 2 k_shogo k_shogo      4096 Sep 14 16:59 ./
drwxr-xr-x 3 k_shogo k_shogo      4096 Sep 14 16:58 ../
-rwxr-xr-x 1 k_shogo k_shogo  59063289 Sep 14 16:58 apiextensions-apiserver*
-rwxr-xr-x 1 k_shogo k_shogo 134938788 Sep 14 16:58 cloud-controller-manager*
-rw-r--r-- 1 k_shogo k_shogo         8 Sep 14 16:58 cloud-controller-manager.docker_tag
-rw-r--r-- 1 k_shogo k_shogo 136332800 Sep 14 16:58 cloud-controller-manager.tar
-rwxr-xr-x 1 k_shogo k_shogo 269503888 Sep 14 16:58 hyperkube*
-rwxr-xr-x 1 k_shogo k_shogo 159545416 Sep 14 16:58 kubeadm*
-rwxr-xr-x 1 k_shogo k_shogo  57823009 Sep 14 16:58 kube-aggregator*
-rw-r--r-- 1 k_shogo k_shogo         8 Sep 14 16:58 kube-aggregator.docker_tag
-rw-r--r-- 1 k_shogo k_shogo  59216896 Sep 14 16:58 kube-aggregator.tar
-rwxr-xr-x 1 k_shogo k_shogo 226908214 Sep 14 16:58 kube-apiserver*
-rw-r--r-- 1 k_shogo k_shogo         8 Sep 14 16:58 kube-apiserver.docker_tag
-rw-r--r-- 1 k_shogo k_shogo 228302336 Sep 14 16:58 kube-apiserver.tar
-rwxr-xr-x 1 k_shogo k_shogo 149688305 Sep 14 16:58 kube-controller-manager*
-rw-r--r-- 1 k_shogo k_shogo         8 Sep 14 16:58 kube-controller-manager.docker_tag
-rw-r--r-- 1 k_shogo k_shogo 151081984 Sep 14 16:58 kube-controller-manager.tar
-rwxr-xr-x 1 k_shogo k_shogo  55088352 Sep 14 16:59 kubectl*
-rwxr-xr-x 1 k_shogo k_shogo 155785360 Sep 14 16:58 kubelet*
-rwxr-xr-x 1 k_shogo k_shogo  52146229 Sep 14 16:58 kube-proxy*
-rw-r--r-- 1 k_shogo k_shogo         8 Sep 14 16:58 kube-proxy.docker_tag
-rw-r--r-- 1 k_shogo k_shogo 100128256 Sep 14 16:58 kube-proxy.tar
-rwxr-xr-x 1 k_shogo k_shogo  50057879 Sep 14 16:58 kube-scheduler*
-rw-r--r-- 1 k_shogo k_shogo         8 Sep 14 16:58 kube-scheduler.docker_tag
-rw-r--r-- 1 k_shogo k_shogo  51451904 Sep 14 16:58 kube-scheduler.tar
-rwxr-xr-x 1 k_shogo k_shogo   2165591 Sep 14 16:58 mounter*
shogo@master-01:~/work$ sudo su
root@master-01:/home/k_shogo/work# mv ./kubernetes/server/bin/kube-apiserver /usr/local/bin/
root@master-01:/home/k_shogo/work# cat << EOF >> /etc/systemd/system/kube-apiserver.service
> [Unit]
> Description=Kubernetes API Server
> Documentation=https://github.com/kubernetes/kubernetes
> 
> [Service]
> ExecStart=/usr/local/bin/kube-apiserver \
>   --etcd-servers=http://127.0.0.1:2379 \
>   --service-cluster-ip-range=10.32.0.0/24 \
>   --insecure-bind-address=0.0.0.0
> Restart=on-failure
> RestartSec=5
> 
> [Install]
> WantedBy=multi-user.target
> EOF
root@master-01:/home/k_shogo/work# systemctl daemon-reload
root@master-01:/home/k_shogo/work# systemctl start kube-apiserver
root@master-01:/home/k_shogo/work# systemctl enable kube-apiserver
Created symlink from /etc/systemd/system/multi-user.target.wants/kube-apiserver.service to /etc/systemd/system/kube-apiserver.service.

3. Install kubectl

root@controller-1:/home/k_shogo/work# sudo mv  kubernetes/server/bin/kubectl /usr/local/bin

 

Deploy two worker nodes

1. Create a first compute instance (GCP)

I deploy one node first, then I will clone the disk to create the second node.

$ gcloud compute instances create worker-1 --zone=europe-west3-b --machine-type=n1-standard-1 --subnet=subnet-00 --private-network-ip=10.240.0.21 --image=ubuntu-1604-xenial-v20180912 --image-project=ubuntu-os-cloud --boot-disk-size=10GB --boot-disk-type=pd-standard --boot-disk-device-name=worker-01
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
\Created [https://www.googleapis.com/compute/v1/projects/python100pj/zones/europe-west3-b/instances/worker-1].
NAME       ZONE            MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
worker-1  europe-west3-b  n1-standard-1               10.240.0.21   35.242.194.x  RUNNING

2. Install container runtime and kubelet

You can refer my previous post “k8s 02: how kubelet works”.

shogo@worker-01:~$ sudo apt-get install docker-ce
Reading package lists... Done
...
shogo@worker-01:~$ sudo su -
root@worker-01:~# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add
 -
OK
root@worker-01:~# cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
> deb http://apt.kubernetes.io/ kubernetes-xenial main
> EOF
root@worker-01:~# apt update
root@worker-01:~# apt install -y kubelet
root@worker-1:~# cat << EOF >> /var/lib/kubelet/worker.kubeconfig
> apiVersion: v1
> kind: Config
> clusters:
> - cluster:
>     server: http://10.240.0.11:8080
>   name: local
> contexts:
> - context:
>     cluster: local
>     user: ""
>   name: local
> current-context: local
> preferences: {}
> users: []
> EOF
root@worker-1:~# cat /lib/systemd/system/kubelet.service 
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/home/
[Service]
ExecStart=/usr/bin/kubelet --kubeconfig=/var/lib/kubelet/worker.kubeconfig
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
root@worker-1:~# systemctl daemon-reload
root@worker-1:~# systemctl restart kubelet
root@worker-1:~# systemctl status kubelet

3.Create a first compute instance (GCP)

Now I copy the disk from the first node, and create another instance with the hostname “worker-02”

$gcloud compute disks create "worker-02" --size "10" --zone "europe-west3-b" --source-snapshot "ss-kube-worker" --type "pd-standard" && gcloud beta compute instances create worker-02 --zone=europe-west3-b --machine-type=n1-standard-1 --subnet=subnet-00 --private-network-ip=10.240.0.22 --tags=kube-worker --disk=name=worker-02,device-name=worker-02
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
Created [https://www.googleapis.com/compute/v1/projects/python100pj/zones/europe-west3-b/disks/worker-02].
NAME       ZONE            SIZE_GB  TYPE         STATUS
worker-02  europe-west3-b  10       pd-standard  READY
Created [https://www.googleapis.com/compute/beta/projects/python100pj/zones/europe-west3-b/instances/worker-02].
NAME       ZONE            MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
worker-02  europe-west3-b  n1-standard-1  true         10.240.0.22  35.242.194.x  RUNNING

 

Configure GCP Firewall rules

1. Add a Firewall rule for worker -> master

For the node to correctly register itself(and eventually to get the pod running), the worker node needs to communicate with API server on tcp port 8080(in our case, not secured. or tcp:6443 for secured port by default).

GCP has permissive rule for allow all for internal communication if the compute instances are in default VPC, but I’m using custom VPC hence all the internal communication is discarded if no rule is created(implicit deny).

gcloud compute firewall-rules create internal-worker2master --direction=INGRESS --network=default --action=ALLOW --rules=tcp:8080 --source-tags=kube-worker --target-tags=kube-master

Confirm the setup

1. Check node status on the master

Now, I can see the node is correctly registered on the master.

shogo@controller-1:~$ kubectl get nodes
NAME        STATUS    ROLES     AGE       VERSION
worker-02   Ready     <none>    27s       v1.11.3
worker-1    Ready     <none>    39m       v1.11.3

2. Check if a pod is deployed with nodename

And I can see a pod can be deployed in the worker node if there is nodename specified in the manifest.

shogo@controller-1:~/work$ cat << EOF >> nginxtest-with-nodename.yaml
>apiVersion: v1
>kind: Pod
>metadata:
>  name: nginx-test-with-nodename
>spec:
>  containers:
>  - name: nginx
>    image: nginx
>    ports:
>    - containerPort: 80
>  nodeName: worker-1
>EOF
shogo@controller-1:~/work$ kubectl apply -f nginxtest-with-nodename.yaml 
pod "nginx-test-with-nodename" created
shogo@controller-1:~/work$ kubectl get pods -o wide
NAME                       READY     STATUS    RESTARTS   AGE       IP           NODE
nginx-test-with-nodename   1/1       Running   0          2m        172.17.0.2   worker-1

So this is where I left the last entry, now I start deploying the scheduler.

Deploy scheduler

1. Install kube-scheduler

shogo@controller-1:~$ cd work/
shogo@controller-1:~/work$ sudo mv ./kubernetes/server/bin/kube-scheduler /usr/local/bin/
shogo@controller-1:~/work$ sudo cp /etc/systemd/system/kube-apiserver.service /etc/systemd/system/kube-scheduler.service
# edit kube-scheduler.service as follows:
shogo@controller-1:~/work$ cat /etc/systemd/system/kube-scheduler.service 
[Unit]
Description=Kubernetes Scheduler Server
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-scheduler  --master=http://127.0.0.1:8080
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
shogo@controller-1:~/work$ sudo systemctl daemon-reload
k_shogo@controller-1:~/work$ sudo systemctl restart kube-scheduler
k_shogo@controller-1:~/work$ sudo systemctl status kube-scheduler
● kube-scheduler.service - Kubernetes Scheduler Server
   Loaded: loaded (/etc/systemd/system/kube-scheduler.service; disabled; vendor preset: enabl
   Active: active (running) since Tue 2018-09-25 17:18:22 UTC; 13s ago
     Docs: https://github.com/kubernetes/kubernetes
 Main PID: 3114 (kube-scheduler)
    Tasks: 5
   Memory: 11.8M
      CPU: 149ms
   CGroup: /system.slice/kube-scheduler.service
           └─3114 /usr/local/bin/kube-scheduler --master=http://127.0.0.1:8080

Sep 25 17:18:22 controller-1 systemd[1]: Stopped Kubernetes Scheduler Server.
Sep 25 17:18:22 controller-1 systemd[1]: Started Kubernetes Scheduler Server.
Sep 25 17:18:22 controller-1 kube-scheduler[3114]: W0925 17:18:22.248051    3114 server.go:16
Sep 25 17:18:22 controller-1 kube-scheduler[3114]: I0925 17:18:22.302638    3114 server.go:55
Sep 25 17:18:22 controller-1 kube-scheduler[3114]: I0925 17:18:22.311257    3114 server.go:57
Sep 25 17:18:23 controller-1 kube-scheduler[3114]: I0925 17:18:23.216142    3114 controller_u
Sep 25 17:18:23 controller-1 kube-scheduler[3114]: I0925 17:18:23.316334    3114 controller_u
Sep 25 17:18:23 controller-1 kube-scheduler[3114]: I0925 17:18:23.316404    3114 leaderelecti
Sep 25 17:18:23 controller-1 kube-scheduler[3114]: I0925 17:18:23.323714    3114 leaderelecti
shogo@controller-1:~/work$ sudo systemctl enable kube-scheduler
Created symlink from /etc/systemd/system/multi-user.target.wants/kube-scheduler.service to /etc/systemd/system/kube-scheduler.service.

2. Check

Scheduler is working now, so it should do all the hardwork to schedule/assign the requested pod onto available worker node.

# create manifest without nodename
shogo@controller-1:~/work$ cat nginxtest.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginxtest
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
apiVersion: v1
kind: Pod
metadata:
  name: nginxtest
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
# and apply it
shogo@controller-1:~/work$ kubectl apply -f nginxtest.yaml 
pod "nginxtest" created
# now I can see the scheduler assigned worker-1 for this pod. and it's working
shogo@controller-1:~/work$ kubectl get pods -o wide
NAME        READY     STATUS    RESTARTS   AGE       IP           NODE
nginxtest   1/1       Running   0          1m        172.17.0.2   worker-1

So far so good.

But it still lacks a few vital part of the deployment. One of them is a controller-manager. It monitors the node status, and deployment status and it takes necessary action if it’s not in the desired situation. Basically all the other component just takes care on what they are told, and they don’t care what the actual situation is. Without the controller-manager, there might be a case that worker nodes are not working, but the master doesn’t notice it and it just keeps waiting for the worker node, which in its point of view “Ready”, to fetch the work assigned to it.

In the next entry, I deploy controller-manager.