Pod 正常里面的 Docker 服务不一定正常。Docker 服务正常,Docker 里面的服务不一定正常。所以如何正确的监测这些状态,成为了应用健康很重要的关键。 livenessProbe, 用来判定容器是否正常。readinessProbe 用来判定容器中的服务是否正常。这两种探测非常重要,一定要利用探测来证明容器正常后才能接入 Service。不然用户可能会访问失败。同时设置 readinessProbe 有助于在滚动更新时候判断容器中服务的状态,保证应用能提供健康的服务。livenessProbe,readinessProbe 和 postStart,preStop 都支持三种方式的探测,分别是 exec 执行系统命令,tcp socket 和 http get 请求。
livenessProbe 1 kubectl explain pods.spec.containers.livenessProbe
livenessProbe 支持三种存活状态的检测,分别是 tcp,exec,http get。下面演示两种
exec 存活探测 创建一个 yaml 文件,内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 apiVersion: v1 kind: Pod metadata: name: liveness-exec-pod namespace: default spec: containers: - name: liveness-exec-container image: busybox:latest imagePullPolicy: IfNotPresent command: ["/bin/sh", "-c" , "touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 3600" ] livenessProbe: exec: command: ["test", "-e" , "/tmp/healthy" ] initialDelaySeconds: 1 periodSeconds: 3 failureThreshold: 3 successThreshold: timeoutSeconds: 1
上面的 Pod 创建后,就会创建 /tmp/healthy 文件,并且睡 30s,之后被删除。健康检查的内容是容器启动1s后判断 /tmp/healthy 文件是否存在,且每隔10s进行一次探测,失败3次即认为失败。健康检查失败后就会进行重新启动。下面是 pod 的列表信息,可以看到重启的次数。
1 2 3 [root@k8s001 rexyan] NAME READY STATUS RESTARTS AGE liveness-exec -pod 1 /1 Running 5 6 m17s
查看详细信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 [root@k8s001 rexyan] Name: liveness-exec -pod Namespace: default Priority: 0 PriorityClassName: <none> Node: k8s002/172.20 .245 .189 Start Time: Sun, 19 May 2019 16 :05 :01 +0800 Labels: <none> Annotations: <none> Status: Running IP: 10.244 .2 .2 Containers: liveness-exec -container: Container ID: docker://b6d08991993bb306f32b58f7bcc71651ac2b68d1021a05634bcae6832bbbe169 Image: busybox:latest Image ID: docker-pullable://docker.io/busybox@sha256:4 b6ad3a68d34da29bf7c8ccb5d355ba8b4babcad1f99798204e7abb43e54ee3d Port: <none> Host Port: <none> Command: /bin/sh -c touch /tmp/healthy; sleep 30 ; rm -rf /tmp/healthy; sleep 3600 State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 137 Started: Sun, 19 May 2019 16 :10 :48 +0800 Finished: Sun, 19 May 2019 16 :11 :57 +0800 Ready: False Restart Count: 5 Liveness: exec [test -e /tmp/healthy] delay=1 s timeout=1 s period=3 s Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-vckdx (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-vckdx: Type: Secret (a volume populated by a Secret) SecretName: default-token-vckdx Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not -ready:NoExecute for 300 s node.kubernetes.io/unreachable:NoExecute for 300 s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 7 m16s default-scheduler Successfully assigned default/liveness-exec -pod to k8s002 Normal Pulling 7 m16s kubelet, k8s002 Pulling image "busybox:latest" Normal Pulled 7 m14s kubelet, k8s002 Successfully pulled image "busybox:latest" Normal Killing 4 m17s (x3 over 6 m35s) kubelet, k8s002 Container liveness-exec -container failed liveness probe, will be restarted Normal Created 3 m47s (x4 over 7 m14s) kubelet, k8s00 2 Created container liveness-exec -container Normal Started 3 m47s (x4 over 7 m13s) kubelet, k8s002 Started container liveness-exec -container Normal Pulled 3 m47s (x3 over 6 m5s) kubelet, k8s002 Container image "busybox:latest" already present on machine Warning Unhealthy 2 m5s (x13 over 6 m41s) kubelet, k8s002 Liveness probe failed:
在 Containers 中可以看到刚才配置的健康检查的信息
1 2 Restart Count: 5 Liveness: exec [test -e /tmp/healthy] delay=1s timeout=1s period=3s #success=1 #failure=3
http get 存活探测 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 apiVersion: v1 kind: Pod metadata: name: liveness-http-pod namespace: default spec: containers: - name: liveness-http-get-container image: ikubernetes/myapp:v1 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80 livenessProbe: httpGet: port: http path: /index.html initialDelaySeconds: 1 periodSeconds: 3 failureThreshold: 3 successThreshold: 1 timeoutSeconds: 1
查看容器状态1 2 3 4 [root@k8s001 rexyan] NAME READY STATUS RESTARTS AGE liveness-exec -pod 0 /1 CrashLoopBackOff 9 23 m liveness-http-pod 1 /1 Running 0 104 s
查看详细信息1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 [root@k8s001 rexyan] Name: liveness-http-pod Namespace: default Priority: 0 PriorityClassName: <none> Node: k8s003/172.20 .245 .191 Start Time: Sun, 19 May 2019 16 :27 :15 +0800 Labels: <none> Annotations: <none> Status: Running IP: 10.244 .1 .3 Containers: liveness-http-get-container: Container ID: docker://9 cb65d175dc8263f54891b597e3a5f4a334f20c4ab636d532887cabfeb7cff3c Image: ikubernetes/myapp:v1 Image ID: docker-pullable://docker.io/ikubernetes/myapp@sha256:9 c3dc30b5219788b2b8a4b065f548b922a34479577befb54b03330999d30d513 Port: 80 /TCP Host Port: 0 /TCP State: Running Started: Sun, 19 May 2019 16 :27 :18 +0800 Ready: True Restart Count: 0 Liveness: http-get http://:http/index.html delay=1 s timeout=1 s period=3 s Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-vckdx (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: default-token-vckdx: Type: Secret (a volume populated by a Secret) SecretName: default-token-vckdx Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not -ready:NoExecute for 300 s node.kubernetes.io/unreachable:NoExecute for 300 s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 2 m58s default-scheduler Successfully assigned default/liveness-http-pod to k8s003 Normal Pulling 2 m58s kubelet, k8s003 Pulling image "ikubernetes/myapp:v1" Normal Pulled 2 m55s kubelet, k8s003 Successfully pulled image "ikubernetes/myapp:v1" Normal Created 2 m55s kubelet, k8s003 Created container liveness-http-get-container Normal Started 2 m55s kubelet, k8s003 Started container liveness-http-get-container [root@k8s001 rexyan]
在 Containers 中可以看到刚才配置的健康检查的信息
1 2 Restart Count: 0 Liveness: http-get http://:http/index.html delay=1 s timeout=1 s period=3 s
现在手动进入容器,删除健康检查的 index.html 页面
1 2 3 4 [root@k8s001 rexyan] NAME READY STATUS RESTARTS AGE liveness-exec -pod 0 /1 CrashLoopBackOff 11 28 m liveness-http-pod 1 /1 Running 0 6 m4s
再次看 pod 的状态就会发现 pod 已经重启了一次,重启之后删除的文件就回来了,所以就不会再重启了。
1 2 3 4 [root@k8s001 rexyan] NAME READY STATUS RESTARTS AGE liveness-exec -pod 0 /1 CrashLoopBackOff 11 30 m liveness-http-pod 1 /1 Running 1 8 m12s
redinessProbe 1 kubectl explain pods.spec.containers.readinessProbe
redinessProbe 也支持三种存活状态的检测,分别是 tcp,exec,http get,下面演示一种。
http get 存活探测 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 apiVersion: v1 kind: Pod metadata: name: readiness-http-pod namespace: default spec: containers: - name: readiness-http-get-container image: ikubernetes/myapp:v1 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80 readinessProbe: httpGet: port: http path: /index.html initialDelaySeconds: 1 periodSeconds: 3 failureThreshold: 3 successThreshold: 1 timeoutSeconds: 1
1 2 [root@k8s001 rexyan] pod/readiness-http-pod created
1 2 3 4 [root@k8s001 rexyan] NAME READY STATUS RESTARTS AGE liveness-http-pod 1 /1 Running 1 26 m readiness-http-pod 1 /1 Running 0 5 s
之后进入容器删除 index.html
查看 pod 的信息, 可以看到 readiness-http-pod READY 个数变成了 0。READY 中 / 前面是值表示 pod 中容器就绪的数量,后面的是 pod 中容器的总个数。
1 2 3 4 [root@k8s001 rexyan] NAME READY STATUS RESTARTS AGE liveness-http-pod 1 /1 Running 1 30 m readiness-http-pod 0 /1 Running 0 3 m43s
进入容器,重新写信息到 nginx 的 index 文件中
重新查看 pod 的信息,就可以看到 pod 的 READY 状态已经从 0 变成1了
1 2 3 4 [root@k8s001 rexyan] NAME READY STATUS RESTARTS AGE liveness-http-pod 1 /1 Running 1 38 m readiness-http-pod 1 /1 Running 0 11 m
查看详细的 pod 信息
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 [root@k8s001 rexyan] Name: readiness-http-pod Namespace: default Priority: 0 PriorityClassName: <none> Node: k8s002/172.20 .245 .189 Start Time: Sun, 19 May 2019 16 :54 :04 +0800 Labels: <none> Annotations: <none> Status: Running IP: 10.244 .2 .3 Containers: readiness-http-get-container: Container ID: docker://2989185e07600 a552f6a57ecc3e813156002e2218701da07da8b2efbfaf7c966 Image: ikubernetes/myapp:v1 Image ID: docker-pullable://docker.io/ikubernetes/myapp@sha256:9 c3dc30b5219788b2b8a4b065f548b922a34479577befb54b03330999d30d513 Port: 80 /TCP Host Port: 0 /TCP State: Running Started: Sun, 19 May 2019 16 :54 :07 +0800 Ready: True Restart Count: 0 Readiness: http-get http://:http/index.html delay=1 s timeout=1 s period=3 s Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-vckdx (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: default-token-vckdx: Type: Secret (a volume populated by a Secret) SecretName: default-token-vckdx Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not -ready:NoExecute for 300 s node.kubernetes.io/unreachable:NoExecute for 300 s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 14 m default-scheduler Successfully assigned default/readiness-http-pod to k8s002 Normal Pulling 14 m kubelet, k8s002 Pulling image "ikubernetes/myapp:v1" Normal Pulled 14 m kubelet, k8s002 Successfully pulled image "ikubernetes/myapp:v1" Normal Created 14 m kubelet, k8s002 Created container readiness-http-get-container Normal Started 14 m kubelet, k8s002 Started container readiness-http-get-container Warning Unhealthy 4 m4s (x134 over 10 m) kubelet, k8s002 Readiness probe failed: HTTP probe failed with statuscode: 404 [root@k8s001 rexyan]
在 Containers 中可以看到刚才配置的健康检查的信息
1 2 Restart Count: 0 Readiness: http-get http://:http/index.html delay=1 s timeout=1 s period=3 s
容器启动和结束钩子 在容器启动后和结束前都有对应的钩子,分别是 postStart 和 preStop
postStart 1 kubectl explain pods.spec.containers.lifecycle.postStart
postStart 有三种执行方式,分别是tcp,exec 和 http get。
preStop 1 kubectl explain pods.spec.containers.lifecycle.preStop
preStop 也有三种执行方式,分别是tcp,exec 和 http get