Tasks

Step-by-step instructions for performing operations with Kubernetes.

Edit This Page

应用程序自检和调试

一旦你的应用程序运行起来了,你将不可避免地需要对它进行调试。 之前我们介绍过如何使用 kubectl get pod 来检索有关您的 pod 的简单状态信息。但还有很多方法可以获得有关应用程序的更多信息。

使用 kubectl describe pod 来获取有关 pod 的详细信息

在这个例子中,我们将使用 Deployment 来创建两个 pod,与前面的示例类似。

nginx-dep.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        resources:
          limits:
            memory: "128Mi"
            cpu: "500m"
        ports:
        - containerPort: 80

使用如下命令来创建 deployment:

$ kubectl create -f https://k8s.io/docs/tasks/debug-application-cluster/nginx-dep.yaml
deployment "nginx-deployment" created
$ kubectl get pods
NAME                                READY     STATUS    RESTARTS   AGE
nginx-deployment-1006230814-6winp   1/1       Running   0          11s
nginx-deployment-1006230814-fmgu3   1/1       Running   0          11s

我们可以使用 kubectl describe pod 获取每个 pod 的更多信息。例如:

$ kubectl describe pod nginx-deployment-1006230814-6winp
Name:		nginx-deployment-1006230814-6winp
Namespace:	default
Node:		kubernetes-node-wul5/10.240.0.9
Start Time:	Thu, 24 Mar 2016 01:39:49 +0000
Labels:		app=nginx,pod-template-hash=1006230814
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind"                           :"ReplicaSet","namespace":"default","name":"nginx-deployment-1956810328","uid":"14e607e7-8ba1-11e7-b5cb-fa16"                             ...
Status:		Running
IP:		10.244.0.6
Controllers:	ReplicaSet/nginx-deployment-1006230814
Containers:
  nginx:
    Container ID:	docker://90315cc9f513c724e9957a4788d3e625a078de84750f244a40f97ae355eb1149
    Image:		nginx
    Image ID:		docker://6f62f48c4e55d700cf3eb1b5e33fa051802986b77b874cc351cce539e5163707
    Port:		80/TCP
    QoS Tier:
      cpu:	Guaranteed
      memory:	Guaranteed
    Limits:
      cpu:	500m
      memory:	128Mi
    Requests:
      memory:		128Mi
      cpu:		500m
    State:		Running
      Started:		Thu, 24 Mar 2016 01:39:51 +0000
    Ready:		True
    Restart Count:	0
    Environment:        <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-5kdvl (ro)
Conditions:
  Type          Status
  Initialized   True
  Ready         True
  PodScheduled  True
Volumes:
  default-token-4bcbi:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-4bcbi
    Optional:   false
QoS Class:      Guaranteed
Node-Selectors: <none>
Tolerations:    <none>
Events:
  FirstSeen	LastSeen	Count	From					SubobjectPath		Type		Reason		Message
  ---------	--------	-----	----					-------------		--------	------		-------
  54s		54s		1	{default-scheduler }						Normal		Scheduled	Successfully assigned nginx-deployment-1006230814-6winp to kubernetes-node-wul5
  54s		54s		1	{kubelet kubernetes-node-wul5}	spec.containers{nginx}	Normal		Pulling		pulling image "nginx"
  53s		53s		1	{kubelet kubernetes-node-wul5}	spec.containers{nginx}	Normal		Pulled		Successfully pulled image "nginx"
  53s		53s		1	{kubelet kubernetes-node-wul5}	spec.containers{nginx}	Normal		Created		Created container with docker id 90315cc9f513
  53s		53s		1	{kubelet kubernetes-node-wul5}	spec.containers{nginx}	Normal		Started		Started container with docker id 90315cc9f513

在这里您可以看到有关容器和 Pod 的配置信息(标签,资源需求等),以及有关容器和 Pod 的状态信息(状态,准备情况,重新启动次数,事件等)。

容器状态是 Waiting,Running 或 Terminated 之一。根据状态,可以获得更多信息 – 在这里您可以看到,对于处于运行状态的容器,系统会告诉您何时启动的容器。

Ready 告诉您容器是否通过了最后一次准备就绪探测。(在这种情况下,容器没有配置就绪探针;如果未配置准备就绪探针,则假定容器已准备就绪。)

重启数量会告诉您容器重新启动的次数; 此信息可用于检测重启策略为 ‘always’ 的容器的循环崩溃。

目前,与 Pod 相关的唯一条件是二进制 Ready 状态,这表明该 Pod 可以处理请求,并且应该添加到所有匹配服务的负载均衡池中。

最后,您会看到与您的 Pod 有关的最近事件日志。系统压缩多个相同的事件,只显示第一次和最后一次出现的时间以及出现的次数。”From” 表示记录事件的组件,”SubobjectPath” 告诉您哪个对象(例如容器内的容器)被引用,”Reason” 和 “Message” 告诉您发生了什么。

示例:调试 Pending 状态的 Pod

通过事件排查的一种常见情况是创建了不适合任何节点的 Pod。例如,Pod 可能会请求比任何节点上的空闲资源更多的资源,或者可能会指定一个不匹配任何节点的标签选择器。 假设我们在上面的 Deployment 例子中创建 5 个 replicas(而不是 2 个),并请求 600 millicores 而不是 500 millicores,集群拥有 4 个节点,每个(虚拟)机器有 1 个 CPU。 在这种情况下,其中一个 Pod 将无法调度。(请注意,由于在每个节点上运行了集群附加 pod,例如 fluentd 和 skydns 等,如果我们请求 1000 millicores,则没有任何一个 pod 可以成功调度。)

$ kubectl get pods
NAME                                READY     STATUS    RESTARTS   AGE
nginx-deployment-1006230814-6winp   1/1       Running   0          7m
nginx-deployment-1006230814-fmgu3   1/1       Running   0          7m
nginx-deployment-1370807587-6ekbw   1/1       Running   0          1m
nginx-deployment-1370807587-fg172   0/1       Pending   0          1m
nginx-deployment-1370807587-fz9sd   0/1       Pending   0          1m

要找出 nginx-deployment-1370807587-fz9sd pod 未运行的原因,我们可以在待处理的 Pod 上使用 kubectl describe pod 并查看其事件:

$ kubectl describe pod nginx-deployment-1370807587-fz9sd
  Name:		nginx-deployment-1370807587-fz9sd
  Namespace:	default
  Node:		/
  Labels:		app=nginx,pod-template-hash=1370807587
  Status:		Pending
  IP:
  Controllers:	ReplicaSet/nginx-deployment-1370807587
  Containers:
    nginx:
      Image:	nginx
      Port:	80/TCP
      QoS Tier:
        memory:	Guaranteed
        cpu:	Guaranteed
      Limits:
        cpu:	1
        memory:	128Mi
      Requests:
        cpu:	1
        memory:	128Mi
      Environment Variables:
  Volumes:
    default-token-4bcbi:
      Type:	Secret (a volume populated by a Secret)
      SecretName:	default-token-4bcbi
  Events:
    FirstSeen	LastSeen	Count	From			        SubobjectPath	Type		Reason			    Message
    ---------	--------	-----	----			        -------------	--------	------			    -------
    1m		    48s		    7	    {default-scheduler }			        Warning		FailedScheduling	pod (nginx-deployment-1370807587-fz9sd) failed to fit in any node
  fit failure on node (kubernetes-node-6ta5): Node didn't have enough resource: CPU, requested: 1000, used: 1420, capacity: 2000
  fit failure on node (kubernetes-node-wul5): Node didn't have enough resource: CPU, requested: 1000, used: 1100, capacity: 2000

在这里,您可以看到 scheduler 生成的事件,表明由于 FailedScheduling(可能还有其他原因),Pod 无法调度。该消息告诉我们没有任何节点能够满足 Pod 的需求。

要解决这种情况,可以使用 kubectl scale 来更新您的部署以指定 4 个或更少的 replicas。(或者您可以让一个Pod 保持 pending,这是无害的。)

在 etcd 中存储了类似于 kubectl describe pod 结尾处看到的事件,并提供有关集群中正在发生的事情的高级信息。您可以使用如下命令列出所有事件:

kubectl get events

但是您需要记住事件是具有命名空间的。这意味着如果您对某些命名空间对象的事件感兴趣(例如,命名空间 my-namespace 中的 Pod 发生了什么),则需要明确地为命令提供一个命名空间:

kubectl get events --namespace=my-namespace

要查看来自所有命名空间的事件,可以使用 --all-namespaces 参数。

kubectl describe pod 之外,另一种获得关于 pod 额外信息的方法(超出了 kubectl get pod 提供的内容)是将 -o yaml 输出格式标志传递给 kubectl get pod。 这会给你 YAML 格式的信息,甚至比 kubectl describe pod 更多的信息 – 基本上是系统拥有的 Pod 的所有信息。 在这里,您将看到类似注解(这是没有标签限制的键值元数据,给 Kubernetes 系统组件内部使用)、重新启动策略、端口和卷。

$ kubectl get pod nginx-deployment-1006230814-6winp -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"nginx-deployment-1006230814","uid":"4c84c175-f161-11e5-9a78-42010af00005","apiVersion":"extensions","resourceVersion":"133434"}}
  creationTimestamp: 2016-03-24T01:39:50Z
  generateName: nginx-deployment-1006230814-
  labels:
    app: nginx
    pod-template-hash: "1006230814"
  name: nginx-deployment-1006230814-6winp
  namespace: default
  resourceVersion: "133447"
  selfLink: /api/v1/namespaces/default/pods/nginx-deployment-1006230814-6winp
  uid: 4c879808-f161-11e5-9a78-42010af00005
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
      protocol: TCP
    resources:
      limits:
        cpu: 500m
        memory: 128Mi
      requests:
        cpu: 500m
        memory: 128Mi
    terminationMessagePath: /dev/termination-log
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-4bcbi
      readOnly: true
  dnsPolicy: ClusterFirst
  nodeName: kubernetes-node-wul5
  restartPolicy: Always
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  volumes:
  - name: default-token-4bcbi
    secret:
      secretName: default-token-4bcbi
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2016-03-24T01:39:51Z
    status: "True"
    type: Ready
  containerStatuses:
  - containerID: docker://90315cc9f513c724e9957a4788d3e625a078de84750f244a40f97ae355eb1149
    image: nginx
    imageID: docker://6f62f48c4e55d700cf3eb1b5e33fa051802986b77b874cc351cce539e5163707
    lastState: {}
    name: nginx
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2016-03-24T01:39:51Z
  hostIP: 10.240.0.9
  phase: Running
  podIP: 10.244.0.6
  startTime: 2016-03-24T01:39:49Z

示例:调试一个关闭(或者无法到达)的节点

有时,在调试时,查看节点的状态可能很有用 – 例如,您已经注意到节点上运行的 Pod 的奇怪行为,或想查明 Pod 不调度到节点上的原因。与 Pod 一样,可以使用 kubectl describe nodekubectl get node -o yaml 来检索有关节点的详细信息。例如,如果某个节点关闭(从网络断开连接,或 kubelet 死亡并不会重新启动等),您将看到以下内容。 注意显示节点为 NotReady 的事件,并且还注意到 Pod 不再运行(它们在 NotReady 状态五分钟后被驱逐)。

$ kubectl get nodes
NAME                     STATUS        AGE     VERSION
kubernetes-node-861h     NotReady      1h      v1.6.0+fff5156
kubernetes-node-bols     Ready         1h      v1.6.0+fff5156
kubernetes-node-st6x     Ready         1h      v1.6.0+fff5156
kubernetes-node-unaj     Ready         1h      v1.6.0+fff5156

$ kubectl describe node kubernetes-node-861h
Name:			kubernetes-node-861h
Role
Labels:		 beta.kubernetes.io/arch=amd64
           beta.kubernetes.io/os=linux
           kubernetes.io/hostname=kubernetes-node-861h
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:             <none>
CreationTimestamp:	Mon, 04 Sep 2017 17:13:23 +0800
Phase:
Conditions:
  Type		Status		LastHeartbeatTime			LastTransitionTime			Reason					Message
  ----    ------    -----------------     ------------------      ------          -------
  OutOfDisk             Unknown         Fri, 08 Sep 2017 16:04:28 +0800         Fri, 08 Sep 2017 16:20:58 +0800         NodeStatusUnknown       Kubelet stopped posting node status.
  MemoryPressure        Unknown         Fri, 08 Sep 2017 16:04:28 +0800         Fri, 08 Sep 2017 16:20:58 +0800         NodeStatusUnknown       Kubelet stopped posting node status.
  DiskPressure          Unknown         Fri, 08 Sep 2017 16:04:28 +0800         Fri, 08 Sep 2017 16:20:58 +0800         NodeStatusUnknown       Kubelet stopped posting node status.
  Ready                 Unknown         Fri, 08 Sep 2017 16:04:28 +0800         Fri, 08 Sep 2017 16:20:58 +0800         NodeStatusUnknown       Kubelet stopped posting node status.
Addresses:	10.240.115.55,104.197.0.26
Capacity:
 cpu:           2
 hugePages:     0
 memory:        4046788Ki
 pods:          110
Allocatable:
 cpu:           1500m
 hugePages:     0
 memory:        1479263Ki
 pods:          110
System Info:
 Machine ID:                    8e025a21a4254e11b028584d9d8b12c4
 System UUID:                   349075D1-D169-4F25-9F2A-E886850C47E3
 Boot ID:                       5cd18b37-c5bd-4658-94e0-e436d3f110e0
 Kernel Version:                4.4.0-31-generic
 OS Image:                      Debian GNU/Linux 8 (jessie)
 Operating System:              linux
 Architecture:                  amd64
 Container Runtime Version:     docker://1.12.5
 Kubelet Version:               v1.6.9+a3d1dfa6f4335
 Kube-Proxy Version:            v1.6.9+a3d1dfa6f4335
ExternalID:                     15233045891481496305
Non-terminated Pods:            (9 in total)
  Namespace                     Name                                            CPU Requests    CPU Limits      Memory Requests Memory Limits
  ---------                     ----                                            ------------    ----------      --------------- -------------
......
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits      Memory Requests         Memory Limits
  ------------  ----------      ---------------         -------------
  900m (60%)    2200m (146%)    1009286400 (66%)        5681286400 (375%)
Events:         <none>

$ kubectl get node kubernetes-node-861h -o yaml
apiVersion: v1
kind: Node
metadata:
  creationTimestamp: 2015-07-10T21:32:29Z
  labels:
    kubernetes.io/hostname: kubernetes-node-861h
  name: kubernetes-node-861h
  resourceVersion: "757"
  selfLink: /api/v1/nodes/kubernetes-node-861h
  uid: 2a69374e-274b-11e5-a234-42010af0d969
spec:
  externalID: "15233045891481496305"
  podCIDR: 10.244.0.0/24
  providerID: gce://striped-torus-760/us-central1-b/kubernetes-node-861h
status:
  addresses:
  - address: 10.240.115.55
    type: InternalIP
  - address: 104.197.0.26
    type: ExternalIP
  capacity:
    cpu: "1"
    memory: 3800808Ki
    pods: "100"
  conditions:
  - lastHeartbeatTime: 2015-07-10T21:34:32Z
    lastTransitionTime: 2015-07-10T21:35:15Z
    reason: Kubelet stopped posting node status.
    status: Unknown
    type: Ready
  nodeInfo:
    bootID: 4e316776-b40d-4f78-a4ea-ab0d73390897
    containerRuntimeVersion: docker://Unknown
    kernelVersion: 3.16.0-0.bpo.4-amd64
    kubeProxyVersion: v0.21.1-185-gffc5a86098dc01
    kubeletVersion: v0.21.1-185-gffc5a86098dc01
    machineID: ""
    osImage: Debian GNU/Linux 7 (wheezy)
    systemUUID: ABE5F6B4-D44B-108B-C46A-24CCE16C8B6E

接下来还有什么?

了解更多调试工具,包括:

Analytics

Create an Issue Edit this Page