terça-feira, 17 de outubro de 2017

debugging cluster kubernetes

In one more work even... trying to check if my app is running in my cluster... kubectl response no pod running.. all nodes gone :(

so here start my debug journey


accessing one node:


checking if kube process are running

ps -ef | grep kube


core@k8s-node-2017:~$ ps -ef | grep kube
root       1739      1  0 15:16 ?        00:00:03 /hyperkube proxy --cluster-cidr=10.100.0.0/16 --master=http://10.120.6.4:8080
core      24484  23080  0 15:42 pts/0    00:00:00 grep --color=auto kube


running OK


check kubelets logs

journalctl -u kubelet -f --no-pager


Oct 17 15:44:56 k8s-node-2017 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Oct 17 15:44:56 k8s-node-2017 systemd[1]: kubelet.service: Unit entered failed state.
Oct 17 15:44:56 k8s-node-2017 systemd[1]: kubelet.service: Failed with result 'exit-code'.


kubectl are logging error

checking all services:

Oct 17 15:21:42 k8s-node-2017 systemd[1]: docker-dnsmasq.service: Unit entered failed state.
Oct 17 15:21:42 k8s-node-2017 systemd[1]: docker-dnsmasq.service: Failed with result 'exit-code'.
Oct 17 15:21:43 k8s-node-2017 systemd[1]: kube-docker.service: Service hold-off time over, scheduling restart.
Oct 17 15:21:43 k8s-node-2017 systemd[1]: Stopped kube docker - docker for our k8s cluster.
Oct 17 15:21:43 k8s-node-2017 systemd[1]: Starting kube docker - docker for our k8s cluster...
Oct 17 15:21:43 k8s-node-2017 kube-docker.sh[7240]: Device "flannel.1" does not exist.
Oct 17 15:21:43 k8s-node-2017 kube-docker.sh[7240]: flannel ip address:
Oct 17 15:21:43 k8s-node-2017 kube-docker.sh[7240]: DNS ip address: ...1
Oct 17 15:21:43 k8s-node-2017 kube-docker.sh[7240]: starting docker daemon with flannel CIDR:...1/24
Oct 17 15:21:43 k8s-node-2017 kube-docker.sh[7240]: invalid value "...1" for flag --dns: ...1 is not an ip address
Oct 17 15:21:43 k8s-node-2017 kube-docker.sh[7240]: See 'dockerd --help'.


here we see one problem with flannel service.

checking kube-flannel:

journalctl -u kube-flannel -f

config: 100: Key not found (/coreos.com) [6578]
Oct 17 15:53:41 k8s-node-2017 kube-flannel.sh[1759]: timed out
Oct 17 15:53:41 k8s-node-2017 kube-flannel.sh[1759]: E1017 17:53:41.837899    1759 main.go:344] Couldn't fetch network config: 100: Key not found (/coreos.com) [6578]


seems that etcd lost cluster entries..

debugging master:

debung master etcd:

core@k8s-master-2017:~$ etcdctl ls /registry
/registry/serviceaccounts
/registry/ranges
/registry/services
/registry/events
/registry/namespaces
/registry/apiregistration.k8s.io


master lost flannel config.. because master reboot.. how here we used etcd like master  + proxies.. not like cluster.. we need persist etcd config on disk not ephemeral..