Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-state-metrics is missing despite being deployed and running and shows in Prometheus #83

Open
yoyoraso opened this issue May 10, 2024 · 10 comments

Comments

@yoyoraso
Copy link

Hi, I have main coroot deployed in one cluster and working on add other clusters to this one by adding already deployed prometheus, kube-state-metrics already deployed on them and just deploying coroot-node-agent, but I can't see kube-state-metrics and service map
image
image
so I started investgating and found this fails on the coroot-node-agent pods "failed to get container metadata for pid 16843 -> /kubepods/burstable/pod6f222fb5-3d0e-425e-899c-e5495124a057/ea64d45c2a6338bb0f9aae2f05ec4a77e323915d25ed11b19cb2504cbf2113d0: failed to interact with dockerd (%!s()) or with containerd (%!s())"

kubernetes version : v1.25.16+vmware.1
OS: Ubuntu 22.04.4 LTS
kernal : 6.5.0-21-generic
container runtime : containerd://1.6.28
coroot node agent tag : 1.18.9

@apetruhin
Copy link
Member

@yoyoraso, we need to examine the node-agent's log. Could you please restart it, wait a minute, and then provide the entire log here?

@yoyoraso
Copy link
Author

@apetruhin, here it is
I0510 14:40:50.531724 606823 net.go:30] ephemeral-port-range: 32768-60999
I0510 14:40:50.540212 606823 cilium.go:30] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct4_global: no such file or directory
I0510 14:40:50.540261 606823 cilium.go:36] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct6_global: no such file or directory
I0510 14:40:50.540272 606823 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v2: no such file or directory
I0510 14:40:50.540280 606823 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v3: no such file or directory
I0510 14:40:50.540290 606823 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v2: no such file or directory
I0510 14:40:50.540300 606823 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v3: no such file or directory
I0510 14:40:50.540313 606823 main.go:102] agent version: 1.18.9
I0510 14:40:50.540380 606823 main.go:108] hostname: ******
I0510 14:40:50.540389 606823 main.go:109] kernel version: 6.5.0-21-generic
I0510 14:40:50.541001 606823 main.go:75] machine-id: ******
I0510 14:40:50.541035 606823 tracing.go:34] no OpenTelemetry traces collector endpoint configured
I0510 14:40:50.541048 606823 otel.go:26] no OpenTelemetry logs collector endpoint configured
I0510 14:40:50.541180 606823 metadata.go:67] cloud provider:
I0510 14:40:50.541193 606823 collector.go:157] instance metadata:
I0510 14:40:50.541282 606823 profiling.go:49] no profiles endpoint configured
W0510 14:40:50.541721 606823 registry.go:75] Cannot connect to the Docker daemon at unix:///proc/1/root/run/docker.sock. Is the docker daemon running?
W0510 14:40:50.541721 606823 registry.go:75] Cannot connect to the Docker daemon at unix:///proc/1/root/run/docker.sock. Is the docker daemon running?
W0510 14:40:54.544388 606823 registry.go:78] couldn't connect to containerd through the following UNIX sockets [/var/snap/microk8s/common/run/containerd.sock,/run/k0s/containerd.sock,/run/k3s/containerd/containerd.sock,/run/containerd/containerd.sock]: failed to dial "/proc/1/root/run/containerd/containerd.sock": context deadline exceeded
W0510 14:40:54.544388 606823 registry.go:78] couldn't connect to containerd through the following UNIX sockets [/var/snap/microk8s/common/run/containerd.sock,/run/k0s/containerd.sock,/run/k3s/containerd/containerd.sock,/run/containerd/containerd.sock]: failed to dial "/proc/1/root/run/containerd/containerd.sock": context deadline exceeded
W0510 14:40:54.544482 606823 registry.go:81] stat /proc/1/root/var/run/crio/crio.sock: no such file or directory
W0510 14:40:54.544482 606823 registry.go:81] stat /proc/1/root/var/run/crio/crio.sock: no such file or directory
I0510 14:40:54.878632 606823 registry.go:281] calculated container id 1 -> / ->
I0510 14:40:54.878729 606823 registry.go:286] "ignoring" cg="/" pid=1
I0510 14:40:54.878791 606823 registry.go:281] calculated container id 2 -> / ->
I0510 14:40:54.878805 606823 registry.go:286] "ignoring" cg="/" pid=2
I0510 14:40:54.878844 606823 registry.go:281] calculated container id 3 -> / ->
I0510 14:40:54.878856 606823 registry.go:286] "ignoring" cg="/" pid=3
I0510 14:40:54.878893 606823 registry.go:281] calculated container id 4 -> / ->
I0510 14:40:54.878901 606823 registry.go:286] "ignoring" cg="/" pid=4
I0510 14:40:54.878936 606823 registry.go:281] calculated container id 5 -> / ->
I0510 14:40:54.878947 606823 registry.go:286] "ignoring" cg="/" pid=5
I0510 14:40:54.878982 606823 registry.go:281] calculated container id 6 -> / ->
I0510 14:40:54.878994 606823 registry.go:286] "ignoring" cg="/" pid=6
I0510 14:40:54.879027 606823 registry.go:281] calculated container id 8 -> / ->
I0510 14:40:54.879038 606823 registry.go:286] "ignoring" cg="/" pid=8
I0510 14:40:54.879073 606823 registry.go:281] calculated container id 11 -> / ->
I0510 14:40:54.879081 606823 registry.go:286] "ignoring" cg="/" pid=11
I0510 14:40:54.879113 606823 registry.go:281] calculated container id 12 -> / ->
I0510 14:40:54.879121 606823 registry.go:286] "ignoring" cg="/" pid=12
I0510 14:40:54.879153 606823 registry.go:281] calculated container id 13 -> / ->
I0510 14:40:54.879166 606823 registry.go:286] "ignoring" cg="/" pid=13
I0510 14:40:54.879200 606823 registry.go:281] calculated container id 14 -> / ->
I0510 14:40:54.879211 606823 registry.go:286] "ignoring" cg="/" pid=14
I0510 14:40:54.879244 606823 registry.go:281] calculated container id 15 -> / ->
I0510 14:40:54.879251 606823 registry.go:286] "ignoring" cg="/" pid=15
I0510 14:40:54.879283 606823 registry.go:281] calculated container id 16 -> / ->
I0510 14:40:54.879291 606823 registry.go:286] "ignoring" cg="/" pid=16
I0510 14:40:54.879325 606823 registry.go:281] calculated container id 17 -> / ->
I0510 14:40:54.879332 606823 registry.go:286] "ignoring" cg="/" pid=17
I0510 14:40:54.879366 606823 registry.go:281] calculated container id 18 -> / ->
I0510 14:40:54.879377 606823 registry.go:286] "ignoring" cg="/" pid=18
I0510 14:40:54.879410 606823 registry.go:281] calculated container id 19 -> / ->
I0510 14:40:54.879419 606823 registry.go:286] "ignoring" cg="/" pid=19
I0510 14:40:54.879452 606823 registry.go:281] calculated container id 20 -> / ->
I0510 14:40:54.879466 606823 registry.go:286] "ignoring" cg="/" pid=20
I0510 14:40:54.879500 606823 registry.go:281] calculated container id 21 -> / ->
I0510 14:40:54.879511 606823 registry.go:286] "ignoring" cg="/" pid=21
I0510 14:40:54.879544 606823 registry.go:281] calculated container id 22 -> / ->
I0510 14:40:54.879556 606823 registry.go:286] "ignoring" cg="/" pid=22
I0510 14:40:54.879588 606823 registry.go:281] calculated container id 23 -> / ->
I0510 14:40:54.879600 606823 registry.go:286] "ignoring" cg="/" pid=23
I0510 14:40:54.879633 606823 registry.go:281] calculated container id 25 -> / ->
I0510 14:40:54.879640 606823 registry.go:286] "ignoring" cg="/" pid=25
I0510 14:40:54.879674 606823 registry.go:281] calculated container id 26 -> / ->
I0510 14:40:54.879685 606823 registry.go:286] "ignoring" cg="/" pid=26
I0510 14:40:54.879718 606823 registry.go:281] calculated container id 27 -> / ->
I0510 14:40:54.879729 606823 registry.go:286] "ignoring" cg="/" pid=27
I0510 14:40:54.879768 606823 registry.go:281] calculated container id 28 -> / ->
I0510 14:40:54.879781 606823 registry.go:286] "ignoring" cg="/" pid=28
I0510 14:40:54.879816 606823 registry.go:281] calculated container id 29 -> / ->
I0510 14:40:54.879823 606823 registry.go:286] "ignoring" cg="/" pid=29
I0510 14:40:54.879858 606823 registry.go:281] calculated container id 31 -> / ->
I0510 14:40:54.879865 606823 registry.go:286] "ignoring" cg="/" pid=31
I0510 14:40:54.879897 606823 registry.go:281] calculated container id 32 -> / ->
I0510 14:40:54.879904 606823 registry.go:286] "ignoring" cg="/" pid=32
I0510 14:40:54.879936 606823 registry.go:281] calculated container id 33 -> / ->
I0510 14:40:54.879949 606823 registry.go:286] "ignoring" cg="/" pid=33
I0510 14:40:54.879985 606823 registry.go:281] calculated container id 34 -> / ->
I0510 14:40:54.879992 606823 registry.go:286] "ignoring" cg="/" pid=34
I0510 14:40:54.880055 606823 registry.go:281] calculated container id 35 -> / ->
I0510 14:40:54.880063 606823 registry.go:286] "ignoring" cg="/" pid=35
I0510 14:40:54.880098 606823 registry.go:281] calculated container id 37 -> / ->
I0510 14:40:54.880107 606823 registry.go:286] "ignoring" cg="/" pid=37
I0510 14:40:54.880140 606823 registry.go:281] calculated container id 38 -> / ->
I0510 14:40:54.880154 606823 registry.go:286] "ignoring" cg="/" pid=38
I0510 14:40:54.880189 606823 registry.go:281] calculated container id 39 -> / ->
I0510 14:40:54.880202 606823 registry.go:286] "ignoring" cg="/" pid=39
W0510 14:40:54.880228 606823 init.go:35] open /proc/1/net/tcp6: no such file or directory
W0510 14:40:54.880228 606823 init.go:35] open /proc/1/net/tcp6: no such file or directory
I0510 14:40:54.880236 606823 registry.go:281] calculated container id 40 -> / ->
I0510 14:40:54.880290 606823 registry.go:286] "ignoring" cg="/" pid=40
I0510 14:40:54.880340 606823 registry.go:281] calculated container id 41 -> / ->
I0510 14:40:54.880353 606823 registry.go:286] "ignoring" cg="/" pid=41
I0510 14:40:54.880391 606823 registry.go:281] calculated container id 43 -> / ->
I0510 14:40:54.880399 606823 registry.go:286] "ignoring" cg="/" pid=43
I0510 14:40:54.880433 606823 registry.go:281] calculated container id 44 -> / ->
I0510 14:40:54.880439 606823 registry.go:286] "ignoring" cg="/" pid=44
I0510 14:40:54.880472 606823 registry.go:281] calculated container id 45 -> / ->
I0510 14:40:54.880480 606823 registry.go:286] "ignoring" cg="/" pid=45
I0510 14:40:54.880511 606823 registry.go:281] calculated container id 46 -> / ->
I0510 14:40:54.880518 606823 registry.go:286] "ignoring" cg="/" pid=46
I0510 14:40:54.880549 606823 registry.go:281] calculated container id 47 -> / ->
I0510 14:40:54.880556 606823 registry.go:286] "ignoring" cg="/" pid=47
I0510 14:40:54.880591 606823 registry.go:281] calculated container id 50 -> / ->
I0510 14:40:54.880598 606823 registry.go:286] "ignoring" cg="/" pid=50
I0510 14:40:54.880631 606823 registry.go:281] calculated container id 51 -> / ->
I0510 14:40:54.880638 606823 registry.go:286] "ignoring" cg="/" pid=51
I0510 14:40:54.880671 606823 registry.go:281] calculated container id 52 -> / ->
I0510 14:40:54.880678 606823 registry.go:286] "ignoring" cg="/" pid=52
I0510 14:40:54.880711 606823 registry.go:281] calculated container id 53 -> / ->
I0510 14:40:54.880718 606823 registry.go:286] "ignoring" cg="/" pid=53
I0510 14:40:54.880750 606823 registry.go:281] calculated container id 55 -> / ->
I0510 14:40:54.880757 606823 registry.go:286] "ignoring" cg="/" pid=55
I0510 14:40:54.880790 606823 registry.go:281] calculated container id 56 -> / ->
I0510 14:40:54.880797 606823 registry.go:286] "ignoring" cg="/" pid=56
I0510 14:40:54.880835 606823 registry.go:281] calculated container id 57 -> / ->
I0510 14:40:54.880843 606823 registry.go:286] "ignoring" cg="/" pid=57
I0510 14:40:54.880877 606823 registry.go:281] calculated container id 58 -> / ->
I0510 14:40:54.880884 606823 registry.go:286] "ignoring" cg="/" pid=58
I0510 14:40:54.880918 606823 registry.go:281] calculated container id 59 -> / ->
I0510 14:40:54.969239 606823 registry.go:213] TCP connection from unknown container {connection-open none 9196 11.0.101.3:33262 11.33.38.9:9093 34 622082154767560 }
W0510 14:40:55.888703 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s())
W0510 14:40:55.888703 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s())
I0510 14:40:55.888742 606823 registry.go:213] TCP connection from unknown container {connection-open none 14343 11.32.115.7:51738 10.100.192.1:443 111 622083074116078 }
W0510 14:40:55.929900 606823 registry.go:277] failed to get container metadata for pid 20694 -> /kubepods/burstable/pod671ca5e2-5ce3-46a0-b10f-f5e4f8098e33/4bda181fc1ca52ebe65399cb8e11649c0e133cd6a85f959f0f6a3d370478f2cb: failed to interact with dockerd (%!s()) or with containerd (%!s())
W0510 14:40:55.929900 606823 registry.go:277] failed to get container metadata for pid 20694 -> /kubepods/burstable/pod671ca5e2-5ce3-46a0-b10f-f5e4f8098e33/4bda181fc1ca52ebe65399cb8e11649c0e133cd6a85f959f0f6a3d370478f2cb: failed to interact with dockerd (%!s()) or with containerd (%!s())
I0510 14:40:55.929929 606823 registry.go:213] TCP connection from unknown container {connection-open none 20694 127.0.0.1:44250 127.0.0.1:8080 14 622083115307215 }
W0510 14:40:55.946816 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s())
W0510 14:40:55.946816 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s())
I0510 14:40:55.946857 606823 registry.go:213] TCP connection from unknown container {connection-open none 14343 11.32.115.7:51752 10.100.192.1:443 107 622083132268869 }
W0510 14:40:55.946943 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s())

@apetruhin
Copy link
Member

Could you please ssh to the node and check for containerd.sock:

# ls -l /run/containerd/containerd.sock
srw-rw---- 1 root root 0 Jan  4 09:04 /run/containerd/containerd.sock

@yoyoraso
Copy link
Author

@apetruhin I can't access the cluster nodes sadly :(

@apetruhin
Copy link
Member

The agent failed to locate containerd.sock.

Please exec into the node-agent pod and try to find the containerd.sock file:

kubectl -n coroot exec -ti coroot-node-agent-dwwrf -- bash

root@coroot-node-agent-dwwrf:/# ls -l /proc/1/root/run/containerd/containerd.sock

The root filesystem should be accessible from a node-agent pod under /proc/1/root/.

@yoyoraso
Copy link
Author

@apetruhin
root@node-agent-ntkdw:/# ls -l /proc/1/root/run/containerd/containerd.sock
lrwxrwxrwx 1 root root 44 May 3 10:14 /proc/1/root/run/containerd/containerd.sock -> /var/vcap/sys/run/containerd/containerd.sock

@apetruhin
Copy link
Member

@yoyoraso, could you please verify whether /proc/1/root/var/vcap/sys/run/containerd/containerd.sock is not symlink to another location?

root@node-agent-ntkdw:/# ls -l /proc/1/root/var/vcap/sys/run/containerd/containerd.sock

@yoyoraso
Copy link
Author

Hi @apetruhin root@node-agent-ntkdw:/# ls -l /proc/1/root/var/vcap/sys/run/containerd/containerd.sock
ls: cannot access '/proc/1/root/var/vcap/sys/run/containerd/containerd.sock': No such file or directory

@apetruhin
Copy link
Member

@yoyoraso, please provide details about your setup and instructions on how to run this type of Kubernetes environment to reproduce the issue.

@yoyoraso
Copy link
Author

@apetruhin it is a basic k8s cluster made using vm tanzu
kubernetes version : v1.25.16+vmware.1
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.16+vmware.1", GitCommit:"84fd181a4243c4354b9208f4292f1b6cd82726b1", GitTreeState:"clean", BuildDate:"2023-11-21T10:59:59Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}
OS: Ubuntu 22.04.4 LTS
kernal : 6.5.0-21-generic
container runtime : containerd://1.6.28
coroot node agent tag : 1.18.9

let me know if you needed more information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants