Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ephemeral containers during sysdump if Cilium is stuck in crashloop #1867

Open
jrajahalme opened this issue Jul 15, 2021 · 3 comments
Open
Labels
area/CI Continuous Integration testing issue or flake help wanted Extra attention is needed kind/feature New feature or request sig/agent

Comments

@jrajahalme
Copy link
Member

Currently bugtool info for Cilium agent is missing from sysdump for Cilium agents in crashloop. A lot of helpful information (e.g., open sockets, iptables, etc) could be collected also from nodes where Cilium agent fails to start. Would it be possible to run a job in the node with a bugtool/bpftool image to collect the current node state in cases when cilium pod fails to start?

@jrajahalme jrajahalme added the kind/feature New feature or request label Jul 15, 2021
@aanm
Copy link
Member

aanm commented Jul 19, 2021

Currently bugtool info for Cilium agent is missing from sysdump for Cilium agents in crashloop. A lot of helpful information (e.g., open sockets, iptables, etc) could be collected also from nodes where Cilium agent fails to start. Would it be possible to run a job in the node with a bugtool/bpftool image to collect the current node state in cases when cilium pod fails to start?

Yes it is possible if a) the cluster supports ephemeral containers: https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/ or b) run a Deployment in the node(s) that are selected by a specific label, or even all nodes, that runs the bugtool in those nodes.

@ti-mo ti-mo added area/CI Continuous Integration testing issue or flake sig/agent labels Jun 22, 2023
@christarazi christarazi changed the title CI: bugtool should run in the node with Cilium in crashloop Use ephemeral containers during sysdump if Cilium is stuck in crashloop. Jul 26, 2023
@christarazi christarazi changed the title Use ephemeral containers during sysdump if Cilium is stuck in crashloop. Use ephemeral containers during sysdump if Cilium is stuck in crashloop Jul 26, 2023
@christarazi
Copy link
Member

^ Good idea. I've updated the issue to reflect this feature request and transferring it to the CLI repo as that's where sysdump lives.

@christarazi christarazi transferred this issue from cilium/cilium Jul 26, 2023
@christarazi christarazi added the help wanted Extra attention is needed label Jul 26, 2023
Copy link

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake help wanted Extra attention is needed kind/feature New feature or request sig/agent
Projects
None yet
Development

No branches or pull requests

4 participants