Fix operator panic when checking for pod crashloop status #115
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Note: I am facing this issue with the
v1.0.0
version of the operator on my< 1.25
K8s cluster. I think the same issue exists with later versions of the operator. So this change is off of the main branch.Druid cluster has statefulsets with OrderedReady and we observed that the operator panics and goes into an irrecoverable state with an index-out-of-bounds exception when a pod is in the
Pending
state without creating any containers.In my case it occurred when:
Pending
state (this is fixed in later versions with Delay removal of orphanPVC to avoid the removal of PVC in use #67)Pending
statein both cases there are no containers available and hence no
ContainerStatuses
were available.Error log:
Possible Solution presented in this PR
The solution is to split the
PodStatus
and theContainerStatus
checks without using index addressing for arrays. This change:Failed
orUnknown
state, and deletes the pod if it isContainerStatus
of all containers. If any one container is not ready and has restarted more than once, then the pod is killed.This PR has:
Key changed/added files in this PR
handler.go