Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Dynamic Allocation of ClusterCIDR for New Nodes Without Component Restart #17

Open
EanWo opened this issue Jun 20, 2024 · 14 comments · May be fixed by #37
Open

Issue with Dynamic Allocation of ClusterCIDR for New Nodes Without Component Restart #17

EanWo opened this issue Jun 20, 2024 · 14 comments · May be fixed by #37
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@EanWo
Copy link

EanWo commented Jun 20, 2024

I encountered an issue as follows:

I started the component.
I created a new ClusterCIDR object and added a NodeSelector.
I added a new node with a label that matches the NodeSelector of the new ClusterCIDR.
I observed that the new node was not assigned the new ClusterCIDR object but instead used the default ClusterCIDR.
However, if I follow the same steps to create a new ClusterCIDR object and then restart the component, the subsequent new nodes correctly use the new ClusterCIDR.

The official documentation states that it is not necessary to restart the component. Additionally, I checked the logs and found that the component is aware of the new ClusterCIDR object even without restarting.

For node addition, I used kubectl delete node to remove it, then went into the node and restarted kubelet.

My issue can be summarized as:

New nodes do not use the newly created ClusterCIDR unless the component is restarted.
The component does recognize the new ClusterCIDR without restarting (as seen in the logs).
Could you please help me understand why the new ClusterCIDR is not applied to new nodes without restarting the component and how to resolve this issue?

Thank you for your assistance!

@aojea
Copy link
Contributor

aojea commented Jun 20, 2024

@EanWo can you please share the logs of the component?

can you check that you have disabled the default ipam controller in the kube-controlller-manager and/or the cloud-controller-manager?

@EanWo
Copy link
Author

EanWo commented Jun 20, 2024

@EanWo can you please share the logs of the component?

can you check that you have disabled the default ipam controller in the kube-controlller-manager and/or the cloud-controller-manager?

Yes, I am certain. I have set the 'allocate-node-cidrs' parameter to 'false' for all master 'kube-controller-manager' configurations

@EanWo
Copy link
Author

EanWo commented Jun 20, 2024

@aojea I created a new ClusterCIDR object called 'clustercidr-name-ean' at step ①, which has a NodeSelector with the specified label 'name=ean', and the logs have recorded this. Then at step ②, I added a node with the label 'name=ean', but the result showed that the assigned PodCIDR belonged to the default ClusterCIDR. At this point, at step ③, I first removed the previously created node using the 'kubectl delete node' command, then re-added it to the cluster by restarting the node with the 'systemctl restart kubelet' command, and then restarted the controller. After that, at step ④, I saw that the node was assigned to the correct 'clustercidr-name-ean' ClusterCIDR.
1
2
3
4

@mneverov
Copy link
Member

@EanWo the original issue was for the old repo. Could you please confirm that you are using the latestv0.2.0 node-ipam-controller?

@EanWo
Copy link
Author

EanWo commented Jun 20, 2024

@EanWo the original issue was for the old repo. Could you please confirm that you are using the latestv0.2.0 node-ipam-controller?

@mneverov I am using the latest code from the main branch. Additionally, I have also tried the code from tag v0.2.0, and the results were the same.

@EanWo
Copy link
Author

EanWo commented Jun 20, 2024

@mneverov @aojea I tried modifying the code to print some logs and discovered some issues. At step ①, I added a ClusterCIDR object named 'name-ean'. At step ②, I printed the contents of the cidrMap, which showed only the default ClusterCIDR and did not include the newly added 'name-ean' ClusterCIDR. Then at step ③, I restarted the controller, and at this point, the cidrMap showed the newly added 'name-ean' ClusterCIDR.
5
6

@mneverov
Copy link
Member

@EanWo thank you for the info!
Do you use any CNI by chance? Can you add the new node yaml before you restart the manager?
What k8s distribution you use (k3s, kind, minikube, k3d ..)?

@EanWo
Copy link
Author

EanWo commented Jun 20, 2024

@mneverov I am using a cluster provided by a cloud vendor. I tested three versions of the Kubernetes cluster, with cluster versions and corresponding CNI plugins being v1.21 with Flannel, v1.23 with Calico, and v1.25 with Flannel. The test results were the same for all versions. I can add and remove nodes to and from the cluster at will.

@EanWo
Copy link
Author

EanWo commented Jun 21, 2024

I have resolved the issue. The cause of the problem was that the YAML file for the ClusterCIDR I created specified a finalizer for the controller.

@EanWo EanWo closed this as completed Jun 21, 2024
@aojea
Copy link
Contributor

aojea commented Jun 22, 2024

/reopen

@sarveshr7 made a good analysis of this problem and I still think we should handle better this scenario, quoting from chat conversation for reference

We should modify the code here: https://github.com/kubernetes-sigs/node-ipam-controller/blob/d843244e4ae2ea7e4a877[…]f6751f461daf0/pkg/controller/ipam/multi_cidr_range_allocator.go , make the createClusterCIDR function idempotent and create the in-memory map irrespective of whether the finalizer is present or not
GitHubGitHub
node-ipam-controller/pkg/controller/ipam/multi_cidr_range_allocator.go at d843244e4ae2ea7e4a877f828d2f6751f461daf0 · kubernetes-sigs/node-ipam-controller

/assign @sarveshr7

@k8s-ci-robot
Copy link
Contributor

@aojea: Reopened this issue.

In response to this:

/reopen

@sarveshr7 made a good analysis of this problem and I still think we should handle better this scenario, quoting from chat conversation for reference

We should modify the code here: https://github.com/kubernetes-sigs/node-ipam-controller/blob/d843244e4ae2ea7e4a877[…]f6751f461daf0/pkg/controller/ipam/multi_cidr_range_allocator.go , make the createClusterCIDR function idempotent and create the in-memory map irrespective of whether the finalizer is present or not
GitHubGitHub
node-ipam-controller/pkg/controller/ipam/multi_cidr_range_allocator.go at d843244e4ae2ea7e4a877f828d2f6751f461daf0 · kubernetes-sigs/node-ipam-controller

/assign @sarveshr7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot reopened this Jun 22, 2024
@aojea aojea added the kind/bug Categorizes issue or PR as related to a bug. label Jun 22, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2024
@mneverov
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2024
@mneverov
Copy link
Member

/assign

@mneverov mneverov linked a pull request Sep 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants