Issue with Dynamic Allocation of ClusterCIDR for New Nodes Without Component Restart #17

EanWo · 2024-06-20T03:49:07Z

I encountered an issue as follows:

I started the component.
I created a new ClusterCIDR object and added a NodeSelector.
I added a new node with a label that matches the NodeSelector of the new ClusterCIDR.
I observed that the new node was not assigned the new ClusterCIDR object but instead used the default ClusterCIDR.
However, if I follow the same steps to create a new ClusterCIDR object and then restart the component, the subsequent new nodes correctly use the new ClusterCIDR.

The official documentation states that it is not necessary to restart the component. Additionally, I checked the logs and found that the component is aware of the new ClusterCIDR object even without restarting.

For node addition, I used kubectl delete node to remove it, then went into the node and restarted kubelet.

My issue can be summarized as:

New nodes do not use the newly created ClusterCIDR unless the component is restarted.
The component does recognize the new ClusterCIDR without restarting (as seen in the logs).
Could you please help me understand why the new ClusterCIDR is not applied to new nodes without restarting the component and how to resolve this issue?

Thank you for your assistance!

aojea · 2024-06-20T06:51:13Z

@EanWo can you please share the logs of the component?

can you check that you have disabled the default ipam controller in the kube-controlller-manager and/or the cloud-controller-manager?

EanWo · 2024-06-20T06:55:55Z

@EanWo can you please share the logs of the component?

can you check that you have disabled the default ipam controller in the kube-controlller-manager and/or the cloud-controller-manager?

Yes, I am certain. I have set the 'allocate-node-cidrs' parameter to 'false' for all master 'kube-controller-manager' configurations

EanWo · 2024-06-20T07:09:29Z

@aojea I created a new ClusterCIDR object called 'clustercidr-name-ean' at step ①, which has a NodeSelector with the specified label 'name=ean', and the logs have recorded this. Then at step ②, I added a node with the label 'name=ean', but the result showed that the assigned PodCIDR belonged to the default ClusterCIDR. At this point, at step ③, I first removed the previously created node using the 'kubectl delete node' command, then re-added it to the cluster by restarting the node with the 'systemctl restart kubelet' command, and then restarted the controller. After that, at step ④, I saw that the node was assigned to the correct 'clustercidr-name-ean' ClusterCIDR.

mneverov · 2024-06-20T10:47:03Z

@EanWo the original issue was for the old repo. Could you please confirm that you are using the latestv0.2.0 node-ipam-controller?

EanWo · 2024-06-20T11:20:16Z

@EanWo the original issue was for the old repo. Could you please confirm that you are using the latestv0.2.0 node-ipam-controller?

@mneverov I am using the latest code from the main branch. Additionally, I have also tried the code from tag v0.2.0, and the results were the same.

EanWo · 2024-06-20T11:55:05Z

@mneverov @aojea I tried modifying the code to print some logs and discovered some issues. At step ①, I added a ClusterCIDR object named 'name-ean'. At step ②, I printed the contents of the cidrMap, which showed only the default ClusterCIDR and did not include the newly added 'name-ean' ClusterCIDR. Then at step ③, I restarted the controller, and at this point, the cidrMap showed the newly added 'name-ean' ClusterCIDR.

mneverov · 2024-06-20T14:00:31Z

@EanWo thank you for the info!
Do you use any CNI by chance? Can you add the new node yaml before you restart the manager?
What k8s distribution you use (k3s, kind, minikube, k3d ..)?

EanWo · 2024-06-20T14:21:44Z

@mneverov I am using a cluster provided by a cloud vendor. I tested three versions of the Kubernetes cluster, with cluster versions and corresponding CNI plugins being v1.21 with Flannel, v1.23 with Calico, and v1.25 with Flannel. The test results were the same for all versions. I can add and remove nodes to and from the cluster at will.

EanWo · 2024-06-21T03:43:14Z

I have resolved the issue. The cause of the problem was that the YAML file for the ClusterCIDR I created specified a finalizer for the controller.

aojea · 2024-06-22T12:08:00Z

/reopen

@sarveshr7 made a good analysis of this problem and I still think we should handle better this scenario, quoting from chat conversation for reference

We should modify the code here: https://github.com/kubernetes-sigs/node-ipam-controller/blob/d843244e4ae2ea7e4a877[…]f6751f461daf0/pkg/controller/ipam/multi_cidr_range_allocator.go , make the createClusterCIDR function idempotent and create the in-memory map irrespective of whether the finalizer is present or not
GitHubGitHub
node-ipam-controller/pkg/controller/ipam/multi_cidr_range_allocator.go at d843244e4ae2ea7e4a877f828d2f6751f461daf0 · kubernetes-sigs/node-ipam-controller

/assign @sarveshr7

k8s-ci-robot · 2024-06-22T12:08:04Z

@aojea: Reopened this issue.

In response to this:

/reopen

@sarveshr7 made a good analysis of this problem and I still think we should handle better this scenario, quoting from chat conversation for reference

We should modify the code here: https://github.com/kubernetes-sigs/node-ipam-controller/blob/d843244e4ae2ea7e4a877[…]f6751f461daf0/pkg/controller/ipam/multi_cidr_range_allocator.go , make the createClusterCIDR function idempotent and create the in-memory map irrespective of whether the finalizer is present or not
GitHubGitHub
node-ipam-controller/pkg/controller/ipam/multi_cidr_range_allocator.go at d843244e4ae2ea7e4a877f828d2f6751f461daf0 · kubernetes-sigs/node-ipam-controller

/assign @sarveshr7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-triage-robot · 2024-09-20T12:49:20Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

mneverov · 2024-09-20T12:58:27Z

/remove-lifecycle stale

mneverov · 2024-09-25T04:17:06Z

/assign

EanWo closed this as completed Jun 21, 2024

k8s-ci-robot assigned sarveshr7 Jun 22, 2024

k8s-ci-robot reopened this Jun 22, 2024

aojea added the kind/bug Categorizes issue or PR as related to a bug. label Jun 22, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2024

k8s-ci-robot assigned mneverov Sep 25, 2024

mneverov linked a pull request Sep 26, 2024 that will close this issue

Reconcile ClusterCIDRs with Finalizers #37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Dynamic Allocation of ClusterCIDR for New Nodes Without Component Restart #17

Issue with Dynamic Allocation of ClusterCIDR for New Nodes Without Component Restart #17

EanWo commented Jun 20, 2024

aojea commented Jun 20, 2024

EanWo commented Jun 20, 2024

EanWo commented Jun 20, 2024

mneverov commented Jun 20, 2024

EanWo commented Jun 20, 2024

EanWo commented Jun 20, 2024

mneverov commented Jun 20, 2024

EanWo commented Jun 20, 2024

EanWo commented Jun 21, 2024

aojea commented Jun 22, 2024

k8s-ci-robot commented Jun 22, 2024

k8s-triage-robot commented Sep 20, 2024

mneverov commented Sep 20, 2024

mneverov commented Sep 25, 2024

Issue with Dynamic Allocation of ClusterCIDR for New Nodes Without Component Restart #17

Issue with Dynamic Allocation of ClusterCIDR for New Nodes Without Component Restart #17

Comments

EanWo commented Jun 20, 2024

aojea commented Jun 20, 2024

EanWo commented Jun 20, 2024

EanWo commented Jun 20, 2024

mneverov commented Jun 20, 2024

EanWo commented Jun 20, 2024

EanWo commented Jun 20, 2024

mneverov commented Jun 20, 2024

EanWo commented Jun 20, 2024

EanWo commented Jun 21, 2024

aojea commented Jun 22, 2024

k8s-ci-robot commented Jun 22, 2024

k8s-triage-robot commented Sep 20, 2024

mneverov commented Sep 20, 2024

mneverov commented Sep 25, 2024