Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EndpointAccessUpdate getting reverted back sporadically #752

Open
3 tasks done
cpinjani opened this issue Aug 14, 2024 · 9 comments
Open
3 tasks done

EndpointAccessUpdate getting reverted back sporadically #752

cpinjani opened this issue Aug 14, 2024 · 9 comments
Assignees
Labels
kind/bug Something isn't working kind/regression
Milestone

Comments

@cpinjani
Copy link
Contributor

cpinjani commented Aug 14, 2024

Rancher version:

Rancher - v2.9-92f15949d71efb11baf63c878cc2f64bcd25b1e8-head
eks-operator - v1.9.1-rc.6

Cluster Type: Downstream EKS cluster

Describe the bug:
EndpointAccessUpdate & LoggingUpdate getting reverted back sporadically, probably due to race condition.

For example:
c756d5c6-eca7-3d88-99e8-3e3087078822 - Initial update
5a3cc622-d06d-3fa8-890b-3920ca1c8064 - Reverted back

image

Steps

  • Provision EKS cluster with Public Access - true, Private Access - false
  • Edit Private access and set to true

Logs:

time="2024-08-14T15:07:35Z" level=info msg="cluster [c-m82t7] finished updating"
time="2024-08-14T15:09:16Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:09:16Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:09:46Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:10:17Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:10:47Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:11:17Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:11:47Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:12:18Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:12:52Z" level=info msg="cluster [c-m82t7] finished updating"
time="2024-08-14T15:12:52Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:12:53Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:13:23Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:13:53Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:14:23Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:14:54Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:15:24Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:15:54Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:16:24Z" level=info msg="waiting for cluster [c-m82t7] to finish updating"
time="2024-08-14T15:16:55Z" level=info msg="cluster [c-m82t7] finished updating"

PR's:

@cpinjani cpinjani added the kind/bug Something isn't working label Aug 14, 2024
@cpinjani cpinjani added this to the v2.9-Next2 milestone Aug 14, 2024
@vatsalparekh vatsalparekh self-assigned this Sep 2, 2024
@mjura mjura self-assigned this Sep 24, 2024
@mjura mjura moved this from In Progress (8 max) to PR to be reviewed in CAPI & Hosted Kubernetes providers (EKS/AKS/GKE) Sep 24, 2024
mjura added a commit to mjura/eks-operator that referenced this issue Sep 26, 2024
mjura added a commit to mjura/eks-operator that referenced this issue Sep 26, 2024
@mjura mjura moved this from PR to be reviewed to Done in CAPI & Hosted Kubernetes providers (EKS/AKS/GKE) Sep 30, 2024
@mjura mjura closed this as completed Sep 30, 2024
@cpinjani cpinjani reopened this Sep 30, 2024
@cpinjani cpinjani self-assigned this Oct 1, 2024
@cpinjani
Copy link
Contributor Author

cpinjani commented Oct 3, 2024

Validation passed on build, EndpointAccessUpdate is applied successfully and not getting reverted.

Rancher - v2.9-4814b506835d6118024dd3141bf2465bdafbb0f3-head
eks-operator - v1.9.3-rc.1

Logs:

time="2024-10-02T19:05:57Z" level=info msg="Updating public access to true and private access to true for cluster [cpinjani-eks6 (id: c-95ts8)]"
time="2024-10-02T19:05:59Z" level=info msg="Updating public access to true and private access to true for cluster [cpinjani-eks6 (id: c-95ts8)]"
time="2024-10-02T19:06:00Z" level=info msg="Cluster [cpinjani-eks6 (id: c-95ts8)] finished updating"
time="2024-10-02T19:06:00Z" level=info msg="Waiting for cluster [cpinjani-eks6 (id: c-95ts8)] to finish updating"
time="2024-10-02T19:06:00Z" level=info msg="Waiting for cluster [cpinjani-eks6 (id: c-95ts8)] to finish updating"
time="2024-10-02T19:06:30Z" level=info msg="Waiting for cluster [cpinjani-eks6 (id: c-95ts8)] to finish updating"
time="2024-10-02T19:07:00Z" level=info msg="Waiting for cluster [cpinjani-eks6 (id: c-95ts8)] to finish updating"
time="2024-10-02T19:07:31Z" level=info msg="Waiting for cluster [cpinjani-eks6 (id: c-95ts8)] to finish updating"
time="2024-10-02T19:08:01Z" level=info msg="Waiting for cluster [cpinjani-eks6 (id: c-95ts8)] to finish updating"
time="2024-10-02T19:08:31Z" level=info msg="Waiting for cluster [cpinjani-eks6 (id: c-95ts8)] to finish updating"
time="2024-10-02T19:09:01Z" level=info msg="Waiting for cluster [cpinjani-eks6 (id: c-95ts8)] to finish updating"
time="2024-10-02T19:09:32Z" level=info msg="Waiting for cluster [cpinjani-eks6 (id: c-95ts8)] to finish updating"
time="2024-10-02T19:10:02Z" level=info msg="Waiting for cluster [cpinjani-eks6 (id: c-95ts8)] to finish updating"
time="2024-10-02T19:10:32Z" level=info msg="Waiting for cluster [cpinjani-eks6 (id: c-95ts8)] to finish updating"
time="2024-10-02T19:11:03Z" level=info msg="Cluster [cpinjani-eks6 (id: c-95ts8)] finished updating"
.
.
.
time="2024-10-02T19:18:48Z" level=info msg="Updating public access to false and private access to true for cluster [cpinjani-eks6 (id: c-95ts8)]"
time="2024-10-02T19:18:49Z" level=info msg="Updating public access to false and private access to true for cluster [cpinjani-eks6 (id: c-95ts8)]"
time="2024-10-02T19:18:50Z" level=info msg="Cluster [cpinjani-eks6 (id: c-95ts8)] finished updating"

@valaparthvi
Copy link
Contributor

valaparthvi commented Oct 14, 2024

The issue still exists.

Operator version: rancher/eks-operator:v1.9.3-rc.2
Rancher: v2.9-99b2583e3370321a922e29a11ac5ff9f845baeb6-head
 eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:01:19Z" level=info msg="Updating public access to false and private access to true for cluster [pvala-eks-again (id: c-2nzpf)]"                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:01:28Z" level=info msg="Waiting for cluster [pvala-eks-again (id: c-2nzpf)] to finish updating"                                                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:01:29Z" level=info msg="Waiting for cluster [pvala-eks-again (id: c-2nzpf)] to finish updating"                                                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:02:00Z" level=info msg="Waiting for cluster [pvala-eks-again (id: c-2nzpf)] to finish updating"                                                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:02:30Z" level=info msg="Waiting for cluster [pvala-eks-again (id: c-2nzpf)] to finish updating"                                                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:03:01Z" level=info msg="Waiting for cluster [pvala-eks-again (id: c-2nzpf)] to finish updating"                                                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:03:32Z" level=info msg="Waiting for cluster [pvala-eks-again (id: c-2nzpf)] to finish updating"                                                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:04:03Z" level=info msg="Waiting for cluster [pvala-eks-again (id: c-2nzpf)] to finish updating"                                                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:04:34Z" level=info msg="Waiting for cluster [pvala-eks-again (id: c-2nzpf)] to finish updating"                                                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:05:04Z" level=info msg="Waiting for cluster [pvala-eks-again (id: c-2nzpf)] to finish updating"                                                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:05:35Z" level=info msg="Waiting for cluster [pvala-eks-again (id: c-2nzpf)] to finish updating"                                                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:06:06Z" level=info msg="Waiting for cluster [pvala-eks-again (id: c-2nzpf)] to finish updating"                                                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:06:18Z" level=info msg="Bringing up vpc"                                                                                                                  │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:06:37Z" level=info msg="Waiting for cluster [pvala-eks-again (id: c-2nzpf)] to finish updating"                                                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:06:51Z" level=info msg="Creating service role"                                                                                                            │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:07:08Z" level=info msg="Waiting for cluster [pvala-eks-again (id: c-2nzpf)] to finish updating"                                                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:07:20Z" level=info msg="Waiting for cluster [pvala-eks-gpu (id: c-tq8nt)] to finish creating"                                                             │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:07:42Z" level=info msg="Updating public access to true and private access to false for cluster [pvala-eks-again (id: c-2nzpf)]"                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:07:46Z" level=info msg="Updating public access to true and private access to false for cluster [pvala-eks-again (id: c-2nzpf)]"                           │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:07:47Z" level=info msg="Cluster [pvala-eks-again (id: c-2nzpf)] finished updating"                                                                        │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:07:51Z" level=info msg="Waiting for cluster [pvala-eks-gpu (id: c-tq8nt)] to finish creating"                                                             │
│ eks-config-operator-8dd669847-2csl5:time="2024-10-14T10:07:51Z" level=info msg="Updating public access to true and private access to false for cluster [pvala-eks-again (id: c-2nzpf)]"      

@valaparthvi
Copy link
Contributor

valaparthvi commented Oct 14, 2024

It updates to the desired config, but not always. It is very random, sometimes it works and sometimes reverts the config. It only worked for me on the third try.

It happens with both imported and provisioned clusters.

@cpinjani
Copy link
Contributor Author

cpinjani commented Oct 14, 2024

@valaparthvi I see below results on eks-operator:v1.9.3-rc.2
The desired spec gets applied, it seems to reverts back and gets re-applied again (We can file separate issue for this)
Final spec is desired one.

Logs

time="2024-10-14T11:17:24Z" level=info msg="Updating public access to false and private access to true for cluster [cpinjani-eks (id: c-7bgv9)]"
time="2024-10-14T11:17:26Z" level=info msg="Updating public access to false and private access to true for cluster [cpinjani-eks (id: c-7bgv9)]"
time="2024-10-14T11:17:27Z" level=info msg="Cluster [cpinjani-eks (id: c-7bgv9)] finished updating"
time="2024-10-14T11:17:28Z" level=info msg="Updating public access to false and private access to true for cluster [cpinjani-eks (id: c-7bgv9)]"
time="2024-10-14T11:17:50Z" level=info msg="Waiting for cluster [cpinjani-eks (id: c-7bgv9)] to finish updating"
time="2024-10-14T11:17:50Z" level=info msg="Waiting for cluster [cpinjani-eks (id: c-7bgv9)] to finish updating"
time="2024-10-14T11:18:20Z" level=info msg="Waiting for cluster [cpinjani-eks (id: c-7bgv9)] to finish updating"
time="2024-10-14T11:18:51Z" level=info msg="Waiting for cluster [cpinjani-eks (id: c-7bgv9)] to finish updating"
time="2024-10-14T11:19:21Z" level=info msg="Waiting for cluster [cpinjani-eks (id: c-7bgv9)] to finish updating"
time="2024-10-14T11:19:51Z" level=info msg="Waiting for cluster [cpinjani-eks (id: c-7bgv9)] to finish updating"
time="2024-10-14T11:20:22Z" level=info msg="Updating public access to true and private access to true for cluster [cpinjani-eks (id: c-7bgv9)]"
time="2024-10-14T11:20:23Z" level=info msg="Updating public access to true and private access to true for cluster [cpinjani-eks (id: c-7bgv9)]"
time="2024-10-14T11:20:24Z" level=info msg="Cluster [cpinjani-eks (id: c-7bgv9)] finished updating"
time="2024-10-14T11:20:25Z" level=info msg="Updating public access to true and private access to true for cluster [cpinjani-eks (id: c-7bgv9)]"
time="2024-10-14T11:22:50Z" level=info msg="Waiting for cluster [cpinjani-eks (id: c-7bgv9)] to finish updating"
time="2024-10-14T11:22:50Z" level=info msg="Waiting for cluster [cpinjani-eks (id: c-7bgv9)] to finish updating"
time="2024-10-14T11:23:21Z" level=info msg="Updating public access to false and private access to true for cluster [cpinjani-eks (id: c-7bgv9)]"
time="2024-10-14T11:23:23Z" level=info msg="Updating public access to false and private access to true for cluster [cpinjani-eks (id: c-7bgv9)]"
time="2024-10-14T11:23:23Z" level=info msg="Cluster [cpinjani-eks (id: c-7bgv9)] finished updating"
time="2024-10-14T11:23:23Z" level=info msg="Waiting for cluster [cpinjani-eks (id: c-7bgv9)] to finish updating"
time="2024-10-14T11:23:24Z" level=info msg="Waiting for cluster [cpinjani-eks (id: c-7bgv9)] to finish updating"
time="2024-10-14T11:23:54Z" level=info msg="Waiting for cluster [cpinjani-eks (id: c-7bgv9)] to finish updating"
time="2024-10-14T11:24:24Z" level=info msg="Waiting for cluster [cpinjani-eks (id: c-7bgv9)] to finish updating"
time="2024-10-14T11:24:54Z" level=info msg="Waiting for cluster [cpinjani-eks (id: c-7bgv9)] to finish updating"
time="2024-10-14T11:25:25Z" level=info msg="Cluster [cpinjani-eks (id: c-7bgv9)] finished updating"

@valaparthvi
Copy link
Contributor

@valaparthvi I see below results on eks-operator:v1.9.3-rc.2
The desired spec gets applied, it seems to reverts back and gets re-applied again (We can file separate issue for this)
Final spec is desired one.

Interesting. I will retest this again and file another one. Thanks for testing.

@valaparthvi
Copy link
Contributor

valaparthvi commented Oct 15, 2024

I tested by updating Public Source config, it did not revert to the desired config. I waited almost 20 min but the changes did not revert. Though, it did work on the second try.


time="2024-10-15T08:01:57Z" level=info msg="Updating public access source config to [0.0.0.0/0 49.37.240.101/32]  for cluster [pvala-eks-tbimported (id: c-cvrqh)]"
time="2024-10-15T08:01:58Z" level=info msg="Updating public access source config to [0.0.0.0/0 49.37.240.101/32]  for cluster [pvala-eks-tbimported (id: c-cvrqh)]"
time="2024-10-15T08:01:58Z" level=info msg="Cluster [pvala-eks-tbimported (id: c-cvrqh)] finished updating"
time="2024-10-15T08:01:59Z" level=info msg="Updating public access source config to [0.0.0.0/0 49.37.240.101/32]  for cluster [pvala-eks-tbimported (id: c-cvrqh)]"
time="2024-10-15T08:02:14Z" level=info msg="Waiting for cluster [pvala-eks-priv-only (id: c-v9hnw)] to finish updating"
time="2024-10-15T08:02:45Z" level=info msg="Cluster [pvala-eks-priv-only (id: c-v9hnw)] finished updating"
time="2024-10-15T08:02:47Z" level=info msg="Waiting for cluster [pvala-eks-tbimported (id: c-cvrqh)] to finish updating"
time="2024-10-15T08:02:47Z" level=info msg="Waiting for cluster [pvala-eks-tbimported (id: c-cvrqh)] to finish updating"
time="2024-10-15T08:03:18Z" level=info msg="Updating public access source config to [0.0.0.0/0]  for cluster [pvala-eks-tbimported (id: c-cvrqh)]"
time="2024-10-15T08:03:19Z" level=info msg="Updating public access source config to [0.0.0.0/0]  for cluster [pvala-eks-tbimported (id: c-cvrqh)]"
time="2024-10-15T08:03:20Z" level=info msg="Cluster [pvala-eks-tbimported (id: c-cvrqh)] finished updating"
time="2024-10-15T08:03:21Z" level=info msg="Waiting for cluster [pvala-eks-tbimported (id: c-cvrqh)] to finish updating"
time="2024-10-15T08:03:21Z" level=info msg="Waiting for cluster [pvala-eks-tbimported (id: c-cvrqh)] to finish updating"
time="2024-10-15T08:03:51Z" level=info msg="Cluster [pvala-eks-tbimported (id: c-cvrqh)] finished updating"

@mjura
Copy link
Contributor

mjura commented Oct 17, 2024

@cpinjani @valaparthvi it looks like race condition, which is not generating any error.s Changes to EKS endpoint are done only at the beginning of the cluster and then this option is not changing eventually once during whole cluster exploration. Usually none is changing multiple options at once, but even then valid solution is to reapply changes which will fix this problem. I think we are loosing too many resources on this issue, which is not important.

@valaparthvi
Copy link
Contributor

I don't disagree with you, Michal. But since this is still an issue, would it be okay to keep this issue around to be fixed at a later stage?

@mjura
Copy link
Contributor

mjura commented Oct 17, 2024

yup, it will be reworked once again later on, back to backlog

@mjura mjura modified the milestones: v2.9.3, v2.10.0 Oct 17, 2024
@kkaempf kkaempf modified the milestones: v2.10.0, v2.10.1 Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working kind/regression
Development

No branches or pull requests

5 participants