Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DVM/DVMP reconciling too much while waiting on transfers to complete and Pods to come up. #1004

Open
djwhatle opened this issue Mar 16, 2021 · 4 comments · Fixed by #1013
Open
Labels
kind/bug Categorizes issue or PR as related to a bug. release-1.5.0

Comments

@djwhatle
Copy link
Contributor

djwhatle commented Mar 16, 2021

Describe the bug
See video, while waiting for DVM to complete, DVM is reconciling at a very high rate when there's no work to be done. I believe I have traced this back to DVMP reconciling at a crazy high frequency when there is no Rsync Pod in existence (7 reconciles/sec) #1004 (comment)

I think we should slow down the reconciles during waiting periods like this to ~3 seconds or so to try to be good citizens on the clusters we're deployed into.

cc @alaypatel07 @pranavgaikwad

Screen.Recording.2021-03-16.at.7.10.27.PM.mov
@djwhatle djwhatle added the kind/bug Categorizes issue or PR as related to a bug. label Mar 16, 2021
@djwhatle
Copy link
Contributor Author

djwhatle commented Mar 16, 2021

It looks like we are already using PollReQ. Maybe this is DVMP causing frequent watch events? Or maybe this is the echo reconcile from lack of predicates.

@djwhatle
Copy link
Contributor Author

djwhatle commented Mar 16, 2021

Got some more data from Jaeger, it looks like DVMP is kicking DVM to reconcile every ~500ms even after the migration is completed.
image

@djwhatle
Copy link
Contributor Author

djwhatle commented Mar 18, 2021

@alaypatel07 I think I figured out what's going. The DVMP resource, if it still exists with an invalid PodRef, will reconcile non-stop and increments its resource version about 15 times every 2 seconds, meaning all of those watch events get sent up the chain.

Edit: I was getting confused between resourceVersion and generation. Forgot resourceVersion is global for etcd. Regardless, as you can see from the jaeger trace, there are a lot of DVMP updates going around.

Screen.Recording.2021-03-18.at.11.08.53.AM.mov

@djwhatle djwhatle changed the title DVM should using PollReQ while waiting on transfers to complete and Pods to come up. DVM reconciling too much while waiting on transfers to complete and Pods to come up. Mar 18, 2021
@djwhatle djwhatle changed the title DVM reconciling too much while waiting on transfers to complete and Pods to come up. DVM/DVMP reconciling too much while waiting on transfers to complete and Pods to come up. Mar 18, 2021
@djwhatle
Copy link
Contributor Author

djwhatle commented May 5, 2021

Possibly fixed by #1013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. release-1.5.0
Projects
None yet
1 participant