-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't wait forever in cc-worker #256
Comments
I think this was originally added because of this issue: #165, though I don't quite understand how it addresses it. See cloudfoundry/cloud_controller_ng@85b7ac8 BackgroundJobEnvironment still waits for DB to be migrated, so it seems to me that original issue would still occur. A cc-worker would never progress to the readiness port. I suppose it limits the failure to a single cc-worker? |
Just documenting my understanding of this. The current behaviour when pre-start suceeds:
Now in the case when pre-start fails:
Thoughts/Questions:
note Probably missing some context here, will update this comment as I poke around more. |
Side note: There are parameters like |
Another interesting observation, From this ticket https://www.pivotaltracker.com/story/show/171417894 it seems like the purpose of this was to allow non-http jobs to have their readiness checked for k8s, and this readiness property was added to worker/clock/deployment_updater. However it is disabled by default in everything but the worker, see https://github.com/cloudfoundry/capi-release/blob/develop/jobs/cc_deployment_updater/spec#L87C21-L87C39 and https://github.com/cloudfoundry/cloud_controller_ng/blob/main/lib/cloud_controller/background_job_environment.rb#L23. This commit added it 56a7240 . One option here would be to set the port back to default of Another options is to be more aggressive and remove this entire check, however I'm not sure if there is still a k8s usecase for this. |
Another note, this is more important to fix now because this is much more likely due to the increased chances of failing cc api pre-start script due to the addition of the stack checker https://github.com/cloudfoundry/capi-release/blob/develop/jobs/cloud_controller_ng/templates/pre-start.sh.erb#L145 . If the stack-checker fails on a deprecated stack and there is a new migration in the upgrade, this bug will 100% occur. |
I don't think there is a k8s use-case for this anymore. Cf-for-k8s which included capi-k8s-release was deprecated in 2021 and cloudfoundry/cloud_controller_ng#3140 pretty much ensured that Cloud Controller's native k8s support was removed. I think on Kubernetes we couldn't control the ordering of how these jobs started as well as we could on BOSH so some of these weird things were put in place to make it more tolerant of that sort of rollout. I don't remember much more than that unfortunately cause I was mostly involved with the networking team at the time. |
Thanks for submitting an issue to
capi-release
. We are always trying to improve! To help us, please fill out the following template.Issue
cc-worker can block bosh from cancelling a deployment, since the post start waits forever:
capi-release/jobs/cloud_controller_worker/templates/post-start.sh.erb
Line 16 in 5f061c8
Steps to Reproduce
CC migrations fail, or pre-start script fails in another way
Expected result
the deployment fails, the task is cancelled, and the lock is deleted
Current result
the deployment fails, the task hangs when cancelled, and the lock is not deleted
Possible Fix
introduce some kind of timeout in the post-start script
name of issue
screenshot[if relevant, include a screenshot]
The text was updated successfully, but these errors were encountered: