You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here is a common situation of "server is not destroyed". Not necessarily the only reason.
Check the logs of the autoscaler container. Example: {"id":"seUyIeJvPpjurPuw","level":"debug","max-pool":300,"min-pool":0,"msg":"check capacity","pending-builds":4,"running-builds":36,"server-buffer":0,"server-capacity":42,"server-count":42,"time":"2023-06-20T15:42:30Z"}
There are 4 undead agents that correspond to the "pending-builds":4.
I have a hypothesis (not certain) this is caused by the features "Auto cancel pull requests Automatically cancel pending pull request builds." "Auto cancel pushes Automatically cancel pending push builds." "Auto cancel running Automatically cancel running builds if newer commit pushed." Maybe multiple builds were cancelled quickly, in succession, and the autoscaler got confused. There was some sort of race condition.
"Solution":
run this database query
update stages set stage_status='killed'
where stage_id in
(select stage_id from stages s
join builds b on s.stage_build_id = b.build_id
where b.build_status not in ('running') and s.stage_status='pending');
Ideally, the real source of the bug could be found. Otherwise, drone itself could run this cleanup on a schedule. this is giving me the idea now to set up a postgres cron task on the server.
Once in a while the created instance on Hetzner does not get destroyed. This results in a big bill from Hetzner because the server is still running.
Can it be that the autoscaler forgets that some servers exist when the autoscaler gets restarted, or the git server gets restarted?
This is our config:
The text was updated successfully, but these errors were encountered: