Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Hetzner) Sometimes the server is not destroyed #129

Open
HenkVanMaanen opened this issue Feb 2, 2023 · 1 comment
Open

(Hetzner) Sometimes the server is not destroyed #129

HenkVanMaanen opened this issue Feb 2, 2023 · 1 comment

Comments

@HenkVanMaanen
Copy link

Once in a while the created instance on Hetzner does not get destroyed. This results in a big bill from Hetzner because the server is still running.

Can it be that the autoscaler forgets that some servers exist when the autoscaler gets restarted, or the git server gets restarted?

This is our config:

drone-autoscaler:
    image: drone/autoscaler:1.8.2
    restart: unless-stopped
    volumes:
      - drone_autoscaler_data:/data
    environment:
      - DRONE_POOL_MIN=0
      - DRONE_POOL_MAX=4
      - DRONE_POOL_MIN_AGE=1h
      - DRONE_CAPACITY_BUFFER=0
      - DRONE_AGENT_CONCURRENCY=5
      - DRONE_SERVER_PROTO=https
      - DRONE_SERVER_HOST=REDACTED
      - DRONE_SERVER_TOKEN=${DRONE_SERVER_TOKEN}
      - DRONE_AGENT_TOKEN=${DRONE_RPC_SECRET}
      - DRONE_HETZNERCLOUD_DATACENTER=${DRONE_HETZNERCLOUD_DATACENTER}
      - DRONE_HETZNERCLOUD_IMAGE=ubuntu-20.04
      - DRONE_HETZNERCLOUD_TYPE=cx51
      - DRONE_HETZNERCLOUD_SSHKEY=${DRONE_HETZNERCLOUD_SSHKEY}
      - DRONE_HETZNERCLOUD_TOKEN=${DRONE_HETZNERCLOUD_TOKEN}
      - DRONE_INTERVAL=10s
      - DRONE_LOGS_DEBUG=false
@sdarwin
Copy link

sdarwin commented Jul 28, 2024

Here is a common situation of "server is not destroyed". Not necessarily the only reason.

Check the logs of the autoscaler container. Example:
{"id":"seUyIeJvPpjurPuw","level":"debug","max-pool":300,"min-pool":0,"msg":"check capacity","pending-builds":4,"running-builds":36,"server-buffer":0,"server-capacity":42,"server-count":42,"time":"2023-06-20T15:42:30Z"}
There are 4 undead agents that correspond to the "pending-builds":4.

I have a hypothesis (not certain) this is caused by the features "Auto cancel pull requests Automatically cancel pending pull request builds." "Auto cancel pushes Automatically cancel pending push builds." "Auto cancel running Automatically cancel running builds if newer commit pushed." Maybe multiple builds were cancelled quickly, in succession, and the autoscaler got confused. There was some sort of race condition.

"Solution":

run this database query

update stages set stage_status='killed'
where stage_id in
(select stage_id from stages s
 join builds b on s.stage_build_id = b.build_id
 where b.build_status not in ('running') and s.stage_status='pending');

Ideally, the real source of the bug could be found. Otherwise, drone itself could run this cleanup on a schedule. this is giving me the idea now to set up a postgres cron task on the server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants