Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshots of PostgreSQL on haumea.nixos.org cause full disk issues #446

Open
mweinelt opened this issue Jun 21, 2024 · 3 comments
Open

Snapshots of PostgreSQL on haumea.nixos.org cause full disk issues #446

mweinelt opened this issue Jun 21, 2024 · 3 comments
Labels

Comments

@mweinelt
Copy link
Member

Hydra's database on haumea.nixos.org runs PostgreSQL on ZFS with zrepl for snapshot-based backups. Every once in a while we see the size of snapshots increase from <1G to 70-120G which results in a full disk.

My current working theory is

  • It is not related to WAL, since the WAL is only ~500MB in size, and we use a zrepl hook to force a CHECKPOINT before the snapshot g ets taken
  • It is likely an index that gets reshuffled (jobsetevalmembers_pk index is ~60GB in size)
  • We are likely also seeing an effect of write amplification (PostgreSQL uses 8K records, ZFS was configured for 16K records, down from its 128K default), going further down to 8K is not recommended. Instead maybe use 128K records and a dedicated SLOG device?
@mweinelt
Copy link
Member Author

Time-wise I could correlate this with automatic vacuuming.

https://github.com/NixOS/infra/blob/master/build/haumea/postgresql.nix#L79-L87

@mweinelt
Copy link
Member Author

Migrated compression from zstd to lz4, cause that is probably lighter on the CPU.

@vcunat
Copy link
Member

vcunat commented Jun 27, 2024

Years ago (ec61098) @edolstra tweaked some vacuuming parameters to 1/100 of the defaults. Maybe we could ease that a bit, as apparently we're suffering from too much vacuuming?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

2 participants