Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto clean-up feature for the Prefect internal database #16054

Open
rmnvncnt opened this issue Nov 19, 2024 · 1 comment
Open

Auto clean-up feature for the Prefect internal database #16054

rmnvncnt opened this issue Nov 19, 2024 · 1 comment
Labels
performance Related to an optimization or performance improvement

Comments

@rmnvncnt
Copy link

rmnvncnt commented Nov 19, 2024

I figured out that our Prefect server deployment was running slow over time and we had trouble scheduling new jobs or updating data in the UI. The issue was the Prefect internal database that was overflowing with logs from old runs and using a script suggested by @Arthurhussey helped mitigate the problem by removing logs older than a week.

While this solution worked in my case, having a scheduled flow tampering with the Prefect database directly might be a source of issues downhill.

It would be very nice if Prefect server had a way of cleaning its logs automatically. For instance, an environment variable similar to PREFECT_EVENTS_RETENTION_PERIOD for flow runs and task runs.

The initial discussion :

@rmnvncnt the Prefect server doesn't have any auto clean-up features right now, but if that's something you'd like, please open an issue so we can discuss it further!

It looks like the issue of deployments not being displayed has been solved by reducing the amount of data in your DB so the scheduler can insert scheduled runs, so I'm going close this issue.

Originally posted by @desertaxle in #15919 (comment)

@rmnvncnt rmnvncnt closed this as not planned Won't fix, can't repro, duplicate, stale Nov 19, 2024
@rmnvncnt rmnvncnt reopened this Nov 19, 2024
@mikelogaciuk
Copy link

That is good idea.

In my company, we delete everything from:

  • aritfact
  • flow_run
  • flow_run_state
  • task_run
  • task_run_state
  • events
  • event_resources
  • log

That is older than 60 days (WHERE created < (CURRENT_DATE -60);).

And we do of course a periodic VACUUM on those tables in order to get the storage back.

@zzstoatzz zzstoatzz added the performance Related to an optimization or performance improvement label Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Related to an optimization or performance improvement
Projects
None yet
Development

No branches or pull requests

3 participants