-
-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make "timed out" and "log limit exceeded" builds aborted #1369
base: master
Are you sure you want to change the base?
Conversation
In 7369408 I gave builds that failed because of a timeout or exceeded log limit a stop sign and I stand by that reasoning: with that it's possible to distinguish between actual build failures and rather transient things such as timeouts. Back then I considered it a feature that these are shown in a different tab, but I don't think that's a good idea anymore. When using a jobset to e.g. track the regressions from a mass rebuild (like a compiler or gcc update), "Newly failed builds" should exclusively display regressions (and flaky builds of course, not much I can do about that). Also, when a bunch of builds fail in such a jobset because of e.g. a broken connection to a builder that results in a timeout, I want to be able to restart them all w/o rebuilding actual regressions. To make it clear that we not only have "Aborted" builds in the tab, I renamed the label to "Aborted / Timed out".
Deployed on hydra.nixos.org. |
I find it a little weird that "restart all aborted jobs" now won't restart this whole tab but only a subset of it (won't restart the timed out jobs). |
OK I only tested it until the point of where a certain job was displayed in a jobset eval. You're right, that should be part of that PR as well. We should probably factor the logic on when something is aborted out to get rid of this duplication (in the restart, the jobset eval template and the jobset diff). |
Also for hydra.nixos.org there's a practical disadvantage that timing out jobs can't be filtered out when doing diffs, i.e. instead of in "still failing" tab they'll be in "aborted" tab. I mean the timeouts that aren't transient but just a bad build, though yeah – we'd better fix those somehow. |
@vcunat part of the motivation was that I don't want to see timed out builds in "newly failing" (and subsequently I don't want to restart actual regressions when restarting timed out builds) and aborted builds seem more sensible here. Perhaps we want to add another tab? Doesn't seem too complicated I guess, the categorization happens entirely outside of the DB AFAIK. |
Note: "log limit exceeded" is usually not transient, in my experience. Restarting such builds will typically produce too much logs again, so I'm not sure about the move in that case. I consider it similar to "output size exceeded". Though yes, it's somewhat impure – at least in the sense that these limits are configurable without changing the derivation hash. |
In 7369408 I gave builds that failed because of a timeout or exceeded log limit a stop sign and I stand by that reasoning: with that it's possible to distinguish between actual build failures and rather transient things such as timeouts.
Back then I considered it a feature that these are shown in a different tab, but I don't think that's a good idea anymore. When using a jobset to e.g. track the regressions from a mass rebuild (like a compiler or gcc update), "Newly failed builds" should exclusively display regressions (and flaky builds of course, not much I can do about that).
Also, when a bunch of builds fail in such a jobset because of e.g. a broken connection to a builder that results in a timeout, I want to be able to restart them all w/o rebuilding actual regressions.
To make it clear that we not only have "Aborted" builds in the tab, I renamed the label to "Aborted / Timed out".
cc @NixOS/infra who will probably be the most affected by that for opinions.