Make "timed out" and "log limit exceeded" builds aborted #1369

Ma27 · 2024-03-13T10:03:00Z

In 7369408 I gave builds that failed because of a timeout or exceeded log limit a stop sign and I stand by that reasoning: with that it's possible to distinguish between actual build failures and rather transient things such as timeouts.

Back then I considered it a feature that these are shown in a different tab, but I don't think that's a good idea anymore. When using a jobset to e.g. track the regressions from a mass rebuild (like a compiler or gcc update), "Newly failed builds" should exclusively display regressions (and flaky builds of course, not much I can do about that).

Also, when a bunch of builds fail in such a jobset because of e.g. a broken connection to a builder that results in a timeout, I want to be able to restart them all w/o rebuilding actual regressions.

To make it clear that we not only have "Aborted" builds in the tab, I renamed the label to "Aborted / Timed out".

cc @NixOS/infra who will probably be the most affected by that for opinions.

In 7369408 I gave builds that failed because of a timeout or exceeded log limit a stop sign and I stand by that reasoning: with that it's possible to distinguish between actual build failures and rather transient things such as timeouts. Back then I considered it a feature that these are shown in a different tab, but I don't think that's a good idea anymore. When using a jobset to e.g. track the regressions from a mass rebuild (like a compiler or gcc update), "Newly failed builds" should exclusively display regressions (and flaky builds of course, not much I can do about that). Also, when a bunch of builds fail in such a jobset because of e.g. a broken connection to a builder that results in a timeout, I want to be able to restart them all w/o rebuilding actual regressions. To make it clear that we not only have "Aborted" builds in the tab, I renamed the label to "Aborted / Timed out".

mweinelt · 2024-03-16T03:23:11Z

Deployed on hydra.nixos.org.

vcunat · 2024-03-19T21:15:21Z

I find it a little weird that "restart all aborted jobs" now won't restart this whole tab but only a subset of it (won't restart the timed out jobs).

Ma27 · 2024-03-19T22:04:29Z

OK I only tested it until the point of where a certain job was displayed in a jobset eval. You're right, that should be part of that PR as well.

We should probably factor the logic on when something is aborted out to get rid of this duplication (in the restart, the jobset eval template and the jobset diff).

vcunat · 2024-03-22T09:52:39Z

Also for hydra.nixos.org there's a practical disadvantage that timing out jobs can't be filtered out when doing diffs, i.e. instead of in "still failing" tab they'll be in "aborted" tab. I mean the timeouts that aren't transient but just a bad build, though yeah – we'd better fix those somehow.

Ma27 · 2024-03-23T10:31:29Z

@vcunat part of the motivation was that I don't want to see timed out builds in "newly failing" (and subsequently I don't want to restart actual regressions when restarting timed out builds) and aborted builds seem more sensible here.

Perhaps we want to add another tab? Doesn't seem too complicated I guess, the categorization happens entirely outside of the DB AFAIK.

vcunat · 2024-04-16T07:59:05Z

Note: "log limit exceeded" is usually not transient, in my experience. Restarting such builds will typically produce too much logs again, so I'm not sure about the move in that case. I consider it similar to "output size exceeded".

Though yes, it's somewhat impure – at least in the sense that these limits are configurable without changing the derivation hash.

lheckemann approved these changes Mar 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make "timed out" and "log limit exceeded" builds aborted #1369

Make "timed out" and "log limit exceeded" builds aborted #1369

Ma27 commented Mar 13, 2024

mweinelt commented Mar 16, 2024

vcunat commented Mar 19, 2024

Ma27 commented Mar 19, 2024

vcunat commented Mar 22, 2024 •

edited

Loading

Ma27 commented Mar 23, 2024

vcunat commented Apr 16, 2024 •

edited

Loading

Make "timed out" and "log limit exceeded" builds aborted #1369

Are you sure you want to change the base?

Make "timed out" and "log limit exceeded" builds aborted #1369

Conversation

Ma27 commented Mar 13, 2024

mweinelt commented Mar 16, 2024

vcunat commented Mar 19, 2024

Ma27 commented Mar 19, 2024

vcunat commented Mar 22, 2024 • edited Loading

Ma27 commented Mar 23, 2024

vcunat commented Apr 16, 2024 • edited Loading

vcunat commented Mar 22, 2024 •

edited

Loading

vcunat commented Apr 16, 2024 •

edited

Loading