Use needles from correct ref of NEEDLES_DIR #5175

sdclarke · 2023-06-01T11:18:39Z

If the CASEDIR variable is a git repo URL with a fragment specifying the ref of the repository, we attempt to use it as the source of needles when viewing needle diffs in test results.

If the TEST_GIT_SHA variable is set, we use that instead as it should be the most accurate way to determine exactly which commit was used

Martchus

I don't think creating another worktree for every rev we might encounter scales very well. It would make more sense to use git show … to obtain a specific needle for a specific rev.

Note that we've been pondering about this problem ourselves and coming up with an approach that scales well enough for being usable on our production instances is hard (e.g. see https://progress.opensuse.org/issues/56789. https://progress.opensuse.org/issues/91905 and other related tickets).

lib/OpenQA/WebAPI/Controller/Step.pm

sdclarke · 2023-06-01T12:41:06Z

I appreciate the feedback!

I don't think creating another worktree for every rev we might encounter scales very well. It would make more sense to use git show … to obtain a specific needle for a specific rev.

I agree that it's unlikely to scale well, using git show seems like a good idea, I'll take a look into that and see if I can come up with something functional.

Martchus · 2023-06-01T12:45:22Z

By the way, I think the most important point of my review was that this should be put behind a configuration option and be disabled by default - at least until we are confident this works well in all kinds of instances (and currently we are likely pretty far from that).

sdclarke · 2023-06-01T12:48:23Z

By the way, I think the most important point of my review was that this should be put behind a configuration option and be disabled by default - at least until we are confident this works well in all kinds of instances (and currently we are likely pretty far from that).

Yes, definitely, this patch is very much a hack that was put together to get a quick and nasty solution to the problem for a specific use case, I realise there's an awful lot of change required for it to be anything like production-ready.

sdclarke · 2023-06-02T14:54:53Z

I've changed this to make use of git show to get the contents of the needle files from the correct SHA of the repository.

It now also has a config option to prevent the feature being used by default.

The hacky clean up has been removed for now, so another method of cleaning up will be needed.

stale · 2023-09-17T00:48:49Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

lib/OpenQA/Task/Needle/RemoveVersions.pm

Martchus

The error handling should be improved. I commented on the most important places.

Also, see my other inline-comments.

lib/OpenQA/Task/Needle/RemoveVersions.pm

lib/OpenQA/WebAPI/Controller/Step.pm

script/openqa-enqueue-needle-versions-cleanup

t/14-grutasks.t

lib/OpenQA/WebAPI/Controller/File.pm

lib/OpenQA/Utils.pm

foursixnine · 2024-02-07T07:35:22Z

@Martchus sorry I thought you hadn't reviewed and clicked again :) (just noticed the thingie, @sdclarke mentioned this during our chat at FOSDEM)

okurz

https://app.circleci.com/pipelines/github/os-autoinst/openQA/12874/workflows/bc1b1c0b-a2fa-4362-9269-22b44265b395/jobs/119993 looks very much related

Martchus

The error handling should be improved as mentioned in my last review.

The cleanup code also needs further improvement, see https://github.com/os-autoinst/openQA/pull/5175/files#r1480067530.

Martchus · 2024-03-06T17:14:16Z

@sdclarke You're not actively working on this, right? I would take over as we'd actually also like to have this feature ourselves (see https://progress.opensuse.org/issues/154783). I would keep your commits as-is (except for rebasing) and add my own commit on top of them.

neillydun · 2024-06-10T14:50:08Z

Is this currently being worked on, or is there anything that could be done to help get it merged?

okurz · 2024-06-10T14:59:24Z

There are at least three points necessary:

Rebase
Fix rebase/merge conflicts
Add tests for missing coverage

Martchus · 2024-06-10T15:19:23Z

Also see https://progress.opensuse.org/issues/154783#note-10 and my subsequent comments in that ticket for what's left.

Martchus · 2024-06-13T09:34:09Z

The minimum effort we need to put into this PR to be mergable is:

Resolve conflicts
Ensure test coverage checks pass (I suppose UI tests are missing)
Maybe some minor tweaks coming up in further reviews of the PR

If anyone wants to work on it (e.g. @sdclarke or someone from our team), drop a comment here for better coordination. We now also have https://progress.opensuse.org/issues/162035 for tracking this.

okurz · 2024-08-07T09:01:44Z

@sdclarke I saw that you pushed an update. Can you state what changes we should expect?

sdclarke · 2024-08-07T10:01:44Z

@sdclarke I saw that you pushed an update. Can you state what changes we should expect?

This push was just fixing the conflicts, I'm trying to look for the correct places to improve the test coverage but I'm having some trouble working out where the changes are needed. Any pointers in that regard would be appreciated.

Assuming I can deal with the test coverage issues, would that be enough to get this merged for now? From my perspective it would be good to get this over the line as it works well enough for our purposes, and it is disabled by default in the config. We can potentially revisit it later to address any outstanding points.

okurz · 2024-08-07T11:57:07Z

@sdclarke I saw that you pushed an update. Can you state what changes we should expect?

This push was just fixing the conflicts, I'm trying to look for the correct places to improve the test coverage but I'm having some trouble working out where the changes are needed. Any pointers in that regard would be appreciated.

As you are also adding new modules you could create new test modules like
t/15-shared-plugin-gru.t
which are basically unit tests with mocking all external dependencies.

Assuming I can deal with the test coverage issues, would that be enough to get this merged for now?

Yes. I already asked the team to support for which I created ticket https://progress.opensuse.org/issues/162035 but unfortunately no one picked it up yet.

Martchus · 2024-08-07T13:31:25Z

Assuming I can deal with the test coverage issues, would that be enough to get this merged for now? From my perspective it would be good to get this over the line as it works well enough for our purposes, and it is disabled by default in the config. We can potentially revisit it later to address any outstanding points.

@sdclarke As stated in #5175 (comment) the answer is no - nothing else needs to be done except further problems are found in PR review. However, I'd strongly recommend looking into point 2 of this note because in my opinion it makes not much sense to enable it on a production instance without it.

mergify · 2024-08-14T18:46:15Z

This pull request is now in conflicts. Could you fix it? 🙏

perlpunk · 2024-08-20T10:22:21Z

lib/OpenQA/WebAPI/Controller/Step.pm

+    };
+    chomp $needles_ref;
+    return undef unless $needles_ref;
+    return undef unless run_cmd_with_log ['git', '-C', $needle_dirs, 'fetch', '--depth', 1, 'origin', $needles_ref];


@sdclarke We have not forgotten about this.

We are currently working on some git features and fixes, and I just wanted to note that we have a git_auto_clone feature now that will fetch any requested ref on the webui (branch or sha) in a minion job before the job starts running. (Note that it only works if it is the same origin and not some fork.)

That would make this fetch here not necessary anymore and also avoid the need for a lockfile.
The sha should always be in NEEDLES_GIT_HASH so the rev-parse also seems unnecessary.

Would you consider to use the git_auto_clone feature in order to not have to implement that part here?

I'm not sure if there are any cases which git_auto_clone wouldn't cover. @Martchus said there might be.

Note that git_auto_clone only helps if a Git-URL is specified as NEEDLES_DIR (or CASEDIR which is relevant if needles are part of the tests).

Even then it is not guaranteed to fetch the refs the test actually runs on. The test might specify a branch name (or no branch which means "default branch") and when the test is finally executed on the worker host the ref that branch points to might differ from the ref checked out by the web UI at the time the test was scheduled.

Additionally, we probably want to have some cleanup for the fetching done by the git_auto_clone feature. So the ref might have also already been cleaned up.

So I think we need that fetch in any case.

That would make this fetch here not necessary anymore and also avoid the need for a lockfile.

I don't think we need a lockfile here. We can probably arrange it so that it would not matter if any other commands are executed in the middle. (The use of FETCH_HEAD definitely needs to go way but it can probably be replaced with whatever we specify on the fetch command.)

The sha should always be in NEEDLES_GIT_HASH so the rev-parse also seems unnecessary.

That's true. So we could (and probably should) get rid of it. If one specifies a branch name via NEEDLES_DIR the NEEDLES_GIT_HASH still has the concrete sha.

Note that git_auto_clone only helps if a Git-URL is specified as NEEDLES_DIR (or CASEDIR which is relevant if needles are part of the tests).

The PR description says that it is about CASEDIR/NEEDLES_DIR being set.

And if it is not set, there is no certain ref to fetch, just the default remote branch

In the near future, git_auto_clone will also automatically update clones if CASEDIR/NEEDLES_DIR is empty. It will just update the checkout branch in the checked out directory.

And this is all we need, no?
What else needs to be fetched that git_auto_clone does not fetch?

The only thing could be forks, but as far as I can see this PR also doesn't handle forks.

Even then it is not guaranteed to fetch the refs the test actually runs on. The test might specify a branch name (or no branch which means "default branch") and when the test is finally executed on the worker host the ref that branch points to might differ from the ref checked out by the web UI at the time the test was scheduled.

What's checked out on the webui does not matter. The important thing is that it has been fetched at some point, so we can do a git show on it, right?

Additionally, we probably want to have some cleanup for the fetching done by the git_auto_clone feature. So the ref might have also already been cleaned up.

Yes. I'm not sure what git already offers there and how to know what we can cleanup.

What's checked out on the webui does not matter. The important thing is that it has been fetched at some point, so we can do a git show on it, right?

I meant "fetched by the web UI". And I meant the following sequence of events:

A test is scheduled pointing to a branch, e.g. just the default branch.

The web UI fetches the latest ref of that branch because git_auto_clone = 1 is configured.

Someone pushes additional commits to that branch or even force-pushes there.

The test is finally assigned to a worker and that worker now fetches a different ref than the web UI in step 2 so NEEDLES_GIT_HASH points to a ref that is not yet checked out (despite git_auto_clone = 1).

Someone wants to display the candidate needles for that job. We need to fetch the missing ref.

And this is not very contrived. That someone specifies a branch where many commits happen (e.g. the default branch of some repo with many contributors) is probably a common use case. That it can take between the scheduling of a job and the time it actually gets executed is also something we need to expect.

perlpunk · 2024-08-20T10:28:21Z

Just to clarify a few things, I'm currently trying to understand the feature.

If the CASEDIR variable is a git repo URL with a fragment specifying the ref of the repository, we attempt to use it as the source of needles when viewing needle diffs in test results.

If the TEST_GIT_SHA variable is set, we use that instead as it should be the most accurate way to determine exactly which commit was used

The title is talking about NEEDLES_DIR, the code mentions NEEDLES_GIT_SHA, but the PR description talks about CASEDIR and TEST_GIT_SHA.
Of course there can be needles in a CASEDIR if the repo has a needles dir itself. I just found it a bit confusing.

I was also wondering: Even if you don't specify CASEDIR/NEEDLES_DIR, the feature is about showing the correct ref from the time the test was running, even if there was a newer version in git meanwhile, right?

Because we have this ticket, but I can't find a clear description of the story, so no acceptance criteria, so I have to figure it out from the code.

kalikiana · 2024-09-09T11:03:06Z

Because we have this ticket, but I can't find a clear description of the story, so no acceptance criteria, so I have to figure it out from the code.

@sdclarke Hey! 👋 Did you have a chance to look at the ticket description and the acceptance criteria? To be sure we're on the same page now on what's being implemented here.

kalikiana · 2024-09-23T07:18:01Z

@sdclarke Hey! 👋 Did you have a chance to look at the ticket description and the acceptance criteria? To be sure we're on the same page now on what's being implemented here.

@sdclarke ping? In case you missed the previous one 🙃

sdclarke · 2024-09-23T16:11:01Z

Hi all, sorry for the lack of updates, I haven't had time to follow up on this lately.

Because we have this ticket, but I can't find a clear description of the story, so no acceptance criteria, so I have to figure it out from the code.

@sdclarke Hey! 👋 Did you have a chance to look at the ticket description and the acceptance criteria? To be sure we're on the same page now on what's being implemented here.

Looking at the ticket, the acceptance criteria look to match what I was trying to achieve.

The title is talking about NEEDLES_DIR, the code mentions NEEDLES_GIT_SHA, but the PR description talks about CASEDIR and TEST_GIT_SHA. Of course there can be needles in a CASEDIR if the repo has a needles dir itself. I just found it a bit confusing.

I think the title became out of sync with the description after an initial round of reviews, when I first wrote it the workflow I was using had the needles directory within the CASEDIR so I didn't initially consider them separately. I subsequently used the NEEDLES_DIR in this PR but must have missed the description.

I was also wondering: Even if you don't specify CASEDIR/NEEDLES_DIR, the feature is about showing the correct ref from the time the test was running, even if there was a newer version in git meanwhile, right?

The original problem we were having that inspired this was that the web UI might not have any version of a particular needle, as we had a workflow which involved testing new needles in branches of the tests/needles repository before merging them to the main branch.

But ultimately, yes your description of the problem this PR is attempting to solve is correct, I based it on CASEDIR/NEEDLES_DIR as that was how we used it, and it provided the path of least resistance to doing what we needed.

As mentioned I don't have time to work on this for now, once again sorry for my delayed response.

systemd/openqa-enqueue-needle-ref-cleanup.timer

b10n1k · 2024-10-25T08:12:43Z

etc/openqa/openqa.ini

@@ -117,6 +117,10 @@
 #git_auto_clone = yes
 ## Experimental - Ensure a git update of all test code and needles
 #git_auto_update = no
+# whether openQA should attempt to display needles of the correct version in the web UI


I have a feeling that the comment is a bit cryptic from users' side. What does it mean attempt? Also correct version I think is not very explanatory. I am talking from the POV of someone who do not know much about the openQA or a new user.

I don't think we'll be able to explain this feature within the config file. I think this comment is good enough for people trying to enable this feature after reading documentation about it elsewhere. So we could document this feature as part of the normal openQA documentation. However, I'd be fine keeping this as an almost undocumented "expert feature" for now.

Yeah, maybe we can still change the word "correct". Who wouldn't want "correct versions"? :)

How about

Suggested change

# whether openQA should attempt to display needles of the correct version in the web UI

# whether openQA should display needles from different branches in the web UI

This is not about the branch that was used (branches are moving targets) but really about the exact version that was used (not a moving target). This lookup will probably also be best effort. So I find the version of this comment as it is right now much better than the suggested change.

Who wouldn't want "correct versions"? :)

Someone who doesn't want to pay the price for it. We could state the disadvantages here, especially that the feature is still experimental.

So rather say "exact version" and state the drawbacks

t/14-grutasks.t

mergify · 2024-10-31T13:24:11Z

This pull request is now in conflicts. Could you fix it? 🙏

If the NEEDLES_DIR (or CASEDIR as a backup) variable is a git repo URL with a fragment specifying the ref of the repository, we attempt to use it as the source of needles when viewing needle diffs in test results. If the NEEDLES_GIT_SHA variable is set, we use that instead as it should be the most accurate way to determine exactly which commit was used

There is now a minion task which will clean up alternate needle files created by the WebUI. The minimum retention time of the needle files can be set in the config, and defaults to 30 minutes

* Improve error handling * Simplify code * Split code into smaller functions * Move needle-related code into its own module so utilities don't become too big * Avoid using a shell to run Git commands; this breaks when needle paths contain characters that needed escaping * Avoid invoking Git commands if temporary needle files are still present * Use the usual directory the web UI stores cache files in (instead of hard-coding `/tmp/…`) * Output temporary files directly instead of using intermediate buffer * Keep using the real needle path as before for populating last seen/match because using the relative path breaks compatibility with existing databases and this change is supposedly not required anyway * See https://progress.opensuse.org/issues/154783

* Use the settings key `temp_needle_refs_retention` which makes it clear that this setting is about temporary needle refs * Document settings key in default `openqa.ini` * Adapt the cleanup to be in accordance with the changed path for temporary needle refs * Rename cleanup task to `limit_temp_needle_refs` to be more in-line with existing cleanup tasks * Add systemd units to enqueue the cleanup task automatically * Change the default retention to two hours (from 30 minutes) * Use the modification time instead of the access time because the access time might be updated very to easily unintendedly * Use signal guard to retry cleanup in case the gru service is restarted * See https://progress.opensuse.org/issues/154783

mergify · 2024-11-05T15:20:22Z

This pull request is now in conflicts. Could you fix it? 🙏

Martchus requested changes Jun 1, 2023

View reviewed changes

sdclarke force-pushed the needles_from_casedir branch 2 times, most recently from 562ad9e to 30a6b8e Compare June 2, 2023 14:41

sdclarke force-pushed the needles_from_casedir branch from 30a6b8e to 0953592 Compare June 2, 2023 15:05

sdclarke force-pushed the needles_from_casedir branch from 7756787 to 6c60bba Compare June 12, 2023 14:56

stale bot added the stale label Sep 17, 2023

sdclarke marked this pull request as ready for review September 20, 2023 09:13

stale bot removed the stale label Sep 20, 2023

sdclarke force-pushed the needles_from_casedir branch from 6c60bba to fe86c76 Compare November 1, 2023 11:04

okurz reviewed Nov 4, 2023

View reviewed changes

lib/OpenQA/Task/Needle/RemoveVersions.pm Outdated Show resolved Hide resolved

sdclarke force-pushed the needles_from_casedir branch from fe86c76 to bbe3787 Compare November 24, 2023 14:31

kalikiana reviewed Dec 11, 2023

View reviewed changes

lib/OpenQA/Task/Needle/RemoveVersions.pm Outdated Show resolved Hide resolved

sdclarke force-pushed the needles_from_casedir branch 5 times, most recently from f2d4db8 to e6992d7 Compare February 1, 2024 17:05

kalikiana approved these changes Feb 1, 2024

View reviewed changes

foursixnine requested a review from Martchus February 3, 2024 16:46

Martchus requested changes Feb 6, 2024

View reviewed changes

foursixnine requested a review from Martchus February 7, 2024 07:32

okurz requested changes Feb 7, 2024

View reviewed changes

Martchus previously requested changes Feb 20, 2024

View reviewed changes

sdclarke force-pushed the needles_from_casedir branch from 07e3a1f to 9567c32 Compare August 7, 2024 08:38

perlpunk reviewed Aug 20, 2024

View reviewed changes

kalikiana force-pushed the needles_from_casedir branch 3 times, most recently from b31f643 to 124af5c Compare October 22, 2024 16:36

b10n1k reviewed Oct 25, 2024

View reviewed changes

systemd/openqa-enqueue-needle-ref-cleanup.timer Outdated Show resolved Hide resolved

b10n1k reviewed Oct 25, 2024

View reviewed changes

Martchus reviewed Oct 25, 2024

View reviewed changes

t/14-grutasks.t Outdated Show resolved Hide resolved

sdclarke and others added 6 commits October 31, 2024 15:50

Add cleanup for alternate needle versions

14f9242

There is now a minion task which will clean up alternate needle files created by the WebUI. The minimum retention time of the needle files can be set in the config, and defaults to 30 minutes

Add unit test for needle version removal

5ac430f

Drop use of undeclared File::Touch dependency

e134fff

kalikiana force-pushed the needles_from_casedir branch from 64cd052 to e134fff Compare October 31, 2024 15:11

	# whether openQA should attempt to display needles of the correct version in the web UI
	# whether openQA should display needles from different branches in the web UI

Use needles from correct ref of NEEDLES_DIR #5175

Are you sure you want to change the base?

Use needles from correct ref of NEEDLES_DIR #5175

Conversation

sdclarke commented Jun 1, 2023

Martchus left a comment

Choose a reason for hiding this comment

sdclarke commented Jun 1, 2023

Martchus commented Jun 1, 2023

sdclarke commented Jun 1, 2023

sdclarke commented Jun 2, 2023

stale bot commented Sep 17, 2023

Martchus left a comment

Choose a reason for hiding this comment

foursixnine commented Feb 7, 2024

okurz left a comment

Choose a reason for hiding this comment

Martchus left a comment

Choose a reason for hiding this comment

Martchus commented Mar 6, 2024

neillydun commented Jun 10, 2024

okurz commented Jun 10, 2024

Martchus commented Jun 10, 2024

Martchus commented Jun 13, 2024 • edited Loading

okurz commented Aug 7, 2024

sdclarke commented Aug 7, 2024

okurz commented Aug 7, 2024

Martchus commented Aug 7, 2024

mergify bot commented Aug 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

perlpunk Aug 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

perlpunk commented Aug 20, 2024 • edited Loading

kalikiana commented Sep 9, 2024

kalikiana commented Sep 23, 2024

sdclarke commented Sep 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented Oct 31, 2024

mergify bot commented Nov 5, 2024

Martchus commented Jun 13, 2024 •

edited

Loading

perlpunk Aug 21, 2024 •

edited

Loading

perlpunk commented Aug 20, 2024 •

edited

Loading