Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move to alma9 by default; enable choice through DEFAULT_LINUX_VERSION #6626

Merged
merged 8 commits into from
Nov 23, 2024

Conversation

h-vetinari
Copy link
Member

@h-vetinari h-vetinari commented Oct 29, 2024

Alternative to #6548

Closes #6283
Closes #6629

We can do this because all the required infrastructure is now in place (c.f. conda-forge/conda-forge.github.io#1941); the last missing piece was to verify how entries of yum_requirements.txt match between centos7 & alma8 - the following overview shows that there's a high degree of compatibility (and many things can be replaced by our own build anyway). The handful of differences can be fixed on the affected feedstocks.

Should wait for:

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@h-vetinari h-vetinari changed the title move to alma8 images move to alma8 by default; add alma9 & enable choice through DEFAULT_LINUX_VERSION Oct 31, 2024
@h-vetinari
Copy link
Member Author

h-vetinari commented Oct 31, 2024

For the reasoning behind c7f5973, see discussion here. The preceding commit fails if a feedstock would override c_stdlib_version, see also this branch where I've tested this for an affected feedstock.

c7f5973 would thus be necessary to make the docker image (sufficiently) independent from c_stdlib_version, so that they can be set independently (if indeed this is something we desire to enable; one reason why is that the image version doesn't have to affect the sysroot-baseline, in particular if it's only for being able to load a build tool or test dependency).

@h-vetinari h-vetinari mentioned this pull request Nov 3, 2024
5 tasks
respect convenience setting DEFAULT_LINUX_VERSION to avoid feedstocks
having to override the monster-zip just to change c_stdlib_version & image.
this allows separate overrides for the c_stdlib_version and the image
(the latter can easily be set using `os_version` in `conda-forge.yml`)
@h-vetinari h-vetinari changed the title move to alma8 by default; add alma9 & enable choice through DEFAULT_LINUX_VERSION move to alma9 by default; enable choice through DEFAULT_LINUX_VERSION Nov 11, 2024
@h-vetinari h-vetinari marked this pull request as ready for review November 12, 2024 01:44
@h-vetinari h-vetinari requested a review from a team as a code owner November 12, 2024 01:44
@h-vetinari
Copy link
Member Author

@jakirkham, you mentioned in the core call just now that all the CUDA feedstocks already set os_version explicitly? In only did a spot check and it looks like this for example:

os_version:
  linux_64: cos7
  linux_aarch64: cos7
  linux_ppc64le: cos7

Following conda-forge/docker-images#291 (using the consolidated names that allow choosing the distro version), this would eventually have to move to

os_version:
  linux_64: cos7
  linux_aarch64: ubi8
  linux_ppc64le: ubi8

or just

os_version:
  linux_64: cos7

because we don't/won't support distro overrides for aarch/ppc. In either case, once we've confirmed that we got os_version set in the relevant feedstocks doing binary repackaging, this PR should be ready. I'll leave out the image renaming stuff for #6687

@h-vetinari
Copy link
Member Author

once we've confirmed that we got os_version set in the relevant feedstocks doing binary repackaging, this PR should be ready.

@jakirkham do you have a list of cuda-related feedstocks that I can double-check?

@h-vetinari
Copy link
Member Author

Gentle ping @jakirkham

@jakirkham
Copy link
Member

Thanks Axel! 🙏

Please don't feel obliged to track or make these changes. This is something NVIDIA plans to pick up. In fact folks are eager to adopt these changes once available

There is not one comprehensive list as there are the CUDA Toolkit (CTK) packages, CUDA Math libraries, CUDA DL libraries, etc. Plus more packages are being added

That said, for the CUDA Toolkit packages we handled this work as part of issue ( conda-forge/cuda-feedstock#26 ), this has a list of CTK feedstocks that had this change included

Other NVIDIA feedstocks have been encouraged to follow the same pattern

@h-vetinari
Copy link
Member Author

Please don't feel obliged to track or make these changes. This is something NVIDIA plans to pick up. In fact folks are eager to adopt these changes once available

I agree that this makes sense, but in the last core call the ask had been to commit os_version: ... into the binary repackaging feedstocks before switching. That would currently be a no-op, so we could even just push them with a [ci skip]. I'm willing to do that (or help), but I don't have a list of feedstocks (well, except perhaps searching for "cuda" in our feedstock names).

@jakirkham
Copy link
Member

The issue link above hopefully answers that question. Admittedly not comprehensively, but that is pretty close

Will look for other cases we may have missed and file issues

@h-vetinari
Copy link
Member Author

Thank you! 🙏

@jakirkham
Copy link
Member

Found a few stragglers. Have put together a list in this comment: conda-forge/cuda-feedstock#28 (comment)

Will start working through those. Also will update the list if we find more

Copy link
Member

@carterbox carterbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one that we're going with instead of #6548, right?

My understanding is that the following is the expected behavior because it is what I observed when using the PR to render a feedstock.

os_version container x86 container aarch64 container ppc64le
undefined linux-anvil-alma-x86_64:9 linux-anvil-alma-aarch64:9 linux-anvil-alma-ppc64le:9
alma9 linux-anvil-alma-x86_64:9 linux-anvil-alma-aarch64:9 linux-anvil-alma-ppc64le:9
alma8 linux-anvil-alma-x86_64:8 linux-anvil-alma-aarch64:8 linux-anvil-alma-ppc64le:8
wrong linux-anvil-comp7 linux-anvil-comp7 linux-anvil-comp7
alma7 linux-anvil-comp7 linux-anvil-comp7 linux-anvil-comp7
cos7 linux-anvil-cos7-x86_64 linux-anvil-aarch64 linux-anvil-ppc64le

Except for CUDA renders where os_version that is not one of the valid ones (alma9, alma9, cos7) breaks the zipping of the keys, but this is fine. It's probably a feature that using an invalid os_version doesn't render.

CUDA 12 renders follow the non-cuda builds exactly.

CUDA 11 has only one container option for each architecture.

os_version container x86 container aarch64 container ppc64le
undefined linux-anvil-cuda:11.8 linux-anvil-aarch64-cuda:11.8 linux-anvil-ppc64le-cuda:11.8
alma9 linux-anvil-cuda:11.8 linux-anvil-aarch64-cuda:11.8 linux-anvil-ppc64le-cuda:11.8
alma8 linux-anvil-cuda:11.8 linux-anvil-aarch64-cuda:11.8 linux-anvil-ppc64le-cuda:11.8
cos7 linux-anvil-cuda:11.8 linux-anvil-aarch64-cuda:11.8 linux-anvil-ppc64le-cuda:11.8

I expect that os_version is completely decoupled from c_stdlib_version, so they should not affect each other.

@h-vetinari
Copy link
Member Author

h-vetinari commented Nov 20, 2024

Your list is correct as of this PR, but I think it'd actually be better to pull in the changes from conda-forge/docker-images#293 / #6687 into this PR. I'll update conda-forge/docker-images#291 to finish the rest of the necessary changes.

I have thought about the fact that only a single value of os_version can be used per platform, and how to handle the split between different distros between CUDA 11.8 and the rest. I'll push some changes to this effect momentarily. The result would look as follows (not counting the cross-compilation split for cuda 11.8)

os_version container x86 container aarch64 container ppc64le for CUDA 11.8
undefined linux-anvil-x86_64:alma9 linux-anvil-aarch64:alma9 linux-anvil-ppc64le:alma9 no
alma9 linux-anvil-x86_64:alma9 linux-anvil-aarch64:alma9 linux-anvil-ppc64le:alma9 no
alma8 linux-anvil-x86_64:alma8 linux-anvil-aarch64:alma8 linux-anvil-ppc64le:alma8 no
ubi8 linux-anvil-x86_64:alma8 linux-anvil-aarch64:alma8 linux-anvil-ppc64le:alma8 no
cos7 linux-anvil-x86_64:cos7 linux-anvil-aarch64:cos7 linux-anvil-ppc64le:cos7 no
undefined l.-a.-x86_64-cuda11.8:ubi8 l.-a.-aarch64-cuda11.8:ubi8 l.-a.-ppc64le-cuda11.8:ubi8 yes
alma9 l.-a.-x86_64-cuda11.8:ubi8 l.-a.-aarch64-cuda11.8:ubi8 l.-a.-ppc64le-cuda11.8:ubi8 yes
alma8 l.-a.-x86_64-cuda11.8:ubi8 l.-a.-aarch64-cuda11.8:ubi8 l.-a.-ppc64le-cuda11.8:ubi8 yes
ubi8 l.-a.-x86_64-cuda11.8:ubi8 l.-a.-aarch64-cuda11.8:ubi8 l.-a.-ppc64le-cuda11.8:ubi8 yes
cos7 l-a-x86_64-cuda11.8:cos7 l.-a.-aarch64-cuda11.8:ubi8 l.-a.-ppc64le-cuda11.8:ubi8 yes

For wrong values of os_version, we could add something like the following,

  # intentional breakage for unsupported values of DEFAULT_LINUX_VERSION
  - quay.io/condaforge/this_image_does_not_exist    # [linux and os.environ.get("DEFAULT_LINUX_VERSION", "alma9") not in ("cos7", "ubi8", "alma8", "alma9")]
  - quay.io/condaforge/this_image_does_not_exist    # [linux and os.environ.get("DEFAULT_LINUX_VERSION", "alma9") not in ("cos7", "ubi8", "alma8", "alma9")]

but arguably, failed renders are hard to debug, so IMO this should rather be a linter rule in smithy

@h-vetinari
Copy link
Member Author

OK, conda-forge/docker-images#291 was merged and I've tested it on the faiss feedstock with several variations of cos7 / ubi8 / alma8 / alma9, and it did the right thing in all those cases (compare table above).

In other words, this PR should be ready. :)

I'll write up the announcement that John asked for shortly.

@h-vetinari
Copy link
Member Author

so IMO this should rather be a linter rule in smithy

I started implementing this in smithy, but ran into conda-forge/conda-smithy#2152, which we should very likely solve first.

@h-vetinari
Copy link
Member Author

I have a smithy PR for a linter rule now: conda-forge/conda-smithy#2155

@jakirkham
Copy link
Member

Thanks Axel! 🙏

Opened a couple of test PRs

Was able to re-render ok

Though did see some issues pulling the Docker images. Were the images from PR ( conda-forge/docker-images#291 ) pushed ok? Or are there some naming changes that still need to be picked up here?

@h-vetinari
Copy link
Member Author

Thanks for testing! 🙏

Though did see some issues pulling the Docker images. Were the images from PR ( conda-forge/docker-images#291 ) pushed ok? Or are there some naming changes that still need to be picked up here?

I had double-checked the CI runs there, and aside from being green, the deploy step looks normal too. Not sure what could be causing the

+ docker pull quay.io/condaforge/linux-anvil-x86_64:alma8
Error response from daemon: unauthorized: access to the requested resource is not authorized

you're seeing.

@isuruf
Copy link
Member

isuruf commented Nov 21, 2024

Default setting for new images on quay.io is private. Made them public just now.

@jakirkham
Copy link
Member

Ah right 😅

Thanks Isuru! 🙏

Remembered there was something on Quay.io that needed an extra tweak. Had forgotten exactly what it was

Restarted the failed CI runs in those test PRs

@jakirkham
Copy link
Member

Looks like both of those builds passed! 🥳

Also looks like the recent work on changing the cdt_name to conda in PR ( conda-forge/conda-forge-ci-setup-feedstock#365 ) has fixed CUDA cross-compilation on Alma 8 (am guessing Alma 9 would work similarly)

@h-vetinari h-vetinari mentioned this pull request Nov 22, 2024
5 tasks
@h-vetinari
Copy link
Member Author

@conda-forge/core, this should be ready to go and I'd welcome reviews. 🙏

It's been tested both by me and by @jakirkham, and covers what we discussed in the core call (c.f. #6283 & conda-forge/docker-images#293)

To accompany this, I also have an announcement (as suggested by John) and a doc update ready. A linter rule for the distrotags in os_version is waiting for some dependent work and will follow soon, but we don't need to wait for that here.

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to these improvements! Thanks again Axel for working with us on them 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants