Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to FindCUDAToolKit #83

Merged
merged 59 commits into from
Jan 11, 2024

Conversation

Tobias-Fischer
Copy link
Contributor

@Tobias-Fischer Tobias-Fischer commented Jan 6, 2024

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@Tobias-Fischer
Copy link
Contributor Author

@conda-forge-admin, please rerender

@Tobias-Fischer
Copy link
Contributor Author

/cc @jakirkham - this hopefully would fix #82 #78

conda-forge-webservices[bot] and others added 2 commits January 6, 2024 11:25
@Tobias-Fischer
Copy link
Contributor Author

@conda-forge-admin, please rerender

@Tobias-Fischer
Copy link
Contributor Author

@conda-forge-admin, please rerender

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tobias! 🙏

Indeed it looks like the ARM builds are doing well

Had a minor suggestion to fix an issue spotted in one of the ARM CI logs

recipe/meta.yaml Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member

jakirkham commented Jan 7, 2024

As to the Linux x86_64 builds, it looks like the errors there are caused by a CUDA version mismatch between build and host. Looking at this snippet from CI:

    nvlink fatal : Input file
    '/home/conda/feedstock_root/build_artifacts/dlib-split_1704585866969/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib/libcudadevrt.a:cuda_device_runtime.o'
    newer than toolkit (122 vs 120)

Think we can fix this by taking this portion of the recipe...

host:
- cudnn # [cuda_compiler_version != "None"]

...and making this change to it

       host:
+        - cuda-version {{ cuda_compiler_version }}  # [(cuda_compiler_version or "None") != "None"]
         - cudnn                                  # [cuda_compiler_version != "None"]

This should align the two environments on the same CUDA compiler version

@Tobias-Fischer
Copy link
Contributor Author

Many thanks @jakirkham! Do you have any idea how to resolve the windows issues?

@Tobias-Fischer
Copy link
Contributor Author

And there seems to be a missing compatible cudnn on aarch64 too :(

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tobias! 🙏

Added a few relevant suggestions below

recipe/meta.yaml Outdated
Comment on lines 65 to 66
- cuda-nvcc-tools # [(cuda_compiler_version or "").startswith("12")]
- cuda-nvcc-impl # [(cuda_compiler_version or "").startswith("12")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we drop these from host? Think this is confusing CMake on Windows

Also please let me know any context as to why they were added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. Unfortunately still the same issue :(. In conda-forge/pycolmap-feedstock#16 I went back to a setuptools build, that fixed it somehow. I think the correct flags are still not being passed.

/cc @carterbox

recipe/meta.yaml Show resolved Hide resolved
@Tobias-Fischer
Copy link
Contributor Author

Finally. It seems using Ninja did the trick. Gosh, what an ordeal ;).

I suggest to:

Any more thoughts @jakirkham? Thanks a lot for all your help!

*:

FAILED: dlib_build/CMakeFiles/dlib.dir/cuda/cuda_dlib.cu.obj
  C:\PROGRA~1\NVIDIA~2\CUDA\v11.2\bin\nvcc.exe -forward-unknown-to-host-compiler -DDLIB_JPEG_STATIC -ID:\bld\dlib-split_1704865264769\work\dlib\external\libpng -ID:\bld\dlib-split_1704865264769\work\dlib\external\zlib -ID:\bld\dlib-split_1704865264769\_h_env\Library\include -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\include" "--use-local-env" -Xcompiler="-MD -O2 -Ob2" -DNDEBUG -std=c++14 "--generate-code=arch=compute_35,code=[sm_35]" "--generate-code=arch=compute_37,code=[sm_37]" "--generate-code=arch=compute_50,code=[sm_50]" "--generate-code=arch=compute_52,code=[sm_52]" "--generate-code=arch=compute_53,code=[sm_53]" "--generate-code=arch=compute_60,code=[sm_60]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_62,code=[sm_62]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_72,code=[sm_72]" "--generate-code=arch=compute_75,code=[sm_75]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_86,code=[compute_86,sm_86]" -DDLIB_NO_ABORT_ON_2ND_FATAL_ERROR -DDLIB_JPEG_SUPPORT -DDLIB_USE_CUDA -DDLIB_PNG_SUPPORT -MD -MT dlib_build\CMakeFiles\dlib.dir\cuda\cuda_dlib.cu.obj -MF dlib_build\CMakeFiles\dlib.dir\cuda\cuda_dlib.cu.obj.d -x cu -c D:\bld\dlib-split_1704865264769\work\dlib\cuda\cuda_dlib.cu -o dlib_build\CMakeFiles\dlib.dir\cuda\cuda_dlib.cu.obj -Xcompiler=-Fddlib_build\CMakeFiles\dlib.dir\dlib.pdb,-FS
  C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\xutility(1260): error: expected a "("
            detected during instantiation of "void std::_Adl_verify_range(const _Iter &, const _Sentinel &) [with _Iter=const char *, _Sentinel=const char *]"
  C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\xlocale(1971): here

@jakirkham
Copy link
Member

jakirkham commented Jan 10, 2024

Nicely done Tobias! 🥳 Thank you for all your hard work here! 🙏

Yeah agree this was not easy. Exploring with upstream whether we can starting using FindCUDAToolkit in dlib sooner, which would help alleviate some of the pain here. It might be worth raising any other pain points in new (or existing) issues so we can make this process smoother going forward

Everything you propose sounds reasonable

CUDA 12 ARM probably needs a newer GLIBC (due to cuDNN) ( conda-forge/conda-forge.github.io#1941 )

As to CUDA 11.2, happy with dropping it on Windows (or anywhere else for that matter). A few feedstocks have already dropped CUDA 11.2. Also have started a discussion about moving from CUDA 11.2 to 11.8 in conda-forge ( conda-forge/conda-forge-pinning-feedstock#5339 )

Edit: FWIW did find this Windows CUDA 11.2 issue ( actions/runner-images#3485 ), which might make sense of the issues we are seeing there

@Tobias-Fischer
Copy link
Contributor Author

@conda-forge-admin, please rerender

@Tobias-Fischer
Copy link
Contributor Author

@conda-forge-admin, please rerender

@Tobias-Fischer
Copy link
Contributor Author

@jakirkham @conda-forge/dlib do you want to take another look, please? All green now.

recipe/meta.yaml Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member

Generally this looks good. Only had a couple suggestions on adding comments for future readers

Thanks again for all of your hard work here Tobias! 🙏

@Tobias-Fischer
Copy link
Contributor Author

Thanks @jakirkham for the review and your help - will add automerge label here :)

@Tobias-Fischer Tobias-Fischer added the automerge Merge the PR when CI passes label Jan 10, 2024
@github-actions github-actions bot merged commit 61a2b74 into conda-forge:main Jan 11, 2024
73 checks passed
Copy link
Contributor

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

  • linter: passed
  • azure: passed

Thus the PR was passing and merged! Have a great day!

@jakirkham
Copy link
Member

Thanks Tobias! 🙏

Happy to help. Glad to see this in 😄

@jakirkham
Copy link
Member

It sounds like we need to set CMAKE_GENERATOR_TOOLSET. The interesting part is "must provide Visual Studio integration files in path .\extras\visual_studio_integration\ MSBuildExtensions" - no idea whether that's the case for conda-forge supplied cuda.

Circling back to this question, there is a CUDA Toolkit package for Visual Studio integration. Think that is what we would need for MSBuild support. Raised an issue about this: conda-forge/staged-recipes#28064

Certainly using a different CMake Generator than MSBuild would bypass this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge the PR when CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants