-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BLAS] Simplify CublasScopedContextHandler #609
[BLAS] Simplify CublasScopedContextHandler #609
Conversation
…We don't need to do any early cleanup upon sycl context destruction.
9c6f8e7
to
8f547cd
Compare
@oneapi-src/onemkl-blas-write any thoughts on this? |
I very much like the simplification here. If a user has two cuda devices available, how do they specify which one to use? |
They would specify that at the creation of a |
Thanks for the review @andrewtbarker! In that case I will apply similar changes to other backends as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. I agree that the CublasScopedContextHandler
can be improved but I have 2 big concerns with the suggested changes.
Ideally we should have a closer look at the impact of this change on an application using oneMKL+cuBLAS or create an example that calls a few oneMKL functions.
b527920
to
ee669b8
Compare
…xtendedDeleter It seems the static thread_local unordered map needs to stay because of all the thread shenanigans. But we're removing the use of detail namespace in sycl since it's not necessary for correctness.
ee669b8
to
46a2661
Compare
I modified this PR such that the cublasHandle(s) are destroyed only in one place: at the end of the program. Because of all the threading shenanigans I believe the By removing the use of The good news is that those changes make AdaptiveCpp and DPC++ implementations identical (if that's of any benefit). |
Alright, that looks fine to me then.
Do you think it would be possible to remove |
Good point, I think those files can be removed but I don't have much experience with AdaptiveCpp so I wouldn't be able to test the build (at least right now). |
Looking at the internal CI, it does not seem that the blas domain compiles with AdaptiveCpp anyway. One of the issue is #567. I see that there are still some differences like |
Nowadays we only use the cuda primary context in DPC++, hence we can refactor
CublasScopedContextHandler
.unordered_map<context, handle>
to beunordered_map<device, handle>
,sycl::detail::contextSetExtendedDeleter
is not necessary. It provides additional feature tying the lifetime of cublasHandle to the corresponding sycl queue but it makes the code very complicated (with atomics etc.) and uses the detail namespace which is not ideal.cublas_handle.hpp
was modified such that it is not templated anymore. Both DPC++ and AdaptiveCpp versions use the same mapping so that could be simplified. Also, we need to set the correct context in the custom destructor in order to destroy cublas handles so the native type was necessary there.Side note:
We can definitely remove the dependency on UR headers just by changing the templated types to the native types.
test_main_blas_rt.txt
test_main_blas_ct.txt