Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move lapack_info_check inside of onemkl_cusolver_host_task #238

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

JackAKirk
Copy link
Contributor

Description

In some cases cuSolver operations can return a successful error code while failing. The previous implementation of this check is done via SYCL and requires the CPU to wait until the cuSolver function completes. This is not great for performance as it prevents the user for issuing more work to the queue until it has received a response from oneMKL.

This PR moves the check inside the host task.

@AidanBeltonS I've updated the cusolver_batch.cpp cases too, can you check it is OK?

@@ -571,7 +569,6 @@ inline sycl::event getrf_batch(const char *func_name, Func func, sycl::queue &qu
// Allocate memory with 32-bit ints then copy over results
std::uint64_t ipiv_size = stride_ipiv * batch_size;
int *ipiv32 = (int *)malloc_device(sizeof(int) * ipiv_size, queue);
int *devInfo = (int *)malloc_device(sizeof(int) * batch_size, queue);

auto done = queue.submit([&](sycl::handler &cgh) {
int64_t num_events = dependencies.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependencies loop can be simplified in the batch with depends_on_events(cgh, dependencies);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one. I've made these changes.

@@ -581,16 +578,17 @@ inline sycl::event getrf_batch(const char *func_name, Func func, sycl::queue &qu
onemkl_cusolver_host_task(cgh, queue, [=](CusolverScopedContextHandler &sc) {
auto handle = sc.get_handle(queue);
auto a_ = reinterpret_cast<cuDataType *>(a);
auto devInfo_ = reinterpret_cast<int *>(devInfo);
auto scratchpad_ = reinterpret_cast<cuDataType *>(scratchpad);
auto ipiv_ = reinterpret_cast<int *>(ipiv32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also remove this unnecessary reinterpret cast as ipiv32 is already an int *

@AidanBeltonS
Copy link
Contributor

LGTM!

@JackAKirk
Copy link
Contributor Author

@ericlars Could we get a review of this from someone? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants