Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateComputeInstance() shows “Not Supported” #65

Open
ytaoeer opened this issue Jul 8, 2023 · 8 comments
Open

CreateComputeInstance() shows “Not Supported” #65

ytaoeer opened this issue Jul 8, 2023 · 8 comments

Comments

@ytaoeer
Copy link

ytaoeer commented Jul 8, 2023

go : 1.18
go-nvml v0.12.0-1
the code is below
image
image
gpu instance can be created and using command line, i can create gi and ci.

@ytaoeer
Copy link
Author

ytaoeer commented Jul 9, 2023

NVIDIA-SMI 535.54.03
Driver Version: 535.54.03
CUDA Version: 12.2
NVIDIA A100-PCIE-40GB

@ytaoeer ytaoeer changed the title CreateComputeInstance() shows “not support” CreateComputeInstance() shows “Not Supported” Jul 9, 2023
@elezar
Copy link
Member

elezar commented Jul 10, 2023

gpu instance can be created and using command line, i can create gi and ci.

Which commands (I assume nvidia-smi) do you use?

@ytaoeer
Copy link
Author

ytaoeer commented Jul 10, 2023

yes, "sudo nvidia-smi mig -i 0 -cgi 19 -C",i can create gi and ci;
but by using nvml go lib, i can just create gi, when using CreateComputeInstance() to create ci, it fail

@elezar
Copy link
Member

elezar commented Jul 10, 2023

Just as a sanity check. Do you have multiple A100 devices available? In the example you show above you access a device with index 1 wherease the nvidia-smi command accesses device 0.

For what it's worth, we use the following flow to create a compute instance:

gi, ret = device.CreateGpuInstance(&giProfileInfo)
if ret != nvml.SUCCESS {
	return fmt.Errorf("error creating GPU instance: %v", ret)
}

ciProfileInfo, ret := gi.GetComputeInstanceProfileInfo(0, 0)
if ret != nvml.SUCCESS {
	return fmt.Errorf("error getting Compute instance profile info for: %v", ret)
}

_, ret = gi.CreateComputeInstance(&ciProfileInfo)
if ret != nvml.SUCCESS {
	return fmt.Errorf("error creating Compute instance: %v", ret)
}

in some of our toolking. Note the call to GetComputeInstanceProfileInfo. I believe you can use 0 for both

@ytaoeer
Copy link
Author

ytaoeer commented Jul 10, 2023

yes,this node has 4 a100.i will try GetComputeInstanceProfileInfo to create ci profileinfo. thanks

@ytaoeer
Copy link
Author

ytaoeer commented Jul 10, 2023

Just as a sanity check. Do you have multiple A100 devices available? In the example you show above you access a device with index 1 wherease the nvidia-smi command accesses device 0.

For what it's worth, we use the following flow to create a compute instance:

gi, ret = device.CreateGpuInstance(&giProfileInfo)
if ret != nvml.SUCCESS {
	return fmt.Errorf("error creating GPU instance: %v", ret)
}

ciProfileInfo, ret := gi.GetComputeInstanceProfileInfo(0, 0)
if ret != nvml.SUCCESS {
	return fmt.Errorf("error getting Compute instance profile info for: %v", ret)
}

_, ret = gi.CreateComputeInstance(&ciProfileInfo)
if ret != nvml.SUCCESS {
	return fmt.Errorf("error creating Compute instance: %v", ret)
}

in some of our toolking. Note the call to GetComputeInstanceProfileInfo. I believe you can use 0 for both

it works. thank you very much. so i guess the problem is creating ComputeInstanceProfileInfo by myself. but according to the source code, the go lib only pass ComputeInstanceProfileInfo.Id to the nvml.
image

@elezar
Copy link
Member

elezar commented Jul 10, 2023

As far as I am aware, there are different versions of the ComputeInstanceProfileInfo struct -- or at least a versioned struct was introduced recently. It may be that constructing it manually created the wrong version and this is what caused it error.

Actually, it may also be that the ProfileID and the info ID are not the same. Could you output the returned ciProfileInfo struct and confirm its .ID value?

@ytaoeer
Copy link
Author

ytaoeer commented Jul 11, 2023

i use a release version go-nvml v0.12.0-1. the output is below
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants