You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
First off, thanks for making this repo as well as all the other awesome work you do making high quality, open source implementations of SOTA ML papers!
I've been having issues getting ConditionalFlowMatcherWrapper.sample to work. For training, I wrapped the ConditionalFlowMatcherWrapper in a simple torch lightning module, and train the model in mixed precision FP16 like so:
I'm using torch 2.0 and A100 GPUs, so attend.py defaults to Flash Attention, which only works with FP16 AFAIK. Training works fine as long as I include the precision=16 part, so no issues here.
However, when I try to load the model, I call .to('cuda:0').half() on both it and the input batch, I get the error: expected scalar type Half but found Float
It seems to be originating from here. Going down the stack trace, it seems like in the Vector Quantizer, here, the inputs to the cdist call are fp32, even though everything further up the stack trace is fp16. Does this have to do with the @autocasthere maybe? I'm not sure what I can do to get around this, since all of my inputs and the model are in fp16.
I also tried not calling half() on the model or data, but then attention.py throws a No kernel available error because it can't do flash attention with FP32. I tried making a slight modification at line 105 to avoid using flash:
if self.flash and q.dtype==torch.float16:
return self.flash_attn(q, k, v, mask = mask)
but then I get a cryptic error cuFFT error: CUFFT_INTERNAL_ERROR.
Any help would be greatly appreciated. Thank you!
The text was updated successfully, but these errors were encountered:
Hello,
First off, thanks for making this repo as well as all the other awesome work you do making high quality, open source implementations of SOTA ML papers!
I've been having issues getting
ConditionalFlowMatcherWrapper.sample
to work. For training, I wrapped theConditionalFlowMatcherWrapper
in a simple torch lightning module, and train the model in mixed precision FP16 like so:I'm using torch 2.0 and A100 GPUs, so
attend.py
defaults to Flash Attention, which only works with FP16 AFAIK. Training works fine as long as I include theprecision=16
part, so no issues here.However, when I try to load the model, I call
.to('cuda:0').half()
on both it and the input batch, I get the error:expected scalar type Half but found Float
It seems to be originating from here. Going down the stack trace, it seems like in the Vector Quantizer, here, the inputs to the
cdist
call are fp32, even though everything further up the stack trace is fp16. Does this have to do with the@autocast
here maybe? I'm not sure what I can do to get around this, since all of my inputs and the model are in fp16.I also tried not calling
half()
on the model or data, but thenattention.py
throws aNo kernel available
error because it can't do flash attention with FP32. I tried making a slight modification at line 105 to avoid using flash:but then I get a cryptic error
cuFFT error: CUFFT_INTERNAL_ERROR
.Any help would be greatly appreciated. Thank you!
The text was updated successfully, but these errors were encountered: