You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have 2 separate questions which I could not find an answer yet, so post it here hope someone can answer:
When doing TRT conversion from torchscript to trt. Would nav call polygraphy surgeon sanitize to do things like constant folding? This is helpful when dealing with larger size models. It seems nav underlying uses polygraphy but want to check if it also sanitizes.
There's an alpha release for TRT-LLM tool which combines TensorRT and FasterTransformer. Is this tool on your roadmap to support it? As a user for nav, I like the simpler interface it provides compared to do compilation/conversion in multiple steps. It would be great to see future support related to LLM.
The text was updated successfully, but these errors were encountered:
Currently, we do not use the polygraphy surgeon, but it is on our roadmap, and we're aiming to support it even as early as next month.
Our plans for supporting TRT-LLM are still under discussion, but we're definitely interested in integrating it where Navigator can provide assistance. We're also open to suggestions – if you have any specific use cases where you see Model Navigator being helpful, please feel free to share.
Thank you for the reply. Regarding 1 it would be nice to have that as some of our models are small enough (can be loaded on a 16GB GPU) but during compilation it went OOM, and it seems sanitize scripts would help reduce the onnx model size.
We also tried out trt-llm. It looks promising when we compare its benchmarking to fastertransformer, since it provides the latest mha attention optimization techniques. Their engine building requires many steps and parameters. I think model-navigator might be able to package all the scripts and provide a higher-level user-friendly API on top of trt-llm.
These are a few suggestions as an external user, but you should have more latest info than me on where they're heading
I have 2 separate questions which I could not find an answer yet, so post it here hope someone can answer:
When doing TRT conversion from torchscript to trt. Would nav call
polygraphy surgeon sanitize
to do things like constant folding? This is helpful when dealing with larger size models. It seems nav underlying uses polygraphy but want to check if it also sanitizes.There's an alpha release for TRT-LLM tool which combines TensorRT and FasterTransformer. Is this tool on your roadmap to support it? As a user for nav, I like the simpler interface it provides compared to do compilation/conversion in multiple steps. It would be great to see future support related to LLM.
The text was updated successfully, but these errors were encountered: