Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions related to TRT conversion and TRT-LLM support #26

Open
shixianc opened this issue Aug 28, 2023 · 3 comments
Open

Questions related to TRT conversion and TRT-LLM support #26

shixianc opened this issue Aug 28, 2023 · 3 comments
Labels
enhancement New feature or request non-stale

Comments

@shixianc
Copy link

I have 2 separate questions which I could not find an answer yet, so post it here hope someone can answer:

  1. When doing TRT conversion from torchscript to trt. Would nav call polygraphy surgeon sanitize to do things like constant folding? This is helpful when dealing with larger size models. It seems nav underlying uses polygraphy but want to check if it also sanitizes.

  2. There's an alpha release for TRT-LLM tool which combines TensorRT and FasterTransformer. Is this tool on your roadmap to support it? As a user for nav, I like the simpler interface it provides compared to do compilation/conversion in multiple steps. It would be great to see future support related to LLM.

@ptarasiewiczNV
Copy link
Collaborator

Hi @shixianc ,

Thank you for the questions.

  1. Currently, we do not use the polygraphy surgeon, but it is on our roadmap, and we're aiming to support it even as early as next month.

  2. Our plans for supporting TRT-LLM are still under discussion, but we're definitely interested in integrating it where Navigator can provide assistance. We're also open to suggestions – if you have any specific use cases where you see Model Navigator being helpful, please feel free to share.

Best regards,
Piotr

@shixianc
Copy link
Author

shixianc commented Sep 20, 2023

@ptarasiewiczNV

Thank you for the reply. Regarding 1 it would be nice to have that as some of our models are small enough (can be loaded on a 16GB GPU) but during compilation it went OOM, and it seems sanitize scripts would help reduce the onnx model size.

We also tried out trt-llm. It looks promising when we compare its benchmarking to fastertransformer, since it provides the latest mha attention optimization techniques. Their engine building requires many steps and parameters. I think model-navigator might be able to package all the scripts and provide a higher-level user-friendly API on top of trt-llm.

These are a few suggestions as an external user, but you should have more latest info than me on where they're heading

Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 20, 2023
@jkosek jkosek added enhancement New feature or request non-stale and removed Stale labels Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request non-stale
Development

No branches or pull requests

3 participants