Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the unit testing framework instead of the MGMN testing framework for _test_te.yaml #560

Open
yhtang opened this issue Feb 16, 2024 · 0 comments
Assignees

Comments

@yhtang
Copy link
Collaborator

yhtang commented Feb 16, 2024

Currently, the TE tests as implemented in _test_te.yaml consists of two parts:

  1. unit testing on V100 only
  2. multi-GPU testing on A100 via SLURM jobs.

This seems to make the testing setup unnecessarily complex.

We have a V100/A100 unit testing framework as exemplified in _test_jax.yaml, which allows the same unit testing/multi-GPU test logic to be matrices over GPU types as well as scaling from 1-8 GPUs.

@terrykong @ashors1 would you be able to refactor the TE to follow the JAX unit testing framework?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants