-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Issues: NVIDIA/Megatron-LM
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[BUG] 0.9.0 release version got param_gather_handle error with 3d parallel
#1292
opened Nov 19, 2024 by
SeunghyunSEO
[QUESTION] How to convert torch_dist format checkpoint to torch format?
#1291
opened Nov 19, 2024 by
zhangyilalala
Where can I download the tokenizer for the model mcore-llava-mistral-7b-instruct-clip336-pretraining?
#1281
opened Nov 11, 2024 by
herolxl
[QUESTION] is there any restriction to use allgather with moe_expert_capacity_factor?
#1277
opened Nov 7, 2024 by
Louis-J
[BUG] TP-comm-overlap bug when replacing
TELayerNormColumnParallelLinear
into TEColumnParallelLinear
.
#1275
opened Nov 6, 2024 by
wplf
[BUG] The
cached_loss_mask
maybe modified unexpectedly in GPTDataset?
#1269
opened Nov 1, 2024 by
shmily326
[QUESTION] How to use loader_mcore and why it requires torch distributed
#1266
opened Oct 29, 2024 by
KookHoiKim
[ENHANCEMENT] Enabling LR scaling for a specific layer (ex. down-projection...) during pretraining
#1263
opened Oct 28, 2024 by
dhia680
[ENHANCEMENT] Add layer name in a layer to improve code debugging
#1198
opened Oct 4, 2024 by
rybakov
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.