-
-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync trainer state with evaluators #2733
Comments
Many handlers/metrics provide a |
Can I work on this? I am pretty new to this |
@jalajk24 right now it is still under discussions whether we need to work on something here. Do you have any ideas or suggestions on the topic ? |
I am proposing a new API function for
@vfdev-5 does this makes sense? It can be called like |
The core question of the issue is whether to abstract a |
Hey @louis-she ,I guess the API can be helpful to compare the performances of two or more different training methods, also it can help in training of ensemble models. I have been working in the space of the GANs and adversarial training and I have noticed that sometimes you need to combine two training methods to get better results, so this may be a helpful addition in the |
@guptaaryan16 can you please give a concrete example of what you are talking about ? |
Sure @vfdev-5 , I think it will be mostly useful for hyperparameter tuning and testing of variation of results to make the training easier; like reducing the number of epochs and testing the different training methods. For instance, I can share a small thing happened when I was training a model using Cifar-10 and Gaussian Augmentation training(https://arxiv.org/abs/1902.02918) to measure the Average Certified Radius(ACR) of the model using Randomized smoothing. There I noticed that if I included a PGD adversarial training(https://arxiv.org/pdf/1706.06083.pdf) in addition to the Gaussian Augmentation training I can get a very high ACR, but to get the specific hyper parameters you need to get the current training epoch and see where the evaluators are getting best results. So it may be helpful to have this API but you can also get the specific epoch without having this . |
@guptaaryan16 thanks for details but I was wondering more about code details. Can you provide some code to highlight your idea. As for HP tuning and multiple experiments, you can check
I think there is nothing impossible here. I imagine that you have a handler to run validation: best_acr = 0.0
def run_validation():
evaluator.run(val_data)
metrics = evaluator.state.metrics
if metrics["ACR"] > best_acr:
best_acr = metrics["ACR"]
current_epoch = trainer.state.epoch
# save locally a bundle:
fp = f"/path/to/output/{current_epoch}_best_acr.pt"
torch.save({
"best_acr": best_acr,
"epoch": current_epoch,
"model": model.state_dict(),
...
}) |
yes @vfdev-5 I do not have the specific code for that but I can imagine that it was written along the same lines(that project did not use ignite ) |
🚀 Feature
There can be use-cases when we would like to get trainer's epoch/iteration or/and other items from
trainer.state
. Let's propose an API such that we could get easily trainer's state from evaluator.Context : https://discuss.pytorch.org/t/get-current-epoch-inside-process-function-of-evaluator/162926
The text was updated successfully, but these errors were encountered: