-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing & Benchmark #5
Comments
Thanks for your comments. We have indeed used the TensorFlow benchmarks in our evaluation. We have experimented with both We will commit our TensorFlow experimental setup in due time. Our docker container should already be sufficient to re-run our ResNet-50 experiments. I will document this shortly too. |
Thanks for your prompt answer ;) I will be very happy to provide you back the results I can obtain on 8x Tesla V100. Thanks a lot once again, All the best |
Experiments with ResNet-50 on 8x V100 certainly aligns with our course of action - I am about to give it a go. I am more than happy to share this setup with you. Re: paper improvements, besides the variable update and all-reduce strategy used, what else would you consider missing from the experimental setup? Feel free to ping me with additional comments. I will leave this issue open to inform you as I make progress with your requests. |
thanks for the additional information, much appreciated. If you want we can set up a call that way I can launch experiments with your help on DGX1 & DGX2 (8x Tesla V100 - 16GB - 8 x Tesla V100 - 32GB and 16 x Tesla V100 32GB) Make sure that you use TFRecords imagenet, it allows to maximise throughtput This kind of systems can really be interesting, interesting however you need to compare on all factors: average CPU load: TF.Distributed/Horovod/Crossbow It's quite likely that your approach is much more intense on CPU/RAM for example as your launch more threads ;) So it's important to highlight that point or people may highlight that they cant reproduce because they dont have as much RAM as you. metrics: imgs/sec seems the best performance proxy to measure throughput. I genuinely think this is really interesting, and want to try it as soon as possible. However, the publication felt a little unclear at first read. |
Hi,
Very interesting work. I have some remarks:
In your paper, you oftenly speak about "compared to Tensorflow", what kind of distribution strategy do you talk about ? What kind of strategy: xring ? nccl ?
I guess you have run the experiment on vanilla TF, could we see the code you used to collect these numbers ? Btw. if I understood correctly you used a non official and not the best implementation available, comparing with: https://github.com/tensorflow/benchmarks would be a lot more interesting.
They offer different tf.distributed strategies + Horovod
And btw. if we can have a docker container and a script to tests your results with Corssbow, it could be interesting ;) => RN50 seems a decent benchmark as you pointed out ;)
Thanks for your help and congrats for this interesting project
The text was updated successfully, but these errors were encountered: