Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing & Benchmark #5

Open
DEKHTIARJonathan opened this issue Feb 7, 2019 · 4 comments
Open

Testing & Benchmark #5

DEKHTIARJonathan opened this issue Feb 7, 2019 · 4 comments

Comments

@DEKHTIARJonathan
Copy link

DEKHTIARJonathan commented Feb 7, 2019

Hi,

Very interesting work. I have some remarks:

  • In your paper, you oftenly speak about "compared to Tensorflow", what kind of distribution strategy do you talk about ? What kind of strategy: xring ? nccl ?

  • I guess you have run the experiment on vanilla TF, could we see the code you used to collect these numbers ? Btw. if I understood correctly you used a non official and not the best implementation available, comparing with: https://github.com/tensorflow/benchmarks would be a lot more interesting.
    They offer different tf.distributed strategies + Horovod

  • And btw. if we can have a docker container and a script to tests your results with Corssbow, it could be interesting ;) => RN50 seems a decent benchmark as you pointed out ;)

Thanks for your help and congrats for this interesting project

@alexandroskoliousis
Copy link
Collaborator

Thanks for your comments. We have indeed used the TensorFlow benchmarks in our evaluation. We have experimented with both replicated and parameter_server (on the GPU) options for variable updates; and nccl for all-reduce.

We will commit our TensorFlow experimental setup in due time. Our docker container should already be sufficient to re-run our ResNet-50 experiments. I will document this shortly too.

@DEKHTIARJonathan
Copy link
Author

Thanks for your prompt answer ;) I will be very happy to provide you back the results I can obtain on 8x Tesla V100.
I think the paper can be improved by providing more details on what was tested and how. Especially because this system aims to target efficiency and scalibility ;) So this part should be as detailled as possible ;)

Thanks a lot once again,

All the best

@alexandroskoliousis
Copy link
Collaborator

Experiments with ResNet-50 on 8x V100 certainly aligns with our course of action - I am about to give it a go. I am more than happy to share this setup with you.

Re: paper improvements, besides the variable update and all-reduce strategy used, what else would you consider missing from the experimental setup? Feel free to ping me with additional comments.

I will leave this issue open to inform you as I make progress with your requests.

@DEKHTIARJonathan
Copy link
Author

DEKHTIARJonathan commented Feb 12, 2019

thanks for the additional information, much appreciated.

If you want we can set up a call that way I can launch experiments with your help on DGX1 & DGX2 (8x Tesla V100 - 16GB - 8 x Tesla V100 - 32GB and 16 x Tesla V100 32GB)

Make sure that you use TFRecords imagenet, it allows to maximise throughtput
Trying with FP16 is free performance for no accuracy cost in RN50, it will be more intensive on your solution (can it keep up with the load) ?
And having a simple container that we can build/pull easily.
And an exact command to reproduce the same setup as you did.


This kind of systems can really be interesting, interesting however you need to compare on all factors:

average CPU load: TF.Distributed/Horovod/Crossbow
average RAM used: TF.Distributed/Horovod/Crossbow
if multi-node, average network speed: TF.Distributed/Horovod/Crossbow

It's quite likely that your approach is much more intense on CPU/RAM for example as your launch more threads ;) So it's important to highlight that point or people may highlight that they cant reproduce because they dont have as much RAM as you.

metrics: imgs/sec seems the best performance proxy to measure throughput.

I genuinely think this is really interesting, and want to try it as soon as possible. However, the publication felt a little unclear at first read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants