Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upsample noise to concatenate with quantied representation #13

Open
Jillian2017 opened this issue Jun 15, 2018 · 6 comments
Open

upsample noise to concatenate with quantied representation #13

Jillian2017 opened this issue Jun 15, 2018 · 6 comments

Comments

@Jillian2017
Copy link

Hi, while reading the paper, it proposed an optionally choice to concatenate the representaion with noise v. In your code, a dcgan_generator is used to generate noise v, while the output size is [,32,64,32]. If the dataset is cityscapes, the input image is resized to 5121024, the feature maps' size is 3264C( C=8 ),so I think the concatenated feature maps-z's size is [,32,64,32+8]=[,32,64,40], questions are:
1> why generate the noise by dcgan network?
2> if the input size changes, for example I train the ADE20k with the input size 512*512, then the noise'size cannot be concatenated to the quantized representation, so we need to change the dcgan network?
3> adding noise will increase the bpp by a large margin, as the its output size is too big.
These are my questions, looking forward to your reply.
Thanks for sharing your code.
generate the noise

@Justin-Tan
Copy link
Owner

Hello,

  1. The DCGAN framework is an arbitrary choice, I have used it before in WGAN with reasonable results, but you may have success in trying other frameworks, e.g. in https://github.com/igul222/improved_wgan_training/blob/master/gan_64x64.py. The authors have not yet specified the details of noise upsampling, so this is my best guess, they may have a more principled method. If you come across any new insights when reading the paper I would definitely like to know more.
  2. The noise is just concatenated to the last channel dimension so this is independent of the [height, width] dimensions of the input image.
  3. I think we can consider the noise vector part of the generator architecture and just save the quantized representation to disk, so the entropy of the quantized vector is still an upper bound on the bpp.

@Jillian2017
Copy link
Author

Very appropriate your timely and generous reply.
For answer1: I also have no idea about the nosie mentioned in the paper, thanks for sharing your idea here.
Have you ever trained these networks on ADK20k, and how are your results? I am doing this part of work, but the results are quite bad.

@Justin-Tan
Copy link
Owner

To produce the images in their paper, the authors train ADE20k for 50 epochs using the semantic label maps as additional information. I haven't tried training on ADE20k yet, because I don't have the compute power to spare right now but I will update the readme when I do. One possible explanation for the disparity in image quality is that the authors incorporate a VGG loss (Section 5.2 in the paper) based on intermediate activations in the VGG network, which I haven't implemented yet.

Of course if the results are bad for general images despite appearing to work well on street scenes, it is highly possible there is a mistake in the implementation somewhere.

@Jillian2017
Copy link
Author

Okay, I got. I will train ADE20k again without semantic map and with VGG loss. Thanks a lot.

@chenxianghu
Copy link

@Justin-Tan We expect your compress effect on ADE20k dataset, can you try it? @Jillian2017 and me try it, but the effect is not so good.Maybe there need some code modification for ADE20k. Thank you very much!

@Justin-Tan
Copy link
Owner

Yes, I think implementing the VGG perceptual loss may help. Unfortunately I am quite busy at the moment, but it is top of the to-do list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants