Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relationship between channels C and bbp #5

Open
chenxianghu opened this issue May 26, 2018 · 20 comments
Open

Relationship between channels C and bbp #5

chenxianghu opened this issue May 26, 2018 · 20 comments

Comments

@chenxianghu
Copy link

you said C=8 channels - (compression to 0.072 bbp) in Results section
I don't know the relationship between C and bbp, can you explain to me?

We compare this image compression effect with BPG
png image -> decode -> our encode -> our quantize Result: quantized representation

the bbp comparation objects are quantized representation and encoded BPG,
same bbp which decoded image quality is better or same decoded image quality which bbp is lower, right?

@Justin-Tan
Copy link
Owner

If you read the original paper (https://arxiv.org/pdf/1804.02958.pdf), the upper bound on the bitrate is given by Eq. 5. Here dim (w_hat) is given by the number of channels C.

@chenxianghu
Copy link
Author

I want to test the performance of this Model, so i modify single_plot function like this and then run your compress.py, there are two steps:

  1. original image -> quantized representation ----spend 623ms
  2. quantized representation -> reconstructed image --- spend 115 ms
    is my test method right? and what's the performance measured data at your size?

If i want to realize End-to-End image compression, i think i should save the quantized representation as file at sender side , and recover file to reconstructed image at receiver side, sender and receiver both should load the well-trained model, is my thinking right?

def single_plot(epoch, global_step, sess, model, handle, name, config, single_compress=False):

    real = model.example
    gen = model.reconstruction
    zz = model.z
    start = time.time()
    # Generate images from noise, using the generator network.
    #r, g = sess.run([real, gen], feed_dict={model.training_phase:True, model.handle: handle})
    r,**z** = sess.run([real,zz], feed_dict={model.training_phase: True, **model.handle: handle**})
    print("encoder + quantizer spend  Time: {:.3f} s".format(time.time() - start))
    print('z shape:', z.shape)
    #print('z result:',z)
    start = time.time()
    **g** = sess.run(gen, feed_dict={model.training_phase: True, **model.z: z**})
    print("generator spend  Time: {:.3f} s".format(time.time() - start))

@chenxianghu
Copy link
Author

chenxianghu commented May 29, 2018

I test your pre-trained mode, the test data:
1.original image -> quantized representation ----about 1.5s
2.quantized representation -> reconstructed image --- about 1s

the test result is different from mine, because my input image size is 256x256

I also test your model effect using different images:

  1. image from leftImg8bit/train ----effect is good
  2. image from leftImg8bit/test ----effect is worse than image from train dir
  3. image from internet ---effect is terrible

can this model can be used for compressing arbitrary images?

@Justin-Tan
Copy link
Owner

Justin-Tan commented May 29, 2018

If you want to compress arbitrary images, train on a large dataset of natural images like ImageNet or the ADE20k dataset. The pretrained model was only trained on the Cityscapes dataset, which is a collection of street scenes from Germany and Switzerland.

The distribution of images in ImageNet/ADE20k will be more diverse and so the model will probably take longer to converge. To train on ADE20k download the dataset from the link in the readme and pass the --ds ADE20k flag:

python3 train.py -ds ADE20k <args>

To train on ImageNet you will have to write your own data loader. I think it will work with the default setup, but you will have to check this.

@chenxianghu
Copy link
Author

First I train my model using cityscapes 60 epochs, and then continue to train this model using ADE20k 10 epochs, i find the compress effect become wrose.Maybe the model doesn't converge. I think it is hard to compress arbitrary image using one model.

@Justin-Tan
Copy link
Owner

Don't train using Cityscapes initially, just train using ADE20k. Make sure you pull the latest version, I fixed a couple of errors in the code.

It should take a long time for the model to converge using ADE20k, the authors trained it for 50 epochs originally to get the results in the paper.

@chenxianghu
Copy link
Author

OK, this morning i also read the paper, i find i should train ADE20k from ZERO, but one error occured
it seems that the shape of self.w_hat and Gv didn't match, so i disable sampling noise by adding a condition like below, now it is working well and under training. Thank you!

        if config.sample_noise is True and dataset != 'ADE20k':
            print('Sampling noise...')
            # noise_prior = tf.contrib.distributions.Uniform(-1., 1.)
            # self.noise_sample = noise_prior.sample([tf.shape(self.example)[0], config.noise_dim])
            noise_prior = tf.contrib.distributions.MultivariateNormalDiag(loc=tf.zeros([config.noise_dim]), scale_diag=tf.ones([config.noise_dim]))
            v = noise_prior.sample(tf.shape(self.example)[0])
            Gv = Network.dcgan_generator(v, config, self.training_phase, C=config.channel_bottleneck, upsample_dim=config.upsample_dim)
            print('Gv:', Gv);
            self.z = tf.concat([self.w_hat, Gv], axis=-1)
        else:
            self.z = self.w_hat

@wensihan
Copy link

wensihan commented Jun 12, 2018

I modify the network as you do, but there still have problem with Incompatible shapes: [1,3,688,512] vs. [1,3,683,512] in Line 127 (model.py): distortion_penalty = config.lambda_X * tf.losses.mean_squared_error(self.example, self.reconstruction). Do you have any suggest?

@wensihan
Copy link

@chenxianghu

@chenxianghu
Copy link
Author

the shape of self.example and self.reconstruction should be the same, for cityscapes dataset it should be
[1, 512, 1024, 3], which means [batch_size, height, width, channels]

@wensihan
Copy link

wensihan commented Jun 12, 2018

I use the dataset of ADE20K which only rescale the width to 512px, is there any change to others parameters except for disabling the sample noise? @chenxianghu

@chenxianghu
Copy link
Author

I modify many places:
1)make my own h5 file, only use 200x200 to 975x975 jpeg images in ADE20K(as the same in the paper)
2)resize image to [512,512], not padding or cropping
3)use tf.image.decode_jpeg, not tf.image.decode_png
4)modify Network.dcgan_generator for adapting to [512,512]

I think it is better that you learn some basic knowledge first and then try to train your own model!

@wensihan
Copy link

wensihan commented Jun 12, 2018

@chenxianghu First, thank you very much for your reply, then I still have a question: 200x200 to 975x975 means the images larger or lower than this will be excluded? And then the dataset contains less than 20210 training images, right?

@chenxianghu
Copy link
Author

yes, this is the description of the original paper:

Data sets: We train the proposed method on two popular data sets that come
with hand-annotated semantic label maps, namely Cityscapes [42] and ADE20k
[43]. Both of these data sets were previously used with GANs [12, 33], hence
we know that GANs can model their distribution|at least to a certain extent.
Cityscapes contains 2975 training and 500 validation images of dimension 2048�
1024px, which we resampled to 1024 � 512px for our experiments. The training
and validation images are annotated with 34 and 19 classes, respectively. From
the ADE20k data set we use the SceneParse150 subset with 20 210 training and
2000 validation images of a wide variety of sizes (200�200px to 975�975px), each
annotated with 150 classes. During training, the ADE20k images are rescaled
such that the width is 512px.

@wensihan
Copy link

wensihan commented Jun 12, 2018

I know this, I just puzzles that does this sentence ( 20 210 training and 2000 validation images of a wide variety of sizes (200�200px to 975�975px)) means the 20210 training images' size vary from 200x200 to 975x975?

@chenxianghu
Copy link
Author

i checked some jpeg image's size are not in the range from 200x200 to 975x975
such as ADE20K\images\training\h\hacienda\ADE_train_00008829.jpg is 1024x768

@wensihan
Copy link

Yes, so I am puzzled... Okay, I know this, the dataset of training is smaller than 20210. Thank you~

@Jillian2017
Copy link

@chenxianghu hi, do you add nosie while training the ADE20K, I came across an error result from the mismatch of the noise's dimension and the encoder network's output. So I wonder if we have to change the method of generating noise. What's more, whether your result is acceptable based on the ADE20K dataset, mine is quite poor and the generator is not convergent after almost 40 epoches.

@chenxianghu
Copy link
Author

@Jillian2017 I add nosie while training the ADE20K dataset by modifying Network.dcgan_generator function to adapt 512x512, my generated images quality is also poor after 40 epoches, some generated images even have a strange colorized plaque which doesn't exist in the original images. Do you have this case , I don't know why.

@zhiqiang-zhu
Copy link

@chenxianghu Hi, can you leave a email ? I would like to ask you some questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants