DADS reward implementation #13

slee01 · 2021-10-31T02:10:18Z

Thank you for sharing your great code :)

I think I found that the reward function is a little different from what was defined in the paper(iclr2020):

dads/unsupervised_skill_learning/dads_agent.py

Lines 142 to 144 in abc37f5

    
           # final DADS reward 
        
           intrinsic_reward = np.log(num_reps + 1) - np.log(1 + np.exp( 
        
               np.clip(logp_altz - logp.reshape(1, -1), -50, 50)).sum(axis=0))

As far as I understand, the first reward term defined in eq. 6 of the paper is log q(s'|s,z) - log(\sum_{i=1}^{L}{q(s'|s,z_i)}). But the reward in this repo is defined as \sum_{i=1}^{L} {log q(s'|s,z) - log q(s'|s,z_i)} with numpy's broadcasting functionality. May I ask if I misunderstood or if there is any practical technique I'm missing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DADS reward implementation #13

DADS reward implementation #13

slee01 commented Oct 31, 2021

DADS reward implementation #13

DADS reward implementation #13

Comments

slee01 commented Oct 31, 2021