This repository is the official Tensorflow implementation of "Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation".
Jiaxuan You*, Bowen Liu*, Rex Ying, Vijay Pande, Jure Leskovec, Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation
- Install rdkit, please refer to the offical website for further details, using anaconda is recommended:
conda create -c rdkit -n my-rdkit-env rdkit
- Install mpi4py, networkx:
conda install mpi4py
pip install networkx=1.11
- Install OpenAI baseline dependencies:
cd rl-baselines
pip install -e .
- Install customized molecule gym environment:
cd gym-molecule
pip install -e.
There are 4 important files:
run_molecule.py
is the main code for running the program. You may tune all kinds of hyper-parameters there.- The molecule environment code is in
gym-molecule/gym_molecule/envs/molecule.py
. - RL related code is in
rl-baselines/baselines/ppo1
folder:gcn_policy.py
is the GCN policy network;pposgd_simple_gcn.py
is the PPO algorithm specifically tuned for GCN policy.
- single process run
python run_molecule.py
- mutiple processes run
mpirun -np 8 python run_molecule.py 2>/dev/null
2>/dev/null
will hide the warning info provided by rdkit package.
We highly recommend using tensorboard to monitor the training process. To do this, you may run
tensorboard --logdir runs
All the generated molecules along the training process are stored in the molecule_gen
folder, each run configuration is stored in a different csv file. Molecules are stored using SMILES strings, along with the desired properties scores.