We will be using Cloud Shell on Google Cloud Platform Console for all steps below
- Create a Google Cloud (GCP) Account
- GPU quota enabled in GCP
- Open Cloud Shell from GCP Console
- This section is adapted from fastai documentation. Please see acknowledgements and references below.
- We want to use docker, CUDA 10.1 and conda package manager, leveraging on existing Google Cloud Deep Learning VM images
- Please ensure GPU quota is enabled, else please refer to fastai GCP setup link above, Step 3
export IMAGE_FAMILY="pytorch-latest-gpu"
export ZONE="us-west1-b"
export INSTANCE_NAME="drml"
export INSTANCE_TYPE="n1-highmem-4"
gcloud compute instances create $INSTANCE_NAME \
--zone=$ZONE \
--image-family=$IMAGE_FAMILY \
--image-project=deeplearning-platform-release \
--maintenance-policy=TERMINATE \
--accelerator="type=nvidia-tesla-t4,count=1" \
--machine-type=$INSTANCE_TYPE \
--boot-disk-size=200GB \
--metadata="install-nvidia-driver=True"
- Ensure your project has been set in Cloud Shell, if not execute
gcloud config set project <project_id>
- Login to VM from Cloud Shell
gcloud compute ssh --zone=us-west1-b jupyter@drml
- Create new tmux session so that you can leave training running after closing cloud shell
tmux new-session -A -s airsimenv
- Get project code
git clone https://github.com/raymondng76/IRS-Practice-Module-Dev.git
- Create conda environment
sudo /opt/conda/bin/conda create -n airsim python=3.6.7
- Activate conda environment
conda activate airsim
- Install packages:
pip install -r IRS-Practice-Module-Dev/requirements.txt
- Get AirSim:
git clone https://github.com/microsoft/AirSim.git
- Update settings file
rm AirSim/docker/settings.json
cp IRS-Practice-Module-Dev/airsim\ settings/settings.json.nodisplay AirSim/docker/
mv AirSim/docker/settings.json.nodisplay AirSim/docker/settings.json
- Create new session named code:
tmux new-session -A -s airsimenv
cd AirSim/docker
- Execute Build Script, targeting Ubuntu18.04 and CUDA 10.1:
python build_airsim_image.py \
--base_image=nvidia/cudagl:10.1-devel-ubuntu18.04 \
--target_image=airsim_binary:10.1-devel-ubuntu18.04
- Verify docker image built:
docker images | grep airsim
- To use the default Blocks environment run:
./download_blocks_env_binary.sh
- To use a packaged AirSim Unreal Environment, for example Neighborhood:
wget https://github.com/microsoft/AirSim/releases/download/v1.2.0Linux/Neighborhood.zip
- For additional environment that can run on Linux, go to Microsoft AirSim Linux Release 1.2.0
- Unzip to AirSim docker dir
unzip Neighborhood.zip -d .
- Run environment in headless mode:
./run_airsim_image_binary.sh airsim_binary:10.1-devel-ubuntu18.04 Neighborhood/AirSimNH.sh -windowed -ResX=1080 -ResY=720 -- headless
. Replace the environment bash file as required. - Note: in
settings.json
file, no-display mode has also been setup to conserve resources. - Detach tmux session:
ctrl-b ctrl-b d
- Create new session named code:
tmux new-session -A -s code
- Activate conda environment
conda activate airsim
- Ensure python dependencies have been installed. Then execute the below commands
- Execute
gdown 'https://drive.google.com/uc?id=1ciGqwUpfNPQu_Ua7cowU8mDIXOG_9kkf'
- Unzip the weights:
unzip Final_Weights_Models.zip
- Execute
- Copy YOLOv3 model weights to
IRS-Practice-Module-Dev
main directorycp -r Final_Weights_Models/Yolov3_drone_weights/ IRS-Practice-Module-Dev/weights
- (Optional) If you are continuing training copy existing RL model/iteration weights to IRS-Practice-Module-Dev/code
cd ..
- e.g.
cp -r Final_Weights_Models/RDQN_Single_Model/3rd_Iteration/* IRS-Practice-Module-Dev/code
- Execute required model training file in IRS-Practice-Module-Dev/code folder Note this is for initial run. For resuming/continuing run, please see item 6.
cd code
python <model>.py --verbose
- Detach tmux session:
ctrl-b ctrl-b d
- You can close cloud shell and let training to continue
- OPTIONAL: Stackdriver monitoring is recommended to be set up for CPU utilization to ensure that any stop in training can be detected. A threshold of < 40% for 5min is recommended. More information can be found in codelabs or the documentation
- Login to VM with user as jupyter from Cloud Shell
gcloud compute ssh --zone=us-west1-b jupyter@drml
- View code progress or airsim output by typing tmux attach -t where SESSION can be
airsimenv
orcode
- If for some reason (e.g. unexpected errors or VM restart) you need to resume training, use the following command
python <model>.py --verbose --load_model
- Install croc:
curl https://getcroc.schollz.com | bash
- From VM home folder, execute
croc send IRS-Practice-Module-Dev/code/save_model/<model>.h5
for required model - Note the passcode output from previous line and execute command on local:
croc -yes <passcode>