GitHub - rbiswasfc/kaggle-nbme-3rd-place-solution: 3rd Place solution for NBME - Score Clinical Patient Notes Kaggle competiiton

Please refer to the following documentation to reproduce my solution for the NBME - Score Clinical Patient Notes competition.

If you run into any trouble with the setup/code or have any questions please contact me at [email protected]

HARDWARE

Colab Pro + (High RAM + GPU)

The following specs were used to create the original solution Ubuntu 18.04.5 LTS (Bionic Beaver) with 200GB Disk 8 vCPUs, 56 GB memory 1 x NVIDIA Tesla P100

SOFTWARE

python packages are detailed separately in requirements.txt Python 3.7 CUDA 11.2

It is assumed that the Kaggle API is installed.

Please execute the following command from top level directory i.e. folder containing this file

python convert_deberta_v2_v3_tokenizer.py --python_path <path_to_python_env>

where path_to_python_env is path to folder containing site-packages folder e.g. /Users/rajabiswas/opt/anaconda3/envs/nbme_env/lib/python3.7/. This will convert slow tokenizer to fast tokenizer from DeBERTa V2/V3 models.

MODEL BUILD:

There are two options to produce the solution.

ordinary prediction a) uses binary model in prod-models folder (~8 hours)
retrain models a) expect this to run around two weeks b) trains all models from scratch c) follow this with (1) to produce entire solution from scratch

For option 1:

Please follow the 5 steps detailed in # Section B: NBME Predictions of entry_points.md (Overwrites files in the outputs folder)

For option 2:

Please follow the 5 steps detailed in # Section A: NBME Training of entry_points.md (Overwrites files in the prod-models folder)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
deberta_v2_v3_tokenizer		deberta_v2_v3_tokenizer
inference-code		inference-code
submissions		submissions
training-code		training-code
.DS_Store		.DS_Store
.gitignore		.gitignore
NBME_slides_raja.pdf		NBME_slides_raja.pdf
README.md		README.md
convert_deberta_v2_v3_tokenizer.py		convert_deberta_v2_v3_tokenizer.py
directory_structure.txt		directory_structure.txt
entry_points.md		entry_points.md
model_summary.pdf		model_summary.pdf
requirements.txt		requirements.txt
settings.json		settings.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HARDWARE

SOFTWARE

MODEL BUILD:

For option 1:

For option 2:

About

Languages

rbiswasfc/kaggle-nbme-3rd-place-solution

Folders and files

Latest commit

History

Repository files navigation

HARDWARE

SOFTWARE

MODEL BUILD:

For option 1:

For option 2:

About

Topics

Resources

Stars

Watchers

Forks

Languages