🐊 Large Language Models as Instructors: A Study on Multilingual Clinical Entity Extraction

👁️ Description

This project is the codebase used for our weak supervision experiments using E3C dataset annotated with InstructGPT-3 and dictionary.

Considering the E3C dataset, we have compared the models trained with both annotations on the whole language in monolingual and multilingual contexts.

🚀 Quick start

poetry install

Train model with default configuration

Train model with chosen experiment configuration from configs/experiment/

python weak_supervision/train.py experiment={experiment_name}

You can override any parameter from command line like this:

python weak_supervision/train.py trainer.max_epochs=20 data.batch_size=64

To deploy the project run:

docker build -t weak_supervision .
docker run -v $(pwd):/workspace/project -e WANDB_API_KEY=$WANDB_API_KEY --gpus all -it  --rm weak_supervision zsh

⚗️ Experiments

here is a description for each experiment consigned in the Makefile. You see the configuration inside hydra folder configs/experiment:

layer_2_comparison: Performance comparison between two encoder models trained with weak supervision dictionary and InstructGPT-3 annotations on layer 2.
layer 2 validation comparison: Same but comparison between manual and InstructGPT-3 annotations on layer 2 subset.
layer 2 blended comparison: Same experience as layer_2_comparison but for each dataset we add a slight quantity of manual annotation.
layer 2 blended methods: we experiment different ratio to blend the dictionary and InstrucGPT-3 annotations.
layer 2 xlm: we trained model with all the data available (all the languages are used) for layer 2. We compare with weak supervision dictionary and InstructGPT-3 annotations.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
tests		tests
weak_supervision		weak_supervision
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐊 Large Language Models as Instructors: A Study on Multilingual Clinical Entity Extraction

👁️ Description

🚀 Quick start

⚗️ Experiments

About

Releases

Packages

Languages

License

arkhn/bio-nlp2023

Folders and files

Latest commit

History

Repository files navigation

🐊 Large Language Models as Instructors: A Study on Multilingual Clinical Entity Extraction

👁️ Description

🚀 Quick start

⚗️ Experiments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages