Skip to content

Official Code for "Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning: A Systematic Study"

License

Notifications You must be signed in to change notification settings

uds-lsv/intermediate-task-selection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploring Task Selection for Intermediate-Task Transfer Learning

| Installation | Datasets | Prompt Transfer | Task Selection |

This repository contains the implementation of "Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning: A Systematic Study" with prompt tuning, and prompt transfer.

Installation

Python Version

  • Python >= 3.8

Environment

Create an environment from file and activate the environment.

conda create -n intermediate-task-selection
conda activate intermediate-task-selection

Install denpendencies first.

. install_dependencies.sh

We suggest to install editable mode. This works for most of case.

pip install -e .

Datasets

We train 23 tasks across NLI, paraphrase detection, semantic similarity, question answering, reading comprehension, grammatical acceptability, word sense disambiguation, sentiment analysis, and coreference resolution.

We split these datasets into those with less than 1K training samples as target tasks and the those with more than 1K samples as source tasks.

In total, we select 13 source tasks and 10 target tasks. To run the following tasks, please use the provided names below:

Source Tasks

Here, source tasks paired with their respective evaluation metrics are the dataset with richer annotation numbers. We train 13 source tasks with prompt tuning, then initialize these pre-trained prompt weights for continual training on target tasks. We apply a learning rate of 5e-1 for all source tasks.

Dataset Metrics
mnli accuracy
mnli_mismatched accuracy
mnli_matched accuracy
qqp accuracy, f1
qnli accuracy
superglue-record f1, em
cxc pearson, spearmanr
squad f1, em
drop f1, em
sst2 accuracy
winogrande accuracy
hellaswag accuracy
superglue-multirc f1, em
cosmosqa accuracy
race accuracy

Target Tasks

We train on 10 target task as baseline and transfer 13 source prompt to each target task. We apply learning rate of 2.

Dataset Metrics
boolq accuracy
cola matthews_correlation
stsb pearson, spearmanr
superglue-wic accuracy
cr accuracy
mrpc accuracy, f1
rte accuracy
superglue-wsc accuracy
superglue-copa accuracy
cb f1_multiclass, accuracy

Prompt Transfer

This section performs intermediate task transfer with prompt tuning. This involves:

  1. Prompt tuning that initialized from the sampled embedding's token.
  2. Prompt transfer that initialized from pretrained prompt trained on the source task.

To reproduce the results of Table 5.2 (Effect of prompt transfer), you need to execute both scripts for prompt tuning and the prompt transfer one.

We applied the same configuration for both prompt tuning and prompt transfer.

You can find our example scripts under seq2seq/scripts. These scripts demonstrate prompt tuning, prompt transfer, and task selection using the configuration files located in seq2seq/configs. To execute the models, please first do:

cd intermediate-task-selection/seq2seq

Configuration File and Arguments

When training the model with prompt tuning, all code for training requires a configuration file defined in configs folder. Our implementation manages files according to the fine-tuning methods and model type, e.g. configs/prompt_tuning_tokens_config/t5-base. Feel free to create your own directory.

Note that we offer partial arguments in our main Python script (run_seq2seq.py) to enable flexible configuration of hyperparameters. These partial arguments facilitate sweeping over different testing values, overriding the arguments specified in the configuration files.

Run Prompt Tuning

To perform prompt tuning on both the source and target tasks for a single task, execute the following command. This script trains prompt tuning with initialization from the language model's vocabulary method.

When dealing with target tasks, we set the learning rate to 2, while for the source tasks, a learning rate of 5e-1 is employed.

To run prompt tuning, please run the command:

. script/prompt_tuning_tokens_init.sh

To get the average performance, please run:

python dev/get_prompt_scores.py

Run Prompt Transfer

The commands train prompt tuning with initialization from pretrained prompt weights. Please specify your prompt checkpoints to CKPTS.

. script/tf_prompt_tuning.sh

To get the relative performance, please run:

python dev/get_transfer_scores.py

Creating Ranking Files

After training all models on target tasks, you can create a ranking of empirical prompt transfer as ground-truth reference for evaluating the prediction of task embedding. For evaluating all 13 source tasks transferring to RTE, please do:

python dev/get_transfer_ranking.py \
	--tgt_task=rte \
	--output_dir=PATH_TO_YOUR_DIR/spot_eval/transfer_ranking

We save the result file in the --output_dir and require it for evaluation ranking. Each file is named as eval_tgt-TASKNAME_seed-VALUE, which contains prompt tuning's and prompt transfer's performances and ranking of intermediate tasks sorted by prompt transfer.

You can also run the script once you have done all the training jobs.

. scripts/run_transfer_ranking.sh

Task Selection

In order to evaluate the transferability of a source task to a given target task, one would need to run prompt tuning on all tasks. We provides code for estimating the transferability via vocab similarity and task embedding upon prompt weight.

  • vocab similarity

vocab similarity estimates the overlapping of two vocaublaries. Please run:

. scripts/run_vocab_sim.sh
  • task embeddings

With our training scripts, we save the prompt weight along in the file prefix_shared.bin. Through all task embedding experiments, the weights are calculated for the task embeddings.

Except for prompt similarity (feature_mean), we provide additional constructions for task embeddings. The following values are supported for --task_embedding_type argument: feature_mean, flatten, unigram, bigram, max_pairwise.

. scripts/get_prompt_similarity.sh

We save the predicted ranking file eval_tgt-TASKNAME_seed-VALUE.json in the --output_dir.

Evaluation on Task Selection

We evaluate using metrics such as ndcg and regret_at_k, which are supported by the --method argument. To assess the task selection methods (random, size-based, text embedding-based, or task embedding-based), both a prediction file and a reference file are necessary.

python dev/get_ndcg.py \
	--pred_dir=PATH_TO_YOUR_DIR \
	--target_dir=PATH_TO_YOUR_DIR \
	--output_dir=PATH_TO_YOUR_DIR \
	--method=ndcg

You can replace ndcg with regret_at_k and top_k_performance.

For evaluation prediction of data size and random approaches, you can directly pass boolean arguments as follows:

python dev/get_ndcg.py 	--ranking_by_random

For data size method, please do:

python dev/get_ndcg.py 	--ranking_by_size

Fine-tuning methods and adding tasks

This repo is developed based on COMPACTER and contains the implementation of recent parameter-efficient fine-tuning methods. For full fine-tuning, please run:

. scripts/baseline.sh

Other parameter fine-tuning methods can be found in the scripts folder (Adapter, AdapterDrop, Low-Rank, BitFit, Compacter, Compacter++, PHM-Adapters, Intrinsic-SAID). Please check scripts for detail

If you wish to add a new task, you will need to create a new dataset class in /data/{tasks,postprocessors}.py and its corresponding configuration file. For example, when running a task, a configuration file such as prompt_tuning_tokens_config/t5-base/prompt_tuning_tokens_init_boolq.json is required.

Contact Information

For help or issues using our code, please submit a issue or contact to [email protected]

About

Official Code for "Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning: A Systematic Study"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages