This repository contains code for using the
Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition—In proceedings of EMNLP 2024 (Findings).
Note: Despite being proposed specifically for visual storytelling, this method is generalizable and can be extended to any task involving model-generated outputs with corresponding references.
Install python (e.g., version 3.11
) and other dependencies provided under requirements.txt, e.g., using:
pip install -r requirements.txt
For generating stories using the models and settings proposed in this work, refer to this documentation.
For computing visual grounding scores (G
), checkout the GROOViST repository.
For computing coherence (C
) and repetition (R
) scores, use the following utility adapted from RoViST. E.g.,
python evaluate/eval_C_R.py -i ./data/stories/vist/gt_test.json -o ./data/scores/vist/gt_test
Note 1: Download the pre-trained ALBERT model from here and place it under the data/
folder.
Note 2: Requirements differ—checkout the evaluate/requirements file.
Similar to Step 1A
.
For obtaining aggregate
python dHM.py -d VIST
🔗 If you find this work useful, please consider citing it:
@inproceedings{
EMNLP 2024 Findings (to appear)
}