Skip to content

Commit

Permalink
chore: update readme to keeptrack the flow
Browse files Browse the repository at this point in the history
  • Loading branch information
honghanhh committed Oct 18, 2024
1 parent 93d37f9 commit 588c312
Showing 1 changed file with 34 additions and 0 deletions.
34 changes: 34 additions & 0 deletions lib/questions_eval/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# How to evaluate the performance of generated synthetic data?

## Datasets

@Simon updating ...

## Metrics

| Term | Definition | Formula | Interpretation |
|---|---|---|---|
| Coverage score | How comprehensively the summary covers the content of the original document. | 100 − X, where X is the percentage of document generated questions that receive an "IDK" (I Don’t Know) response based on the summary. | A higher coverage score indicates that the summary captures more of the original details and is less generic. |
| Conformity score | Whether the summary avoids contradicting the document. | It is derived by identifying the percentage of questions for which the summary’s answer is "NO" and the document’s is "YES", or vice versa, and computing 100 − X. | A higher conformity score signifies a greater alignment between the summary and the document. |
| Consistency score | The level of non-hallucination, is based on the accuracy of factual information in the summary as compared to the document. | 100 − X, where X is the percentage of summary derived questions that are answered with an "IDK" based on the document, indicating factual discrepancies. | A higher consistency score suggests that the summary is more factual and contains fewer inaccuracies or fabrications. |

## Implementation

Command
```
python run.py -m model=llama3.1-405b-local samples=10 num_questions=5
```

Scripts:
```
cd ./open-nlp/lib/questions_eval
bash/experiments/super_tiny.sh
```

## References
- [SemScore: Evaluating LLMs with Semantic Similarity](https://huggingface.co/blog/g-ronimo/semscore)
- [MEDIC: Towards a Comprehensive Framework for evaluating LLMs in Clinical Applications](https://arxiv.org/pdf/2409.07314)

## Contributors
- [@simonmeoni](https://github.com/simonmeoni)
- [@honghanhh](https://github.com/honghanhh)

0 comments on commit 588c312

Please sign in to comment.