diff --git a/lib/shroom/README.md b/lib/shroom/README.md index c0611ed..3abbafd 100644 --- a/lib/shroom/README.md +++ b/lib/shroom/README.md @@ -1,29 +1,40 @@ # SemEval-2025 Task-3 — Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes +[Description](#description) | [Data](#data) | [Models](#models) | [Results](#results) | +[Leaderboard](#leaderboard) |[Contributors](#contributors) -[Description](#description) | [Data](#data) | [Models](#models) | [Results](#results) | [Leaderboard](#leaderboard) |[Contributors](#contributors) - -In this repo, we provide our solution to solve [Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes](https://helsinki-nlp.github.io/shroom/). +In this repo, we provide our solution to solve +[Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes](https://helsinki-nlp.github.io/shroom/). ## Description -ARKHN aims to detect hallucination spans in the outputs of instruction-tuned LLMs in a multilingual context in Mu-SHROOM, which stands for “Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes”. +ARKHN aims to detect hallucination spans in the outputs of instruction-tuned LLMs in a multilingual +context in Mu-SHROOM, which stands for “Multilingual Shared-task on Hallucinations and Related +Observable Overgeneration Mistakes”. Evaluation metrics: -- intersection-over-union of characters marked as hallucinations in the gold reference vs. predicted as such -- how well the probability assigned by the participants’ system that a character is part of a hallucination correlates with the empirical probabilities observed in our annotators. + +- intersection-over-union of characters marked as hallucinations in the gold reference vs. predicted + as such +- how well the probability assigned by the participants’ system that a character is part of a + hallucination correlates with the empirical probabilities observed in our annotators. ## Data + (updating) ## Models + (updating) ## Results + (updating) ## Leaderboard + (updating) ## Contributors + (updating)