Skip to content

Datasets of trait-concept pairs for specific trait types in English and Spanish.

Notifications You must be signed in to change notification settings

cardiffnlp/trait-concept-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

trait-concept-datasets

Datasets of trait-concept pairs for specific trait types in English and Spanish derived from McRae and Norms datasets [1,2]

The 5 trait types that are covered in the datasets are: colours, components, materials, size & shape, and tactile. For more details on how the dataset was constructed from the original datasets and details on the translation to Spanish see the original paper:

@inproceedings{and22-dist-hyp,
    title = "Assessing the Limits of the Distributional Hypothesis in Semantic Spaces: {T}rait-based Relational Knowledge and the Impact of Co-occurrences",
    author = "Anderson, Mark and Camacho Collados, Jose",
    booktitle = "To appear in proceedings of *SEM 2022: The Eleventh Joint Conference on Lexical and Computational Semantics",
    month = jul,
    year = "2022",
    address = "Seattle",
    publisher = "Association for Computational Linguistics",
}

There are datasets for each trait type for both the McRae and Norms datasets and both single-labelled and multi-labelled for English. There is only a single-labelled dataset from McRae for Spanish.

[1] Ken McRae, George S. Cree, Mark S. Seidenberg, and Chris McNorgan (2005) Semantic feature production norms for a large set of living and nonliving things. Behavior Research Methods, 37:547–559

[2] Barry Devereux, Lorraine K. Tyler, Jeroen Geertzen, and Billi Randall (2014) The centre for speech, language and the brain (cslb) concept property norms. Behavior Research Methods, 46:1119 – 1127

About

Datasets of trait-concept pairs for specific trait types in English and Spanish.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published