Protests are an important and well researched aspect of political behavior, making measurement validity crucial. Unlike conventional forms of behavior such as voting, protest can be difficult to observe. Most studies rely on news articles for event coding, introducing a possible selection bias. Validation is often done by comparing the characteristics of different newspaper measures or using independent sources. In this paper, I benchmark a manually and a partly automatically coded dataset from the PolDem project against a unique, large government dataset covering all extreme right demonstrations and rallies in Germany from 2005 to 2020. Coverage of events in newspapers mainly depends on the region and the number of participants. Conversely, machine learning can provide a good confidence estimate about the possible misdetection of an event. The results have important implications for the study of protests. Researchers should carefully assess the advantages and shortfalls of news media based datasets.
- Cornelius Erfort
Post-doctoral Researcher
University of Witten/Herdecke
Department of Philosophy, Politics, and Economics
Alfred-Herrhausen-Straße 50, 58455 Witten, Germany
[email protected]
ORCID: 0000-0001-8534-7748
This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 390285477/ GRK 2458.