Insight Data Science 2017 Remote Program project developing tools for automagically analyzing privacy policy content.
Consumers sign billions of pages of legal contracts every year with a few simple clicks without reading or understanding the contents of those contracts. Website and web service privacy policies are generally opaque to the public. BeforeIAccept uses natural language processing and machine learning to digest privacy policies for consumers.
BeforeIAccept is trained on the OPP-115 corpus from UsablePrivacy.org. Please see the paper listed below for details on the development and content of the dataset.
The creation and analysis of a website privacy policy corpus. Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kanthashree Mysore Sathyendra, N. Cameron Russell, Thomas B. Norton, Eduard Hovy, Joel Reidenberg, and Norman Sadeh. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, August 2016.
3-Clause BSD License. See LICENSE.txt.