Unifying Privacy Policy Detection Toolchain

This is the accompanying repository for the "Unifying Privacy Policy Detection" paper published in the Privacy Enhancing Technologies Symposium (PETS) 2021.

The aim of this project is to support privacy policy researchers with a unified solution for creating privacy policy corpora based on currently available best-practices.

At the moment, we have uploaded the source code as a proof of concept, according with the trained classifiers and vectorizers in English and German. We are planning to provide a pip package as soon as possible in order to ease the application of this toolchain.

Explanation

The toolchain consists of five steps:

Finding potential privacy/cookie policies on websites
Text-from-HTML extraction
Language detection
Key phrase extraction
Classification

Structure of the repository

The current structure of the repository is depicted as follows:

.
|-- LICENSE
|-- README.md
|-- privacy_policy_link_detection
|   |-- README.md
|   |-- custom_command_find_privacy_policies.py
|   `-- demo_privacy_policy_download.py
`-- privacy_policy_toolchain
    |-- code
    |   |-- ppt.py
    |   `-- resources
    |       |-- VotingClassifier_soft_de.pkl
    |       |-- VotingClassifier_soft_en.pkl
    |       |-- trained_vectorizer_de.pkl
    |       `-- trained_vectorizer_en.pkl
    |-- data
    |   `-- privacy_policies
    |-- environment.yml
    |-- feature_list
    |   |-- feature_list_de.txt
    |   `-- feature_list_en.txt
    |-- logs
    |   `-- language_analysis
    `-- results
        `-- classification

The folder resources contains the trained models and the vectorizers for both English and German.

Paper

Henry Hosseini, Martin Degeling, Christine Utz, Thomas Hupperich. "Unifying Privacy Policy Detection." PETS 2021.

Contact

Henry Hosseini: [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unifying Privacy Policy Detection Toolchain

Explanation

Structure of the repository

Paper

Contact

About

Releases 2

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
privacy_policy_link_detection		privacy_policy_link_detection
privacy_policy_toolchain		privacy_policy_toolchain
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

ITSec-Uni-Muenster/Unifying-Privacy-Policy-Detection

Folders and files

Latest commit

History

Repository files navigation

Unifying Privacy Policy Detection Toolchain

Explanation

Structure of the repository

Paper

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages