The Problem

Do you ever read the privacy policies for websites you visit or apps you use? If not, you're hardly alone. Privacy policies are often lengthy and difficult to interpret, and only 1 in 5 Americans say they always or often read these obscure documents.

This is problematic because collection and sharing of personal data by companies poses risks. Abuse of personal information has become commonplace, sometimes with dire consequences. Fortunately, consumers are taking notice; for example, a recent survey showed half of Americans have decided not to use a product or service because of privacy concerns.

Multiple states have introduced legislation to protect consumer privacy. The California Consumer Privacy Act (CCPA) was published in June 2018, and became effective in January 2020. However, only half of U.S. security professionals surveyed in October 2019 said their firms were already compliant or would be by the end of the year. The CCPA is itself lengthy, difficult, and open to interpretation -- and there are no existing tools for consumers that assess compliance.

Our mission for this project has been to protect personal information by helping consumers assess privacy policies. We believe that if the 80% of Americans who typically do not review privacy policies began doing so, these empowered consumers would demand better privacy policies.

Our Solution

To achieve our mission, we developed Privacy Screen, a web tool that uses natural language processing and machine learning to help consumers determine whether privacy policies allow them to request deletion of their personal information. We drew our inspiration from a key element of the CCPA, which is highlighted in the excerpt from CA Civ Code § 1798.105(b) below:

Lacking available prelabeled data to train and test our machine-learning model, we began by randomly sampling 400 URLs from the MAPS Policies Dataset. This yielded 307 usable and distinct English-language privacy policies, which we then labeled ourselves. The resulting dataset was moderately imbalanced, with 36% classified positive (i.e., user can request data deletion), and the remaining 64% being negative. We documented the procedures we developed and used on the Labeling Method page of this project website.

Many privacy policies are lengthy, and the specific subsection addressing a user’s right to request deletion makes up a small percentage of the overall text. For positive policies, we also extracted the portion of the policy that indicated users can request deletion of their data. Privacy Screen leverages these excerpts to extract the subsection most pertinent to the user’s right to request deletion. This subsection is then fed to a Support Vector Machine (SVM) model in order to predict whether the extracted text indicates the user can request deletion of their personal information.

After submitting a privacy policy, the user is presented with a simple assessment of compliance, accompanied by the specific portion of the text that was identified by the model as being most pertinent to the user’s right to request deletion. We hope you'll give Privacy Screen a try!


Our Privacy Policy

We DO NOT store or share any data as you use Privacy Screen. However, we DO acknowledge your right to request deletion of your personal data and information!

The Team

We created Privacy Screen for our Capstone project over the course of the Spring 2020 term, completing our journeys through the UC Berkeley Master of Information and Data Science (MIDS) program. We would appreciate any feedback, which can be submitted by emailing us at privacy (d°t) screen (at) lists (d°t) berkeley (d°t) edu. Additional information can be found on the Resources page of this project website.