Rogue one : a legitimacy story

The increasing changes in the interaction and activities available in the internet, continuously introduce privacy concerns regarding online presence. The way companies deal with these concerns is by including privacy policies in their websites. Nonetheless, these policies have some problems. The vo...

Full description

Autores:
Chacón Buitrago, Valentina
Tipo de recurso:
Trabajo de grado de pregrado
Fecha de publicación:
2020
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/51468
Acceso en línea:
http://hdl.handle.net/1992/51468
Palabra clave:
Páginas Web
Privacidad de los datos
Seguridad en computadores
Ingeniería
Rights
openAccess
License
http://creativecommons.org/licenses/by-nc-nd/4.0/
Description
Summary:The increasing changes in the interaction and activities available in the internet, continuously introduce privacy concerns regarding online presence. The way companies deal with these concerns is by including privacy policies in their websites. Nonetheless, these policies have some problems. The vocabulary used in the documents is not clear for all users, legal terms are difficult to understand and can be up to 8 pages long meaning that these policies are not accessible and therefore fail to inform users effectively. Studies show that it would take an average person about 200 hours a year to actually read the policy for every unique website visited in a year, not to mention the updated version of policies for sites visited on a repeated basis. Accessing web pages while ignoring privacy policies exposes users to risks regarding the handling of their personal information and the legitimacy of the services offered by web sites. To prevent users from disclosing their private information indiscriminately and reduce the time and effort involved in reading a privacy policy this project develops a model that discloses whether a website is legit or rogue based on the contents of its privacy policy with a 93.2% accuracy. This task falls at the crossroads of Information Retrieval, Natural Language Processing and Supervised Machine Learning algorithms. The project takes a top down approach as the experiments are designed to reduce the number of viable classifiers and configurations at each step, therefore reducing the search space for the setup with highest classification accuracy. There are two stages of experimentation with three experiments in total in which we identify the configuration that provides the best classification accuracy.