Information flow analysis has largely ignored the setting where the analyst has neither control over nor a complete model of the analyzed system. We formalize such limited information flow analyses and study an instance of it: detecting the usage of data by websites. We prove that these problems are ones of causal inference. Leveraging this connection, we push beyond traditional information flow analysis to provide a systematic methodology based on experimental science and statistical analysis. Our methodology allows us to systematize prior works in the area viewing them as instances of a general approach and to develop a statistically rigorous tool, AdFisher, for detecting information usage.
AdFisher uses machine learning to automate the selection of a statistical test. We use it to find that Google's Ad Settings is opaque about some features of a user's profile, that it does provide some choice on ads, and that these choices can lead to seemingly discriminatory ads. In particular, we found that visiting webpages associated with substance abuse will change the ads shown but not the settings page. We also found that setting the gender to female results in getting fewer instances of an ad related to high paying jobs than setting it to male.
We make our tool, AdFisher, freely available on Github at https://github.com/tadatitam/info-flow-experiments.
The code used for running our experiments and the raw data from them are available below with each publication that details the results. Also see the tool's webpage to learn about the results we found with it.
Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination
Privacy Enhancing Technologies Symposium (PETS) 2015
See the website: here
Read the paper: official version, preprint
Tech report arXiv:1408.6491: version 1, version 2
Download the code and raw data: version 1, version 2
Read additional details here
Information Flow Investigations: Extended Abstract
Abstract for 5-Minute Talk at CSF 2013
Read the paper here