k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY

k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY

Author

Latanya Sweeney

Year
2002
image

k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY

Latanya Sweeney. 2002. (View Paper → )

Consider a data holder, such as a hospital or a bank, that has a privately held collection of person-specific, field structured data. Suppose the data holder wants to share a version of the data with researchers. How can a data holder release a version of its private data with scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful? The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment. A release provides k-anonymity protection if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release. This paper also examines re-identification attacks that can be realized on releases that adhere to k-anonymity unless accompanying policies are respected. The k-anonymity protection model is important because it forms the basis on which the real-world systems known as Datafly, µ-Argus and k-Similar provide guarantees of privacy protection.

We need medical healthcare data for research. Anonymising it is hard. This paper is more than 20 years old but it teaches an important lesson. Re-identification is a real threat - especially when second or third datasets can be combined. We needed to develop stronger techniques. Today newer techniques, including differential privacy, are often preferred for more robust privacy guarantees, especially in high-dimensional datasets and complex use cases.