Efficiently Supporting Online Privacy-Preserving Data Publishing in a Distributed Computing Environment

There has recently been an increasing need for the collection and sharing of microdata containing information regarding an individual entity. Because microdata typically contain sensitive information on an individual, releasing it directly for public use may violate existing privacy requirements. Th...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Jong Wook Kim
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
T
Acceso en línea:https://doaj.org/article/3a8ed715e88b46d18d7276989ca88e93
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:There has recently been an increasing need for the collection and sharing of microdata containing information regarding an individual entity. Because microdata typically contain sensitive information on an individual, releasing it directly for public use may violate existing privacy requirements. Thus, extensive studies have been conducted on privacy-preserving data publishing (PPDP), which ensures that any microdata released satisfy the privacy policy requirements. Most existing privacy-preserving data publishing algorithms consider a scenario in which a data publisher, receiving a request for the release of data containing personal information, anonymizes the data prior to publishing—a process that is usually conducted offline. However, with the increasing demand for the sharing of data among various parties, it is more desirable to integrate the data anonymization functionality into existing systems that are capable of supporting online query processing. Thus, we developed a novel scheme that is able to efficiently anonymize the query results on the fly, and thus support efficient online privacy-preserving data publishing. In particular, given a user’s query, the proposed approach effectively estimates the generalization level of each quasi-identifier attribute, thereby achieving the <i>k</i>-anonymity property in the query result datasets based on the statistical information without applying <i>k</i>-anonymity on all actual datasets, which is a costly procedure. The experiment results show that, through the proposed method, significant gains in processing time can be achieved.