Background: Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. Objective: The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. Design: Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement: Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated. Results: The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. Conclusions: For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.
Journal of the American Medical Informatics Association
School of Computer Science

El Emam, K. (Khaled), Dankar, F.K. (Fida Kamal), Issa, R. (Romeo), Jonker, E. (Elizabeth), Amyot, D. (Daniel), Cogo, E. (Elise), … Bottomley, J. (Jim). (2009). A Globally Optimal k-Anonymity Method for the De-Identification of Health Data. Journal of the American Medical Informatics Association, 16(5), 670–682. doi:10.1197/jamia.M3144