|
Journal of Zhejiang University SCIENCE A
ISSN 1673-565X(Print), 1862-1775(Online), Monthly
2009 Vol.10 No.7 P.952-963
Distributed anonymous data perturbation method for privacy-preserving data mining
Abstract: Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed systems. Meanwhile, data perturbation techniques are comparatively efficient but are mainly used in centralized privacy-preserving data mining (PPDM). In this paper, we propose a light-weight anonymous data perturbation method for efficient privacy preserving in distributed data mining. We first define the privacy constraints for data perturbation based PPDM in a semi-honest distributed environment. Two protocols are proposed to address these constraints and protect data statistics and the randomization process against collusion attacks: the adaptive privacy-preserving summary protocol and the anonymous exchange protocol. Finally, a distributed data perturbation framework based on these protocols is proposed to realize distributed PPDM. Experiment results show that our approach achieves a high security level and is very efficient in a large scale distributed environment.
Key words: Privacy-preserving data mining (PPDM), Distributed data mining, Data perturbation
References:
[1] Agrawal, D., Aggarwal, C.C., 2001. On the Design and Quantification of Privacy Preserving Data Mining Algorithms. Proc. 20th ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems, p.247-255.
[2] Agrawal, R., Srikant, R., 2000. Privacy-preserving data mining. ACM SIGMOD Record, 29(2):439-450.
[3] Ashley, P., Hada, S., Karjoth, G., 2003. The Enterprise Privacy Authorization Language (EPAL 1.1), IBM. Available from: http://www.zurich.ibm.com/security/enterprise-privacy/epal/
[4] Ashrafi, M.Z., Taniar, D., Smith, K., 2003. Towards Privacy Preserving Distributed Association Rule Mining. Distributed Computing-IWDC, p.279-289.
[5] Beaver, D., 1991. Foundations of secure interactive computing. CRYPTO, 1991:377-391.
[6] Bertino, E., Fovino, I.N., Provenza, L.P., 2005. A framework for evaluating privacy preserving data mining algorithms. Data Min. Knowl. Discov., 11(2):121-154.
[7] Chaum, D., Crepeau, C., Damgard, I., 1988. Multiparty Unconditionally Secure Protocols. Proc. 20th Annual ACM Symp. on Theory of Computing, p.11-19.
[8] Chawla, S., Dwork, C., McSherry, F., Smith, A., Wee, H., 2005. Toward Privacy in Public Databases. Theory of Cryptography Conf., p.363-385.
[9] Cramer, R., Damgard, I., Nielsen, J.B., 2001. Multiparty Computation from Threshold Homomorphic Encryption. Proc. EUROCRYPT, p.280-300.
[10] Cranor, L., Langheinrich, M., Marchiori, M., Presler-Marshall, M., Reagle, J. (Eds.), 2002. The Platform for Privacy Preferences 1.0 (P3P1.0) Specification. W3C. Available from: http://www.w3.org/TR/P3P/
[11] CSA (Canadian Standards Association), 2004. Privacy Code. Available from: http://www.csa.ca/standards/privacy/Default.asp?laguage=English
[12] Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J., 2004. Privacy preserving mining of association rules. Inf. Syst., 29(4):343-364.
[13] Fienberg, S.E., McIntyre, J., 2004. Data swapping: variations on a theme by dalenius and reiss. Priv. Statist. Datab., 3050:14-29.
[14] Fukasawa, T., Wang, J., Takata, T., Miyazaki, M., 2004. An Effective Distributed Privacy-preserving Data Mining Algorithm. Intelligent Data Engineering and Automated Learning (IDEAL), p.320-325.
[15] Goldreich, O., Micali, S., Wigderson, A., 1987. How to Play Any Mental Game or a Completeness Theorem for Protocols with Honest Majority. 19th ACM Symp. on the Theory of Computing, p.218-229.
[16] Kargupta, H., Das, K., Liu, K., 2007. Multi-party, privacy-preserving distributed data mining using a game theoretic framework. LNCS, 4702:523.
[17] Liew, C.K., Choi, U.J., Liew, C.J., 1985. A data distortion by probability distribution. ACM Trans. Datab. Syst., 10(3):395-411.
[18] Paillier, P., 1999. Public-key cryptosystems based on composite degree residuosity classes. Advances in Cryptology EUROCRYPT, 99:223-238. Available from: http://www.springerlink.com/content/kwjvf0k8fqyy2h3d/
[19] Rizvi, S.J., Haritsa, J.R., 2002. Maintaining Data Privacy in Association Rule Mining. Proc. 28th Int. Conf. on Very Large Data Bases, 28:682-693.
[20] Sweeney, L., 2002. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncert., Fuzz. and Knowl.-based Syst., 10(5):571-588.
[21] Yao, A.C., 1986. How to Generate and Exchange Secrets. Proc. 27th IEEE Symp. on Foundations of Computer Science, p.162-167.
[22] Zhang, P., Tong, Y.H., Tang, S.W., Yang, D.Q., 2005. Privacy Preserving Naive Bayes Classification Advanced Data Mining and Applications. 3584:744-752.
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/jzus.A0820320
CLC number:
TP391.7
Download Full Text:
Downloaded:
4841
Clicked:
6071
Cited:
3
On-line Access:
Received:
2008-04-08
Revision Accepted:
2008-08-10
Crosschecked:
2009-04-29