Full Text:   <3109>

CLC number: TP391

On-line Access: 2011-08-03

Received: 2010-10-14

Revision Accepted: 2011-04-12

Crosschecked: 2011-07-04

Cited: 2

Clicked: 7885

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE C 2011 Vol.12 No.8 P.615-628


Incremental expectation maximization principal component analysis for missing value imputation for coevolving EEG data

Author(s):  Sun Hee Kim, Hyung Jeong Yang, Kam Swee Ng

Affiliation(s):  Department of Computer Science, Chonnam National University, Gwangju 500-757, Korea

Corresponding email(s):   hjyang@jnu.ac.kr

Key Words:  Electroencephalography (EEG), Missing value imputation, Hidden pattern discovery, Expectation maximization, Principal component analysis

Sun Hee Kim, Hyung Jeong Yang, Kam Swee Ng. Incremental expectation maximization principal component analysis for missing value imputation for coevolving EEG data[J]. Journal of Zhejiang University Science C, 2011, 12(8): 615-628.

@article{title="Incremental expectation maximization principal component analysis for missing value imputation for coevolving EEG data",
author="Sun Hee Kim, Hyung Jeong Yang, Kam Swee Ng",
journal="Journal of Zhejiang University Science C",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Incremental expectation maximization principal component analysis for missing value imputation for coevolving EEG data
%A Sun Hee Kim
%A Hyung Jeong Yang
%A Kam Swee Ng
%J Journal of Zhejiang University SCIENCE C
%V 12
%N 8
%P 615-628
%@ 1869-1951
%D 2011
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C10b0359

T1 - Incremental expectation maximization principal component analysis for missing value imputation for coevolving EEG data
A1 - Sun Hee Kim
A1 - Hyung Jeong Yang
A1 - Kam Swee Ng
J0 - Journal of Zhejiang University Science C
VL - 12
IS - 8
SP - 615
EP - 628
%@ 1869-1951
Y1 - 2011
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C10b0359

Missing values occur in bio-signal processing for various reasons, including technical problems or biological characteristics. These missing values are then either simply excluded or substituted with estimated values for further processing. When the missing signal values are estimated for electroencephalography (EEG) signals, an example where electrical signals arrive quickly and successively, rapid processing of high-speed data is required for immediate decision making. In this study, we propose an incremental expectation maximization principal component analysis (iEMPCA) method that automatically estimates missing values from multivariable EEG time series data without requiring a whole and complete data set. The proposed method solves the problem of a biased model, which inevitably results from simply removing incomplete data rather than estimating them, and thus reduces the loss of information by incorporating missing values in real time. By using an incremental approach, the proposed method also minimizes memory usage and processing time of continuously arriving data. Experimental results show that the proposed method assigns more accurate missing values than previous methods.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]Abdala, O.T., Saeed, M., 2004. Estimation of missing values in clinical laboratory measurements of ICU patients using a weighted K-nearest neighbors algorithm. Comput. Cardiol., 31:693-696.

[2]Acar, E., Dunlavy, D.M., Kolda, T.G., Mørup, M., 2011. Scalable tensor factorizations for incomplete data. Chemometr. Intell. Lab. Syst., 106(1):41-56.

[3]Adams, E., Walczak, B., Vervaet, C., Risha, P.G., Massart, D.L., 2002. Principal component analysis of dissolution data with missing elements. Int. J. Pharm., 234(1-2):169-178.

[4]Al-Deek, H.M., Venkata, C., Chandra, S.R., 2004. New algorithms for filtering and imputation of real-time and archived dual-loop detector data in I-4 data warehouse. Trans. Res. Rec. J. Transp. Res. Board, 1867:116-126.

[5]Ching, W.K., Li, L., Tsing, N.K., Tai, C.W., Ng, T.W., Wong, A.S., Cheng, K.W., 2010. A weighted local least squares imputation method for missing value estimation in microarray gene expression data. Int. J. Data. Min. Bioinform., 4(3):331-347.

[6]Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B, 39(1):1-38.

[7]Dixon, J.K., 1979. Pattern recognition with partly missing data. IEEE. Tran. Syst. Man. Cybern., 9(10):617-621.

[8]Graham, J.W., 2009. Missing data analysis: making it work in the real world. Ann. Rev. Psychol., 60(1):549-576.

[9]Graham, J.W., Olchowski, A.E., Gilreath, T.D., 2007. How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prev. Sci., 8(3):206-213.

[10]Horton, N.J., Lipsitz, S.R., 2001. Multiple imputation in practice: comparison of software packages for regression models with missing variables. Am. Stat., 55(3):244-254.

[11]Janssen, K.J.M., Vergouwe, Y., Donders, A.R.T., Harrell, F.E.Jr., Chen, O., Grobbee, D.E., Moons, K.G.M., 2009. Dealing with missing predictor values when applying clinical prediction models. Clin. Chem., 55(5):994-1001.

[12]Little, R.J.A., Rubin, D.B., 2002. Statistical Analysis with Missing Data (2nd Ed.). John Wiley and Sons, New York, p.200-222.

[13]Musil, C.M., Warnerm, C.B., Yobas, P.K., Jones, S.L., 2002. A comparison of imputation techniques for handling missing data. West. J. Nurs. Res., 24(7):815-829.

[14]Ni, D., Leonard, J.D., Guin, A., Feng, C., 2005. Multiple imputation scheme for overcoming the missing values and variability issues in ITS data. J. Transp. Eng., 131(12):931-938.

[15]Norazian, M.N., Shukri, Y.A., Azam, R.N., Al Bakri, A.M.M., 2008. Estimation of missing values in air pollution data using single imputation techniques. ScienceAsia, 34(3):341-345.

[16]Pan, J.Y., Kitagawa, H., Hamamoto, M., Faloutsos, C., 2004. AutoSplit: Fast and Scalable Discovery of Hidden Variables in Stream and Multimedia Databases. 8th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, p.519-528.

[17]Papadimitriou, S., Sun, J., Faloutsos, C., 2005. Streaming Pattern Discovery in Multiple Time-Series. 31st Int. Conf. on Very Large Data Bases, p.697-708.

[18]Raghunathan, T.E., Lepkowksi, J.M., van Hoewyk, J., Solenbeger, P., 2001. A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. Methodol., 27(1):85-95.

[19]Rosenbaum, P.R., Rubin, D.B., 1983. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J. R. Stat. Soc. Ser. B, 45(2):212-218.

[20]Roweis, S., 1998. EM algorithms for PCA and SPCA. Adv. Neur. Inform. Process. Syst., 10:626-632.

[21]Rubin, D.B., 1978. Multiple Imputation in Sample Surveys—a Phenomenological Bayesian Approach to Nonresponse. Proc. Survey Research Methods Section, p.20-34.

[22]Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons, New York, p.249-250.

[23]Ryan, C., Greene, D., Cagney, G., Cunningham, P., 2010. Missing value imputation for epistatic MAPs. BMC Bioinform., 11(1):197-234.

[24]Schafer, J.L., 1997. Analysis of Incomplete Multivariate Data. Chapman and Hall, London, p.478-479.

[25]Schlogl, A., Supp, G., 2006. Analyzing event-related EEG data with multivariate autoregressive parameters. Progr. Brain Res., 159:135-147.

[26]Schneider, T., 2001. Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J. Climate, 14:853-871.

[27]Sharma, S., Lingras, P., Zhong, M., 2004. Effect of missing values estimations of traffic parameters. Transp. Plan. Technol., 27(2):119-144.

[28]Smith, B.L., Scherer, W.T., Conklin, J.H., 2003. Exploring imputation techniques for missing data in transportation management systems. Transp. Res. Rec. J. Transp. Res. Board, 1836:132-142.

[29]Smith, L., 2002. A Tutorial on Principal Components Analysis. Cornell University, USA. Available from http://www.cs. otgo.ac.nz/cosc453/student_tutorials/principal_compone nts.pdf [Accessed on Sept. 10, 2009].

[30]Smith, S.J.M., 2005. EEG in the diagnosis, classification, and management of patients with epilepsy. J. Neurol. Neurosurg. Psych., 76:ii2-ii7.

[31]Stanimirova, I., Daszykowski, M., Walczak, B., 2007. Dealing with missing values and outliers in principal component analysis. Talanta, 72(1):172-178.

[32]Subha, D.P., Joseph, P.K., Acharya, U.R., Lim, C.M., 2010. EEG signal analysis: a survey. J. Med. Syst., 34(2):195-212.

[33]Sun, J., Papadimitriou, S., Faloutsos, C., 2005. Online Latent Variable Detection in Sensor Networks. 21st Int. Conf. on Data Engineering, p.1126-1127.

[34]Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, B., Altman, R.B., 2001. Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520-525.

[35]Wang, X., Li, A., Jiang, Z., Feng, H., 2006. Missing value estimation for DNA microarray gene expression data by support vector regression imputation and orthogonal coding scheme. BMC Bioinform., 7:32.

[36]Yamaguchi, T., Mackin, K.J., Matsumoto, K., Okusa, H., 2008. SOM for classifying data sets with missing values: application to clinical data of bladder cancer patients. Artif. Life Robot., 13(1):271-274.

[37]Yuan, Y.C., 2001. Multiple Imputation for Missing Data: Concepts and New Development SAS/STAT 8.2. Available from http://www.sas.com/statistics [Accessed on May 18, 2010].

[38]Zhao, L., Chai, T., Cong, Q., 2006. Operating Condition Recognition of Predenitrification Bioprocess Using Robust EMPCA and FCM. Sixth World Congress on Intelligent Control and Automation, p.9386-9390.

[39]Zhong, M., Sharma, S., Liu, Z., 2005. Assessing robustness of imputation models based on data from different Jurisdictions: examples of Alberta and Saskatchewan, Canada. Transp. Res. Rec. J. Transp. Res. Board, 1917:116-126.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE