CLC number: Q39; O213
On-line Access:
Received: 2008-12-16
Revision Accepted: 2009-07-08
Crosschecked: 2009-09-08
Cited: 2
Clicked: 5455
Yan-feng SHEN, Jun ZHU. Power analysis of principal components regression in genetic association studies[J]. Journal of Zhejiang University Science B, 2009, 10(10): 721-730.
@article{title="Power analysis of principal components regression in genetic association studies",
author="Yan-feng SHEN, Jun ZHU",
journal="Journal of Zhejiang University Science B",
volume="10",
number="10",
pages="721-730",
year="2009",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.B0830866"
}
%0 Journal Article
%T Power analysis of principal components regression in genetic association studies
%A Yan-feng SHEN
%A Jun ZHU
%J Journal of Zhejiang University SCIENCE B
%V 10
%N 10
%P 721-730
%@ 1673-1581
%D 2009
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.B0830866
TY - JOUR
T1 - Power analysis of principal components regression in genetic association studies
A1 - Yan-feng SHEN
A1 - Jun ZHU
J0 - Journal of Zhejiang University Science B
VL - 10
IS - 10
SP - 721
EP - 730
%@ 1673-1581
Y1 - 2009
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.B0830866
Abstract: Association analysis provides an opportunity to find genetic variants underlying complex traits. A principal components regression (PCR)-based approach was shown to outperform some competing approaches. However, a limitation of this method is that the principal components (PCs) selected from single nucleotide polymorphisms (SNPs) may be unrelated to the phenotype. In this article, we investigate the theoretical properties of such a method in more detail. We first derive the exact power function of the test based on PCR, and hence clarify the relationship between the test power and the degrees of freedom (DF). Next, we extend the PCR test to a general weighted PCs test, which provides a unified framework for understanding the properties of some related statistics. We then compare the performance of these tests. We also introduce several data-driven adaptive alternatives to overcome difficulties in the PCR approach. Finally, we illustrate our results using simulations based on real genotype data. Simulation study shows the risk of using the unsupervised rule to determine the number of PCs, and demonstrates that there is no single uniformly powerful method for detecting genetic variants.
[1] Chapman, J.M., Cooper, J.D., Todd, J.A., Clayton, D.G., 2003. Detecting disease associations due to linkage equilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum. Hered., 56(1-3):18-31.
[2] Clayton, D., Chapman, J., Cooper, J., 2004. Use of unphased multilocus genotype data in indirect association studies. Genet. Epidemiol., 27(4):415-428.
[3] Excoffier, L., Slatkin, M., 1995. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol., 12(5):921-927.
[4] Fisher, R.A., 1932. Statistical Methods for Research Workers, 4th Ed. Oliver and Boyd, London, p.99-101.
[5] Gauderman, W.J., Murcray, C., Gilliland, F., Conti, D.V., 2007. Testing association between disease and multiple SNPs in a candidate gene. Genet. Epidemiol., 31(5): 383-395.
[6] Goeman, J.J., van de Geer, S.A., Kort, F., van Houwelingen, H.C., 2004. A global test for groups of genes: testing association with a clinical outcome. Bioinformatics, 20(1):93-99.
[7] Jolliffe, I.T., 2002. Principal Component Analysis. Springer, New York, p.167-190.
[8] Kwee, L.C., Liu, D.W., Lin, X.H., Ghosh, D., Epstein, M.P., 2008. A powerful and flexible multilocus association test for quantitative traits. Am. J. Hum. Genet., 82(2): 386-397.
[9] Mardia, K.V., Kent, J.T., Bibby, J.M., 1979. Multivariate Analysis. Academic Press, London.
[10] McCullagh, P., Nelder, J.A., 1983. Generalized Linear Models. Chapman and Hall, London.
[11] Risch, N., Merikangas, K., 1996. The future of genetic studies of complex human diseases. Science, 273(5281): 1516-1517.
[12] Robertson, T., Wright, F.T., Dykstra, R.L., 1988. Order Restricted Statistical Inference. Wiley, New York, p.59-86.
[13] Roeder, K., Bacanu, S.A., Sonpar, V., Zhang, X.H., Devlin, B., 2005. Analysis of single-locus tests to detect gene/ disease associations. Genet. Epidemiol., 28(3):207-219.
[14] Schaid, D.J., Rowland, C.M., Tines, D.E., Jacobson, R.M., Poland, G.A., 2002. Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet., 70(2):425-434.
[15] Tzeng, J.Y., Wang, C.H., Kao, J.T., Hsiao, C.K., 2006. Regression-based association analysis with clustered haplotypes through use of genotypes. Am. J. Hum. Genet., 78(2):231-242.
[16] Wang, K., Abbott, D., 2008. A principal components regression approach to multilocus genetic association studies. Genet. Epidemiol., 32(2):108-118.
[17] Wang, T., Elston, R.C., 2007. Improved power by use of a weighted score test for linkage disequilibrium mapping. Am. J. Hum. Genet., 80(2):353-360.
[18] Xiong, M.M., Zhao, J.Y., Boerwinkle, E., 2002. Generalized T2 test for genome association studies. Am. J. Hum. Genet., 70(5):1257-1268.
[19] Xu, X., Tian, L., Wei, L.J., 2003. Combining dependent tests for linkage or association across multiple phenotypic traits. Biostatistics, 4(2):223-229.
[20] Zhang, D.W., Lin, X.H., 2003. Hypothesis testing in semiparametric additive mixed models. Biostatistics, 4(1): 57-74.
Open peer comments: Debate/Discuss/Question/Opinion
<1>