Full Text:   <1789>

Summary:  <1480>

Suppl. Mater.: 

CLC number: Q39

On-line Access: 2018-12-03

Received: 2018-03-14

Revision Accepted: 2018-07-12

Crosschecked: 2018-11-08

Cited: 0

Clicked: 3434

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Cheng-yin Ye

https://orcid.org/0000-0001-8384-1857

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE B 2018 Vol.19 No.12 P.935-947

http://doi.org/10.1631/jzus.B1800162


An ensemble-based likelihood ratio approach for family-based genomic risk prediction


Author(s):  Hui An, Chang-shuai Wei, Oliver Wang, Da-hui Wang, Liang-wen Xu, Qing Lu, Cheng-yin Ye

Affiliation(s):  Department of Health Management, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China; more

Corresponding email(s):   yechengyin@hznu.edu.cn

Key Words:  Family-based study, Genetic risk prediction, High-dimensional data


Hui An, Chang-shuai Wei, Oliver Wang, Da-hui Wang, Liang-wen Xu, Qing Lu, Cheng-yin Ye. An ensemble-based likelihood ratio approach for family-based genomic risk prediction[J]. Journal of Zhejiang University Science B, 2018, 19(12): 935-947.

@article{title="An ensemble-based likelihood ratio approach for family-based genomic risk prediction",
author="Hui An, Chang-shuai Wei, Oliver Wang, Da-hui Wang, Liang-wen Xu, Qing Lu, Cheng-yin Ye",
journal="Journal of Zhejiang University Science B",
volume="19",
number="12",
pages="935-947",
year="2018",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.B1800162"
}

%0 Journal Article
%T An ensemble-based likelihood ratio approach for family-based genomic risk prediction
%A Hui An
%A Chang-shuai Wei
%A Oliver Wang
%A Da-hui Wang
%A Liang-wen Xu
%A Qing Lu
%A Cheng-yin Ye
%J Journal of Zhejiang University SCIENCE B
%V 19
%N 12
%P 935-947
%@ 1673-1581
%D 2018
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.B1800162

TY - JOUR
T1 - An ensemble-based likelihood ratio approach for family-based genomic risk prediction
A1 - Hui An
A1 - Chang-shuai Wei
A1 - Oliver Wang
A1 - Da-hui Wang
A1 - Liang-wen Xu
A1 - Qing Lu
A1 - Cheng-yin Ye
J0 - Journal of Zhejiang University Science B
VL - 19
IS - 12
SP - 935
EP - 947
%@ 1673-1581
Y1 - 2018
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.B1800162


Abstract: 
Objective: As one of the most popular designs used in genetic research, family-based design has been well recognized for its advantages, such as robustness against population stratification and admixture. With vast amounts of genetic data collected from family-based studies, there is a great interest in studying the role of genetic markers from the aspect of risk prediction. This study aims to develop a new statistical approach for family-based risk prediction analysis with an improved prediction accuracy compared with existing methods based on family history. Methods: In this study, we propose an ensemble-based likelihood ratio (ELR) approach, Fam-ELR, for family-based genomic risk prediction. Fam-ELR incorporates a clustered receiver operating characteristic (ROC) curve method to consider correlations among family samples, and uses a computationally efficient tree-assembling procedure for variable selection and model building. Results: Through simulations, Fam-ELR shows its robustness in various underlying disease models and pedigree structures, and attains better performance than two existing family-based risk prediction methods. In a real-data application to a family-based genome-wide dataset of conduct disorder, Fam-ELR demonstrates its ability to integrate potential risk predictors and interactions into the model for improved accuracy, especially on a genome-wide level. Conclusions: By comparing existing approaches, such as genetic risk-score approach, Fam-ELR has the capacity of incorporating genetic variants with small or moderate marginal effects and their interactions into an improved risk prediction model. Therefore, it is a robust and useful approach for high-dimensional family-based risk prediction, especially on complex disease with unknown or less known disease etiology.

基于家系数据集群化似然比算法的疾病基因组遗传风险预测研究

目的:作为遗传研究中最常用的设计之一,基于家系数据的实验设计因其优势而得到了广泛认可,例如家系数据在人群分层和混合情况下表现出来的稳健性.在疾病风险预测中,研究者对如何基于家系遗传数据,寻找和分析遗传标记的作用非常感兴趣.本研究旨在开发一种新的统计方法,用于基于家系数据的遗传风险预测.
创新点:期望新方法能够捕捉小或中等边际效应的遗传因子,及其相互作用,与基于家族史或家系数据的现有风险预测方法相比,具有更高的预测准确性.
方法:在这项研究中,我们提出了集群化似然比(ELR)的新方法,Fam-ELR,用于家系数据的基因组疾病风险预测.Fam-ELR采用集群化的受试者工作特征曲线(ROC)方法来考虑家系样本内部的相关性,并使用计算有效的集群树进行变量选择和模型构建.
结论:通过模拟,Fam-ELR显示了其在各种疾病遗传模型和谱系结构中的稳健性,并且获得了比现有的两种基于家系数据的风险预测方法更好的性能.同时,在基于全基因组行为障碍家系数据集的实际应用中,Fam-ELR展示了其将潜在风险预测因子和其相互作用整合到模型中以提高准确性的能力,尤其是在全基因组水平上.通过比较现有方法,例如遗传风险评分方法等,Fam-ELR被证实具有将较小或中等边际效应的遗传变异及其相互作用纳入改进的风险预测模型的能力.因此,它是一种强有力且实用的方法,适用于基于家系数据的高维度遗传风险预测中,特别是对于病因未知或知之甚少的人类复杂疾病.

关键词:家系数据研究;遗传风险预测;高维数据

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abraham G, Inouye M, 2015. Genomic risk prediction of complex human disease and its clinical application. Curr Opin Genet Dev, 33:10-16.

[2]Anney RJL, Lasky-Su J, Ó'Dúshláine C, et al., 2008. Conduct disorder and ADHD: evaluation of conduct problems as a categorical and quantitative trait in the international multicentre ADHD genetics study. Am J Med Genet B Neuropsychiatr Genet, 147B(8):1369-1378.

[3]Chatterjee N, Wheeler B, Sampson J, et al., 2013. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat Genet, 45(4):400-405.

[4]Choi S, Bae S, Park T, 2016. Risk prediction using genome-wide association studies on type 2 diabetes. Genomics Inform, 14(4):138-148.

[5]de los Campos G, Naya H, Gianola D, et al., 2009. Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics, 182(1):375-385.

[6]Ferreira MAR, O'Donovan MC, Meng YA, et al., 2008. Collaborative genome-wide association analysis supports a role for ANK3 and CACNA1C in bipolar disorder. Nat Genet, 40(9):1056-1058.

[7]Ginsburg GS, Willard HF, 2009. Genomic and personalized medicine: foundations and applications. Transl Res, 154(6):277-287.

[8]Goes FS, Hamshere ML, Seifuddin F, et al., 2012. Genome-wide association of mood-incongruent psychotic bipolar disorder. Transl Psychiatry, 2(10):e180.

[9]Goes FS, McGrath J, Avramopoulos D, et al., 2015. Genome-wide association study of schizophrenia in Ashkenazi Jews. Am J Med Genet B Neuropsychiatr Genet, 168(8):649-659.

[10]Janssens ACJW, van Duijn CM, 2008. Genome-based prediction of common diseases: advances and prospects. Hum Mol Genet, 17(R2):R166-R173.

[11]Kazdin AE, 1997. Practitioner review: psychosocial treatments for conduct disorder in children. J Child Psychol Psychiatry, 38(2):161-178.

[12]Lasky-Su J, Neale BM, Franke B, et al., 2008. Genome-wide association scan of quantitative traits for attention deficit hyperactivity disorder identifies novel associations and confirms candidate gene associations. Am J Med Genet B Neuropsychiatr Genet, 147B(8):1345-1354.

[13]Maller J, George S, Purcell S, et al., 2006. Common variation in three genes, including a noncoding variant in CFH, strongly influences risk of age-related macular degeneration. Nat Genet, 38(9):1055-1059.

[14]Marchini J, Donnelly P, Cardon LR, 2005. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet, 37(4):413-417.

[15]Meigs JB, Shrader P, Sullivan LM, et al., 2008. Genotype score in addition to common risk factors for prediction of type 2 diabetes. N Engl J Med, 359(21):2208-2219.

[16]Need AC, Attix DK, McEvoy JM, et al., 2009. A genome-wide study of common SNPs and CNVs in cognitive performance in the CANTAB. Hum Mol Genet, 18(23):4650-4661.

[17]Obuchowski NA, 1997. Nonparametric analysis of clustered ROC curve data. Biometrics, 53(2):567-578.

[18]Pappa I, St Pourcain B, Benke K, et al., 2016. A genome-wide approach to children’s aggressive behavior: the EAGLE consortium. Am J Med Genet B Neuropsychiatr Genet, 171(5):562-572.

[19]Rietveld CA, Esko T, Davies G, et al., 2014. Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proc Natl Acad Sci USA, 111(38):13790-13794.

[20]Sherva R, Wang Q, Kranzler H, et al., 2016. Genome-wide association study of cannabis dependence severity, novel risk variants, and shared genetic risks. JAMA Psychiatry, 73(5):472-480.

[21]Shieh Y, Hu DL, Ma L, et al., 2016. Breast cancer risk prediction using a clinical risk model and polygenic risk score. Breast Cancer Res Treat, 159(3):513-525.

[22]Smith JA, Ware EB, Middha P, et al., 2015. Current applications of genetic risk scores to cardiovascular outcomes and subclinical phenotypes. Curr Epidemiol Rep, 2(3):180-190.

[23]Sonuga-Barke EJS, Lasky-Su J, Neale BM, et al., 2008. Does parental expressed emotion moderate genetic effects in ADHD? An exploration using a genome wide association scan. Am J Med Genet B Neuropsychiatr Genet, 147B(8):1359-1368.

[24]Wackerly DD, Mendenhall III W, Scheaffer RL, 2008. Mathematical Statistics with Applications, 7th Ed. Thomson, Belmont, CA, USA.

[25]Wei CS, Anthony JC, Lu Q, 2012. Genome-environmental risk assessment of cocaine dependence. Front Genet, 3:83.

[26]Wei CS, Schaid DJ, Lu Q, 2013. Trees assembling Mann-Whitney approach for detecting genome-wide joint association among low-marginal-effect loci. Genet Epidemiol, 37(1):84-91.

[27]Wen YL, Burt A, Lu Q, 2017. Risk prediction modeling on family-based sequencing data using a random field method. Genetics, 207(1):63-73.

[28]Wray NR, Lee SH, Mehta D, et al., 2014. Research review: polygenic methods and their application to psychiatric traits. J Child Psychol Psychiatry, 55(10):1068-1087.

[29]Yang J, Benyamin B, McEvoy BP, et al., 2010. Common SNPs explain a large proportion of the heritability for human height. Nat Genet, 42(7):565-569.

[30]Ye C, Zhu J, Lu Q, 2011a. A clustered optimal ROC curve method for family-based genetic risk prediction. Stat Interface, 4(3):373-380.

[31]Ye C, Cui Y, Wei C, et al., 2011b. A non-parametric method for building predictive genetic tests on high-dimensional data. Hum Hered, 71(3):161-170.

[32]List of electronic supplementary materials

[33]Table S1 Significant interaction effects identified by logistic regression in the genome-wide prediction

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE