CLC number: Q789; R73
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 0000-00-00
Cited: 13
Clicked: 7503
MAO Yong, ZHOU Xiao-bo, PI Dao-ying, SUN You-xian, WONG Stephen T.C.. Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm[J]. Journal of Zhejiang University Science B, 2005, 6(10): 961-973.
@article{title="Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm",
author="MAO Yong, ZHOU Xiao-bo, PI Dao-ying, SUN You-xian, WONG Stephen T.C.",
journal="Journal of Zhejiang University Science B",
volume="6",
number="10",
pages="961-973",
year="2005",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.2005.B0961"
}
%0 Journal Article
%T Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm
%A MAO Yong
%A ZHOU Xiao-bo
%A PI Dao-ying
%A SUN You-xian
%A WONG Stephen T.C.
%J Journal of Zhejiang University SCIENCE B
%V 6
%N 10
%P 961-973
%@ 1673-1581
%D 2005
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.2005.B0961
TY - JOUR
T1 - Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm
A1 - MAO Yong
A1 - ZHOU Xiao-bo
A1 - PI Dao-ying
A1 - SUN You-xian
A1 - WONG Stephen T.C.
J0 - Journal of Zhejiang University Science B
VL - 6
IS - 10
SP - 961
EP - 973
%@ 1673-1581
Y1 - 2005
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.2005.B0961
Abstract: In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.
[1] Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., et al., 2000. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403:503-511.
[2] Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S., 2002. Choosing kernel parameters for support vector machines. Machine Learning, 46:131-159.
[3] Cristianini, N., Shawe-Taylor, J., 2000. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge.
[4] Dudoit, S., Fridlyand, J., Speed, T.P., 2002. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97:77-87.
[5] Furlanello, C., Serafini, M., Merler, S., Jurman, G., 2003. An accelerated procedure for recursive feature ranking on microarray data. Neural Networks, 16:641-648.
[6] Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al., 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531-537.
[7] Guyon, I., Weston, J., Barnhill, S., Vapnik, V., 2002. Gene selection for cancer classification using support vector machines. Machine Learning, 46:389-422.
[8] Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Rafeld, M., et al., 2001. Gene expression profiles in hereditary breast cancer. The New England Journal of Medicine, 344:539-548.
[9] Houck, C., Joines, J., Kay, M., 1995. A Genetic Algorithm for Function Optimization: A Matlab Implementatio. NCSU-IE TR 95-09, North Carolina State University, USA.
[10] Kim, S., Dougherty, E.R., Chen, Y., Sivakumar, K., Meltzer, P., Trent, J.M., Bittner, M., 2000. Multivariate measurement of gene expression relations. Genomics, 67:201-209.
[11] Kim, S., Dougherty, E.R., Barrea, J., Chen, Y., Bittner, M., Trent, J.M., 2002. Strong feature sets from small samples. Journal of Computational Biology, 9:127-146.
[12] Lee, K.E., Sha, N., Dougherty, E.R., Vannucci, M., Mallick, B.K., 2003. Gene selection: a Bayesian variable selection approach. Bioinformatics, 19:90-97.
[13] Li, W., Yang, Y., 2002. How Many Genes are Needed for a Discriminant Microarray Data Analysis. In: Lin, S.M., Johnson, K.F. (Eds.), Methods of Microarray Data Analysis. Kluwer Academic, Boston, p.137-150.
[14] Mao, Y., Zhou, X., Pi, D.Y., Wong, T.C., Sun, Y.X., 2004. Multi-class cancer classification by using fuzzy support vector machine and binary decision tree with gene selection. Journal of Biomedicine and Biotechnology, in Press.
[15] Miettinen, K., Neittaanmaki, P., Makela, M.M., 1999. Evolutionary Algorithms in Engineering and Computer Science. Wiley, New York.
[16] Shashua, A., Wolf, L., 2004. Kernel Feature Selection with Side Data using a Spectral Approach. Computer Vision−ECCV 2004: 8th European Conference on Computer Vision. Prague, Czech Republic, p.39-53.
[17] Srinivas, M., Patnaik, L.M., 1994. Adaptive probabilities of crossover and mutation in genetic algorithm. IEEE Trans. Syst. Man, Cybem., 24(4):656-667.
[18] Tabus, I., Astola, J., 2001. On the use of MDL principle in gene expression prediction. J. Appl. Signal Process, 4:297-303.
[19] Vapnik, V.N., 2000. The Nature of Statistical Learning Theory, 2nd Ed., Springer, New York.
[20] Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V., 2001. In: Leen, T.K., Dietterich, T.G., Tresp, V. (Eds.), Advances in Neural Information Processing Systems 13. MIT Press, Cambridge, MA, p.668-674.
[21] Zhang, X., Wong, W., 2001. Recursive Sample Classification and Gene Selection Based on SVM: Method and Software Description. Technical Report, Department of Biostatistics, Harvard School of Public Health, USA.
[22] Zhou, X., Wang, X., Dougherty, E.R., 2003a. Construction of genomic networks using mutual-information clustering and reversible-jump Markov-Chain-Monte-Carlo predictor design. Signal Process, 83:745-761.
[23] Zhou, X., Wang, X., Dougherty, E.R., 2003b. Binarization of microarray data based on a mixture model. Molecular Cancer Therapeutics, 2:679-684.
[24] Zhou, X., Wang, X., Dougherty, E.R., 2003c. Missing value estimation based on linear and nonlinear regression with Bayesian gene selection. Bioinformatics, 19:2302-2307.
[25] Zhou, X., Wang, X., Dougherty, E.R., 2004a. A Bayesian approach to nonlinear probit gene selection and classification. Journal of Franklin Institute, Special Issue on Genomics, Signal Processing and Statistics, 341:137-156.
[26] Zhou, X., Wang, X., Dougherty, E.R., 2004b. Nonlinear-probit gene classification using mutual-information and wavelet-based feature selection. Biological Systems, in Press.
[27] Zhou, X., Wang, X., Dougherty, E.R., 2005. Gene selection using logistic regressions based on AIC, BIC and MDL criteria. Journal of New Mathematics and Natural Computation, 1(1):129-145.
Open peer comments: Debate/Discuss/Question/Opinion
<1>