CLC number: TP181
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2023-04-20
Cited: 0
Clicked: 1856
Citations: Bibtex RefMan EndNote GB/T7714
Han YAN, Chongquan ZHONG, Yuhu WU, Liyong ZHANG, Wei LU. A hybrid-model optimization algorithm based on the Gaussian process and particle swarm optimization for mixed-variable CNN hyperparameter automatic search[J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24(11): 1557-1573.
@article{title="A hybrid-model optimization algorithm based on the Gaussian process and particle swarm optimization for mixed-variable CNN hyperparameter automatic search",
author="Han YAN, Chongquan ZHONG, Yuhu WU, Liyong ZHANG, Wei LU",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="24",
number="11",
pages="1557-1573",
year="2023",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2200515"
}
%0 Journal Article
%T A hybrid-model optimization algorithm based on the Gaussian process and particle swarm optimization for mixed-variable CNN hyperparameter automatic search
%A Han YAN
%A Chongquan ZHONG
%A Yuhu WU
%A Liyong ZHANG
%A Wei LU
%J Frontiers of Information Technology & Electronic Engineering
%V 24
%N 11
%P 1557-1573
%@ 2095-9184
%D 2023
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2200515
TY - JOUR
T1 - A hybrid-model optimization algorithm based on the Gaussian process and particle swarm optimization for mixed-variable CNN hyperparameter automatic search
A1 - Han YAN
A1 - Chongquan ZHONG
A1 - Yuhu WU
A1 - Liyong ZHANG
A1 - Wei LU
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 24
IS - 11
SP - 1557
EP - 1573
%@ 2095-9184
Y1 - 2023
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2200515
Abstract: convolutional neural networks (CNNs) have been developed quickly in many real-world fields. However, CNN’s performance depends heavily on its hyperparameters, while finding suitable hyperparameters for CNNs working in application fields is challenging for three reasons: (1) the problem of mixed-variable encoding for different types of hyperparameters in CNNs, (2) expensive computational costs in evaluating candidate hyperparameter configuration, and (3) the problem of ensuring convergence rates and model performance during hyperparameter search. To overcome these problems and challenges, a hybrid-model optimization algorithm is proposed in this paper to search suitable hyperparameter configurations automatically based on the gaussian process and particle swarm optimization (GPPSO) algorithm. First, a new encoding method is designed to efficiently deal with the CNN hyperparameter mixed-variable problem. Second, a hybrid-surrogate-assisted model is proposed to reduce the high cost of evaluating candidate hyperparameter configurations. Third, a novel activation function is suggested to improve the model performance and ensure the convergence rate. Intensive experiments are performed on imageclassification benchmark datasets to demonstrate the superior performance of GPPSO over state-of-the-art methods. Moreover, a case study on metal fracture diagnosis is carried out to evaluate the GPPSO algorithm performance in practical applications. Experimental results demonstrate the effectiveness and efficiency of GPPSO, achieving accuracy of 95.26% and 76.36% only through 0.04 and 1.70 GPU days on the CIFAR-10 and CIFAR-100 datasets, respectively.
[1]Abadi M, Agarwal A, Barham P, et al., 2016. TensorFlow: large-scale machine learning on heterogeneous distributed systems. https://arxiv.org/abs/1603.04467
[2]Alvarez-Rodriguez U, Battiston F, de Arruda GF, et al., 2021. Evolutionary dynamics of higher-order interactions in social networks. Nat Hum Behav, 5(5):586-595.
[3]Alzubaidi L, Zhang JL, Humaidi AJ, et al., 2021. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data, 8(1):53.
[4]Baker B, Gupta O, Naik N, et al., 2017. Designing neural network architectures using reinforcement learning. https://arxiv.org/abs/1611.02167
[5]Cai H, Chen TY, Zhang WN, et al., 2018. Efficient architecture search by network transformation. Proc 32nd AAAI Conf on Artificial Intelligence, p.2787-2794.
[6]Chen ZG, Zhan ZH, Kwong S, et al., 2022. Evolutionary computation for intelligent transportation in smart cities: a survey. IEEE Comput Intell Mag, 17(2):83-102.
[7]Darwish A, Hassanien AE, Das S, 2020. A survey of swarm and evolutionary computing approaches for deep learning. Artif Intell Rev, 53(3):1767-1812.
[8]Fernandes FE, Yen GG, 2021. Automatic searching and pruning of deep neural networks for medical imaging diagnostic. IEEE Trans Neur Netw Learn Syst, 32(12):5664-5674.
[9]Fielding B, Lawrence T, Zhang L, 2019. Evolving and ensembling deep CNN architectures for image classification. Int Joint Conf on Neural Networks, p.1-8.
[10]Goodfellow IJ, Warde-Farley D, Mirza M, et al., 2013. Maxout networks. Proc 30th Int Conf on Machine Learning, p.1319-1327.
[11]Grigorescu S, Trasnea B, Cocias T, et al., 2020. A survey of deep learning techniques for autonomous driving. J Field Robot, 37(3):362-386.
[12]Guo H, Zhang W, Nie XY, et al., 2022. High-speed planar imaging of OH radicals in turbulent flames assisted by deep learning. Appl Phys B, 128(3):52.
[13]He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. IEEE Conf on Computer Vision and Pattern Recognition, p.770-778.
[14]Huang G, Liu Z, van der Maaten L, et al., 2017. Densely connected convolutional networks. 30th IEEE Conf on Computer Vision and Pattern Recognition, p.2261-2269.
[15]Jiang WW, Luo JY, 2022. Graph neural network for traffic forecasting: a survey. Expert Syst Appl, 207:117921.
[16]Jin HF, Song QQ, Hu X, 2019. Auto-Keras: an efficient neural architecture search system. Proc 25th ACM SIGKDD Int Conf on Knowledge Discovery & Data Mining, p.1946-1956.
[17]Krizhevsky A, Sutskever I, Hinton GE, 2017. ImageNet classification with deep convolutional neural networks. Commun ACM, 60(6):84-90.
[18]Larsson G, Maire M, Shakhnarovich G, 2016. FractalNet: ultra-deep neural networks without residuals. https://arxiv.org/abs/1605.07648
[19]Li JY, Zhan ZH, Wang C, et al., 2020. Boosting data-driven evolutionary algorithm with localized data generation. IEEE Trans Evol Comput, 24(5):923-937.
[20]Li JY, Zhan ZH, Liu RD, et al., 2021. Generation-level parallelism for evolutionary computation: a pipeline-based parallel particle swarm optimization. IEEE Trans Cybern, 51(10):4848-4859.
[21]Li JY, Zhan ZH, Zhang J, 2022. Evolutionary computation for expensive optimization: a survey. Mach Intell Res, 19(1):3-23.
[22]Li JY, Zhan ZH, Xu J, et al., 2023. Surrogate-assisted hybrid-model estimation of distribution algorithm for mixed-variable hyperparameters optimization in convolutional neural networks. IEEE Trans Neur Netw Learn Syst, 34(5):2338-2352.
[23]Li X, Lai SQ, Qian XM, 2022. DBCFace: towards pure convolutional neural network face detection. IEEE Trans Circ Syst Video Technol, 32(4):1792-1804.
[24]Lin M, Chen Q, Yan SC, 2013. Network in network. https://arxiv.org/abs/1312.4400
[25]Liu HX, Simonyan K, Vinyals O, et al., 2017. Hierarchical representations for efficient architecture search. https://arxiv.org/abs/1711.00436
[26]Miranda LJV, 2018. PySwarms: a research toolkit for particle swarm optimization in Python. J Open Source Softw, 3(21):433.
[27]Poli R, Kennedy J, Blackwell T, 2007. Particle swarm optimization. Swarm Intell, 1(1):33-57.
[28]Real E, Moore S, Selle A, et al., 2017. Large-scale evolution of image classifiers. https://arxiv.org/abs/1703.01041v2
[29]Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556
[30]Snoek J, Larochelle H, Adams RP, 2012. Practical Bayesian optimization of machine learning algorithms. https://arxiv.org/abs/1206.2944
[31]Springenberg JT, Dosovitskiy A, Brox T, et al., 2014. Striving for simplicity: the all convolutional net. https://arxiv.org/abs/1412.6806v3
[32]Srivastava RK, Greff K, Schmidhuber J, 2015. Highway networks. https://arxiv.org/abs/1505.00387
[33]Suganuma M, Shirakawa S, Nagao T, 2017. A genetic programming approach to designing convolutional neural network architectures. Proc Genetic and Evolutionary Computation Conf, p.497-504.
[34]Sun YN, Xue B, Zhang MJ, et al., 2019. A particle swarm optimization-based flexible convolutional autoencoder for image classification. IEEE Trans Neur Netw Learn Syst, 30(8):2295-2309.
[35]Sun YN, Xue B, Zhang MJ, et al., 2020a. Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Trans Cybern, 50(9):3840-3854.
[36]Sun YN, Xue B, Zhang M, et al., 2020b. Completely automated CNN architecture design based on blocks. IEEE Trans Neur Netw Learn Syst, 31(4):1242-1254.
[37]Sun YN, Wang HD, Xue B, et al., 2020c. Surrogate-assisted evolutionary deep learning using an end-to-end random forest-based performance predictor. IEEE Trans Evol Comput, 24(2):350-364.
[38]Tulbure AA, Tulbure AA, Dulf EH, 2022. A review on modern defect detection models using DCNNs-deep convolutional neural networks. J Adv Res, 35:33-48.
[39]Wang B, Sun YN, Xue B, et al., 2018. Evolving deep convolutional neural networks by variable-length particle swarm optimization for image classification. IEEE Congress on Evolutionary Computation, p.1-8.
[40]Wang B, Xue B, Zhang MJ, 2020. Particle swarm optimisation for evolving deep neural networks for image classification by evolving and stacking transferable blocks. IEEE Congress on Evolutionary Computation, p.1-8.
[41]Wang YQ, Li JY, Chen CH, et al., 2022. Scale adaptive fitness evaluation-based particle swarm optimisation for hyperparameter and architecture optimisation in neural networks and deep learning. CAAI Trans Intell Technol, early access.
[42]Wu SH, Zhan ZH, Tan KC, et al., 2023. Orthogonal transfer for multitask optimization. IEEE Trans Evol Comput, 27(1):185-200.
[43]Wu T, Shi J, Zhou DY, et al., 2019. A multi-objective particle swarm optimization for neural networks pruning. IEEE Congress on Evolutionary Computation, p.570-577.
[44]Xie LX, Yuille A, 2017. Genetic CNN. IEEE Int Conf on Computer Vision, p.1388-1397.
[45]Zhan ZH, Li JY, Zhang J, 2022a. Evolutionary deep learning: a survey. Neurocomputing, 483:42-58.
[46]Zhan ZH, Zhang J, Lin Y, et al., 2022b. Matrix-based evolutionary computation. IEEE Trans Emerg Top Comput Intell, 6(2):315-328.
[47]Zhong Z, Yan JJ, Wu W, et al., 2018. Practical block-wise neural network architecture generation. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.2423-2432.
[48]Zoph B, Le QV, 2017. Neural architecture search with reinforcement learning. https://arxiv.org/abs/1611.01578
Open peer comments: Debate/Discuss/Question/Opinion
<1>