Weighted Random Search for CNN Hyperparameter Optimization
Keywords:
Hyperparameter optimization, supervised learning, random search, convolutional neural networksAbstract
Nearly all model algorithms used in machine learning use two different sets of parameters: the training parameters and the meta-parameters (hyperparameters). While the training parameters are learned during the training phase, the values of the hyperparameters have to be specified before learning starts. For a given dataset, we would like to find the optimal combination of hyperparameter values, in a reasonable amount of time. This is a challenging task because of its computational complexity. In previous work [11], we introduced the Weighted Random Search (WRS) method, a combination of Random Search (RS) and probabilistic greedy heuristic. In the current paper, we compare the WRS method with several state-of-the art hyperparameter optimization methods with respect to Convolutional Neural Network (CNN) hyperparameter optimization. The criterion is the classification accuracy achieved within the same number of tested combinations of hyperparameter values. According to our experiments, the WRS algorithm outperforms the other methods.References
Albelwi, S.; Mahmood, A. (2016). Analysis of instance selection algorithms on large datasets with deep convolutional neural networks. In 2016 IEEE Long Island Systems, Applications and Technology Conference (LISAT), 1-5, 2016.
Albelwi, S.; Mahmood, A. (2017). A framework for designing the architectures of deep convolutional neural networks. Entropy, 19(6), 2017.
Andonie, R. (2019). Hyperparameter optimization in learning systems. Journal of Membrane Computing, 2019.
Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. (2011a). Algorithms for Hyper-Parameter Optimization. In 25th Annual Conference on Neural Information Processing Systems (NIPS 2011), volume 24 of Advances in Neural Information Processing Systems, Granada, Spain. Neural Information Processing Systems Foundation, 2011.
Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. (2011b). Algorithms for hyper-parameter optimization. In Shawe-Taylor, J., Zemel, R. S., Bartlett, P. L., Pereira, F. C. N., and Weinberger, K. Q., editors, NIPS, 2546-2554, 2011.
Bergstra, J.; Komer, B.; Eliasmith, C.; Yamins, D.; Cox, D. D. (2015). Hyperopt: a Python library for model selection and hyperparameter optimization. Computational Science and Discovery, 8(1), 014008, 2015.
Bergstra, J.; Yamins, D.; Cox, D. D. (2013). Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML’13, JMLR.org, I-115-I-123, 2013.
Chang, C.-C.; Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1-27:27, 2011.
Domhan, T.; Springenberg, J. T.; and Hutter, F. (2015). Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, AAAI Press, 3460-3468, 2015.
Florea, A. C.; Andonie, R. (2018). A dynamic early stopping criterion for random search in svm hyperparameter optimization. In Iliadis, L., Maglogiannis, I., and Plagianakos, V., editors, Artificial Intelligence Applications and Innovations, Cham. Springer International Publishing, 168-180, 2018.
Florea, A.-C.; Andonie, R. (2019).Weighted random search for hyperparameter optimization. International Journal of Computers Communications & Control, 14(2), 154-169, 2019.
He, K.; Zhang, X.; Ren, S.; Sun, J. (2016). Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778, 2016.
Hinz, T.; Navarro-Guerrero, N.; Magg, S.; Wermter, S. (2018). Speeding up the hyperparameter optimization of deep convolutional neural networks. International Journal of Computational Intelligence and Applications, 17(02), 1850008, 2018.
Hutter, F.; Hoos, H.; Leyton-Brown, K. (2014a). An efficient approach for assessing hyperparameter importance. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, 754-762, 2014.
Hutter, F.; Hoos, H.; Leyton-Brown, K. (2014b). An efficient approach for assessing hyperparameter importance. In Proceedings of International Conference on Machine Learning 2014 (ICML 2014), 754-762, 2014.
Ilievski, I.; Akhtar, T.; Feng, J.; Shoemaker, C. A. (2016). Hyperparameter optimization of deep neural networks using non-probabilistic RBF surrogate model. CoRR, abs/1607.08316, 2017.
Ilievski, I.; Akhtar, T.; Feng, J.; Shoemaker, C. A. (2017). Efficient hyperparameter optimization for deep learning algorithms using deterministic RBF surrogates. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., 822-829, 2017.
Kennedy, J.; Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of the IEEE International Conference on Neural Networks, pages 1942-1948, 1995.
Kotthoff, L.; Thornton, C.; Hoos, H. H.; Hutter, F.; Leyton-Brown, K. (2017). Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of Machine Learning Research, 18(25), 1-5, 2017.
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12, pages 1097-1105, USA. Curran Associates Inc., 2012.
Larson, J.; Menickelly, M.; Wild, S. M. (2019). Derivative-free optimization methods. Acta Numerica, 28, 287-404, 2019.
Lemley, J.; Jagodzinski, F.; Andonie, R. (2016). Big holes in big data: A monte carlo algorithm for detecting large hyper-rectangles in high dimensional data. In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), 1, 563-571, 2016.
Li, L.; Jamieson, K. G.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A. (2016). Efficient hyperparameter optimization and infinitely many armed bandits. CoRR, abs/1603.06560, 2016.
Liu, H.; Simonyan, K.; Yang, Y. (2018). DARTS: differentiable architecture search. CoRR, abs/1806.09055. 2018.
Loshchilov, I.; Hutter, F. (2016). CMA-ES for hyperparameter optimization of deep neural networks. CoRR, abs/1604.07269, 2016.
Luo, R.; Tian, F.; Qin, T.; Chen, E.; Liu, T.-Y. (2018). Neural architecture optimization. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems 31, Curran Associates, Inc, 7816-7827, 2018.
Martinez-Cantin, R. (2014). Bayesopt: A bayesian optimization library for nonlinear optimization, experimental design and bandits. Journal of Machine Learning Research, 15, 3915-3919, 2014.
Miikkulainen, R.; Liang, J.; Meyerson, E.; Rawal, A.; Fink, D.; Francon, O.; Raju, B.; Shahrzad, H.; Navruzyan, A.; Duffy, N.; Hodjat, B. (2019). Chapter 15 - evolving deep neural networks. In Kozma, R., Alippi, C., Choe, Y., and Morabito, F. C., editors, Artificial Intelligence in the Age of Neural Networks and Brain Computing, Academic Press, 293 - 312, 2019.
Miikkulainen, R.; Liang, J. Z.; Meyerson, E.; Rawal, A.; Fink, D.; Francon, O.; Raju, B.; Shahrzad, H.; Navruzyan, A.; Duffy, N.; Hodjat, B. (2017). Evolving deep neural networks. CoRR, abs/1703.00548, 2017.
Nelder, J. A.; Mead, R. (1965). A Simplex Method for Function Minimization. Computer Journal, 7, 308-313, 1965.
Patterson, J.; Gibson, A. (2017). Deep Learning: A Practitioner’s Approach. O’Reilly Media, Inc., 1st edition, 2017.
Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine learning. CoRR, abs/1811.12808, 2018.
Real, E.; Aggarwal, A.; Huang, Y.; Le, Q. V. (2018). Regularized evolution for image classifier architecture search. CoRR, abs/1802.01548, 2018.
Real, E.; Moore, S.; Selle, A.; Saxena, S.; Suematsu, Y. L.; Le, Q. V.; Kurakin, A. (2017). Large-scale evolution of image classifiers. CoRR, abs/1703.01041, 2017.
Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R. P.; de Freitas, N. (2016). Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1), 148-175, 2016.
Snoek, J.; Larochelle, H.; Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., 2960-2968, 2012.
Sobol, I. (1976). Uniformly distributed sequences with an additional uniform property. USSR Computational Mathematics and Mathematical Physics, 16(5), 236 - 242, 1976.
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. (2015). Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR), 2015.
Talathi, S. S. (2015). Hyper-parameter optimization of deep convolutional networks for object recognition. In 2015 IEEE International Conference on Image Processing (ICIP), 3982-3986, 2015.
Wu, B.; Dai, X.; Zhang, P.; Wang, Y.; Sun, F.; Wu, Y.; Tian, Y.; Vajda, P.; Jia, Y.; Keutzer, K. (2018). Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. CoRR, abs/1812.03443, 2018.
Young, S. R.; Rose, D. C.; Karnowski, T. P.; Lim, S.-H.; Patton, R. M. (2015). Optimizing deep learning hyper-parameters through an evolutionary algorithm. In Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, MLHPC ’15, New York, NY, USA. ACM, 4, 1-5, 2015.
Zeiler, M. D.; Fergus, R. (2014). Visualizing and understanding convolutional networks. In Fleet, D., Pajdla, T., Schiele, B., and Tuytelaars, T., editors, Computer Vision - ECCV 2014, Cham. Springer International Publishing, 818-833, 2014.
Zoph, B.; Le, Q. V. (2016). Neural architecture search with reinforcement learning. CoRR, abs/1611.01578, 2016.
[online] Optunity. Available: http://optunity.readthedocs.io/en/latest/. Accessed: 2017-09-01.
Published
Issue
Section
License
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.