EfficientNet Convolutional Neural Network with Gram Matrices Modules for Predicting Sadness Emotion

Modestas Motiejauskas; Gintautas Dzemyda

doi:10.15837/ijccc.2024.5.6697

Authors

Modestas Motiejauskas Institute of Data Science and Digital Technologies, Vilnius University, Lithuania
Gintautas Dzemyda Institute of Data Science and Digital Technologies, Vilnius University, Lithuania

DOI:

https://doi.org/10.15837/ijccc.2024.5.6697

Keywords:

EfficientNetV2, Gram matrix, emotion prediction, images of general nature, sadness emotion

Abstract

Images are becoming an attractive area of emotional analysis. Recognising emotions in the images of general nature is gaining more and more research attention. Such emotion recognition is more sophisticated and different from conventional computer tasks. Due to human subjectivity, ambiguous judgments, cultural and personal differences, there is no an unambiguous model for such emotion assessment. In this paper, we have chosen sadness as the main emotion, which has significant impact to the richness of human experience and the depth of personal meaning. The main hypothesis of our research is that by extending the capabilities of convolutional neural networks to integrate both deep and shallow layer feature maps, it is possible to improve the detection of sadness emotion in images. We have suggested integration of the different convolutional layers by taking the learned features from the selected layers and applying a pairwise operation to compute the Gram matrices of feature sub-maps. Our findings show that this approach improves the network’s ability to recognize sadness in the context of binary classification, resulting in a higher emotion recognition accuracy. We experimentally evaluated the proposed network for the stated binary classification problem under different parameters and datasets. The results demonstrate that the improved network achieves improved accuracy as compared to the baseline (EfficientNetV2) and the previous state-of-the-art model.

References

Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Springer Series in Information Science and Statistics. Springer, 2006. ISBN: 978-0387310732.

Chen, T.; Borth, D.; Darrell, T.; and Chang, S.F. (2014). Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks, arXiv Preprint arXiv:1410.8586, 2014.

Chen, L.-C.; Papandreou, G.; Schroff, F.; and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation, arXiv Preprint arXiv:1706.05587, 2017.

Chollet, F. (2016). Xception: Deep learning with depthwise separable convolutions, arXiv Preprint arXiv:1610.02357, 2016. Presented in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1800–1807, 2017.

Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. (2020). Randaugment: Practical automated data augmentation with a reduced search space, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, IEEE, 702–703, 2020.

Dellandrea, E.; Liu, N.; and Chen, L. (2010). Classification of affective semantics in images based on discrete and dimensional models of emotions, In: 2010 International Workshop on Content Based Multimedia Indexing (CBMI), IEEE, https://doi.org/10.1109/CBMI.2010.5529906, 2010.

Deshmukh, R.S.; Jagtap, V.; Paygude, S. (2017). Facial emotion recognition system through machine learning approach, In: 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), 272–277, 2017. https://doi.org/10.1109/ICCONS.2017.8250725.

Duchi, J.D.; Hazan, E.; Singer, Y. (2011). Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, Journal of Machine Learning Research, 12, 2121–2159, 2011.

Goutte, C.; Gaussier, E. (2005). A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation, In: Advances in Information Retrieval, Springer Berlin Heidelberg, 345–359, 2005.

He, K.; Zhang, X.; Ren, S.; Sun, J. (2016). Deep residual learning for image recognition, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.

He, X.; Zhang, W. (2018). Emotion recognition by assisted learning with convolutional neural networks, Neurocomputing, 291, 187–194, 2018. https://doi.org/10.1016/j.neucom.2018.02.073.

Hu, J.; Shen, L.; Sun, G. (2018). Squeeze-and-excitation networks, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7132–7141, 2018.

Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. (2017). Densely connected convolutional networks, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4700–4708, 2017.

Iqbal, J. L. M.; Kumar, M. S.; Mishra, G.; Asha, G. R.; Saritha, A. N.; Karthik, A.; Kotaiah, N. B. (2023). Facial emotion recognition using geometrical features based deep learning techniques, International Journal of Computers, Communications and Control, 18(4). https://doi.org/10.15837/ijccc.2023.4.4644.

Janssens, A.C.J.W.; Martens, F.K. (2020). Reflection on modern methods: Revisiting the area under the ROC Curve, International Journal of Epidemiology, 49(4), 1397–1403, 2020. https://doi.org/10.1093/ije/dyz274.

Johnson, J.; Karpathy, A.; Fei-Fei, L. (2016). Densecap: Fully convolutional localization networks for dense captioning, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 4565–4574, 2016.

Karbauskaite, R.; Sakalauskas, L.; Dzemyda, G. (2020). Kriging predictor for facial emotion recognition using numerical proximities of human emotions, Informatica, 31(2), 249–275, 2020. https://doi.org/10.15388/20-INFOR419.

Levine, S.; Pastor, P.; Krizhevsky, A.; Ibarz, J.; Quillen, D. (2018). Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, The International Journal of Robotics Research, 37(4–5), 421–436, 2018.

Maas, A.L.; Hannun, A.Y.; Ng, A.Y.; et al. (2013). Rectifier nonlinearities improve neural network acoustic models, In: Proc. International Conference on Machine Learning (ICML), 2013.

Panda, R.; Zhang, J.; Li, H.; Lee, J.-Y.; Lu, X.; Roy-Chowdhury, A.K. (2018). Contemplating visual emotions: Understanding and overcoming dataset bias, arXiv Preprint arXiv:1807.03797, 2018.

Polycarpou, M.M. (2008); Editorial: A new era for the IEEE Transactions on Neural Networks, IEEE Transactions on Neural Networks, 19(1), 1–2, 2008. https://doi.org/10.1109/TNN.2007.915293.

Revina, I.M.; Emmanuel, W.R.S. (2018). A survey on human face expression recognition techniques, Journal of King Saud University – Computer and Information Sciences, (2018). https://doi.org/10.1016/j.jksuci.2018.09.002.

Ronneberger, O.; Fischer, P.; Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation, In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III, Springer, 234–241, 2015.

Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A.C.; Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge, International Journal of Computer Vision, 115, 211–252, 2015. https://doi.org/10.1007/s11263-015-0816-y.

Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4510–4520, 2018.

Smith, S. L.; Kindermans, P.-J.; Ying, C.; Le, Q. V. (2017). Don’t decay the learning rate, increase the batch size, arXiv preprint arXiv:1711.00489,

Shao, J.; Yongsheng, Q. (2019). Three convolutional neural network models for facial expression recognition in the wild, Neurocomputing, 355, (2019). https://doi.org/10.1016/j.neucom.2019.05.005.

Simonyan, K.; Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition, arXiv Preprint arXiv:1409.1556, 2014.

Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, 15(1), 1929–1958, 2014.

Tan, M.; Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks, In: International Conference on Machine Learning, 6105–6114, PMLR, 2019.

Tan, M.; Le, Q.V. (2021). EfficientNetV2: Smaller Models and Faster Training, arXiv Preprint arXiv:2104.00298, 2021.

Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1701–1708, 2014.

Yang, J.; She, D.; Sun, M. (2017). Joint image emotion classification and distribution learning via deep convolutional neural network, In: IJCAI’17: Proceedings of the 26th International Joint Conference on Artificial Intelligence, 3266–3272, 2017.

Yang, H.; Fan, Y.; Lv, G.; Liu, S.; Guo, Z. (2022). Exploiting Emotional Concepts for Image Emotion Recognition, Visual Computer, (2022). https://doi.org/10.1007/s00371-022-02472-8.

You, Q.; Luo, J.; Jin, H.; Yang, J. (2016). Building a large scale dataset for image emotion recognition: The fine print and the benchmark, In: Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), 2016.

Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization, arXiv Preprint arXiv:1710.09412, 2017.

Zhang, H.; Liu, Y.; Xu, D.; He, K.; Peng, G.; Yue, Y.; Liu, R. (2022). Learning multi-level representations for image emotion recognition in the deep convolutional network, SPIE-Intl Soc Optical Eng, 91, (2022). https://doi.org/10.1117/12.2623414.

Zhao, G.; Yang, H.; Tu, B.; Zhang, L. (2021). A survey on image emotion recognition, Journal of Information Processing Systems, 17(6), (2021).