Recognizing A Complex Human Behaviour via A Shallow Neural Network with Zero Video Training Sample
DOI:
https://doi.org/10.15837/ijccc.2025.5.6882Keywords:
complex human behaviour recognition, temporal action localization, motion data structure, action encoding, skeleton-based action recognitionAbstract
In contrast to human action recognition (HAR), understanding complex human behaviour (CHB), consisting of multiple basic actions, poses a significant challenge for researchers due to its extended duration, numerous types, and substantial data-labeling expenses. In this paper, a new approach to recognize CHB from a semantic point of view is proposed, which can be roughly summarized as judging by action quantization and action combination similarity. To fully evaluate the effectiveness of our method, the self-collected dataset – HanYue Action3D is extended to become the first public skeleton-based dataset with complex behavior samples and temporal calibration. Experimental results have demonstrated the feasibility and universal superiority of our method. Moreover, our method’s zero-shot learning capability bridges the divide between laboratory settings and real-world applications.
References
Z. Sun, Q. Ke, H. Rahmani, et al. Human Action Recognition from Various Data Modalities: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 6: 1-20. https://doi.org/10.1109/TPAMI.2022.3183112
Bakar A R. Advances in human action recognition: an updated survey. IET Image Processing. 2019, 13 (13): 2381 2394. https://doi.org/10.1049/iet-ipr.2019.0350
Moustaka V, Vakali A, Anthopoulos L G. A systematic review for smart city data analytics. ACM Computing Surveys, 2018, 51(5): 1 41. https://doi.org/10.1145/3239566
Buzachis A, Celesti A, Galletta A, et al. A multi-agent autonomous intersection management (MA-AIM) system for smart cities leveraging edge-of-things and blockchain. Information Sciences, 2020, 522: 148 163. https://doi.org/10.1016/j.ins.2020.02.059
Kelly J W, Klesel B C, Cherep L A. Visual stabilization of balance in virtual reality using the HTC vive. ACM Transactions on Applied Perception, 2019, 16(2): 1 11. https://doi.org/10.1145/3313902
Rahimi Moghadam K, Banigan C, Ragan E D. Scene transitions and teleportation in virtual reality and the implications for spatial awareness and sickness. IEEE Transactions on Visualization and Computer Graphics, 2020, 26(6): 2273 2287. https://doi.org/10.1109/TVCG.2018.2884468
S. Majumder and N. Kehtarnavaz. Vision and inertial sensing fusion for human action recognition: A review. IEEE Sensors. 2021,21(3): 2454-2467. https://doi.org/10.1109/JSEN.2020.3022326
Ma X, Li Z, Zhang L. An Improved ResNet-50 for Garbage Image Classification. Tehnicki vjesnik - Technical Gazette, 2022, 29 (5):1552-1559. https://doi.org/10.17559/TV-20220420124810
Macuzic S, Arsic B, Saveljic I, et al. Artificial Neural Network for Prediction of Seat-to-Head Frequency Response Function During Whole Body Vibrations in the Fore-and-Aft Direction.Tehnicki vjesnik - Technical Gazette, 2022, 29 (6):2001-2007. https://doi.org/10.17559/TV-20220207192647
Ullah A, Muhammad K, Del Ser J, Baik SW, de Albuquerque VHC. Activity Recognition Using Temporal Optical Flow Convolutional Features and Multilayer LSTM. IEEE Transactions on Industrial Electronics 2019, 66: 9692-9702. https://doi.org/10.1109/TIE.2018.2881943
Liu L, Shao L, Li X, Lu K. Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach. IEEE Transactions on Cybernetics 2016, 46: 158-170. https://doi.org/10.1109/TCYB.2015.2399172
Ijjina EP, Chalavadi KM. Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recognition 2017, 72: 504-516. https://doi.org/10.1016/j.patcog.2017.07.013
Zhang S, Chen E, Qi C, Liang C. Action Recognition Based on Sub-action Motion History Image and Static History Image. MATEC Web of Conferences 2016, 56: 2006. https://doi.org/10.1051/matecconf/20165602006
Abdelbaky A, Aly S. Human action recognition using short-time motion energy template images and PCANet features. Neural Computing and Applications 2020. https://doi.org/10.1007/s00521-020-04712-1
Vishwakarma, Dinesh Kumar, Singh K. Human Activity Recognition Based on Spatial Distribution of Gradients at Sublevels of Average Energy Silhouette Images. IEEE Transactions on Cognitive & Developmental Systems 2017, 9(4):316-327. https://doi.org/10.1109/TCDS.2016.2577044
Arivazhagan S, Shebiah RN, Harini R, Swetha S. Human action recognition from RGB-D data using complete local binary pattern. Cognitive Systems Research 2019, 58: 94-104. https://doi.org/10.1016/j.cogsys.2019.05.002
Chen Y, Wang L, Li C, Hou Y, Li W. ConvNets-based action recognition from skeleton motion maps. Multimedia Tools and Applications 2019. https://doi.org/10.1007/s11042-019-08261-1
Phyo CN, Zin TT, Tin P. Deep Learning for Recognizing Human Activities Using Motions of Skeletal Joints. IEEE Transactions on Consumer Electronics 2019, 65: 243-252. https://doi.org/10.1109/TCE.2019.2908986
Ahmad T, Mao H, Lin L, Tang G. Action Recognition Using Attention-Joints Graph Convolutional Neural Networks. IEEE Access 2020, 8: 305-313. https://doi.org/10.1109/ACCESS.2019.2961770
Caetano C, Bremond F, Schwartz WR. Skeleton Image Representation for 3D Action Recognition Based on Tree Structure and Reference Joints. 2019: 16-23. https://doi.org/10.1109/SIBGRAPI.2019.00011
Liang X, Zhang H, Zhang Y, Huang J. JTCR: Joint Trajectory Character Recognition for human action recognition. 2019: 350-353. https://doi.org/10.1109/ECICE47484.2019.8942672
Wang X, Gao L, Wang P, Sun X, Liu X. Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length. IEEE Transactions on Multimedia 2018, 20: 634-644. https://doi.org/10.1109/TMM.2017.2749159
Wang Y, Sun J. Video Human Action Recognition Algorithm Based on Double Branch 3D-CNN. International Congress on Image and Signal Processing, BioMedical Engineering and Informatics. 2022. https://doi.org/10.1109/CISP-BMEI56279.2022.9979858
Li J, Liu X, Zhang M, et al. Spatio-temporal deformable 3D ConvNets with attention for action recognition. Pattern Recognition, 2020, 98: 107037-107045. https://doi.org/10.1016/j.patcog.2019.107037
Dai C, Liu X, Lai J. Human action recognition using two-stream attention based LSTM networks. Applied Soft Computing, 2020, 86: 105820-105827. https://doi.org/10.1016/j.asoc.2019.105820
Wang R, Luo H, Wang Q, Li Z, Zhao F, Huang J. A Spatial-Temporal Positioning Algorithm Using Residual Network and LSTM. IEEE Transactions on Instrumentation and Measurement, 2020, 69: 9251-9261. https://doi.org/10.1109/TIM.2020.2998645
Li H, Shrestha A, Heidari H, Le Kernec J, Fioranelli F. Bi-LSTM Network for Multimodal Continuous Human Activity Recognition and Fall Detection. IEEE Sensors Journal 2020, 20: 1191-1201. https://doi.org/10.1109/JSEN.2019.2946095
Yadav N, Naik D. Generating Short Video Description using Deep-LSTM and Attention Mechanism. 2021: 1-6. https://doi.org/10.1109/I2CT51068.2021.9417907
Liu Z, Zhang H, Chen Z, et al. Disentangling and unifying graph convolutions for skeletonbased action recognition. IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp.143-152. https://doi.org/10.1109/CVPR42600.2020.00022
Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recognition. AAAI conference on artificial intelligence, 2018. https://doi.org/10.1609/aaai.v32i1.12328
Shi L, Zhang Y, Cheng J, et al. Skeleton-based action recognition with directed graph neural networks. IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.7912-7921. https://doi.org/10.1109/CVPR.2019.00810
Chi H, Ha M, Chi S, et al. InfoGCN: Representation Learning for Human Skeleton-based Action Recognition. IEEE Conference on Computer Vision and Pattern Recognition, 2022. https://doi.org/10.1109/CVPR52688.2022.01955
Xia H, Zhan Y. A Survey on Temporal Action Localization. IEEE Access 2020, 8: 70477-70487. https://doi.org/10.1109/ACCESS.2020.2986861
Z. Shou, D. Wang, and S.-F. Chang, "Temporal action localization in untrimmed videos via multi-stage CNNs," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 1049-1058. https://doi.org/10.1109/CVPR.2016.119
Chao Y, Vijayanarasimhan S, Seybold B, et al. Rethinking the faster r-cnn architecture for temporal action localization. IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1130-1139. https://doi.org/10.1109/CVPR.2018.00124
Lin T, Liu X, Li X, et al. Bmn: Boundary-matching network for temporal action proposal generation. IEEE International Conference on Computer Vision, 2019, pp. 3888-3897. https://doi.org/10.1109/ICCV.2019.00399
Liu Y, Ma L, Zhang Y F, et al. Multi-granularity generator for temporal action proposal. IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.3604-3613. https://doi.org/10.1109/CVPR.2019.00372
Zhao P S, Xie L X, Chen J, Zhang Y, et al. Bottom-up temporal action localization with mutual regularization. IEEE International Conference on Computer Vision, 2020, pp. 539-555. https://doi.org/10.1007/978-3-030-58598-3_32
Huang Y P, Dai Q, Lu Y T. Decoupling localization and classification in single shot temporal action detection. In ICME, 2019, pp. 1288-1293. https://doi.org/10.1109/ICME.2019.00224
Long F C, Yao T, Qiu Z F, et al. Gaussian temporal awareness networks for action localization. IEEE International Conference on Computer Vision, 2019, pp. 344-353. https://doi.org/10.1109/CVPR.2019.00043
Wang H R, Yu B S, Xia K, et al. Skeleton edge motion networks for human action recognition. Neurocomputing, 2021:1-12 https://doi.org/10.1016/j.neucom.2020.10.037
Y. Du, Y. Fu, and L. Wang, Skeleton based action recognition with convolutional neural network, IEEE Asian Conference on Pattern Recognition, 2015: 579-583. https://doi.org/10.1109/ACPR.2015.7486569
Z. Yang, Y. Li, J. Yang, J. Luo, Action Recognition with Spatio-Temporal Visual Attention on Skeleton Image Sequences, IEEE Transactions on Circuits and Systems for Video Technology 2018. 29 (8):2405-2415. https://doi.org/10.1109/TCSVT.2018.2864148
Yao L, Yang W, Huang W. A data augmentation method for human action recognition using dense joint motion images. Applied Soft Computing 2020, 97: 106713. https://doi.org/10.1016/j.asoc.2020.106713
Yao L Y, Yang W, Huang W, et al. Multi-scale feature learning and temporal probing strategy for one-stage temporal action localization. International Journal of Intelligent Systems, 2022, 6(1):1-10. https://doi.org/10.1002/int.22713
T. Lin, X. Zhao, H. Su, C. Wang, and M. Yang, "BSN: Boundary sensitive network for temporal action proposal generation," in Proc. 15th Eur. Conf. Comput. Vis. (ECCV), Munich, Germany, Sep. 2018, pp. 3-21. https://doi.org/10.1007/978-3-030-01225-0_1
T. Lin, X. Liu, X. Li, E. Ding, and S. Wen, "BMN: Boundary-matching network for temporal action proposal generation," in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, South Korea, Oct. 2019, pp. 3888-3897. https://doi.org/10.1109/ICCV.2019.00399
C. Lin et al., "Fast learning of temporal action proposal via dense boundary generator," in Proc. 34th AAAI Conf. Artif. Intell. (AAAI), New York, NY, USA, Feb. 2020, pp. 11499-11506. https://doi.org/10.1609/aaai.v34i07.6815
H. Su, W. Gan, W. Wu, Y. Qiao, and J. Yan, "BSN++: Complementary boundary regressor with scale-balanced relation modeling for temporal action proposal generation," in Proc. 35th AAAI Conf. Artif. Intell. (AAAI), Feb. 2021, pp. 2602-2610. https://doi.org/10.1609/aaai.v35i3.16363
Additional Files
Published
Issue
Section
License
Copyright (c) 2025 Leiyue Yao

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.