Recognizing A Complex Human Behaviour via A Shallow Neural Network with Zero Video Training Sample

Qianglai Xie; Wei Lu; Wei Yang; Keyun Xiong; Lei Zhang; Leiyue Yao

doi:10.15837/ijccc.2025.5.6882

Authors

Qianglai Xie the Center of Collaboration and Innovation, Jiangxi University of Technology, Jiangxi, China
Wei Lu the Center of Collaboration and Innovation, Jiangxi University of Technology, Jiangxi, China
Wei Yang the Center of Collaboration and Innovation, Jiangxi University of Technology, Jiangxi, China
Keyun Xiong College of Computer Science, Jiangxi University of Chinese Medicine, Jiangxi, China
Lei Zhang Hanlin Hangyu (Tianjin) Industrial Co., Ltd. Tianjin, China
Leiyue Yao Jiangxi University of Chinese Medicine

DOI:

https://doi.org/10.15837/ijccc.2025.5.6882

Keywords:

complex human behaviour recognition, temporal action localization, motion data structure, action encoding, skeleton-based action recognition

Abstract

In contrast to human action recognition (HAR), understanding complex human behaviour (CHB), consisting of multiple basic actions, poses a significant challenge for researchers due to its extended duration, numerous types, and substantial data-labeling expenses. In this paper, a new approach to recognize CHB from a semantic point of view is proposed, which can be roughly summarized as judging by action quantization and action combination similarity. To fully evaluate the effectiveness of our method, the self-collected dataset – HanYue Action3D is extended to become the first public skeleton-based dataset with complex behavior samples and temporal calibration. Experimental results have demonstrated the feasibility and universal superiority of our method. Moreover, our method’s zero-shot learning capability bridges the divide between laboratory settings and real-world applications.

Author Biography

Lei Zhang, Hanlin Hangyu (Tianjin) Industrial Co., Ltd. Tianjin, China

References

Z. Sun, Q. Ke, H. Rahmani, et al. Human Action Recognition from Various Data Modalities: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 6: 1-20. https://doi.org/10.1109/TPAMI.2022.3183112

Bakar A R. Advances in human action recognition: an updated survey. IET Image Processing. 2019, 13 (13): 2381 2394. https://doi.org/10.1049/iet-ipr.2019.0350

Moustaka V, Vakali A, Anthopoulos L G. A systematic review for smart city data analytics. ACM Computing Surveys, 2018, 51(5): 1 41. https://doi.org/10.1145/3239566

Buzachis A, Celesti A, Galletta A, et al. A multi-agent autonomous intersection management (MA-AIM) system for smart cities leveraging edge-of-things and blockchain. Information Sciences, 2020, 522: 148 163. https://doi.org/10.1016/j.ins.2020.02.059

Kelly J W, Klesel B C, Cherep L A. Visual stabilization of balance in virtual reality using the HTC vive. ACM Transactions on Applied Perception, 2019, 16(2): 1 11. https://doi.org/10.1145/3313902

Rahimi Moghadam K, Banigan C, Ragan E D. Scene transitions and teleportation in virtual reality and the implications for spatial awareness and sickness. IEEE Transactions on Visualization and Computer Graphics, 2020, 26(6): 2273 2287. https://doi.org/10.1109/TVCG.2018.2884468

S. Majumder and N. Kehtarnavaz. Vision and inertial sensing fusion for human action recognition: A review. IEEE Sensors. 2021,21(3): 2454-2467. https://doi.org/10.1109/JSEN.2020.3022326

Ma X, Li Z, Zhang L. An Improved ResNet-50 for Garbage Image Classification. Tehnicki vjesnik - Technical Gazette, 2022, 29 (5):1552-1559. https://doi.org/10.17559/TV-20220420124810

Macuzic S, Arsic B, Saveljic I, et al. Artificial Neural Network for Prediction of Seat-to-Head Frequency Response Function During Whole Body Vibrations in the Fore-and-Aft Direction.Tehnicki vjesnik - Technical Gazette, 2022, 29 (6):2001-2007. https://doi.org/10.17559/TV-20220207192647

Ullah A, Muhammad K, Del Ser J, Baik SW, de Albuquerque VHC. Activity Recognition Using Temporal Optical Flow Convolutional Features and Multilayer LSTM. IEEE Transactions on Industrial Electronics 2019, 66: 9692-9702. https://doi.org/10.1109/TIE.2018.2881943

Liu L, Shao L, Li X, Lu K. Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach. IEEE Transactions on Cybernetics 2016, 46: 158-170. https://doi.org/10.1109/TCYB.2015.2399172

Ijjina EP, Chalavadi KM. Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recognition 2017, 72: 504-516. https://doi.org/10.1016/j.patcog.2017.07.013

Zhang S, Chen E, Qi C, Liang C. Action Recognition Based on Sub-action Motion History Image and Static History Image. MATEC Web of Conferences 2016, 56: 2006. https://doi.org/10.1051/matecconf/20165602006

Abdelbaky A, Aly S. Human action recognition using short-time motion energy template images and PCANet features. Neural Computing and Applications 2020. https://doi.org/10.1007/s00521-020-04712-1

Vishwakarma, Dinesh Kumar, Singh K. Human Activity Recognition Based on Spatial Distribution of Gradients at Sublevels of Average Energy Silhouette Images. IEEE Transactions on Cognitive & Developmental Systems 2017, 9(4):316-327. https://doi.org/10.1109/TCDS.2016.2577044

Arivazhagan S, Shebiah RN, Harini R, Swetha S. Human action recognition from RGB-D data using complete local binary pattern. Cognitive Systems Research 2019, 58: 94-104. https://doi.org/10.1016/j.cogsys.2019.05.002

Chen Y, Wang L, Li C, Hou Y, Li W. ConvNets-based action recognition from skeleton motion maps. Multimedia Tools and Applications 2019. https://doi.org/10.1007/s11042-019-08261-1

Phyo CN, Zin TT, Tin P. Deep Learning for Recognizing Human Activities Using Motions of Skeletal Joints. IEEE Transactions on Consumer Electronics 2019, 65: 243-252. https://doi.org/10.1109/TCE.2019.2908986

Ahmad T, Mao H, Lin L, Tang G. Action Recognition Using Attention-Joints Graph Convolutional Neural Networks. IEEE Access 2020, 8: 305-313. https://doi.org/10.1109/ACCESS.2019.2961770

Caetano C, Bremond F, Schwartz WR. Skeleton Image Representation for 3D Action Recognition Based on Tree Structure and Reference Joints. 2019: 16-23. https://doi.org/10.1109/SIBGRAPI.2019.00011

Liang X, Zhang H, Zhang Y, Huang J. JTCR: Joint Trajectory Character Recognition for human action recognition. 2019: 350-353. https://doi.org/10.1109/ECICE47484.2019.8942672

Wang X, Gao L, Wang P, Sun X, Liu X. Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length. IEEE Transactions on Multimedia 2018, 20: 634-644. https://doi.org/10.1109/TMM.2017.2749159

Wang Y, Sun J. Video Human Action Recognition Algorithm Based on Double Branch 3D-CNN. International Congress on Image and Signal Processing, BioMedical Engineering and Informatics. 2022. https://doi.org/10.1109/CISP-BMEI56279.2022.9979858

Li J, Liu X, Zhang M, et al. Spatio-temporal deformable 3D ConvNets with attention for action recognition. Pattern Recognition, 2020, 98: 107037-107045. https://doi.org/10.1016/j.patcog.2019.107037

Dai C, Liu X, Lai J. Human action recognition using two-stream attention based LSTM networks. Applied Soft Computing, 2020, 86: 105820-105827. https://doi.org/10.1016/j.asoc.2019.105820

Wang R, Luo H, Wang Q, Li Z, Zhao F, Huang J. A Spatial-Temporal Positioning Algorithm Using Residual Network and LSTM. IEEE Transactions on Instrumentation and Measurement, 2020, 69: 9251-9261. https://doi.org/10.1109/TIM.2020.2998645

Li H, Shrestha A, Heidari H, Le Kernec J, Fioranelli F. Bi-LSTM Network for Multimodal Continuous Human Activity Recognition and Fall Detection. IEEE Sensors Journal 2020, 20: 1191-1201. https://doi.org/10.1109/JSEN.2019.2946095

Yadav N, Naik D. Generating Short Video Description using Deep-LSTM and Attention Mechanism. 2021: 1-6. https://doi.org/10.1109/I2CT51068.2021.9417907

Liu Z, Zhang H, Chen Z, et al. Disentangling and unifying graph convolutions for skeletonbased action recognition. IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp.143-152. https://doi.org/10.1109/CVPR42600.2020.00022

Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recognition. AAAI conference on artificial intelligence, 2018. https://doi.org/10.1609/aaai.v32i1.12328

Shi L, Zhang Y, Cheng J, et al. Skeleton-based action recognition with directed graph neural networks. IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.7912-7921. https://doi.org/10.1109/CVPR.2019.00810

Chi H, Ha M, Chi S, et al. InfoGCN: Representation Learning for Human Skeleton-based Action Recognition. IEEE Conference on Computer Vision and Pattern Recognition, 2022. https://doi.org/10.1109/CVPR52688.2022.01955

Xia H, Zhan Y. A Survey on Temporal Action Localization. IEEE Access 2020, 8: 70477-70487. https://doi.org/10.1109/ACCESS.2020.2986861

Z. Shou, D. Wang, and S.-F. Chang, "Temporal action localization in untrimmed videos via multi-stage CNNs," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 1049-1058. https://doi.org/10.1109/CVPR.2016.119

Chao Y, Vijayanarasimhan S, Seybold B, et al. Rethinking the faster r-cnn architecture for temporal action localization. IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1130-1139. https://doi.org/10.1109/CVPR.2018.00124

Lin T, Liu X, Li X, et al. Bmn: Boundary-matching network for temporal action proposal generation. IEEE International Conference on Computer Vision, 2019, pp. 3888-3897. https://doi.org/10.1109/ICCV.2019.00399

Liu Y, Ma L, Zhang Y F, et al. Multi-granularity generator for temporal action proposal. IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.3604-3613. https://doi.org/10.1109/CVPR.2019.00372

Zhao P S, Xie L X, Chen J, Zhang Y, et al. Bottom-up temporal action localization with mutual regularization. IEEE International Conference on Computer Vision, 2020, pp. 539-555. https://doi.org/10.1007/978-3-030-58598-3_32

Huang Y P, Dai Q, Lu Y T. Decoupling localization and classification in single shot temporal action detection. In ICME, 2019, pp. 1288-1293. https://doi.org/10.1109/ICME.2019.00224

Long F C, Yao T, Qiu Z F, et al. Gaussian temporal awareness networks for action localization. IEEE International Conference on Computer Vision, 2019, pp. 344-353. https://doi.org/10.1109/CVPR.2019.00043

Wang H R, Yu B S, Xia K, et al. Skeleton edge motion networks for human action recognition. Neurocomputing, 2021:1-12 https://doi.org/10.1016/j.neucom.2020.10.037

Y. Du, Y. Fu, and L. Wang, Skeleton based action recognition with convolutional neural network, IEEE Asian Conference on Pattern Recognition, 2015: 579-583. https://doi.org/10.1109/ACPR.2015.7486569

Z. Yang, Y. Li, J. Yang, J. Luo, Action Recognition with Spatio-Temporal Visual Attention on Skeleton Image Sequences, IEEE Transactions on Circuits and Systems for Video Technology 2018. 29 (8):2405-2415. https://doi.org/10.1109/TCSVT.2018.2864148

Yao L, Yang W, Huang W. A data augmentation method for human action recognition using dense joint motion images. Applied Soft Computing 2020, 97: 106713. https://doi.org/10.1016/j.asoc.2020.106713

Yao L Y, Yang W, Huang W, et al. Multi-scale feature learning and temporal probing strategy for one-stage temporal action localization. International Journal of Intelligent Systems, 2022, 6(1):1-10. https://doi.org/10.1002/int.22713

T. Lin, X. Zhao, H. Su, C. Wang, and M. Yang, "BSN: Boundary sensitive network for temporal action proposal generation," in Proc. 15th Eur. Conf. Comput. Vis. (ECCV), Munich, Germany, Sep. 2018, pp. 3-21. https://doi.org/10.1007/978-3-030-01225-0_1

T. Lin, X. Liu, X. Li, E. Ding, and S. Wen, "BMN: Boundary-matching network for temporal action proposal generation," in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, South Korea, Oct. 2019, pp. 3888-3897. https://doi.org/10.1109/ICCV.2019.00399

C. Lin et al., "Fast learning of temporal action proposal via dense boundary generator," in Proc. 34th AAAI Conf. Artif. Intell. (AAAI), New York, NY, USA, Feb. 2020, pp. 11499-11506. https://doi.org/10.1609/aaai.v34i07.6815

H. Su, W. Gan, W. Wu, Y. Qiao, and J. Yan, "BSN++: Complementary boundary regressor with scale-balanced relation modeling for temporal action proposal generation," in Proc. 35th AAAI Conf. Artif. Intell. (AAAI), Feb. 2021, pp. 2602-2610. https://doi.org/10.1609/aaai.v35i3.16363

Recognizing A Complex Human Behaviour via A Shallow Neural Network with Zero Video Training Sample

Authors

DOI:

Keywords:

Abstract

Author Biography

Lei Zhang, Hanlin Hangyu (Tianjin) Industrial Co., Ltd. Tianjin, China

References

Additional Files

Published

Issue

Section

License

Most read articles by the same author(s)