Research on Key Technology of Web Hierarchical Topic Detection and Evolution Based on Behaviour Tracking Analysis

Authors

  • Mo Chen Beijing Union University

Keywords:

Web hierarchical topic, topic detection, event evolution, behaviour tracking analysis

Abstract

In the development background of today’s big data era, the research direction of Web hierarchical topic detection and evolution characterized by the semistructured or unstructured data has caught wide attention for academicians. This paper proposes an idea of Web hierarchical topic detection and evolution based on behaviour tracking analysis taking the network big data as the research object, and expounds main implementation methods, which include the instance analysis of the usage mode, the instance analysis of the seed, the set analysis of similar instance supporting the topics, the set analysis of similar instance supporting the events, the evolution analysis of the event, and expounds the algorithm of Web hierarchical topic detection and evolution based on behaviour tracking analysis. The process of experimental analysis is organized as follows, first of all, the experiment analyses the quality of topic detection, the accuracy rate with the number of instance concerned and the seed threshold variation trend, the accuracy rate with the number of instance concerned and the probability threshold variation trend, secondly, the experiment analyses the quality of topic evolution, the accuracy rate with the variation trend of parameter adjustment, the accuracy rate with the number of instance concerned and the similar threshold variation trend, finally, the experiment analyses the time consuming to solve main research problem under different method, the qualitative result of topic detection and evolution under different data set. The results of experimental analysis show the idea is feasible, verifiable and superior, which plays a major role in reconfiguring Web hierarchical topic corpus and providing an intelligent big data warehouse for the network information evolution application.

References

Ahila, S.S.; Shunmuganathan, K.L. (2016). Role of Agent Technology in Web Usage Mining: Homomorphic Encryption Based Recommendation for E-commerce Applications, Wireless Personal Communications, 87(2), 499-512, 2016. https://doi.org/10.1007/s11277-015-3082-y

Alam, M.H.; Ryu, W.J.; Lee, S. (2017). Hashtag-Based Topic Evolution in Social Media, World Wide Web-Internet and Web Information Systems, 20(6), 1527-1549, 2017. https://doi.org/10.1007/s11280-017-0451-3

Aujla, G.S.; Kumar, N.; Zomaya, A.Y. (2018). Optimal Decision Making for Big Data Processing at Edge-Cloud Environment: An SDN Perspective, IEEE Transactions on Industrial Informatics, 14(2), 778-782, 2018. https://doi.org/10.1109/TII.2017.2738841

Chen, B.T.; Tsutsui, S.; Ding, Y.; Ma, F.C. (2017). Understanding the Topic Evolution in a Scientific Domain: an Exploratory Study for the Field of Information Retrieval, Journal of Informetrics, 11(4), 1175-1189, 2017. https://doi.org/10.1016/j.joi.2017.10.003

Chen, M.; Yang, X.P. (2016). Research on Model of Network Information Extraction Based on Improved Topic-Focused Web Crawler Key Technology, Tehnicki vjesnik/Technical Gazette, 23(4), 49-54, 2016. https://doi.org/10.17559/TV-20150314134638

Chen, M.; Yang, X.P.; Sun, M.; Zhao, Y. (2014). Research on Model of Network Information Currency Evaluation Based on Web Semantic Extraction Method, International Journal of Future Generation Communication and Networking, 7(2), 103-116, 2014. https://doi.org/10.14257/ijfgcn.2014.7.2.11

Chen, Y.; Zhang, H.; Liu, R.; Ye, Z.W.; Lin, J.Y. (2019). Experimental Explorations on Short Text Topic Mining Between LDA and NMF Based Schemes, Knowledge-Based Systems, 163, 1-3, 2019. https://doi.org/10.1016/j.knosys.2018.08.011

Dai, Y.; Wu, W.; Zhou, H.B.; Zhang, J.; Ma, F.Y. (2018). Numerical Simulation and Optimization of Oil Jet Lubrication for Rotorcraft Meshing Gears, International Journal of Simulation Modelling, 17(2), 318-326, 2018. https://doi.org/10.2507/IJSIMM17(2)CO6

Dai, Y.; Zhu, X.; Zhou, H.; Mao, Z.; Wu, W. (2018). Trajectory Tracking Control for Seafloor Tracked Vehicle by Adaptive Neural-Fuzzy Inference System Algorithm, International Journal of Computers Communications & Control, 13(4), 465-476, 2018. https://doi.org/10.15837/ijccc.2018.4.3267

Du, J.; Sun, Y.; Ren, H. (2018). The Relationship of Delivery Frequency with the Cost and Resource Operational Efficiency: A Case Study of Jingdong Logistics, Mathematics and Computer Science, 3(6), 129-140, 2018.

Fatima, B.; Ramzan, H.; Asghar, S. (2016). Session Identification Techniques Used in Web Usage Mining a Systematic Mapping of Scholarly Literature, Online Information Review, 40(7), 1033-1053, 2016. https://doi.org/10.1108/OIR-08-2015-0274

Gaul, W.G.; Vincent, D. (2017). Evaluation of the Evolution of Relationships between Topics over Time, Advances in Data Analysis and Classification, 11(1), 159-178, 2017. https://doi.org/10.1007/s11634-016-0241-2

Jimenez-Marquez, J.L.; Gonzalez-Carrasco, I.; Lopez-Cuadrado, J.L.; Ruiz-Mezcua, B. (2019). Towards a Big Data Framework for Analysing Social Media Content, International Journal of Information Management, 44, 1-3, 2019. https://doi.org/10.1016/j.ijinfomgt.2018.09.003

Kaseb, M.R.; Khafagy, M.H.; Ali, I.A.; Saad, E.M. (2019). An Improved Technique for Increasing Availability in Big Data Replication, Future Generation Computer Systems-The International Journal of Escience, 91, 493-497, 2019. https://doi.org/10.1016/j.future.2018.08.015

Kausel, E.E. (2018). Big Data at Work: The Data Science Revolution and Organizational Psychology, Personnel Psychology, 71(1), 135-136, 2018. https://doi.org/10.1111/peps.12255

Kho, N.D. (2018). The State of Big Data, Econtent, 41(1), 11-12, 2018. https://doi.org/10.1007/978-3-319-63962-8_255-1

Liu, J.; Fang, C.; Ansari, N. (2016). Request Dependency Graph: a Model for Web Usage Mining in Large-Scale Web of Things, IEEE Internet of Things Journal, 3(4), 598-608, 2016. https://doi.org/10.1109/JIOT.2015.2452964

Makkie, M.; Huang, H.; Zhao, Y.; Vasilakos, A.V.; Liu, T.M. (2019). Fast and Scalable Distributed Deep Convolutional Autoencoder for fMRI Big Data Analytics, Neurocomputing, 325, 20-22, 2019. https://doi.org/10.1016/j.neucom.2018.09.066

Osman, A.M.S. (2019). A Novel Big Data Analytics Framework for Smart Cities, Future Generation Computer Systems-The International Journal of Escience, 91, 620-623, 2019. https://doi.org/10.1016/j.future.2018.06.046

O'Halloran, K.L.; Tan, S.; Duc-Son, P. (2018). A Digital Mixed Methods Research Design: Integrating Multimodal Analysis with Data Mining and Information Visualization for Big Data Analytics, Journal of Mixed Methods Research, 12(1), 11-15, 2018. https://doi.org/10.1177/1558689816651015

Pandian, P.S.; Srinivasan, S. (2016). A Unified Model for Preprocessing and Clustering Technique for Web Usage Mining, Journal of Multiple-Valued Logic and Soft Computing, 26(3), 205-220, 2016.

Sagi, T.; Gal, A. (2018). Non-Binary Evaluation Measures for Big Data Integration, VLDB Journal, 27(1), 105-110, 2018. https://doi.org/10.1007/s00778-017-0489-y

Tran, Q.T.; Nguyen, S.D.; Seo, T.I. (2019). Algorithm for Estimating Online Bearing Fault Upon the Ability to Extract Meaningful Information From Big Data of Intelligent Structures, IEEE Transactions on Industrial Electronics, 66(5), 3804-3806, 2019. https://doi.org/10.1109/TIE.2018.2847704

Uma, R.; Muneeswaran, K. (2017). OMIR: Ontology-Based Multimedia Information Retrieval System for Web Usage Mining, Cybernetics and Systems, 48(4), 393-414, 2017. https://doi.org/10.1080/01969722.2017.1285163

Wu, P.J.; Lin, K.C. (2018); Unstructured Big Data Analytics for Retrieving E-Commerce Logistics Knowledge, Telematics and Informatics, 35(1), 237-241, 2018. https://doi.org/10.1016/j.tele.2017.11.004

Yao, L.; Ge, Z.Q. (2019). Scalable Semisupervised GMM for Big Data Quality Prediction in Multimode Processes, IEEE Transactions on Industrial Electronics, 66(5), 3681-3684, 2019. https://doi.org/10.1109/TIE.2018.2856200

Zhang, D. (2017). High-Speed Train Control System Big Data Analysis Based on Fuzzy RDF Model and Uncertain Reasoning, International Journal of Computers Communications & Control, 12(4), 11-15, 2017. https://doi.org/10.15837/ijccc.2017.4.2914

Zhang, D.; Sui, J.; Gong, Y. (2017). Large Scale Software Test Data Generation Based on Collective Constraint and Weighted Combination Method, Tehnicki Vjesnik, 24(4), 1041- 1050, 2017. https://doi.org/10.17559/TV-20170319045945

Zhang, D.; Jin, D.; Gong, Y. (2015). Research of Alarm Correlations Based on Static Defect Detection, Tehnicki vjesnik, 22(2), 311-318, 2015. https://doi.org/10.17559/TV-20150317102804

Zhou, H.K.; Yu, H.M.; Hu, R. (2017). Topic Discovery and Evolution in Scientific Literature Based on Content and Citations, Frontiers of Information Technology & Electronic Engineering, 18(10), 1511-1524, 2017. https://doi.org/10.1631/FITEE.1601125

Zhou, H.K.; Yu, H.M.; Hu, R. (2017). Topic Evolution Based on the Probabilistic Topic Model: a Review, Frontiers of Computer Science, 11(5), 786-802, 2017. https://doi.org/10.1007/s11704-016-5442-5

Published

2019-05-31

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.