Latent Semantic Analysis using a Dennis Coefficient for English Sentiment Classification in a Parallel System
Keywords:
English sentiment classification, parallel system, Cloudera, Hadoop Map and Hadoop Reduce, Dennis Measure, Latent Semantic AnalysisAbstract
We have already survey many significant approaches for many years because there are many crucial contributions of the sentiment classification which can be applied in everyday life, such as in political activities, commodity production, and commercial activities. We have proposed a novel model using a Latent Semantic Analysis (LSA) and a Dennis Coefficient (DNC) for big data sentiment classification in English. Many LSA vectors (LSAV) have successfully been reformed by using the DNC. We use the DNC and the LSAVs to classify 11,000,000 documents of our testing data set to 5,000,000 documents of our training data set in English. This novel model uses many sentiment lexicons of our basis English sentiment dictionary (bESD). We have tested the proposed model in both a sequential environment and a distributed network system. The results of the sequential system are not as good as that of the parallel environment. We have achieved 88.76% accuracy of the testing data set, and this is better than the accuracies of many previous models of the semantic analysis. Besides, we have also compared the novel model with the previous models, and the experiments and the results of our proposed model are better than that of the previous model. Many different fields can widely use the results of the novel model in many commercial applications and surveys of the sentiment classification.References
Bai, A.; Hammer, H.; Yazidi, A.; Engelstad, P. (2014); Constructing sentiment lexicons in Norwegian from a large text corpus, 2014 IEEE 17th International Conference on Computational Science and Engineering, 231-237, 2014.
Baldocchi, D.D.; Hincks, B.B.; Meyers, T.P.(1988); Measuring Biosphere-Atmosphere Exchanges of Biologically Related Gases with Micrometeorological Methods, Ecology society of America, 59(5), 1331-1340, 1988.
Choi, S.-S; Cha, S.-H.; Tappert, C.C. (2010); A Survey Of Binary Similarity And Distance Measures, Systemics, Cybernetics And Informatics, 8(1), 43-48, 2010.
Hofmann, T. (2001); Unsupervised Learning by Probabilistic Latent Semantic Analysis, Machine Learning, 42(1-2), 177-196, 2001. https://doi.org/10.1023/A:1007617005950
Koppel, D.E. (1972); Analysis of Macromolecular Polydispersity in Intensity Correlation Spectroscopy: The Method of Cumulants, The Journal of Chemical Physics, 57(11), 4814, 1972. https://doi.org/10.1063/1.1678153
Landauer, T.K.; Dumais, S. T. (1997); A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge, Psychological Review, 104(2), 211-240, 1997. https://doi.org/10.1037/0033-295X.104.2.211
Landauer, T.K.; Foltz, P. W.; Laham, D. (2009); An introduction to latent semantic analysis, Discourse Processes, 25(2-3), 259-284, 2009.
Ngoc, P.V.; Ngoc, C.V.T.; Ngoc, T.V.T. et al. (2017); A C4.5 algorithm for english emotional classification, Evolving Systems, 1-27, 2017.
Phu, V.N. ; Tuoi, P.T. (2014); Sentiment classification using Enhanced Contextual Valence Shifters, International Conference on Asian Language Processing (IALP), 224-229, 2014. https://doi.org/10.1109/IALP.2014.6973485
Phu, V.N.; Dat, N.D.; Tran, D.T.N.; Chau, V.T.N.; Nguyen, T.A.(2017); Fuzzy C-Means for English Sentiment Classification in a Distributed System, International Journal of Applied Intelligence, 45(3), 717-738 2017.
Phu, V.N.; Chau, V.T.N.; Tran, D.T.N. (2017); SVM for English Semantic Classification in Parallel Environment, International Journal of Speech Technology, 20(3), 487-508, 2017. https://doi.org/10.1007/s10772-017-9421-5
Phu, V.N.; Tran, V.T.N.; Chau, V.T.N. et al. (2017); A Decision Tree using ID3 Algorithm for English Semantic Analysis, International Journal of Speech Technology, 20(3), 593-613, 2017. https://doi.org/10.1007/s10772-017-9429-x
Phu, V.N.; Chau, V.T.N.; Tran, V.T.N. et al. (2017); A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics, International Journal of Artificial Intelligence Review (AIR), 1-67, 2017
Phu, V.N., Chau, V.T.N., Dat, N.D. et al. (2017); A Valences-Totaling Model for English Sentiment Classification, International Journal of Knowledge and Information Systems, 53(3), 579-636, 2017. https://doi.org/10.1007/s10115-017-1054-0
Phu, V.N.; Chau, V.T.N.; Tran, V.T.N(2017); Shifting Semantic Values of English Phrases for Classification, International Journal of Speech Technology, 20(3), 579-636, 2017.
Phu, V.N., Chau, V.T.N., Tran, V.T.N. et al. (2017); A Valence-Totaling Model for Vietnamese Sentiment Classification, International Journal of Evolving Systems, 1-47, 2017.
Phu, V.N., Tran, V.T.N., Chau, V.T.N. et al. (2017); Semantic Lexicons of English Nouns for Classification, International Journal of Evolving Systems, 1-69, 2017.
Turney, D. P.; Littman, M.L. (2002); Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus, arXiv:cs/0212012, Learning, 2002.
Cambridge English Dictionary (2017); http://dictionary.cambridge.org/
Longman English Dictionary (2017); http://www.ldoceonline.com/
Oxford English Dictionary (2017); http://www.oxforddictionaries.com/
Published
Issue
Section
License
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.