An Ensemble Machine Learning Approach to Understanding the Effect of a Global Pandemic on Twitter Users’ Attitudes

Bokang Jia; Domnica Dzitac; Samridha Shrestha; Komiljon Turdaliev; Nurgazy Seidaliev

Authors

Bokang Jia New York University Abu Dhabi
Domnica Dzitac New York University Abu Dhabi
Samridha Shrestha New York University Abu Dhabi
Komiljon Turdaliev New York University Abu Dhabi,
Nurgazy Seidaliev New York University Abu Dhabi

Keywords:

COVID-19, Coronavirus, Machine Learning, Natural Language Processing, Automatic Hate-Speech Detection, Racism

Abstract

It is thought that the COVID-19 outbreak has significantly fuelled racism and discrimination, especially towards Asian individuals[10]. In order to test this hypothesis, in this paper, we build upon existing work in order to classify racist tweets before and after COVID-19 was declared a global pandemic. To overcome the difficult linguistic and unbalanced nature of the classification task, we combine an ensemble of machine learning techniques such as a Linear Support Vector Classifiers, Logistic Regression models, and Deep Neural Networks. We fill the gap in existing literature by (1) using a combined Machine Learning approach to understand the effect of COVID-19 on Twitter users’ attitudes and by (2) improving on the performance of automatic racism detectors. Here we show that there has not been a sharp increase in racism towards Asian people on Twitter and that users that posted racist Tweets before the pandemic are prone to post an approximately equal amount during the outbreak. Previous research on racism and other virus outbreaks suggests that racism towards communities associated with the region of the origin of the virus is not exclusively attributed to the outbreak but rather it is a continued symptom of deep-rooted biases towards minorities[13]. Our research supports these previous findings. We conclude that the COVID-19 outbreak is an additional outlet to discriminate against Asian people, instead of it being the main cause.

References

[1] Aken, B., Risch, J., Lí¶ser, A. (2018). Challenges for Toxic Comment Classification: An In-Depth Error Analysis CoRR , DOI: 10.18653/v1/W18-5105 https://doi.org/10.18653/v1/W18-5105

[2] Atkenson, A. (2020). What Will Be the Economic Impact of COVID-19 in the US? Rough Estimates of Disease Scenarios National Bureau of Economic Research, DOI: 10.3386/w26867 https://doi.org/10.3386/w26867

[3] Chen, E., Lerman, K., Ferrara, E. (2020). COVID-19: The First Public Coronavirus Twitter Dataset JMIR Public Health Surveill , DOI: 10.2196/19273 https://doi.org/10.2196/19273

[4] Davidson T., Warmsley D., Macy M., Weber I. (2017). Automated Hate Speech Detection and the Problem of Offensive Language, AAAI Publications, Eleventh International AAAI Conference on Web and Social Media, DOI: http://arxiv.org/abs/1703.04009

[5] Devakumar, D., Shannon, G., Bhopal, S. S., Abubakar, I.(2020). Racism and discrimination in COVID-19 responses The Lancet, DOI: https://doi.org/10.1016/S0140-6736(20)30792-3

[6] Godin, F., Vandersmissen, B., De Neve, W., Van de Walle, R. (2015). Named Entity Recognition for Twitter Microposts using Distributed Word Representations Proceedings of the Workshop on Noisy User-generated Text , DOI: 10.18653/v1/W15-4322 https://doi.org/10.18653/v1/W15-4322

[7] Hanasoge, S., Horiuchi, N., Huang, C., Jia, H., Kim, N. Y., Murao, M., Seo, M., Tan, R., Wilkinson, J. (2020). Visibility challenges for Asian scientists Nature Reviews Physics, DOI: https://doi.org/10.1038/s42254-020-0162-z

[8] Kwok, I., Wang, Y. (2013). Locate the Hate: Detecting Tweets against Blacks, AAAI Publications, Twenty-Seventh AAAI Conference on Artificial Intelligence , DOI: 10.5555/2891460.2891697

[9] Li J., Guo K., Viedma E. H., Lee H., Liu J., Zhong N., Gomes L. F. A. M., Filip F.G., Fang SC., í–zdemir M.S., Liu X., Lu G., Shi Y. (2020), Culture versus Policy: More Global Collaboration to Effectively Combat COVID-19 The Innovation, Volume 1, Issue 2 https://doi.org/10.1016/j.xinn.2020.100023

[10] Nature (2020). Stop the coronavirus stigma now Nature 580, 165, DOI: https://doi.org/10.1038/d41586-020-01009-0

[11] Pitsilis, G., Ramampiaro, H., Langseth, H. (2018). Effective hate-speech detection in Twitter data using recurrent neural networks Appl Intell 48, DOI: https://doi.org/10.1007/s10489-018-1242-y

[12] Saars H. A., Keil, R. (2006). Global Cities and the Spread of Infectious Disease: The Case of Severe Acute Respiratory Syndrome (SARS) in Toronto, Canada Urban Studies , DOI: https://doi.org/10.1080/00420980500452458

[13] Siu, J. Y. (2015). Influence of social experiences in shaping perceptions of the Ebola virus among African residents of Hong Kong during the 2014 outbreak: a qualitative study International Journal for Equity in Health , DOI: https://doi.org/10.1186/s12939-015-0223-6

[14] Dong E, Du H, Gardner L. (2020). An interactive web-based dashboard to track COVID-19 in real time Lancet Inf Dis. 20(5):533-534, DOI: 10.1016/S1473-3099(20)30120-1 https://doi.org/10.1016/S1473-3099(20)30120-1

[15] Wang, W., Chen, L., Thirunarayan, K., Sheth, A. P. (2014). Cursing in English on Twitter Association for Computing Machinery , DOI: https://doi.org/10.1145/2531602.2531734

[16] Waseem, Z., Hovy, D.(2016). Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter Proceedings of the NAACL Student Research Workshop , DOI: 10.18653/v1/N16-2013 https://doi.org/10.18653/v1/N16-2013

[17] World Health Organization (2021). Coronavirus disease 2021 (COVID-19): situation report, 52 World Health Organization

[18] Zimmerman, S., Kruschwitz, U., Fox, C. (2018). Improving Hate Speech Detection with Deep Learning Ensembles Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

[19] Zubiaga, A., Voss, A., Procter, R., Liakata, M., Wang, B., Tsakalidis, A.(2016). Towards Real-Time, Country-Level Location Classification of Worldwide Tweets IEEE Transactions on Knowledge and Data Engineering, Volume: 29, Issue: 9, Sept. 1 2017, DOI: 10.1109/TKDE.2017.2698463 https://doi.org/10.1109/TKDE.2017.2698463

An Ensemble Machine Learning Approach to Understanding the Effect of a Global Pandemic on Twitter Users’ Attitudes

Authors

Keywords:

Abstract

References

Additional Files

Published

Issue

Section

License

Most read articles by the same author(s)