Evaluating and Mitigating Gender Bias in Generative Large Language Models
DOI:
https://doi.org/10.15837/ijccc.2024.6.6853Keywords:
Artificial Intelligence, Large Language Models, Natural Language Processing, Gender BiasAbstract
The examination of gender bias, alongside other demographic biases like race, nationality, and religion, within generative large language models (LLMs), is increasingly capturing the attention of both the scientific community and industry stakeholders. These biases often affect generative LLMs, influencing popular products and potentially compromising user experiences. A growing body of research is dedicated to enhancing gender representations in natural language processing (NLP) across a spectrum of generative LLMs. This paper explores the current research focused on identifying and evaluating gender bias in generative LLMs. A comprehensive investigation is conducted to evaluate and mitigate gender bias across five distinct generative LLMs. The mitigation strategies implemented yield significant improvements in gender bias scores, with performance enhancements of up to 46% compared to zero-shot text generation approaches. Additionally, we explore how different levels of LLM precision and quantization impact gender bias, providing insights into how technical factors influence bias mitigation strategies. By tackling these challenges and suggesting areas for future research, we aim to contribute to the ongoing discussion about gender bias in language technologies, promoting more equitable and inclusive NLP systems.
References
Alhafni, B.; Habash, N.; Bouamor, H. (2020). Gender-Aware Reinflection using Linguistically Enhanced Neural Models, Proceedings of the Second Workshop on Gender Bias in Natural Language Processing, 139–150, 2020. https://aclanthology.org/2020.gebnlp-1.12.
Anthropic (2024). The Claude 3 Model Family: Opus, Sonnet, Haiku, https://www-cdn. anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf.
Banks, J.; Warkentin, T. (2024). Gemma: Introducing new state-of-the-art open models, https://blog.google/technology/developers/gemma-open-models/.
Barikeri, S.; Lauscher, A.; Vulić, I.; Glavaš, G. (2021). RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1941–1955, 2021. https://doi.org/10.18653/v1/2021.acl-long.151.
Bartl, M.; Leavy, S. (2022). Inferring Gender: A Scalable Methodology for Gender Detection with Online Lexical Databases, Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, 47–58, 2022. https://doi.org/10.18653/v1/2022.ltedi-1. 7.
Baumler, C.; Rudinger, R. (2022). Recognition of They/Them as Singular Personal Pronouns in Coreference Resolution, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3426–3432, 2022. https://doi.org/10.18653/v1/2022.naacl-main.250.
Borchers, C.; Gala, D.; Gilburt, B.; Oravkin, E.; Bounsi, W.; Asano, Y. M.; Kirk, H. (2022). Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements, Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), 212–224, 2022. https://doi.org/10.18653/v1/2022.gebnlp-1.22.
Budzianowski, P.; Vulić, I. (2019). Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems, Proceedings of the 3rd Workshop on Neural Generation and Translation, 15–22, 2019. https://aclanthology.org/ D19-5602.
Dettmers, T.; Pagnoni, A.; Holtzman, A.; Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs, https://arxiv.org/abs/2305.14314.
Devinney, H.; Björklund, J.; Björklund, H. (2020). Semi-Supervised Topic Modeling for Gender Bias Discovery in English and Swedish, Proceedings of the Second Workshop on Gender Bias in Natural Language Processing, 79–92, 2020. https://aclanthology.org/2020.gebnlp-1.8.
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186, 2019. https://doi.org/10.18653/v1/ N19-1423.
Dhamala, J.; Sun, T.; Kumar, V.; Krishna, S.; Pruksachatkun, Y.; Chang, K.-W.; Gupta, R. (2021). BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation, Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 862–872, 2021. https://doi.org/10.1145/3442188.3445924.
Dinan, E.; Fan, A.; Wu, L.; Weston, J.; Kiela, D.; Williams, A. (2020). Multi-Dimensional Gender Bias Classification, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 314–331, 2020. https://aclanthology.org/2020.emnlp-main.23.
Dinan, E.; Roller, S.; Shuster, K.; Fan, A.; Auli, M.; Weston, J. (2019). Wizard of Wikipedia: Knowledge-Powered Conversational Agents, https://arxiv.org/abs/1811.01241.
Doughman, J.; Khreich, W.; El Gharib, M.; Wiss, M.; Berjawi, Z. (2021). Gender Bias in Text: Origin, Taxonomy, and Implications, Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing, 34–44, 2021. https://aclanthology.org/2021.gebnlp-1.5.
Floridi, L.; Chiriatti, M. (2020). GPT-3: Its Nature, Scope, Limits, and Consequences, Minds and Machines, 30(4), 2020. https://doi.org/10.1007/s11023-020-09548-1.
Gaido, M.; Savoldi, B.; Bentivogli, L.; Negri, M.; Turchi, M. (2021). How to Split: the Effect of Word Segmentation on Gender Bias in Speech Translation, Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing, 3576–3589, 2021. https://aclanthology.org/ 2021.findings-acl.313.
Gallegos, I. O.; Rossi, R. A.; Barrow, J.; Tanjim, M. M.; Kim, S.; Dernoncourt, F.; Yu, T.; Zhang, R.; Ahmed, N. K. (2024). Bias and Fairness in Large Language Models: A Survey, Computational Linguistics, 50(3), 1097–1179, 2024. https://aclanthology.org/2024.cl-3.8.
Garimella, A.; Banea, C.; Hovy, D.; Mihalcea, R. (2019).Women’s Syntactic Resilience and Men’s Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3493–3498, 2019. https://aclanthology.org/P19-1339.
Gehman, S.; Gururangan, S.; Sap, M.; Choi, Y.; Smith, N. A. (2020). RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models, Findings of the Association for Computational Linguistics: EMNLP 2020, 3356–3369, 2020. https://aclanthology.org/2020. findings-emnlp.301.
Hansal, O.; Le, N. T.; Sadat, F. (2022). Indigenous Language Revitalization and the Dilemma of Gender Bias, Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), 244–254, 2022. https://aclanthology.org/2022.gebnlp-1.25.
Havens, L.; Terras, M.; Bach, B.; Alex, B. (2022). Uncertainty and Inclusivity in Gender Bias Annotation: An Annotation Taxonomy and Annotated Datasets of British English Text, Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), 30–57, 2022. https://aclanthology.org/2022.gebnlp-1.4.
Hovy, D.; Bianchi, F.; Fornaciari, T. (2020). “You Sound Just Like Your Father” Commercial Machine Translation Systems Include Stylistic Biases, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 1686–1690, 2020. https://aclanthology.org/ 2020.acl-main.154.
Hovy, D.; Prabhumoye, S. (2021). Five sources of bias in natural language processing, Language and Linguistics Compass, 15(8). https://doi.org/10.1111/lnc3.12432.
Jiang, A. Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D. S.; de las Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; Renard Lavaud, L.; Lachaux, M. A.; Stock, P.; Le Scao, T.; Lavril, T.; Wang, T.; Lacroix, T.; El Sayed, W. (2023). Mistral 7B, arXiv preprint arXiv:2310.06825, 1–9. http://arxiv.org/abs/2310.06825.
Jorg, T.; Kämpgen, B.; Feiler, D.; Müller, L.; Düber, C.; Mildenberger, P.; Jungmann, F. (2023). Efficient structured reporting in radiology using an intelligent dialogue system based on speech recognition and natural language processing, Insights into Imaging, 14(1). https://doi.org/10. 1186/s13244-023-01392-y.
Joshi, P.; Santy, S.; Budhiraja, A.; Bali, K.; Choudhury, M. (2020). The State and Fate of Linguistic Diversity and Inclusion in the NLP World, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 6282–6293, 2020. https://aclanthology.org/ 2020.acl-main.560.
Jourdan, F.; Santy, S.; Budhiraja, A.; Bali, K.; Choudhury, M. (2023). Are fairness metric scores enough to assess discrimination biases in machine learning?, Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), 163–174, 2023. https://aclanthology.org/2023.trustnlp-1.15.
Kiritchenko, S.; Mohammad, S. (2018). Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems, Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, 43–53, 2018. https://aclanthology.org/S18-2005.
Kotek, H.; Dockum, R.; Sun, D. (2023). Gender bias and stereotypes in Large Language Models, Proceedings of the ACM Collective Intelligence Conference, CI 2023. https://doi.org/10.1145/ 3582269.3615599.
Kurita, K.; Vyas, N.; Pareek, A.; Black, A. W.; Tsvetkov, Y. (2019). Measuring Bias in Contextualized Word Representations, Proceedings of the First Workshop on Gender Bias in Natural Language Processing, 166–172, 2019. https://aclanthology.org/W19-3823.
Le Scao, et al. (2022). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Major Contributors Prompt Engineering Architecture and Objective Engineering Evaluation and Interpretability Broader Impacts, arXiv. https://arxiv.org/abs/2211.05100.
Li, T.; Khashabi, D.; Khot, T.; Sabharwal, A.; Srikumar, V. (2020). UNQOVERing Stereotyping Biases via Underspecified Questions, Proceedings of the Association for Computational Linguistics: EMNLP 2020, 3475–3489, 2020. https://aclanthology.org/2020.findings-emnlp. 311/.
Liu, R.; Jia, C.; Wei, J.; Xu, G.; Wang, L.; Vosoughi, S. (2021). Mitigating Political Bias in Language Models Through Reinforced Calibration, 35th AAAI Conference on Artificial Intelligence, AAAI 2021. https://cdn.aaai.org/ojs/17744/17744-13-21238-1-2-20210518.pdf.
Lucy, L.; Bamman, D. (2021). Gender and Representation Bias in GPT-3 Generated Stories, Proceedings of the Third Workshop on Narrative Understanding, 48–55, 2021. https://aclanthology.org/2021.nuse-1.5.
Matthews, A.; Grasso, I.; Mahoney, C.; Chen, Y.; Wali, E.; Middleton, T.; Matthews, J.; Njie, M. (2021). Gender Bias in Natural Language Processing Across Human Languages, TrustNLP 2021 - 1st Workshop on Trustworthy Natural Language Processing, Proceedings of the Workshop, 48–55, 2021. https://aclanthology.org/2021.trustnlp-1.6.
Meta (2024). Introducing Meta Llama 3: The most capable openly available LLM to date, Meta AI Blog. https://ai.meta.com/blog/meta-llama-3/.
Miller, A.; Feng, W.; Batra, D.; Bordes, A.; Fisch, A.; Lu, J.; Parikh, D.; Weston, J. (2017). ParlAI: A Dialog Research Software Platform, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 79–84, 2017. https://aclanthology.org/D17-2014.
Moss-Racusin, C. A.; Dovidio, J. F.; Brescoll, V. L.; Graham, M. J.; Handelsman, J. (2012). Science faculty’s subtle gender biases favor male students, Proceedings of the National Academy of Sciences of the United States of America, 109(41), 2012. https://doi.org/10.1073/pnas. 1211286109.
Nangia, N.; Vania, C.; Bhalerao, R.; Bowman, S. R. (2020). CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1953–1967, 2020. https:// aclanthology.org/2020.emnlp-main.154
Nemani, P.; Joel, Y. D.; Vijay, P.; Liza, F. F. (2024). Gender bias in transformers: A comprehensive review of detection and mitigation strategies, Natural Language Processing Journal, 6, 2024. https://doi.org/10.1016/j.nlp.2023.100047.
OpenAI (2023). GPT-4 is OpenAI’s most advanced system, producing safer and more useful responses, OpenAI Blog. https://openai.com/gpt-4.
Opitz, J.; Frank, A. (2018). Addressing the Winograd Schema Challenge as a Sequence Ranking Task, Proceedings of the First International Workshop on Language Cognition and Computational Models, 41–52, 2018. https://aclanthology.org/W18-4105.
Park, B.; Janecek, M.; Ezzati-Jivan, N.; Li, Y.; Emami, A. (2024). Picturing Ambiguity: A Visual Twist on the Winograd Schema Challenge, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 355–374, 2024. https://aclanthology.org/2024.acl-long.22.
Parrish, A.; Chen, A.; Nangia, N.; Padmakumar, V.; Phang, J.; Thompson, J.; Htut, P. M.; Bowman, S. (2022). BBQ: A hand-built bias benchmark for question answering, Findings of the Association for Computational Linguistics: ACL 2022, 2086–2105, 2022. https://doi.org/10. 18653/v1/2022.findings-acl.165.
Plank, B.; Hovy, D.; Søgaard, A. (2014a). Learning part-of-speech taggers with inter-annotator agreement loss, Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 742–751, 2014a. https://doi.org/10.3115/v1/E14-1078.
Plank, B.; Hovy, D.; Søgaard, A. (2014b). Linguistically debatable or just plain wrong?, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 507–511, 2014b. https://doi.org/10.3115/v1/P14-2083.
Rudinger, R.; Naradowsky, J.; Leonard, B.; Van Durme, B. (2018). Gender Bias in Coreference Resolution, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 8–14, 2018. https://doi.org/10.18653/v1/N18-2002.
Savoldi, B.; Gaido, M.; Bentivogli, L.; Negri, M.; Turchi, M. (2021). Gender Bias in Machine Translation, Transactions of the Association for Computational Linguistics, 9, 845–874, 2021. https://doi.org/10.1162/tacl_a_00401.
Sengupta, B.; Maher, R.; Groves, D.; Olieman, C. (2021). GenBiT: measure and mitigate gender bias in language datasets, Microsoft Journal of Applied Research, 16, 63–71, 2021. https://www.microsoft.com/en-us/research/publication/ genbit-measure-and-mitigate-gender-bias-in-language-datasets/.
Song, L.; Xu, K.; Zhang, Y.; Chen, J.; Yu, D. (2020). ZPR2: Joint Zero Pronoun Recovery and Resolution using Multi-Task Learning and BERT, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5429–5434, 2020. https://doi.org/10.18653/v1/ 2020.acl-main.482.
Stanovsky, G.; Smith, N. A.; Zettlemoyer, L. (2019). Evaluating Gender Bias in Machine Translation, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1679–1684, 2019. https://doi.org/10.18653/v1/P19-1164.
Stanczak, K.; Augenstein, I. (2021). A Survey on Gender Bias in Natural Language Processing, arXiv preprint arXiv:2112.14168, 2021. https://arxiv.org/abs/2112.14168.
Sun, T.; Gaut, A.; Tang, S.; Huang, Y.; ElSherief, M.; Zhao, J.; Mirza, D.; Belding, E.; Chang, K.-W.; Wang, W. Y. (2019). Mitigating Gender Bias in Natural Language Processing: Literature Review, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1630–1640, 2019. https://doi.org/10.18653/v1/P19-1159.
Takeshita, M.; Katsumata, Y.; Rzepka, R.; Araki, K. (2020). Can Existing Methods Debias Languages Other than English? First Attempt to Analyze and Mitigate Japanese Word Embeddings, Proceedings of the Second Workshop on Gender Bias in Natural Language Processing, 44–55, 2020. https://aclanthology.org/2020.gebnlp-1.5.
Thakur, H.; Jain, A.; Vaddamanu, P.; Liang, P. P.; Morency, L.-P. (2023). Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 340–351, 2023. https://doi.org/10.18653/v1/2023.acl-short.30.
Touvron et al. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models, arXiv preprint arXiv:2307.09288, 2023. https://arxiv.org/abs/2307.09288.
Ungless, E.; Rafferty, A.; Nag, H.; Ross, Björn. (2022). A Robust Bias Mitigation Procedure Based on the Stereotype Content Model, Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS), 207–217, 2022. https://aclanthology.org/2022.nlpcss-1.23.
Valentini, F.; Rosati, G.; Blasi, D.; Fernandez Slezak, D.; Altszyler, E. (2023). On the Interpretability and Significance of Bias Metrics in Texts: a PMI-based Approach, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 509–520, 2023. https://doi.org/10.18653/v1/2023.acl-short.44.
Voytovich, L.; Greenberg, C. (2022). Natural Language Processing: Practical Applications in Medicine and Investigation of Contextual Autocomplete, Acta Neurochirurgica, Supplementum, 134, 2022. https://doi.org/10.1007/978-3-030-85292-4_24.
Wang, Z.; Chakravarthy, A.; Munechika, D.; Chau, D. H. (2024). Wordflow: Social Prompt Engineering for Large Language Models, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 42–50, 2024. https://aclanthology.org/2024.acl-demos.5.
Ye, Q.; Ahmed, M.; Pryzant, R.; Khani, F. (2024). Prompt Engineering a Prompt Engineer, Findings of the Association for Computational Linguistics ACL 2024, 355–385, 2024. https://aclanthology.org/2024.findings-acl.21.
Yu, J.; Kim, S. U. G.; Choi, J.; Choi, J. D. (2024). What Is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models, Information, 15(9), 549, 2024. https://doi.org/10.3390/info15090549.
Yuan, S.; Maronikolakis, A.; Schütze, H. (2022). Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing, Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), 1–10, 2022. https://doi.org/10.18653/v1/2022.woah-1.1.
Zack, T.; Lehman, E.; Suzgun, M.; Rodriguez, J. A.; Celi, L. A.; Gichoya, J.; Jurafsky, D.; Szolovits, P.; Bates, D. W.; Abdulnour, R. E. E.; Butte, A. J.; Alsentzer, E. (2024). Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study, The Lancet Digital Health, 6(1), 2024. https://doi.org/10.1016/S2589-7500(23) 00225-X.
Zhang, Y.; Li, S.; Deng, C.; Wang, L.; Zhao, H. (2024). Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 773–791, 2024. https://doi.org/10. 18653/v1/2024.naacl-long.44.
Zhao, J.; Mukherjee, S.; Hosseini, S.; Chang, K.-W.; Awadallah, A. (2020). Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2896–2907, 2020. https://doi.org/10.18653/ v1/2020.acl-main.260.
Zhao, J.; Wang, T.; Yatskar, M.; Ordonez, V.; Chang, K.-W. (2018). Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 15–20, 2018. https://doi.org/10.18653/v1/N18-2003.
Zong, Z.; Hong, C. (2018). On Application of Natural Language Processing in Machine Translation, 2018 3rd International Conference on Mechanical, Control and Computer Engineering (ICMCCE), 506–510, 2018. https://doi.org/10.1109/ICMCCE.2018.00112.
Additional Files
Published
Issue
Section
License
Copyright (c) 2024 Hanqing Zhou, Diana Inkpen, Burak Kantarci
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.