9:00 – 9:15Opening
9:15 – 10:15Keynote 1 (Isabelle, Live)
10:15 – 11:00Coffee Break
11:00 – 12:15Oral papers
12:15 – 14:00Lunch
14:00 – 15:30Poster session
15:30 – 16:00Coffee Break
16:00 – 17:00Keynote 2 (Hal, Remote)
17:00 – 18:00Lightning (Remote)
18:00 – 18:15Closing Remarks
TypeTitleAuthors
Oral + Poster / in-personA Parameter-Efficient Multi-Objective Approach to Mitigate Stereotypical Bias in Language ModelsYifan Wang, Vera Demberg
Lightning talk / remoteDo PLMs and Annotators Share the Same Gender Bias? Definition, Dataset, and Framework of Contextualized Gender BiasShucheng Zhu, Bingjie Du, Jishun Zhao, Ying Liu, Pengyuan Liu
Lightning talk / remoteWe Don’t Talk About That: Case Studies on Intersectional Analysis of Social Bias in Large Language ModelsHannah Devinney, JENNY BJÖRKLUND, Henrik Björklund
Lightning talk / remoteAn Explainable Approach to Understanding Gender Stereotype TextManuela Nayantara Jeyaraj, Sarah Jane Delany
Lightning talk / remoteA Fairness Analysis of Human and AI-Generated Student Reflection SummariesBhiman Kumar Baghel, Arun Balajiee Lekshmi Narayanan, Michael Miller Yoder
Lightning talk / remoteOn Shortcuts and Biases: How Finetuned Language Models Distinguish Audience-Specific Instructions in Italian and EnglishNicola Fanton, Michael Roth
Lightning talk / remoteThe power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMsAleix Sant, Carlos Escolano, Audrey Mash, Francesca De Luca Fornaciari, Maite Melero
Poster / In-personDetecting Gender Discrimination on Actor Level Using Linguistic Discourse AnalysisStefanie Urchs, Veronika Thurner, Matthias Aßenmacher, Christian Heumann, Stephanie Thiemichen
Poster / In-personWhat Can Go Wrong in Authorship Profiling: Cross-Domain Analysis of Gender and Age PredictionHongyu Chen, Michael Roth, Agnieszka Falenska
Oral+Poster / In-personTowards Fairer NLP Models: Handling Gender Bias In Classification TasksNasim Sobhani, Sarah Jane Delany
Lightning talk / remoteInvestigating Gender Bias in STEM Job AdvertisementsMalika Dikshit, Houda Bouamor, Nizar Habash
Oral+Poster / In-personDissecting Biases in Relation Extraction: A Cross-Dataset Analysis on People’s Gender and OriginMarco Antonio Stranisci, Pere-Lluís Huguet Cabot, Elisa Bassignana, Roberto Navigli
Lightning talk / remoteGender Bias in Turkish Word Embeddings: A Comprehensive Study of Syntax, Semantics and Morphology Across DomainsDuygu Altinok
Oral+Poster / In-personDisagreeable, Slovenly, Honest and Un-named Women? Investigating Gender Bias in English Educational Resources by Extending Existing Gender Bias TaxonomiesHaotian Zhu, Kexin Gao, Fei Xia, Mari Ostendorf
Poster / In-personGenerating Gender Alternatives in Machine TranslationSarthak Garg, Mozhdeh Gheini, Clara Emmanuel, Tatiana Likhomanenko, Qin Gao, Matthias Paulik
Oral+Poster / In-personBeyond Binary Gender Labels: Revealing Gender Bias in LLMs through Gender-Neutral Name PredictionsZhiwen You, HaeJin Lee, Shubhanshu Mishra, Sullam Jeoung, Apratim Mishra, JINSEOK KIM, Jana Diesner
Poster / In-personIs there Gender Bias in Dependency Parsing? Revisiting “Women’s Syntactic Resilience”Paul Stanley Go, Agnieszka Falenska
Poster / In-personFrom ‘Showgirls’ to ‘Performers’: Fine-tuning with Gender-inclusive Language for Bias Reduction in LLMsMarion Bartl, Susan Leavy

Lightning talk / remote
Sociodemographic Bias in Language Models: A Survey and Forward PathVipul Gupta, Pranav Narayanan Venkit, Shomir Wilson, Rebecca J. Passonneau

Lightning talk / remote
Stop! In the Name of Flaws: Disentangling Personal Names and Sociodemographic Attributes in NLPVagrant Gautam, Arjun Subramonian, Anne Lauscher, Os Keyes

Lightning talk / remote
Evaluating Gender Bias in Multilingual Multimodal AI Models: Insights from an Indian ContextKshitish Ghate, Arjun Choudhry, Vanya Bannihatti Kumar

Lightning talk / remote
Detecting and Mitigating LGBTQIA+ Bias in Large Norwegian Language ModelsSelma Kristine Bergstrand, Björn Gambäck

Lightning talk / remote
Whose wife is it anyway? Assessing bias against same-gender relationships in machine translationIan Stewart, Rada Mihalcea
Lightning talk / remoteAnalysis of Annotator Demographics in Sexism DetectionNarjes Tahaei, Sabine Bergler
Lightning talk / remoteAn Empirical Study of Gendered Stereotypes in Emotional Attributes for Bangla in Multilingual Large Language ModelsJayanta Sadhu, Maneesha Rani Saha, Rifat Shahriyar
Poster / In-personMultilingual DAMA for Debiasing TranslationTomasz Limisiewicz, David Mareček
Poster / In-personOverview of the Shared Task on Machine Translation Gender Bias Evaluation with Multilingual Holistic BiasMarta R. Costa-jussà, Pierre Andrews, Christine Basta, Juan Ciro, Agnieszka Falenska, Seraphina Goldfarb-Tarrant, Rafael Mosquera, Debora Nozza, Eduardo Sánchez

Findings / Poster / In-person
Biasly: An Expert-Annotated Dataset for Subtle Misogyny Detection and Mitigation

Findings / Poster / In-person
Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models

Findings / Poster / In-person
More than Minorities and Majorities: Understanding Multilateral Bias in Language Generation

Findings / Poster / In-person
Ask LLMs Directly, “What shapes your bias?”: Measuring Social Bias in Large Language Models
Findings / Poster / In-personPro-Woman, Anti-Man? Identifying Gender Bias in Stance Detection

Submissions will be accepted as short papers (4-6 pages) and as long papers (8-10 pages), plus additional pages for references, following the ACL 2024 guidelines. Supplementary material can be uploaded separately. Blind submission is required.

All submissions have the requirement to include a statement which explicitly defines

(a) what system behaviours are considered as bias in the work, and

(b) why those behaviours are harmful, in what ways, and to whom (cf. Blodgett et al. (2020)). We encourage authors to engage with definitions of bias and other relevant concepts such as prejudice, harm, discrimination from outside NLP, especially from social sciences and normative ethics, in this statement and in their work in general.

Please, find help on how to write a bias statment in here.

Non-archival option

The authors have the option of submitting previously unpublished research as non-archival, meaning that only the abstract will be published in the conference proceedings. We expect these submissions to describe the same quality of work and format as archival submissions.

Paper submission linkhere

Final submissions: Both long and short papers can be extended by 1 page in the camera-ready version.

Shared Task on Machine Translation Gender Bias Evaluation with Multilingual Holistic Bias

Motivation 

Demographic biases are relatively infrequent phenomena but present a very important problem. The development of datasets in this area has raised the interest in evaluating Natural Language Processing (NLP) models beyond standard quality terms. In Machine Translation (MT), gender bias is observed when translations show errors in linguistic gender determination despite the fact that there are sufficient gender clues in the source content for a system to infer the correct gendered forms. To illustrate this phenomenon, sentence (1) below does not contain enough linguistic clues for a translation system to decide which gendered form should be used when translating into a language where the word for doctor is gendered. Sentence (2), however, includes a gendered pronoun which most likely has the word doctor as its antecedent. Sentence (3) shows two variations of the exact sentence with the only variation of the gender inflection. 

1. I didn’t feel well, so I made an appointment with my doctor. 

2. My doctor is very attentive to her patients’ needs. 

3. Mi amiga es una ama de casa / Mi amigo es un amo de casa. (in English, My (female/male) friend is a homemaker)

Gender bias is observed when the system produces the wrong gendered form when translating sentence (2) into a language that uses distinct gendered forms for the word doctor. A single error in the translation of an utterance the like of sentence (1) would not be sufficient to conclude that gender bias exists in the model; doing so would take consistently observing one linguistic gender over another. Finally, a lack of robustness is shown in sentence (3) if the translation quality differs in the translation of sentences in (3). It has previously been hypothesized that one possible source of gender bias is gender representation imbalance in large training and evaluation data sets, e.g. [Costa-jussà et al., 2022; Qian et al., 2022]

Goals

The goals of the shared translation task are:

Shared Task Description

We propose to evaluate the 3 cases of gender bias: gender-specific, gender robustness and unambiguous gender.

Description Task 1: Gender-specific

In the English-to-X translation direction, we evaluate the capacity of machine translation systems to generate gender-specific translations from English neutral inputs (e.g.  I didn’t feel well, so I made an appointment with my doctor.) This can be illustrated by the fact that machine translation (MT) models systematically translate neutral source sentences into masculine or feminine depending on the stereotypical usage of the word (e.g. “homemakers” into “amas de casa”, which is the feminine form in Spanish and “doctors” into “médicos”, which is the masculine form in Spanish). 

Description Task 2: Gender Robustness

In the X-to-English translation direction, we compare the robustness of the model when the source input only differs in gender (masculine or feminine), e.g. in Spanish: Mi amiga es una ama de casa / Mi amigo es un amo de casa.

Description Task 3: Unambiguous Gender

In the X-to-X translation direction, we evaluate the unambiguous gender translation across languages and without being English-centric, e.g, Spanish-to-Catalan: Mi amiga es una ama de casa is translated into La meva amiga és una mestressa de casa  

Submission details

X Languages. In addition to English, our challenge covers 26 languages: Modern Standard Arabic, Belarusian, Bulgarian, Catalan, Czech, Danish, German, French, Italian, Lithuanian, Standard Latvian, Marathi, Dutch, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tamil, Thai, Ukrainian, Urdu

Evaluation. The challenge will be evaluated using automatic metrics. Evaluation criteria will be in terms of overall translation quality and difference in performance for male and female sets. More details will be provided.

Submission platform. We will use the Dynabench platform for all tasks.

Important Dates.

From December: Fill in the interest form

Mar 20, 2024: Model Submission Opens

May 20, 2024: Model Submission Closes

May 24, 2024: System paper submission deadline

June 21, 2024: Notifications of the acceptance

July 5, 2024: Camera-Ready version

August 16, Workshop at ACL

Citation

Marta Costa-jussà, Pierre Andrews, Eric Smith, Prangthip Hansanti, Christophe Ropers, Elahe Kalbassi, Cynthia Gao, Daniel Licht, and Carleigh Wood. 2023. Multilingual Holistic Bias: Extending Descriptors and Patterns to Unveil Demographic Biases in Languages at Scale. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14141–14156, Singapore. Association for Computational Linguistics.

17 May 24 May: Workshop Paper Due Date

17 June 21 June: Notification of Acceptance

1 July 5 July: Camera-ready papers due

16 August, 2024: Workshop Dates

Isabelle Augenstein, University of Copenhagen

Title Quantifying societal biases towards entities

Abstract Language is known to be influenced by the gender of the speaker and the referent, a phenomenon that has received much attention in sociolinguistics. This can lead to harmful societal biases, such as gender bias, the tendency to make assumptions based on gender rather than objective factors. Moreover, these biases are then picked up on by language models and perpetuated to models for downstream NLP tasks. Most research on quantifying these biases emerging in text and in language models has used artificial probing templates imposing fixed sentence constructions, been conducted for English, and has ignored biases beyond gender including inter-sectional aspects ones. In our work, we by contrast focus on detecting biases towards specific entities, and adopt a cross-lingual inter-sectional approach. This allows for studying more complex interdependencies, such as the relationship between a politician’s origin and language of the analysed text, or relationships between gender and racial bias.

Hal Daumé III, University of Maryland and Microsoft Research NYC

Title Gender, Stereotypes, and Harms

Abstract Gender is expressed and performed in a plethora of ways in the world, and reflected in complex, interconnected ways in language. I’ll discuss recent and ongoing work measuring how modern NLP models encode (some of) these expressions of gender, how those encoding reflect cultural stereotypes (and whose cultural stereotypes), and how that impacts people using these models. This will reflect joint work with a number of collaborators including students Haozhe An, Connor Baumler, Yang Trista Cao, Eve Fleisig, Amanda Liu, and Anna Sotnikova.


Bashar Alhafni (NYU) Jasmijn Bastings (Google)
Hannah Devinney (Umeå University, Sweden)
Marco Gaido (Trento, FBK)
Dorna Behdadi (University of Gothenburg, Sweden)
Matthias Gallé (Naver Labs Europe, France)
Mercedes García-Martínez (Pangeanic, Spain)
Nizar Habash (NYU Abu Dhabi, Abu Dhabi)
Ben Hachey (Harrison.AI, Australia)
Lucy Havens (University of Edinburgh)
Wael Khreich (American University of Beirut)
Svetlana Kiritchenko (National Research Council, Canada)
Gabriella Lapesa (GESIS, Germany)
Antonis Maronikolakis (LMU Munich, Germany)
Maite Melero (Barcelona Computing)
Carla Perez Almendros (Cardiff University, UK)
Michael Roth (University of Stuttgart)
Rafal Rzepka (Hokkaido University, Japan)
Beatrice Savoldi (Trento, FBK)
Masashi Takeshita (Hokkaido University)
Soroush Vosoughi (Dartmouth

5th Workshop on Gender Bias in Natural Language Processing

At ACL in Bangkok, Thailand, 16th August, 2024

Gender bias, among other demographic biases (e.g. race, nationality, religion), in machine-learned models is of increasing interest to the scientific community and industry. Models of natural language are highly affected by such biases, which are present in widely used products and can lead to poor user experiences. There is a growing body of research into improved representations of gender in NLP models. Key example approaches are to build and use balanced training and evaluation datasets (e.g. (Webster et al., 2018; Bentivogli et al., 2020; Renduchintala et al., 2021)), and to change the learning algorithms themselves (e.g. (Bolukbasi et al., 2016)). While these approaches show promising results, there is more to do to solve identified and future bias issues. In order to make progress as a field, we need to create widespread awareness of bias and a consensus on how to work against it, for instance by developing standard tasks and metrics. Our workshop provides a forum to achieve this goal.

Organizers

Christine Basta, Alexandria University
Marta R. Costa-jussà, FAIR, Meta,
Agnieszka Falénska, University of Stuttgart
Seraphina Goldfarb-Tarrant, Cohere
Debora Nozza, Bocconi University