How to write a bias statement
At this year’s Workshop on Gender Bias in NLP (GeBNLP 2020), we’d like to encourage authors to give more explicit consideration to the wider aspects of bias, in addition to the great work we’ve already seen at last year’s workshop. One of the things we’re doing to achieve this is to ask all authors to include an explicit bias statement in their work. The idea behind this requirement is to encourage a common format for discussing the assumptions and normative stances inherent in any research on bias, and to make them explicit so they can be discussed. This is inspired by the recommendations by Blodgett et al. (2020), and we borrow from them in our definition of the bias statement. In this blog post, we’d like to provide some guidance to help you write a bias statement for your research.
Two things are worth highlighting. Firstly, this blog post is intended to help you write a bias statement, but perhaps your work is a bit different from what we had in mind when we wrote it. That’s fine – we’re really keen on promoting this discussion. Although we don’t require a specific form of the statement, we suggest you make a specific section for this statement. If your case is different, do whatever makes sense. The reviewers will be asked to comment on your bias statement in the specific context of your work, and we’ve recruited some reviewers from the social sciences and humanities to help us with that. And secondly, we’d like to encourage you to think about your concepts of bias and how they relate to the lived experience of humans throughout your work, from the beginning to the end. That’s the really important thing. The bias statement is just a way to condense the discussion in one place.
Types of Harm
One part of a successful bias statement is to clarify what type of harm we are worried about, and who suffers because of it. Doing so explicitly serves two purposes. On the one hand, by describing certain behaviours as harmful, we make a judgement based on the values we hold. It’s a normative judgement, because we declare that one thing is right (for instance, treating all humans equally), and another thing wrong (for instance, exploiting humans for profit). On the other hand, being explicit about our normative assumptions also makes it easier to evaluate, for ourselves, our readers and reviewers, whether the methods we propose are in fact effective at reducing the harmful effects we fear, and that will help us make progress more quickly.
This schema is not final. Our suggestion is not authors to adhere to these, but perhaps to broadly point to these categories to get your imagination going about what kinds of harms might arise.
Following the categories of Blodgett et al. (2020):
- Allocational harms: an automated system allocates resources or opportunities unfairly to different social groups. Some examples are as follows:
- Personnel selection system based on a database that is trained only on males would discard females as candidates.
- Representational harms: arise when a system represents some social groups in a less favorable light than others, demeans them, or fails to recognize their existence altogether. Some examples are as follows:
- Stereotyping: Propagating negative generalisations about particular social groups
- Differences in system performances affecting users unequally: language that misrepresents the distribution of social groups or language that denigrates certain social groups
Recommendations for authors
- Provide explicit statements of why the system behaviours described as “bias” are harmful, in what ways, and to whom. Authors should be thoughtful about their own definitions of “bias”–are there any limitations to choosing this definition, any injustices/undesirable outcomes that it might overlook that might require a different kind of definition, any implicit assumptions this definition makes/requires that might not always hold true? Essentially the same kind of discussion that any analysis of modeling decisions generally requires.
- Be forthright about the normative reasoning underlying these statements.
- Negative example: “Biased word embeddings can lead to biased downstream systems and contribute to social injustice.”
- Positive examples: “Coreference systems with gender labels that treat gender as fixed, immutable, and binary are harmful because they erase or exclude non-binary or transgender people”; or “Toxicity systems that treat Mainstream U.S. English as more toxic than African-American English are harmful because they contribute to the stigmatization of African-American English, may disenfranchise AAE speakers online, and may result in burdens of dealing with toxicity systems that are differentially distributed across speaker groups.”
Concrete Examples
PAPER: Basta, C., Costa-jussà, M.R. and Casas,N. Evaluating the Underlying Gender Bias in Contextualized Word Embeddings, CORR, arXiv:2019, Proceedings of the 1st ACL Worskhop on Gender Bias for Natural Language Processing, 2nd August, Florence
BIAS STATEMENT: In this paper, we study stereotypical associations between male and female gender and professional occupations in contextual word embeddings. If a system systematically and by default associates certain professions with a specific gender, this creates a representational harm by perpetuating inappropriate stereotypes about what activities men and women are able, allowed or expected to perform, e.g. making that there are less professional females in STEM (McGuire et al, 2020). When such representations are used in downstream NLP applications, there is an additional risk of unequal performance across genders (Gonen & Webster, 2020). Our work is based on the belief that the observed correlations between genders and occupations in word embeddings are a symptom of an inadequate training process, and decorrelating genders and occupations would enable systems to counteract rather than reinforce existing gender imbalances
Acknowledgements
We would like to thank Su Lin Blodgett for her contributions to this blog post.