Talking about Sampling, Syntax, and Sentence Completions – The (Overlooked?) Impact of Gender on NLP Tools
Language reflects demographics, as people of different demographic groups systematically use language differently. That holds also for gender. Gender-speficic language affects NLP tools, usually because it is not sufficiently modeled. Instead, models assume all language is the same. If they are ignored, these linguistic differences lead to uneven performance and even discrimination. If we instead start paying attention to WHO is talking, and not just WHAT is said, they give us the opportunity to model language in all its diversity, open new applications, and improve both performance and fairness.
In this talk, I will first show how gender representation is reflected in syntax, and then how this affects various NLP tasks. Lack of awareness of gendered differences mean that NLP models perform differently for different genders in part of speech tagging and parsing, and also affects the style of machine translations and sentence completions in BERT and GPT2.