Evaluating gender bias in NLP
What is your take on intrinsic vs. extrinsic evaluation?
Should gender bias evaluation be task specific?
What is currently missing the most in this space?
Relation between evaluation of gender bias and related harmful biases/phenomena (e.g. hate speech)
(+ questions from Twitter or panelists)
Panelists:
Kellie Webster, Google Research
Kai-Wei Chang, University of California Los Angeles
Seraphina Goldfarb-Tarrant, University of Edinburgh
Mark Yatskar, University of Pennsylvania