Wikipedia:Wikipedia Signpost/2022-05-29/In focus
Article display preview: | This is a draft of a potential Signpost article, and should not be interpreted as a finished piece. Its content is subject to review by the editorial team and ultimately by JPxG, the editor in chief. Please do not link to this draft as it is unfinished and the URL will change upon publication. If you would like to contribute and are familiar with the requirements of a Signpost article, feel free to be bold in making improvements!
|
Measuring gender diversity in Wikipedia articles
When thinking about gender diversity in Wikipedia, we often think of the number of biographical articles about men and women. The Humaniki project shows that about 19% of biographical articles on the English Wikipedia are about women. However, this is only one aspect of gender diversity. In this article, I develop a method which measures gender diversity at the article level and I show why it's useful.
Motivation
While working on the article about economics on the French Wikipedia, I was surprised by the low number of women among all people cited in the article. So I've started exploring methods to measure gender diversity. I draw a distinction between gender diversity and gender parity[1]. First, gender parity supposes binary gender, whereas we now consider that there are nonbinary people. Second, gender parity implies that the ideal would be a fifty-fifty divide between men and women. After some iterations, I've found a way to measure gender diversity at the article level. This tool can be used to explore gender diversity for articles about academic fields, activities, or occupations. My approach is very basic and simply computes the share of people cited in an article by gender.
This simple quantitative approach to measure gender diversity is similar to many research projects on this theme in computational social sciences. David Doukhan is counting women's speaking time on the radio[2]. Antoine Mazières and his co-authors are computing the share of screen time with women in popular movies[3] and Gilles Bastin and his co-authors are computing gender frequency of people cited in French newspapers[4].
Methodology
For each article, I get the list of internal links (also known as blue links). I retrieve them using the Wikipedia links API. Then I combine this query with a Wikidata SPARQL query[5]. I select all links corresponding to human beings in Wikidata (property P31 is Q5) and I retrieve their gender (property P21 in Wikidata). Note that gender in Wikidata can be male, female, non-binary, intersex, transgender female, transgender male or agender. I'd find more intuitive to group together transgender males with males and transgender females with females but I prefer to keep the classification of Wikidata.
Last, I count the number of entities by gender and compute the share.
Everyone can compute gender diversity for a single Wikipedia article using the gender diversity explorer tool.
This is a very basic approach. It doesn't make any difference between entities cited in the references and entities cited in the core of the article. It doesn't take into account people cited in the article without a link to a Wikipedia article. But even if it's imperfect, I believe this is a useful approach.
Numbers should be interpreted with caution. The number of gendered entities cited in a single article is often very low. I personally don't interpret proportions if the total number of gendered entities is lower than 50.
Insights
Focus on economics
Let's have a look at the article about economics. In May 2022, we find 137 males, 6 females and 1 transgender female[6]. So less than 5 % of people quoted in the article are females. Of course, everyone one knows that many prominent economists from Adam Smith to Jean Tirole are males. So no one is really surprised to find a vast majority of males in the results. Nobody would be able to say what would be the fair share of females in the article. However, I personally think that 5% is not much and that the contribution of women to economics is more important. Harriet Martineau, Mary Paley Marshall, Joan Robinson, Elinor Ostrom, Anna Schwartz, Janet Yellen, Esther Duflo or Susan Athey have all made major contributions to economics.
Focus on academic fields
In this section, I compare gender diversity in Wikipedia articles about some important academic fields. As for economics, we know that most academic fields have long been dominated by male figures. So we're not surprised to find a relative low share of women in Wikipedia articles. By comparing Physics, Architecture, Economics, Social science, Computer science, Philosophy, Mathematics, Psychology, Medicine, Music, Political science, Sociology, Biology, Science, Art, History and Literature, I find that all of them have a proportion of men higher than 80%[7]. Values for computer science and political science should be taken with caution since the number of people cited in those articles is lower than 50. If we exclude computer science and political science, we find that 10 over 15 articles have less than 10% of women among all gendered entities ! If we look at raw numbers, the count of women in each article is really low : 4 women in mathematics, 4 women in medicine, 1 woman in physics.
Conclusion and discussion
I believe that measuring helps to raise awareness of the problem of gender diversity in Wikipedia articles. Anyone can play with the gender diversity inspector and discover some insights.
In the next months, I would like to explore gender diversity in articles about occupations (journalist, politician, etc) and activities (journalism, politics, sport, etc). I would also like to have large scale studies looking at all articles about academic fields or all articles about an occupation.
My experiments with measuring gender diversity in Wikipedia articles let me believe that women are often forgotten or undermined in Wikipedia articles about general topics. It would be worth to give a specific attention on this topic. WikiProjects such as Women in Red could focus on this issue and be sure that the role of women hasn't been undermined in articles.
References
- ^ "The idea of closing the “gender gap” itself has always struck me as somewhat problematic as it implies a gulf between two equivalent sides and reinforces the idea of binary gender. An aspiration to equitable “gender diversity” might be more fitting" writes Katherine Maher in "Capstone: Making History, Building the Future Together", in Wikipedia @ 20, MIT Press, 2020, https://wikipedia20.pubpub.org/pub/4d61w771/release/2?readingCollection=08ec69da
- ^ https://larevuedesmedias.ina.fr/la-radio-et-la-tele-les-femmes-parlent-deux-fois-moins-que-les-hommes
- ^ "Computational appraisal of gender representativeness in popular movies", https://www.nature.com/articles/s41599-021-00815-9
- ^ Gendered News project, https://gendered-news.imag.fr/genderednews/
- ^ See the SPARQL queries in the project methodology
- ^ https://observablehq.com/@pac02/explore-gender-diversity-in-a-single-wikipedia-article?wikipedia=en.wikipedia.org&article=Economics
- ^ https://observablehq.com/@pac02/gender-diversity-in-wikipedia-articles-evidence-from-some?collection=@pac02/gender-diversity-in-wikipedia-articles
Discuss this story
{{reply to|Chess}}
on reply) 21:47, 29 May 2022 (UTC)[reply]