An analysis of unconscious gender bias in academic texts by means of a decision algorithm.

Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Pedro Orgeira-Crespo, Carla Míguez-Álvarez, Miguel Cuevas-Alonso, Elena Rivo-López
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/ac933f33d8a74ac2bf1204b25abfb9ca
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:ac933f33d8a74ac2bf1204b25abfb9ca
record_format dspace
spelling oai:doaj.org-article:ac933f33d8a74ac2bf1204b25abfb9ca2021-12-02T20:13:54ZAn analysis of unconscious gender bias in academic texts by means of a decision algorithm.1932-620310.1371/journal.pone.0257903https://doaj.org/article/ac933f33d8a74ac2bf1204b25abfb9ca2021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0257903https://doaj.org/toc/1932-6203Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context of the doctoral thesis. A significant number of documents have been used, using all accessible doctoral theses from Spanish universities of the last 40 years; this dataset is only manageable by data mining systems, so that the training allows identifying the terms within the context effectively and compiling them in a novel dictionary of non-inclusive terms.Pedro Orgeira-CrespoCarla Míguez-ÁlvarezMiguel Cuevas-AlonsoElena Rivo-LópezPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 9, p e0257903 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Pedro Orgeira-Crespo
Carla Míguez-Álvarez
Miguel Cuevas-Alonso
Elena Rivo-López
An analysis of unconscious gender bias in academic texts by means of a decision algorithm.
description Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context of the doctoral thesis. A significant number of documents have been used, using all accessible doctoral theses from Spanish universities of the last 40 years; this dataset is only manageable by data mining systems, so that the training allows identifying the terms within the context effectively and compiling them in a novel dictionary of non-inclusive terms.
format article
author Pedro Orgeira-Crespo
Carla Míguez-Álvarez
Miguel Cuevas-Alonso
Elena Rivo-López
author_facet Pedro Orgeira-Crespo
Carla Míguez-Álvarez
Miguel Cuevas-Alonso
Elena Rivo-López
author_sort Pedro Orgeira-Crespo
title An analysis of unconscious gender bias in academic texts by means of a decision algorithm.
title_short An analysis of unconscious gender bias in academic texts by means of a decision algorithm.
title_full An analysis of unconscious gender bias in academic texts by means of a decision algorithm.
title_fullStr An analysis of unconscious gender bias in academic texts by means of a decision algorithm.
title_full_unstemmed An analysis of unconscious gender bias in academic texts by means of a decision algorithm.
title_sort analysis of unconscious gender bias in academic texts by means of a decision algorithm.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/ac933f33d8a74ac2bf1204b25abfb9ca
work_keys_str_mv AT pedroorgeiracrespo ananalysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT carlamiguezalvarez ananalysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT miguelcuevasalonso ananalysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT elenarivolopez ananalysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT pedroorgeiracrespo analysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT carlamiguezalvarez analysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT miguelcuevasalonso analysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT elenarivolopez analysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
_version_ 1718374728597504000