SUBTLEX-CH: Chinese word and character frequencies based on film subtitles.

<h4>Background</h4>Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other l...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Qing Cai, Marc Brysbaert
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2010
Materias:
R
Q
Acceso en línea:https://doaj.org/article/5a1c31a8e36b4ac1b6eacd342a631be4
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:5a1c31a8e36b4ac1b6eacd342a631be4
record_format dspace
spelling oai:doaj.org-article:5a1c31a8e36b4ac1b6eacd342a631be42021-12-02T20:21:14ZSUBTLEX-CH: Chinese word and character frequencies based on film subtitles.1932-620310.1371/journal.pone.0010729https://doaj.org/article/5a1c31a8e36b4ac1b6eacd342a631be42010-06-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/20532192/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Background</h4>Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to.<h4>Methodology</h4>Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts.<h4>Conclusions</h4>Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.Qing CaiMarc BrysbaertPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 5, Iss 6, p e10729 (2010)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Qing Cai
Marc Brysbaert
SUBTLEX-CH: Chinese word and character frequencies based on film subtitles.
description <h4>Background</h4>Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to.<h4>Methodology</h4>Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts.<h4>Conclusions</h4>Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.
format article
author Qing Cai
Marc Brysbaert
author_facet Qing Cai
Marc Brysbaert
author_sort Qing Cai
title SUBTLEX-CH: Chinese word and character frequencies based on film subtitles.
title_short SUBTLEX-CH: Chinese word and character frequencies based on film subtitles.
title_full SUBTLEX-CH: Chinese word and character frequencies based on film subtitles.
title_fullStr SUBTLEX-CH: Chinese word and character frequencies based on film subtitles.
title_full_unstemmed SUBTLEX-CH: Chinese word and character frequencies based on film subtitles.
title_sort subtlex-ch: chinese word and character frequencies based on film subtitles.
publisher Public Library of Science (PLoS)
publishDate 2010
url https://doaj.org/article/5a1c31a8e36b4ac1b6eacd342a631be4
work_keys_str_mv AT qingcai subtlexchchinesewordandcharacterfrequenciesbasedonfilmsubtitles
AT marcbrysbaert subtlexchchinesewordandcharacterfrequenciesbasedonfilmsubtitles
_version_ 1718374108855533568