TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI

CORLI is a consortium of Huma-Num, the French national infrastructure dedicated to the technical support and promotion of digital humanities. The goal of CORLI is to promote and provide tools and information for good and efficient research practices in corpus linguistics, especially on spoken langua...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Christophe Parisse, Carole Etienne, Loïc Liégeois
Formato: article
Lenguaje:DE
EN
ES
FR
IT
Publicado: OpenEdition 2021
Materias:
TEI
Acceso en línea:https://doaj.org/article/f1216e400ffe429d9d5bd4232acfc402
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:f1216e400ffe429d9d5bd4232acfc402
record_format dspace
spelling oai:doaj.org-article:f1216e400ffe429d9d5bd4232acfc4022021-12-02T11:30:58ZTEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI2162-560310.4000/jtei.3464https://doaj.org/article/f1216e400ffe429d9d5bd4232acfc4022021-07-01T00:00:00Zhttp://journals.openedition.org/jtei/3464https://doaj.org/toc/2162-5603CORLI is a consortium of Huma-Num, the French national infrastructure dedicated to the technical support and promotion of digital humanities. The goal of CORLI is to promote and provide tools and information for good and efficient research practices in corpus linguistics, especially on spoken language corpora. Because of the time required to collect and transcribe spoken language resources, their number is limited and thus corpora need to be interoperable and reusable in order to improve research on themes such as phonology, prosody, interaction, syntax, and textometry. To help researchers reach this goal, CORLI has designed a pair of tools: TEICORPO to assist in the conversion and use of spoken language corpora, and TEIMETA for metadata purposes. TEICORPO is based on the principle of an underlying common format, namely TEI XML as described in its specification for spoken language use (ISO 2016). This tool enables the conversion of transcriptions created with alignment software such as CLAN, Transcriber, Praat, or ELAN as well as common file formats (CSV, XLSX, TXT, or DOCX) and the TEI format, which plays the role of a lossless pivot format. Backward conversion is possible in many cases, with limitations inherent in the destination target format. TEICORPO can run the Treetagger part-of-speech tagger and the Stanford CoreNLP tools on TEI files and can export the resulting files to textometric tools such as TXM, Le Trameur, or Iramuteq, making it suitable for spoken language corpora editing as well as for various research purposes.Christophe ParisseCarole EtienneLoïc LiégeoisOpenEditionarticleTEItranscriptionoral corporaconversionannotationBlockComputer engineering. Computer hardwareTK7885-7895DEENESFRITJournal of the Text Encoding Initiative, Vol 13 (2021)
institution DOAJ
collection DOAJ
language DE
EN
ES
FR
IT
topic TEI
transcription
oral corpora
conversion
annotationBlock
Computer engineering. Computer hardware
TK7885-7895
spellingShingle TEI
transcription
oral corpora
conversion
annotationBlock
Computer engineering. Computer hardware
TK7885-7895
Christophe Parisse
Carole Etienne
Loïc Liégeois
TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
description CORLI is a consortium of Huma-Num, the French national infrastructure dedicated to the technical support and promotion of digital humanities. The goal of CORLI is to promote and provide tools and information for good and efficient research practices in corpus linguistics, especially on spoken language corpora. Because of the time required to collect and transcribe spoken language resources, their number is limited and thus corpora need to be interoperable and reusable in order to improve research on themes such as phonology, prosody, interaction, syntax, and textometry. To help researchers reach this goal, CORLI has designed a pair of tools: TEICORPO to assist in the conversion and use of spoken language corpora, and TEIMETA for metadata purposes. TEICORPO is based on the principle of an underlying common format, namely TEI XML as described in its specification for spoken language use (ISO 2016). This tool enables the conversion of transcriptions created with alignment software such as CLAN, Transcriber, Praat, or ELAN as well as common file formats (CSV, XLSX, TXT, or DOCX) and the TEI format, which plays the role of a lossless pivot format. Backward conversion is possible in many cases, with limitations inherent in the destination target format. TEICORPO can run the Treetagger part-of-speech tagger and the Stanford CoreNLP tools on TEI files and can export the resulting files to textometric tools such as TXM, Le Trameur, or Iramuteq, making it suitable for spoken language corpora editing as well as for various research purposes.
format article
author Christophe Parisse
Carole Etienne
Loïc Liégeois
author_facet Christophe Parisse
Carole Etienne
Loïc Liégeois
author_sort Christophe Parisse
title TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_short TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_full TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_fullStr TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_full_unstemmed TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_sort teicorpo: a conversion tool for spoken language transcription with a pivot file in tei
publisher OpenEdition
publishDate 2021
url https://doaj.org/article/f1216e400ffe429d9d5bd4232acfc402
work_keys_str_mv AT christopheparisse teicorpoaconversiontoolforspokenlanguagetranscriptionwithapivotfileintei
AT caroleetienne teicorpoaconversiontoolforspokenlanguagetranscriptionwithapivotfileintei
AT loicliegeois teicorpoaconversiontoolforspokenlanguagetranscriptionwithapivotfileintei
_version_ 1718395914572267520