A TEI-based Approach to Standardising Spoken Language Transcription

This paper formulates a proposal for standardising spoken language transcription, as practised in conversation analysis, sociolinguistics, dialectology and related fields, with the help of the TEI guidelines. Two areas relevant to standardisation are identified and discussed: first, the macro struct...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autor principal:	Thomas Schmidt
Formato:	article
Lenguaje:	DE EN ES FR IT
Publicado:	OpenEdition 2011
Materias:	digital infrastructures spoken language standardization transcription Computer engineering. Computer hardware TK7885-7895
Acceso en línea:	https://doaj.org/article/8f064ee6c7324930b5fa5255bfd4b387
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:8f064ee6c7324930b5fa5255bfd4b387
record_format	dspace
spelling	oai:doaj.org-article:8f064ee6c7324930b5fa5255bfd4b3872021-12-02T11:30:34ZA TEI-based Approach to Standardising Spoken Language Transcription2162-560310.4000/jtei.142https://doaj.org/article/8f064ee6c7324930b5fa5255bfd4b3872011-06-01T00:00:00Zhttp://journals.openedition.org/jtei/142https://doaj.org/toc/2162-5603This paper formulates a proposal for standardising spoken language transcription, as practised in conversation analysis, sociolinguistics, dialectology and related fields, with the help of the TEI guidelines. Two areas relevant to standardisation are identified and discussed: first, the macro structure of transcriptions, as embodied in the data models and file formats of transcription tools such as ELAN, Praat or EXMARaLDA; second, the micro structure of transcriptions as embodied in transcription conventions such as CA, HIAT or GAT. A two-step process is described in which first the macro structure is represented in a generic TEI format based on elements defined in the P5 version of the Guidelines. In the second step, character data in this representation is parsed according to the regularities of a transcription convention resulting in a more fine-grained TEI markup which is also based on P5. It is argued that this two step process can, on the one hand, map idiosyncratic differences in tool formats and transcription conventions onto a unified representation. On the other hand, differences motivated by different theoretical decisions can be retained in a manner which still allows a common processing of data from different sources. In order to make the standard usable in practice, a conversion tool—TEI Drop—is presented which uses XSL transformations to carry out the conversion between different tool formats (CHAT, ELAN, EXMARaLDA, FOLKER and Transcriber) and the TEI representation of transcription macro structure (and vice versa) and which also provides methods for parsing the micro structure of transcriptions according to two different transcription conventions (HIAT and cGAT). Using this tool, transcribers can continue to work with software they are familiar with while still producing TEI-conformant transcription files. The paper concludes with a discussion of the work needed in order to establish the proposed standard. It is argued that both tool formats and the TEI guidelines are in a sufficiently mature state to serve as a basis for standardisation. Most work consequently remains in analysing and standardising differences between different transcription conventions.Thomas SchmidtOpenEditionarticledigital infrastructuresspoken languagestandardizationtranscriptionComputer engineering. Computer hardwareTK7885-7895DEENESFRITJournal of the Text Encoding Initiative, Vol 1 (2011)
institution	DOAJ
collection	DOAJ
language	DE EN ES FR IT
topic	digital infrastructures spoken language standardization transcription Computer engineering. Computer hardware TK7885-7895
spellingShingle	digital infrastructures spoken language standardization transcription Computer engineering. Computer hardware TK7885-7895 Thomas Schmidt A TEI-based Approach to Standardising Spoken Language Transcription
description	This paper formulates a proposal for standardising spoken language transcription, as practised in conversation analysis, sociolinguistics, dialectology and related fields, with the help of the TEI guidelines. Two areas relevant to standardisation are identified and discussed: first, the macro structure of transcriptions, as embodied in the data models and file formats of transcription tools such as ELAN, Praat or EXMARaLDA; second, the micro structure of transcriptions as embodied in transcription conventions such as CA, HIAT or GAT. A two-step process is described in which first the macro structure is represented in a generic TEI format based on elements defined in the P5 version of the Guidelines. In the second step, character data in this representation is parsed according to the regularities of a transcription convention resulting in a more fine-grained TEI markup which is also based on P5. It is argued that this two step process can, on the one hand, map idiosyncratic differences in tool formats and transcription conventions onto a unified representation. On the other hand, differences motivated by different theoretical decisions can be retained in a manner which still allows a common processing of data from different sources. In order to make the standard usable in practice, a conversion tool—TEI Drop—is presented which uses XSL transformations to carry out the conversion between different tool formats (CHAT, ELAN, EXMARaLDA, FOLKER and Transcriber) and the TEI representation of transcription macro structure (and vice versa) and which also provides methods for parsing the micro structure of transcriptions according to two different transcription conventions (HIAT and cGAT). Using this tool, transcribers can continue to work with software they are familiar with while still producing TEI-conformant transcription files. The paper concludes with a discussion of the work needed in order to establish the proposed standard. It is argued that both tool formats and the TEI guidelines are in a sufficiently mature state to serve as a basis for standardisation. Most work consequently remains in analysing and standardising differences between different transcription conventions.
format	article
author	Thomas Schmidt
author_facet	Thomas Schmidt
author_sort	Thomas Schmidt
title	A TEI-based Approach to Standardising Spoken Language Transcription
title_short	A TEI-based Approach to Standardising Spoken Language Transcription
title_full	A TEI-based Approach to Standardising Spoken Language Transcription
title_fullStr	A TEI-based Approach to Standardising Spoken Language Transcription
title_full_unstemmed	A TEI-based Approach to Standardising Spoken Language Transcription
title_sort	tei-based approach to standardising spoken language transcription
publisher	OpenEdition
publishDate	2011
url	https://doaj.org/article/8f064ee6c7324930b5fa5255bfd4b387
work_keys_str_mv	AT thomasschmidt ateibasedapproachtostandardisingspokenlanguagetranscription AT thomasschmidt teibasedapproachtostandardisingspokenlanguagetranscription
_version_	1718395890853478400

A TEI-based Approach to Standardising Spoken Language Transcription

Ejemplares similares