Towards Finer Granularity in Metadata

In early 2010, the Austrian Academy of Sciences’ ICLTT instituted an experiment in selective metadata creation for a medium-sized collection (<100 million tokens) of digitised periodicals. The project has two main objectives: (a) assigning basic structures to previously digitised texts, so-c...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Gerhard Budin, Heinrich Kabas, Karlheinz Mörth
Formato:	article
Lenguaje:	DE EN ES FR IT
Publicado:	OpenEdition 2012
Materias:	corpus annotation metadata metadata creation TEI headers tools Computer engineering. Computer hardware TK7885-7895
Acceso en línea:	https://doaj.org/article/9c937d6b011c463c91f64314cd0fbb44
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:9c937d6b011c463c91f64314cd0fbb44
record_format	dspace
spelling	oai:doaj.org-article:9c937d6b011c463c91f64314cd0fbb442021-12-02T11:29:53ZTowards Finer Granularity in Metadata2162-560310.4000/jtei.416https://doaj.org/article/9c937d6b011c463c91f64314cd0fbb442012-02-01T00:00:00Zhttp://journals.openedition.org/jtei/416https://doaj.org/toc/2162-5603In early 2010, the Austrian Academy of Sciences’ ICLTT instituted an experiment in selective metadata creation for a medium-sized collection (<100 million tokens) of digitised periodicals. The project has two main objectives: (a) assigning basic structures to previously digitised texts, so-called divisions in TEI nomenclature, thus creating a set of new digital objects, and (b) the subsequent categorisation of these texts with the purpose of being able to create thematically organised sub-corpora. An additional objective was to have metadata stored as TEI headers. Attempts at streamlining metadata creation are legion, in particular in the library community. Tools to do the job are often incorporated into workflow engines which consist of commercial products (such as docWORKS[e] and C-3) as well as free products such as Goobi, which incorporates the metadata creation tool RusDML, and the Archivists’ Toolkit™. The experimental workflow being tested at the ICLTT is an attempt to capture detailed metadata for a comparatively large collection of digitised periodicals and other collective publications such as yearbooks, readers, commemorative publications, almanacs, and anthologies. While all higher-level digital objects in the corpus were furnished with metadata from the beginning of the digitisation process, the current experiment is designed to enrich this data to more fully describe the contents of the material at hand. To achieve this end, the department’s standard tools were adapted, which had the added benefit of keeping software production costs at a minimum. While in earlier experiments of our group of researchers (metadata creators) created the TEI header for each text division manually, we have been trying to approach the problem by exploiting the contents section of the digitised issues and/or other secondary sources, which has resulted in a tangible acceleration of the process. Together with collecting basic data such as author, title, publication date, and creation date, the project classifies each division with a type of texts and topics, the latter using the standard Dewey Decimal Classification (version 22, German) with supplementary keywords. This paper discusses a number of issues concerning the quality and type of resulting data. It also touches upon the issue of automation and at what points in the process human intervention is indispensible. Particular attention is directed at the software module for creating TEI headers.Gerhard BudinHeinrich KabasKarlheinz MörthOpenEditionarticlecorpus annotationmetadatametadata creationTEI headerstoolsComputer engineering. Computer hardwareTK7885-7895DEENESFRITJournal of the Text Encoding Initiative, Vol 2 (2012)
institution	DOAJ
collection	DOAJ
language	DE EN ES FR IT
topic	corpus annotation metadata metadata creation TEI headers tools Computer engineering. Computer hardware TK7885-7895
spellingShingle	corpus annotation metadata metadata creation TEI headers tools Computer engineering. Computer hardware TK7885-7895 Gerhard Budin Heinrich Kabas Karlheinz Mörth Towards Finer Granularity in Metadata
description	In early 2010, the Austrian Academy of Sciences’ ICLTT instituted an experiment in selective metadata creation for a medium-sized collection (<100 million tokens) of digitised periodicals. The project has two main objectives: (a) assigning basic structures to previously digitised texts, so-called divisions in TEI nomenclature, thus creating a set of new digital objects, and (b) the subsequent categorisation of these texts with the purpose of being able to create thematically organised sub-corpora. An additional objective was to have metadata stored as TEI headers. Attempts at streamlining metadata creation are legion, in particular in the library community. Tools to do the job are often incorporated into workflow engines which consist of commercial products (such as docWORKS[e] and C-3) as well as free products such as Goobi, which incorporates the metadata creation tool RusDML, and the Archivists’ Toolkit™. The experimental workflow being tested at the ICLTT is an attempt to capture detailed metadata for a comparatively large collection of digitised periodicals and other collective publications such as yearbooks, readers, commemorative publications, almanacs, and anthologies. While all higher-level digital objects in the corpus were furnished with metadata from the beginning of the digitisation process, the current experiment is designed to enrich this data to more fully describe the contents of the material at hand. To achieve this end, the department’s standard tools were adapted, which had the added benefit of keeping software production costs at a minimum. While in earlier experiments of our group of researchers (metadata creators) created the TEI header for each text division manually, we have been trying to approach the problem by exploiting the contents section of the digitised issues and/or other secondary sources, which has resulted in a tangible acceleration of the process. Together with collecting basic data such as author, title, publication date, and creation date, the project classifies each division with a type of texts and topics, the latter using the standard Dewey Decimal Classification (version 22, German) with supplementary keywords. This paper discusses a number of issues concerning the quality and type of resulting data. It also touches upon the issue of automation and at what points in the process human intervention is indispensible. Particular attention is directed at the software module for creating TEI headers.
format	article
author	Gerhard Budin Heinrich Kabas Karlheinz Mörth
author_facet	Gerhard Budin Heinrich Kabas Karlheinz Mörth
author_sort	Gerhard Budin
title	Towards Finer Granularity in Metadata
title_short	Towards Finer Granularity in Metadata
title_full	Towards Finer Granularity in Metadata
title_fullStr	Towards Finer Granularity in Metadata
title_full_unstemmed	Towards Finer Granularity in Metadata
title_sort	towards finer granularity in metadata
publisher	OpenEdition
publishDate	2012
url	https://doaj.org/article/9c937d6b011c463c91f64314cd0fbb44
work_keys_str_mv	AT gerhardbudin towardsfinergranularityinmetadata AT heinrichkabas towardsfinergranularityinmetadata AT karlheinzmorth towardsfinergranularityinmetadata
_version_	1718395894216261632

Towards Finer Granularity in Metadata

Ejemplares similares