Towards Finer Granularity in Metadata

In early 2010, the Austrian Academy of Sciences’ ICLTT instituted an experiment in selective metadata creation for a medium-sized collection (<100 million tokens) of digitised periodicals. The project has two main objectives: (a) assigning basic structures to previously digitised texts, so-c...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Gerhard Budin, Heinrich Kabas, Karlheinz Mörth
Formato: article
Lenguaje:DE
EN
ES
FR
IT
Publicado: OpenEdition 2012
Materias:
Acceso en línea:https://doaj.org/article/9c937d6b011c463c91f64314cd0fbb44
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:9c937d6b011c463c91f64314cd0fbb44
record_format dspace
spelling oai:doaj.org-article:9c937d6b011c463c91f64314cd0fbb442021-12-02T11:29:53ZTowards Finer Granularity in Metadata2162-560310.4000/jtei.416https://doaj.org/article/9c937d6b011c463c91f64314cd0fbb442012-02-01T00:00:00Zhttp://journals.openedition.org/jtei/416https://doaj.org/toc/2162-5603In early 2010, the Austrian Academy of Sciences’ ICLTT instituted an experiment in selective metadata creation for a medium-sized collection (<100 million tokens) of digitised periodicals. The project has two main objectives: (a) assigning basic structures to previously digitised texts, so-called divisions in TEI nomenclature, thus creating a set of new digital objects, and (b) the subsequent categorisation of these texts with the purpose of being able to create thematically organised sub-corpora. An additional objective was to have metadata stored as TEI headers. Attempts at streamlining metadata creation are legion, in particular in the library community. Tools to do the job are often incorporated into workflow engines which consist of commercial products (such as docWORKS[e] and C-3) as well as free products such as Goobi, which incorporates the metadata creation tool RusDML, and the Archivists’ Toolkit™. The experimental workflow being tested at the ICLTT is an attempt to capture detailed metadata for a comparatively large collection of digitised periodicals and other collective publications such as yearbooks, readers, commemorative publications, almanacs, and anthologies. While all higher-level digital objects in the corpus were furnished with metadata from the beginning of the digitisation process, the current experiment is designed to enrich this data to more fully describe the contents of the material at hand. To achieve this end, the department’s standard tools were adapted, which had the added benefit of keeping software production costs at a minimum. While in earlier experiments of our group of researchers (metadata creators) created the TEI header for each text division manually, we have been trying to approach the problem by exploiting the contents section of the digitised issues and/or other secondary sources, which has resulted in a tangible acceleration of the process. Together with collecting basic data such as author, title, publication date, and creation date, the project classifies each division with a type of texts and topics, the latter using the standard Dewey Decimal Classification (version 22, German) with supplementary keywords. This paper discusses a number of issues concerning the quality and type of resulting data. It also touches upon the issue of automation and at what points in the process human intervention is indispensible. Particular attention is directed at the software module for creating TEI headers.Gerhard BudinHeinrich KabasKarlheinz MörthOpenEditionarticlecorpus annotationmetadatametadata creationTEI headerstoolsComputer engineering. Computer hardwareTK7885-7895DEENESFRITJournal of the Text Encoding Initiative, Vol 2 (2012)
institution DOAJ
collection DOAJ
language DE
EN
ES
FR
IT
topic corpus annotation
metadata
metadata creation
TEI headers
tools
Computer engineering. Computer hardware
TK7885-7895
spellingShingle corpus annotation
metadata
metadata creation
TEI headers
tools
Computer engineering. Computer hardware
TK7885-7895
Gerhard Budin
Heinrich Kabas
Karlheinz Mörth
Towards Finer Granularity in Metadata
description In early 2010, the Austrian Academy of Sciences’ ICLTT instituted an experiment in selective metadata creation for a medium-sized collection (<100 million tokens) of digitised periodicals. The project has two main objectives: (a) assigning basic structures to previously digitised texts, so-called divisions in TEI nomenclature, thus creating a set of new digital objects, and (b) the subsequent categorisation of these texts with the purpose of being able to create thematically organised sub-corpora. An additional objective was to have metadata stored as TEI headers. Attempts at streamlining metadata creation are legion, in particular in the library community. Tools to do the job are often incorporated into workflow engines which consist of commercial products (such as docWORKS[e] and C-3) as well as free products such as Goobi, which incorporates the metadata creation tool RusDML, and the Archivists’ Toolkit™. The experimental workflow being tested at the ICLTT is an attempt to capture detailed metadata for a comparatively large collection of digitised periodicals and other collective publications such as yearbooks, readers, commemorative publications, almanacs, and anthologies. While all higher-level digital objects in the corpus were furnished with metadata from the beginning of the digitisation process, the current experiment is designed to enrich this data to more fully describe the contents of the material at hand. To achieve this end, the department’s standard tools were adapted, which had the added benefit of keeping software production costs at a minimum. While in earlier experiments of our group of researchers (metadata creators) created the TEI header for each text division manually, we have been trying to approach the problem by exploiting the contents section of the digitised issues and/or other secondary sources, which has resulted in a tangible acceleration of the process. Together with collecting basic data such as author, title, publication date, and creation date, the project classifies each division with a type of texts and topics, the latter using the standard Dewey Decimal Classification (version 22, German) with supplementary keywords. This paper discusses a number of issues concerning the quality and type of resulting data. It also touches upon the issue of automation and at what points in the process human intervention is indispensible. Particular attention is directed at the software module for creating TEI headers.
format article
author Gerhard Budin
Heinrich Kabas
Karlheinz Mörth
author_facet Gerhard Budin
Heinrich Kabas
Karlheinz Mörth
author_sort Gerhard Budin
title Towards Finer Granularity in Metadata
title_short Towards Finer Granularity in Metadata
title_full Towards Finer Granularity in Metadata
title_fullStr Towards Finer Granularity in Metadata
title_full_unstemmed Towards Finer Granularity in Metadata
title_sort towards finer granularity in metadata
publisher OpenEdition
publishDate 2012
url https://doaj.org/article/9c937d6b011c463c91f64314cd0fbb44
work_keys_str_mv AT gerhardbudin towardsfinergranularityinmetadata
AT heinrichkabas towardsfinergranularityinmetadata
AT karlheinzmorth towardsfinergranularityinmetadata
_version_ 1718395894216261632