Native Language Identification Across Text Types: How Special Are Scientists?

Native Language Identification (NLI) is the task of recognizing the native language of an author from text that they wrote in another language. In this paper, we investigate the generalizability of NLI models among learner corpora, and from learner corpora to a new text type, namely scientific artic...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Sabrina Stehwien, Sebastian Padó
Formato: article
Lenguaje:EN
Publicado: Accademia University Press 2016
Materias:
H
Acceso en línea:https://doaj.org/article/ad3c03ee29b045bcb83cade3d11d93f2
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:ad3c03ee29b045bcb83cade3d11d93f2
record_format dspace
spelling oai:doaj.org-article:ad3c03ee29b045bcb83cade3d11d93f22021-12-02T09:52:25ZNative Language Identification Across Text Types: How Special Are Scientists?2499-455310.4000/ijcol.348https://doaj.org/article/ad3c03ee29b045bcb83cade3d11d93f22016-06-01T00:00:00Zhttp://journals.openedition.org/ijcol/348https://doaj.org/toc/2499-4553Native Language Identification (NLI) is the task of recognizing the native language of an author from text that they wrote in another language. In this paper, we investigate the generalizability of NLI models among learner corpora, and from learner corpora to a new text type, namely scientific articles. Our main results are: (a) the science corpus is not harder to model than some learner corpora; (b) it cannot profit as much as learner corpora from corpus combination via domain adaptation; (c) this pattern can be explained in terms of the respective models focusing on language transfer and topic indicators to different extents.Sabrina StehwienSebastian PadóAccademia University PressarticleSocial SciencesHComputational linguistics. Natural language processingP98-98.5ENIJCoL, Vol 2, Iss 1 (2016)
institution DOAJ
collection DOAJ
language EN
topic Social Sciences
H
Computational linguistics. Natural language processing
P98-98.5
spellingShingle Social Sciences
H
Computational linguistics. Natural language processing
P98-98.5
Sabrina Stehwien
Sebastian Padó
Native Language Identification Across Text Types: How Special Are Scientists?
description Native Language Identification (NLI) is the task of recognizing the native language of an author from text that they wrote in another language. In this paper, we investigate the generalizability of NLI models among learner corpora, and from learner corpora to a new text type, namely scientific articles. Our main results are: (a) the science corpus is not harder to model than some learner corpora; (b) it cannot profit as much as learner corpora from corpus combination via domain adaptation; (c) this pattern can be explained in terms of the respective models focusing on language transfer and topic indicators to different extents.
format article
author Sabrina Stehwien
Sebastian Padó
author_facet Sabrina Stehwien
Sebastian Padó
author_sort Sabrina Stehwien
title Native Language Identification Across Text Types: How Special Are Scientists?
title_short Native Language Identification Across Text Types: How Special Are Scientists?
title_full Native Language Identification Across Text Types: How Special Are Scientists?
title_fullStr Native Language Identification Across Text Types: How Special Are Scientists?
title_full_unstemmed Native Language Identification Across Text Types: How Special Are Scientists?
title_sort native language identification across text types: how special are scientists?
publisher Accademia University Press
publishDate 2016
url https://doaj.org/article/ad3c03ee29b045bcb83cade3d11d93f2
work_keys_str_mv AT sabrinastehwien nativelanguageidentificationacrosstexttypeshowspecialarescientists
AT sebastianpado nativelanguageidentificationacrosstexttypeshowspecialarescientists
_version_ 1718397963380719616