Native Language Identification Across Text Types: How Special Are Scientists?
Native Language Identification (NLI) is the task of recognizing the native language of an author from text that they wrote in another language. In this paper, we investigate the generalizability of NLI models among learner corpora, and from learner corpora to a new text type, namely scientific artic...
Guardado en:
Autores principales: | , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Accademia University Press
2016
|
Materias: | |
Acceso en línea: | https://doaj.org/article/ad3c03ee29b045bcb83cade3d11d93f2 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:ad3c03ee29b045bcb83cade3d11d93f2 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:ad3c03ee29b045bcb83cade3d11d93f22021-12-02T09:52:25ZNative Language Identification Across Text Types: How Special Are Scientists?2499-455310.4000/ijcol.348https://doaj.org/article/ad3c03ee29b045bcb83cade3d11d93f22016-06-01T00:00:00Zhttp://journals.openedition.org/ijcol/348https://doaj.org/toc/2499-4553Native Language Identification (NLI) is the task of recognizing the native language of an author from text that they wrote in another language. In this paper, we investigate the generalizability of NLI models among learner corpora, and from learner corpora to a new text type, namely scientific articles. Our main results are: (a) the science corpus is not harder to model than some learner corpora; (b) it cannot profit as much as learner corpora from corpus combination via domain adaptation; (c) this pattern can be explained in terms of the respective models focusing on language transfer and topic indicators to different extents.Sabrina StehwienSebastian PadóAccademia University PressarticleSocial SciencesHComputational linguistics. Natural language processingP98-98.5ENIJCoL, Vol 2, Iss 1 (2016) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Social Sciences H Computational linguistics. Natural language processing P98-98.5 |
spellingShingle |
Social Sciences H Computational linguistics. Natural language processing P98-98.5 Sabrina Stehwien Sebastian Padó Native Language Identification Across Text Types: How Special Are Scientists? |
description |
Native Language Identification (NLI) is the task of recognizing the native language of an author from text that they wrote in another language. In this paper, we investigate the generalizability of NLI models among learner corpora, and from learner corpora to a new text type, namely scientific articles. Our main results are: (a) the science corpus is not harder to model than some learner corpora; (b) it cannot profit as much as learner corpora from corpus combination via domain adaptation; (c) this pattern can be explained in terms of the respective models focusing on language transfer and topic indicators to different extents. |
format |
article |
author |
Sabrina Stehwien Sebastian Padó |
author_facet |
Sabrina Stehwien Sebastian Padó |
author_sort |
Sabrina Stehwien |
title |
Native Language Identification Across Text Types: How Special Are Scientists? |
title_short |
Native Language Identification Across Text Types: How Special Are Scientists? |
title_full |
Native Language Identification Across Text Types: How Special Are Scientists? |
title_fullStr |
Native Language Identification Across Text Types: How Special Are Scientists? |
title_full_unstemmed |
Native Language Identification Across Text Types: How Special Are Scientists? |
title_sort |
native language identification across text types: how special are scientists? |
publisher |
Accademia University Press |
publishDate |
2016 |
url |
https://doaj.org/article/ad3c03ee29b045bcb83cade3d11d93f2 |
work_keys_str_mv |
AT sabrinastehwien nativelanguageidentificationacrosstexttypeshowspecialarescientists AT sebastianpado nativelanguageidentificationacrosstexttypeshowspecialarescientists |
_version_ |
1718397963380719616 |