Native Language Identification Across Text Types: How Special Are Scientists?

Native Language Identification (NLI) is the task of recognizing the native language of an author from text that they wrote in another language. In this paper, we investigate the generalizability of NLI models among learner corpora, and from learner corpora to a new text type, namely scientific artic...

Full description

Saved in:
Bibliographic Details
Main Authors: Sabrina Stehwien, Sebastian Padó
Format: article
Language:EN
Published: Accademia University Press 2016
Subjects:
H
Online Access:https://doaj.org/article/ad3c03ee29b045bcb83cade3d11d93f2
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Native Language Identification (NLI) is the task of recognizing the native language of an author from text that they wrote in another language. In this paper, we investigate the generalizability of NLI models among learner corpora, and from learner corpora to a new text type, namely scientific articles. Our main results are: (a) the science corpus is not harder to model than some learner corpora; (b) it cannot profit as much as learner corpora from corpus combination via domain adaptation; (c) this pattern can be explained in terms of the respective models focusing on language transfer and topic indicators to different extents.