The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution

Abstract The present study attempted to to investigate factors which affect EFL writing scores through using generalizability theory (G-theory). To this purpose, one hundred and twenty students participated in one independent and one integrated writing tasks. Proceeding, their performances were sc...

Full description

Saved in:

Bibliographic Details
Main Author:	Ali Khodi
Format:	article
Language:	EN
Published:	SpringerOpen 2021
Subjects:	Classical test theory Decision-making study Generalizability theory Writing assessment Language and Literature P
Online Access:	https://doaj.org/article/d3a9b55a30d24b82bd4f354aaa36c4c2
Tags:	Add Tag No Tags, Be the first to tag this record!

id	oai:doaj.org-article:d3a9b55a30d24b82bd4f354aaa36c4c2
record_format	dspace
spelling	oai:doaj.org-article:d3a9b55a30d24b82bd4f354aaa36c4c22021-11-14T12:24:49ZThe affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution10.1186/s40468-021-00134-52229-0443https://doaj.org/article/d3a9b55a30d24b82bd4f354aaa36c4c22021-10-01T00:00:00Zhttps://doi.org/10.1186/s40468-021-00134-5https://doaj.org/toc/2229-0443Abstract The present study attempted to to investigate factors which affect EFL writing scores through using generalizability theory (G-theory). To this purpose, one hundred and twenty students participated in one independent and one integrated writing tasks. Proceeding, their performances were scored by six raters: one self-rating, three peers,-rating and two instructors-rating. The main purpose of the sudy was to determine the relative and absolute contributions of different facets such as student, rater, task, method of scoring, and background of education to the validity of writing assessment scores. The results indicated three major sources of variance: (a) the student by task by method of scoring (nested in background of education) interaction (STM:B) with 31.8% contribution to the total variance, (b) the student by rater by task by method of scoring (nested in background of education) interaction (SRTM:B) with 26.5% of contribution to the total variance, and (c) the student by rater by method of scoring (nested in background of education) interaction (SRM:B) with 17.6% of the contribution. With regard to the G-coefficients in G-study (relative G-coefficient ≥ 0.86), it was also found that the result of the assessment was highly valid and reliable. The sources of error variance were detected as the student by rater (nested in background of education) (SR:B) and rater by background of education with 99.2% and 0.8% contribution to the error variance, respectively. Additionally, ten separate G-studies were conducted to investigate the contribution of different facets across rater, task, and methods of scoring as differentiation facet. These studies suggested that peer rating, analytical scoring method, and integrated writing tasks were the most reliable and generalizable designs of the writing assessments. Finally, five decision-making studies (D-studies) in optimization level were conducted and it was indicated that at least four raters (with G-coefficient = 0.80) are necessary for a valid and reliable assessment. Based on these results, to achieve the greatest gain in generalizability, teachers should have their students take two writing assessments and their performance should be rated on at least two scoring methods by at least four raters.Ali KhodiSpringerOpenarticleClassical test theoryDecision-making studyGeneralizability theoryWriting assessmentLanguage and LiteraturePENLanguage Testing in Asia, Vol 11, Iss 1, Pp 1-27 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Classical test theory Decision-making study Generalizability theory Writing assessment Language and Literature P
spellingShingle	Classical test theory Decision-making study Generalizability theory Writing assessment Language and Literature P Ali Khodi The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution
description	Abstract The present study attempted to to investigate factors which affect EFL writing scores through using generalizability theory (G-theory). To this purpose, one hundred and twenty students participated in one independent and one integrated writing tasks. Proceeding, their performances were scored by six raters: one self-rating, three peers,-rating and two instructors-rating. The main purpose of the sudy was to determine the relative and absolute contributions of different facets such as student, rater, task, method of scoring, and background of education to the validity of writing assessment scores. The results indicated three major sources of variance: (a) the student by task by method of scoring (nested in background of education) interaction (STM:B) with 31.8% contribution to the total variance, (b) the student by rater by task by method of scoring (nested in background of education) interaction (SRTM:B) with 26.5% of contribution to the total variance, and (c) the student by rater by method of scoring (nested in background of education) interaction (SRM:B) with 17.6% of the contribution. With regard to the G-coefficients in G-study (relative G-coefficient ≥ 0.86), it was also found that the result of the assessment was highly valid and reliable. The sources of error variance were detected as the student by rater (nested in background of education) (SR:B) and rater by background of education with 99.2% and 0.8% contribution to the error variance, respectively. Additionally, ten separate G-studies were conducted to investigate the contribution of different facets across rater, task, and methods of scoring as differentiation facet. These studies suggested that peer rating, analytical scoring method, and integrated writing tasks were the most reliable and generalizable designs of the writing assessments. Finally, five decision-making studies (D-studies) in optimization level were conducted and it was indicated that at least four raters (with G-coefficient = 0.80) are necessary for a valid and reliable assessment. Based on these results, to achieve the greatest gain in generalizability, teachers should have their students take two writing assessments and their performance should be rated on at least two scoring methods by at least four raters.
format	article
author	Ali Khodi
author_facet	Ali Khodi
author_sort	Ali Khodi
title	The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution
title_short	The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution
title_full	The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution
title_fullStr	The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution
title_full_unstemmed	The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution
title_sort	affectability of writing assessment scores: a g-theory analysis of rater, task, and scoring method contribution
publisher	SpringerOpen
publishDate	2021
url	https://doaj.org/article/d3a9b55a30d24b82bd4f354aaa36c4c2
work_keys_str_mv	AT alikhodi theaffectabilityofwritingassessmentscoresagtheoryanalysisofratertaskandscoringmethodcontribution AT alikhodi affectabilityofwritingassessmentscoresagtheoryanalysisofratertaskandscoringmethodcontribution
_version_	1718429258055942144

The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution

Similar Items