Quality Assessment Methods for Textual Conversational Interfaces: A Multivocal Literature Review

The evaluation and assessment of conversational interfaces is a complex task since such software products are challenging to validate through traditional testing approaches. We conducted a systematic Multivocal Literature Review (MLR), on five different literature sources, to provide a view on quali...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Riccardo Coppola, Luca Ardito
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/8ce15c5efd6241b6bf23093b83bb1edf
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:8ce15c5efd6241b6bf23093b83bb1edf
record_format dspace
spelling oai:doaj.org-article:8ce15c5efd6241b6bf23093b83bb1edf2021-11-25T17:58:23ZQuality Assessment Methods for Textual Conversational Interfaces: A Multivocal Literature Review10.3390/info121104372078-2489https://doaj.org/article/8ce15c5efd6241b6bf23093b83bb1edf2021-10-01T00:00:00Zhttps://www.mdpi.com/2078-2489/12/11/437https://doaj.org/toc/2078-2489The evaluation and assessment of conversational interfaces is a complex task since such software products are challenging to validate through traditional testing approaches. We conducted a systematic Multivocal Literature Review (MLR), on five different literature sources, to provide a view on quality attributes, evaluation frameworks, and evaluation datasets proposed to provide aid to the researchers and practitioners of the field. We came up with a final pool of 118 contributions, including grey (35) and white literature (83). We categorized 123 different quality attributes and metrics under ten different categories and four macro-categories: Relational, Conversational, User-Centered and Quantitative attributes. While Relational and Conversational attributes are most commonly explored by the scientific literature, we testified a predominance of User-Centered Attributes in industrial literature. We also identified five different academic frameworks/tools to automatically compute sets of metrics, and 28 datasets (subdivided into seven different categories based on the type of data contained) that can produce conversations for the evaluation of conversational interfaces. Our analysis of literature highlights that a high number of qualitative and quantitative attributes are available in the literature to evaluate the performance of conversational interfaces. Our categorization can serve as a valid entry point for researchers and practitioners to select the proper functional and non-functional aspects to be evaluated for their products.Riccardo CoppolaLuca ArditoMDPI AGarticleconversational interfacessoftware quality attributessoftware qualityInformation technologyT58.5-58.64ENInformation, Vol 12, Iss 437, p 437 (2021)
institution DOAJ
collection DOAJ
language EN
topic conversational interfaces
software quality attributes
software quality
Information technology
T58.5-58.64
spellingShingle conversational interfaces
software quality attributes
software quality
Information technology
T58.5-58.64
Riccardo Coppola
Luca Ardito
Quality Assessment Methods for Textual Conversational Interfaces: A Multivocal Literature Review
description The evaluation and assessment of conversational interfaces is a complex task since such software products are challenging to validate through traditional testing approaches. We conducted a systematic Multivocal Literature Review (MLR), on five different literature sources, to provide a view on quality attributes, evaluation frameworks, and evaluation datasets proposed to provide aid to the researchers and practitioners of the field. We came up with a final pool of 118 contributions, including grey (35) and white literature (83). We categorized 123 different quality attributes and metrics under ten different categories and four macro-categories: Relational, Conversational, User-Centered and Quantitative attributes. While Relational and Conversational attributes are most commonly explored by the scientific literature, we testified a predominance of User-Centered Attributes in industrial literature. We also identified five different academic frameworks/tools to automatically compute sets of metrics, and 28 datasets (subdivided into seven different categories based on the type of data contained) that can produce conversations for the evaluation of conversational interfaces. Our analysis of literature highlights that a high number of qualitative and quantitative attributes are available in the literature to evaluate the performance of conversational interfaces. Our categorization can serve as a valid entry point for researchers and practitioners to select the proper functional and non-functional aspects to be evaluated for their products.
format article
author Riccardo Coppola
Luca Ardito
author_facet Riccardo Coppola
Luca Ardito
author_sort Riccardo Coppola
title Quality Assessment Methods for Textual Conversational Interfaces: A Multivocal Literature Review
title_short Quality Assessment Methods for Textual Conversational Interfaces: A Multivocal Literature Review
title_full Quality Assessment Methods for Textual Conversational Interfaces: A Multivocal Literature Review
title_fullStr Quality Assessment Methods for Textual Conversational Interfaces: A Multivocal Literature Review
title_full_unstemmed Quality Assessment Methods for Textual Conversational Interfaces: A Multivocal Literature Review
title_sort quality assessment methods for textual conversational interfaces: a multivocal literature review
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/8ce15c5efd6241b6bf23093b83bb1edf
work_keys_str_mv AT riccardocoppola qualityassessmentmethodsfortextualconversationalinterfacesamultivocalliteraturereview
AT lucaardito qualityassessmentmethodsfortextualconversationalinterfacesamultivocalliteraturereview
_version_ 1718411815743913984