A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks

Abstract Contact tracing for venereal disease control has been widespread since 1936 and relies on reported information about contacts’ attributes to determine whether two contacts may represent the same individual. We developed and implemented a gold-standard for determining overlap between contact...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: John Schneider, L. Philip Schumm, Maya Fraser, Vijay Yeldandi, Chuanhong Liao
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2018
Materias:
R
Q
Acceso en línea:https://doaj.org/article/3eab4c6c10954703890f755ab5df7e89
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:3eab4c6c10954703890f755ab5df7e89
record_format dspace
spelling oai:doaj.org-article:3eab4c6c10954703890f755ab5df7e892021-12-02T15:07:52ZA Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks10.1038/s41598-018-26794-72045-2322https://doaj.org/article/3eab4c6c10954703890f755ab5df7e892018-06-01T00:00:00Zhttps://doi.org/10.1038/s41598-018-26794-7https://doaj.org/toc/2045-2322Abstract Contact tracing for venereal disease control has been widespread since 1936 and relies on reported information about contacts’ attributes to determine whether two contacts may represent the same individual. We developed and implemented a gold-standard for determining overlap between contacts reported by different individuals using cell phone numbers as unique identifiers. This method was then used to evaluate the performance of using reported names and demographic characteristics to infer overlap. Cell-phone numbers, names and demographic data for a sample of high-risk men in India and their contacts were collected using a novel, hybrid instrument involving both cell-phone data extraction and Computer-Assisted Personal Interviewing (CAPI). Logistic regression was used to model the probability that a pair of contacts reported by different respondents were identical, based on the correspondence between their reported names and attributes. A discrete mixture model is proposed which provides predictions nearly as good as the logistic model but may be used in a new population without re-calibration. Despite achieving AUCs of 0.83–0.86, the low rate of true overlap among a very large number of contact pairs still results in a high rate of false positives. Next generation contact tracing calls for more archived or digital matching processes.John SchneiderL. Philip SchummMaya FraserVijay YeldandiChuanhong LiaoNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 8, Iss 1, Pp 1-8 (2018)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
John Schneider
L. Philip Schumm
Maya Fraser
Vijay Yeldandi
Chuanhong Liao
A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks
description Abstract Contact tracing for venereal disease control has been widespread since 1936 and relies on reported information about contacts’ attributes to determine whether two contacts may represent the same individual. We developed and implemented a gold-standard for determining overlap between contacts reported by different individuals using cell phone numbers as unique identifiers. This method was then used to evaluate the performance of using reported names and demographic characteristics to infer overlap. Cell-phone numbers, names and demographic data for a sample of high-risk men in India and their contacts were collected using a novel, hybrid instrument involving both cell-phone data extraction and Computer-Assisted Personal Interviewing (CAPI). Logistic regression was used to model the probability that a pair of contacts reported by different respondents were identical, based on the correspondence between their reported names and attributes. A discrete mixture model is proposed which provides predictions nearly as good as the logistic model but may be used in a new population without re-calibration. Despite achieving AUCs of 0.83–0.86, the low rate of true overlap among a very large number of contact pairs still results in a high rate of false positives. Next generation contact tracing calls for more archived or digital matching processes.
format article
author John Schneider
L. Philip Schumm
Maya Fraser
Vijay Yeldandi
Chuanhong Liao
author_facet John Schneider
L. Philip Schumm
Maya Fraser
Vijay Yeldandi
Chuanhong Liao
author_sort John Schneider
title A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks
title_short A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks
title_full A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks
title_fullStr A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks
title_full_unstemmed A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks
title_sort gold-standard for entity resolution within sexually transmitted infection networks
publisher Nature Portfolio
publishDate 2018
url https://doaj.org/article/3eab4c6c10954703890f755ab5df7e89
work_keys_str_mv AT johnschneider agoldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT lphilipschumm agoldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT mayafraser agoldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT vijayyeldandi agoldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT chuanhongliao agoldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT johnschneider goldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT lphilipschumm goldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT mayafraser goldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT vijayyeldandi goldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT chuanhongliao goldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
_version_ 1718388378790002688