Using de novo assembly to identify structural variation of eight complex immune system gene regions.
Driven by the necessity to survive environmental pathogens, the human immune system has evolved exceptional diversity and plasticity, to which several factors contribute including inheritable structural polymorphism of the underlying genes. Characterizing this variation is challenging due to the com...
Guardado en:
Autores principales: | , , , , , , , , , , , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Public Library of Science (PLoS)
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/bea2427071be4bdeb11fe0c7dccabc1c |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:bea2427071be4bdeb11fe0c7dccabc1c |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:bea2427071be4bdeb11fe0c7dccabc1c2021-12-02T19:58:09ZUsing de novo assembly to identify structural variation of eight complex immune system gene regions.1553-734X1553-735810.1371/journal.pcbi.1009254https://doaj.org/article/bea2427071be4bdeb11fe0c7dccabc1c2021-08-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1009254https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Driven by the necessity to survive environmental pathogens, the human immune system has evolved exceptional diversity and plasticity, to which several factors contribute including inheritable structural polymorphism of the underlying genes. Characterizing this variation is challenging due to the complexity of these loci, which contain extensive regions of paralogy, segmental duplication and high copy-number repeats, but recent progress in long-read sequencing and optical mapping techniques suggests this problem may now be tractable. Here we assess this by using long-read sequencing platforms from PacBio and Oxford Nanopore, supplemented with short-read sequencing and Bionano optical mapping, to sequence DNA extracted from CD14+ monocytes and peripheral blood mononuclear cells from a single European individual identified as HV31. We use this data to build a de novo assembly of eight genomic regions encoding four key components of the immune system, namely the human leukocyte antigen, immunoglobulins, T cell receptors, and killer-cell immunoglobulin-like receptors. Validation of our assembly using k-mer based and alignment approaches suggests that it has high accuracy, with estimated base-level error rates below 1 in 10 kb, although we identify a small number of remaining structural errors. We use the assembly to identify heterozygous and homozygous structural variation in comparison to GRCh38. Despite analyzing only a single individual, we find multiple large structural variants affecting core genes at all three immunoglobulin regions and at two of the three T cell receptor regions. Several of these variants are not accurately callable using current algorithms, implying that further methodological improvements are needed. Our results demonstrate that assessing haplotype variation in these regions is possible given sufficiently accurate long-read and associated data. Continued reductions in the cost of these technologies will enable application of these methods to larger samples and provide a broader catalogue of germline structural variation at these loci, an important step toward making these regions accessible to large-scale genetic association studies.Jia-Yuan ZhangHannah RobertsDavid S C FloresAntony J CutlerAndrew C BrownJustin P WhalleyOlga MielczarekDavid BuckHelen LockstoneBarbara XellaKaren OliverCraig CortonEmma BetteridgeRachael Bashford-RogersJulian C KnightJohn A ToddGavin BandPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 8, p e1009254 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Biology (General) QH301-705.5 |
spellingShingle |
Biology (General) QH301-705.5 Jia-Yuan Zhang Hannah Roberts David S C Flores Antony J Cutler Andrew C Brown Justin P Whalley Olga Mielczarek David Buck Helen Lockstone Barbara Xella Karen Oliver Craig Corton Emma Betteridge Rachael Bashford-Rogers Julian C Knight John A Todd Gavin Band Using de novo assembly to identify structural variation of eight complex immune system gene regions. |
description |
Driven by the necessity to survive environmental pathogens, the human immune system has evolved exceptional diversity and plasticity, to which several factors contribute including inheritable structural polymorphism of the underlying genes. Characterizing this variation is challenging due to the complexity of these loci, which contain extensive regions of paralogy, segmental duplication and high copy-number repeats, but recent progress in long-read sequencing and optical mapping techniques suggests this problem may now be tractable. Here we assess this by using long-read sequencing platforms from PacBio and Oxford Nanopore, supplemented with short-read sequencing and Bionano optical mapping, to sequence DNA extracted from CD14+ monocytes and peripheral blood mononuclear cells from a single European individual identified as HV31. We use this data to build a de novo assembly of eight genomic regions encoding four key components of the immune system, namely the human leukocyte antigen, immunoglobulins, T cell receptors, and killer-cell immunoglobulin-like receptors. Validation of our assembly using k-mer based and alignment approaches suggests that it has high accuracy, with estimated base-level error rates below 1 in 10 kb, although we identify a small number of remaining structural errors. We use the assembly to identify heterozygous and homozygous structural variation in comparison to GRCh38. Despite analyzing only a single individual, we find multiple large structural variants affecting core genes at all three immunoglobulin regions and at two of the three T cell receptor regions. Several of these variants are not accurately callable using current algorithms, implying that further methodological improvements are needed. Our results demonstrate that assessing haplotype variation in these regions is possible given sufficiently accurate long-read and associated data. Continued reductions in the cost of these technologies will enable application of these methods to larger samples and provide a broader catalogue of germline structural variation at these loci, an important step toward making these regions accessible to large-scale genetic association studies. |
format |
article |
author |
Jia-Yuan Zhang Hannah Roberts David S C Flores Antony J Cutler Andrew C Brown Justin P Whalley Olga Mielczarek David Buck Helen Lockstone Barbara Xella Karen Oliver Craig Corton Emma Betteridge Rachael Bashford-Rogers Julian C Knight John A Todd Gavin Band |
author_facet |
Jia-Yuan Zhang Hannah Roberts David S C Flores Antony J Cutler Andrew C Brown Justin P Whalley Olga Mielczarek David Buck Helen Lockstone Barbara Xella Karen Oliver Craig Corton Emma Betteridge Rachael Bashford-Rogers Julian C Knight John A Todd Gavin Band |
author_sort |
Jia-Yuan Zhang |
title |
Using de novo assembly to identify structural variation of eight complex immune system gene regions. |
title_short |
Using de novo assembly to identify structural variation of eight complex immune system gene regions. |
title_full |
Using de novo assembly to identify structural variation of eight complex immune system gene regions. |
title_fullStr |
Using de novo assembly to identify structural variation of eight complex immune system gene regions. |
title_full_unstemmed |
Using de novo assembly to identify structural variation of eight complex immune system gene regions. |
title_sort |
using de novo assembly to identify structural variation of eight complex immune system gene regions. |
publisher |
Public Library of Science (PLoS) |
publishDate |
2021 |
url |
https://doaj.org/article/bea2427071be4bdeb11fe0c7dccabc1c |
work_keys_str_mv |
AT jiayuanzhang usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT hannahroberts usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT davidscflores usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT antonyjcutler usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT andrewcbrown usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT justinpwhalley usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT olgamielczarek usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT davidbuck usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT helenlockstone usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT barbaraxella usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT karenoliver usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT craigcorton usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT emmabetteridge usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT rachaelbashfordrogers usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT juliancknight usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT johnatodd usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions AT gavinband usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions |
_version_ |
1718375800875515904 |