Construction of a public CHO cell line transcript database using versatile bioinformatics analysis pipelines.

Chinese hamster ovary (CHO) cell lines represent the most commonly used mammalian expression system for the production of therapeutic proteins. In this context, detailed knowledge of the CHO cell transcriptome might help to improve biotechnological processes conducted by specific cell lines. Neverth...

Description complète

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Oliver Rupp, Jennifer Becker, Karina Brinkrolf, Christina Timmermann, Nicole Borth, Alfred Pühler, Thomas Noll, Alexander Goesmann
Format:	article
Langue:	EN
Publié:	Public Library of Science (PLoS) 2014
Sujets:	Medicine R Science Q
Accès en ligne:	https://doaj.org/article/eae2d8a145fb40729561ea44d3d1e62b
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

id	oai:doaj.org-article:eae2d8a145fb40729561ea44d3d1e62b
record_format	dspace
spelling	oai:doaj.org-article:eae2d8a145fb40729561ea44d3d1e62b2021-11-18T08:37:59ZConstruction of a public CHO cell line transcript database using versatile bioinformatics analysis pipelines.1932-620310.1371/journal.pone.0085568https://doaj.org/article/eae2d8a145fb40729561ea44d3d1e62b2014-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24427317/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203Chinese hamster ovary (CHO) cell lines represent the most commonly used mammalian expression system for the production of therapeutic proteins. In this context, detailed knowledge of the CHO cell transcriptome might help to improve biotechnological processes conducted by specific cell lines. Nevertheless, very few assembled cDNA sequences of CHO cells were publicly released until recently, which puts a severe limitation on biotechnological research. Two extended annotation systems and web-based tools, one for browsing eukaryotic genomes (GenDBE) and one for viewing eukaryotic transcriptomes (SAMS), were established as the first step towards a publicly usable CHO cell genome/transcriptome analysis platform. This is complemented by the development of a new strategy to assemble the ca. 100 million reads, sequenced from a broad range of diverse transcripts, to a high quality CHO cell transcript set. The cDNA libraries were constructed from different CHO cell lines grown under various culture conditions and sequenced using Roche/454 and Illumina sequencing technologies in addition to sequencing reads from a previous study. Two pipelines to extend and improve the CHO cell line transcripts were established. First, de novo assemblies were carried out with the Trinity and Oases assemblers, using varying k-mer sizes. The resulting contigs were screened for potential CDS using ESTScan. Redundant contigs were filtered out using cd-hit-est. The remaining CDS contigs were re-assembled with CAP3. Second, a reference-based assembly with the TopHat/Cufflinks pipeline was performed, using the recently published draft genome sequence of CHO-K1 as reference. Additionally, the de novo contigs were mapped to the reference genome using GMAP and merged with the Cufflinks assembly using the cuffmerge software. With this approach 28,874 transcripts located on 16,492 gene loci could be assembled. Combining the results of both approaches, 65,561 transcripts were identified for CHO cell lines, which could be clustered by sequence identity into 17,598 gene clusters.Oliver RuppJennifer BeckerKarina BrinkrolfChristina TimmermannNicole BorthAlfred PühlerThomas NollAlexander GoesmannPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 9, Iss 1, p e85568 (2014)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Science Q
spellingShingle	Medicine R Science Q Oliver Rupp Jennifer Becker Karina Brinkrolf Christina Timmermann Nicole Borth Alfred Pühler Thomas Noll Alexander Goesmann Construction of a public CHO cell line transcript database using versatile bioinformatics analysis pipelines.
description	Chinese hamster ovary (CHO) cell lines represent the most commonly used mammalian expression system for the production of therapeutic proteins. In this context, detailed knowledge of the CHO cell transcriptome might help to improve biotechnological processes conducted by specific cell lines. Nevertheless, very few assembled cDNA sequences of CHO cells were publicly released until recently, which puts a severe limitation on biotechnological research. Two extended annotation systems and web-based tools, one for browsing eukaryotic genomes (GenDBE) and one for viewing eukaryotic transcriptomes (SAMS), were established as the first step towards a publicly usable CHO cell genome/transcriptome analysis platform. This is complemented by the development of a new strategy to assemble the ca. 100 million reads, sequenced from a broad range of diverse transcripts, to a high quality CHO cell transcript set. The cDNA libraries were constructed from different CHO cell lines grown under various culture conditions and sequenced using Roche/454 and Illumina sequencing technologies in addition to sequencing reads from a previous study. Two pipelines to extend and improve the CHO cell line transcripts were established. First, de novo assemblies were carried out with the Trinity and Oases assemblers, using varying k-mer sizes. The resulting contigs were screened for potential CDS using ESTScan. Redundant contigs were filtered out using cd-hit-est. The remaining CDS contigs were re-assembled with CAP3. Second, a reference-based assembly with the TopHat/Cufflinks pipeline was performed, using the recently published draft genome sequence of CHO-K1 as reference. Additionally, the de novo contigs were mapped to the reference genome using GMAP and merged with the Cufflinks assembly using the cuffmerge software. With this approach 28,874 transcripts located on 16,492 gene loci could be assembled. Combining the results of both approaches, 65,561 transcripts were identified for CHO cell lines, which could be clustered by sequence identity into 17,598 gene clusters.
format	article
author	Oliver Rupp Jennifer Becker Karina Brinkrolf Christina Timmermann Nicole Borth Alfred Pühler Thomas Noll Alexander Goesmann
author_facet	Oliver Rupp Jennifer Becker Karina Brinkrolf Christina Timmermann Nicole Borth Alfred Pühler Thomas Noll Alexander Goesmann
author_sort	Oliver Rupp
title	Construction of a public CHO cell line transcript database using versatile bioinformatics analysis pipelines.
title_short	Construction of a public CHO cell line transcript database using versatile bioinformatics analysis pipelines.
title_full	Construction of a public CHO cell line transcript database using versatile bioinformatics analysis pipelines.
title_fullStr	Construction of a public CHO cell line transcript database using versatile bioinformatics analysis pipelines.
title_full_unstemmed	Construction of a public CHO cell line transcript database using versatile bioinformatics analysis pipelines.
title_sort	construction of a public cho cell line transcript database using versatile bioinformatics analysis pipelines.
publisher	Public Library of Science (PLoS)
publishDate	2014
url	https://doaj.org/article/eae2d8a145fb40729561ea44d3d1e62b
work_keys_str_mv	AT oliverrupp constructionofapublicchocelllinetranscriptdatabaseusingversatilebioinformaticsanalysispipelines AT jenniferbecker constructionofapublicchocelllinetranscriptdatabaseusingversatilebioinformaticsanalysispipelines AT karinabrinkrolf constructionofapublicchocelllinetranscriptdatabaseusingversatilebioinformaticsanalysispipelines AT christinatimmermann constructionofapublicchocelllinetranscriptdatabaseusingversatilebioinformaticsanalysispipelines AT nicoleborth constructionofapublicchocelllinetranscriptdatabaseusingversatilebioinformaticsanalysispipelines AT alfredpuhler constructionofapublicchocelllinetranscriptdatabaseusingversatilebioinformaticsanalysispipelines AT thomasnoll constructionofapublicchocelllinetranscriptdatabaseusingversatilebioinformaticsanalysispipelines AT alexandergoesmann constructionofapublicchocelllinetranscriptdatabaseusingversatilebioinformaticsanalysispipelines
_version_	1718421534747394048

Construction of a public CHO cell line transcript database using versatile bioinformatics analysis pipelines.

Documents similaires