Trellis for efficient data and task management in the VA Million Veteran Program

Abstract Biomedical studies have become larger in size and yielded large quantities of data, yet efficient data processing remains a challenge. Here we present Trellis, a cloud-based data and task management framework that completely automates the process from data ingestion to result presentation,...

Description complète

Enregistré dans:
Détails bibliographiques
Auteurs principaux: Paul Billing Ross, Jina Song, Philip S. Tsao, Cuiping Pan
Format: article
Langue:EN
Publié: Nature Portfolio 2021
Sujets:
R
Q
Accès en ligne:https://doaj.org/article/977e68558af542b39e3e97a8f93a32b7
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
Description
Résumé:Abstract Biomedical studies have become larger in size and yielded large quantities of data, yet efficient data processing remains a challenge. Here we present Trellis, a cloud-based data and task management framework that completely automates the process from data ingestion to result presentation, while tracking data lineage, facilitating information query, and supporting fault-tolerance and scalability. Using a graph database to coordinate the state of the data processing workflows and a scalable microservice architecture to perform bioinformatics tasks, Trellis has enabled efficient variant calling on 100,000 human genomes collected in the VA Million Veteran Program.