FASTAFS: file system virtualisation of random access compressed FASTA files

Abstract Background The FASTA file format, used to store polymeric sequence data, has become a bioinformatics file standard used for decades. The relatively large files require additional files, beyond the scope of the original format, to identify sequences and to provide random access. Multiple com...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Youri Hoogstrate, Guido W. Jenster, Harmen J. G. van de Werken
Formato: article
Lenguaje:EN
Publicado: BMC 2021
Materias:
Acceso en línea:https://doaj.org/article/fbb2aed2325b447aa3f461d033a7c2f8
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:fbb2aed2325b447aa3f461d033a7c2f8
record_format dspace
spelling oai:doaj.org-article:fbb2aed2325b447aa3f461d033a7c2f82021-11-07T12:22:09ZFASTAFS: file system virtualisation of random access compressed FASTA files10.1186/s12859-021-04455-31471-2105https://doaj.org/article/fbb2aed2325b447aa3f461d033a7c2f82021-11-01T00:00:00Zhttps://doi.org/10.1186/s12859-021-04455-3https://doaj.org/toc/1471-2105Abstract Background The FASTA file format, used to store polymeric sequence data, has become a bioinformatics file standard used for decades. The relatively large files require additional files, beyond the scope of the original format, to identify sequences and to provide random access. Multiple compressors have been developed to archive FASTA files back and forth, but these lack direct access to targeted content or metadata of the archive. Moreover, these solutions are not directly backwards compatible to FASTA files, resulting in limited software integration. Results We designed a linux based toolkit that virtualises the content of DNA, RNA and protein FASTA archives into the filesystem by using filesystem in userspace. This guarantees in-sync virtualised metadata files and offers fast random-access decompression using bit encodings plus Zstandard (zstd). The toolkit, FASTAFS, can track all its system-wide running instances, allows file integrity verification and can provide, instantly, scriptable access to sequence files and is easy to use and deploy. The file compression ratios were comparable but not superior to other state of the art archival tools, despite the innovative random access feature implemented in FASTAFS. Conclusions FASTAFS is a user-friendly and easy to deploy backwards compatible generic purpose solution to store and access compressed FASTA files, since it offers file system access to FASTA files as well as in-sync metadata files through file virtualisation. Using virtual filesystems as in-between layer offers format conversion without the need to rewrite code into different programming languages while preserving compatibility.Youri HoogstrateGuido W. JensterHarmen J. G. van de WerkenBMCarticleFASTAFASTAFSIntegrityFUSEZstdMetadataComputer applications to medicine. Medical informaticsR858-859.7Biology (General)QH301-705.5ENBMC Bioinformatics, Vol 22, Iss 1, Pp 1-12 (2021)
institution DOAJ
collection DOAJ
language EN
topic FASTA
FASTAFS
Integrity
FUSE
Zstd
Metadata
Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5
spellingShingle FASTA
FASTAFS
Integrity
FUSE
Zstd
Metadata
Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5
Youri Hoogstrate
Guido W. Jenster
Harmen J. G. van de Werken
FASTAFS: file system virtualisation of random access compressed FASTA files
description Abstract Background The FASTA file format, used to store polymeric sequence data, has become a bioinformatics file standard used for decades. The relatively large files require additional files, beyond the scope of the original format, to identify sequences and to provide random access. Multiple compressors have been developed to archive FASTA files back and forth, but these lack direct access to targeted content or metadata of the archive. Moreover, these solutions are not directly backwards compatible to FASTA files, resulting in limited software integration. Results We designed a linux based toolkit that virtualises the content of DNA, RNA and protein FASTA archives into the filesystem by using filesystem in userspace. This guarantees in-sync virtualised metadata files and offers fast random-access decompression using bit encodings plus Zstandard (zstd). The toolkit, FASTAFS, can track all its system-wide running instances, allows file integrity verification and can provide, instantly, scriptable access to sequence files and is easy to use and deploy. The file compression ratios were comparable but not superior to other state of the art archival tools, despite the innovative random access feature implemented in FASTAFS. Conclusions FASTAFS is a user-friendly and easy to deploy backwards compatible generic purpose solution to store and access compressed FASTA files, since it offers file system access to FASTA files as well as in-sync metadata files through file virtualisation. Using virtual filesystems as in-between layer offers format conversion without the need to rewrite code into different programming languages while preserving compatibility.
format article
author Youri Hoogstrate
Guido W. Jenster
Harmen J. G. van de Werken
author_facet Youri Hoogstrate
Guido W. Jenster
Harmen J. G. van de Werken
author_sort Youri Hoogstrate
title FASTAFS: file system virtualisation of random access compressed FASTA files
title_short FASTAFS: file system virtualisation of random access compressed FASTA files
title_full FASTAFS: file system virtualisation of random access compressed FASTA files
title_fullStr FASTAFS: file system virtualisation of random access compressed FASTA files
title_full_unstemmed FASTAFS: file system virtualisation of random access compressed FASTA files
title_sort fastafs: file system virtualisation of random access compressed fasta files
publisher BMC
publishDate 2021
url https://doaj.org/article/fbb2aed2325b447aa3f461d033a7c2f8
work_keys_str_mv AT yourihoogstrate fastafsfilesystemvirtualisationofrandomaccesscompressedfastafiles
AT guidowjenster fastafsfilesystemvirtualisationofrandomaccesscompressedfastafiles
AT harmenjgvandewerken fastafsfilesystemvirtualisationofrandomaccesscompressedfastafiles
_version_ 1718443524458807296