Compression of Pangenomic DNA Sequence Data

DNA is being sequenced at an unprecedented pace and scale. Consequently, there are two problems that afflict numerous fields across science and industry: a lack of methods for efficiently storing and transmitting large collections of DNA sequence data, and a lack of representations that facilitate their analysis at scale. The prior increasingly burdens researchers with monetary and time costs, while the latter stymies innovation and insight in fields ranging from agriculture to medicine. This project aims to address both problems simultaneously by leveraging the proliferation of collections of DNA sequence data composed of a single species, or pangenomes, to develop new methods for the compression of DNA sequence data that enable analysis directly on the compressed collections. The primary goal of this project is to provide a software toolkit that researchers can use for the compression and analysis of large collections of DNA sequence data, including updating the compressed collections over time as new DNA sequence data are generated and existing data are revised.

Publications

There are currently no publications listed.

About NCGR

The National Center for Genome Resources is a not-for-profit research institute that innovates, collaborates, and educates in the field of genomic data science. As leaders in DNA sequence analysis, we partner with government, industry, and academia to drive biological discovery in all kingdoms of life. We deliver value through expertise in experimental design, software, computation, data integration and training a skilled workforce.

More

Contact

© 2021 National Center for Genome Resources. Privacy Policy | Terms of Use