Pangenomes: Developing algorithms and linking to phenotypes

PI: Brendan Mumey (MSU); Joann Mudge (NCGR)
Co-PIs: Thiru Ramaraj (DePaul University); Indika Kahanda (MSU)

The current trajectory of next generation sequencing improvements, including falling costs and increased read lengths and throughput, ensures that multiple genomes per species will be routine within the next decade. This project initiates work on a next generation of bioinformatics software that can exploit the increased information content available from multiple accessions and intelligently use the data for unbiased, species-wide analyses.

We develop pangenomic software algorithms and tools that can scale to complex eukaryotic organisms. These tools allow researchers to study large numbers of genome sequences from a single species to understand the genomic regions responsible for phenotypic adaptions such as the ability to adapt to different environments. Each individual's genomic sequence corresponds to path in a graph data structure called a De Bruijn graph, which are large and tangled and can have millions of nodes and edges. Our tool finds hotspots or frequented regions (FRs) in De Bruijn graphs representing regions shared across individuals, as well as regions that aren't frequented (unique to an individual).

We are developing software and machine learning techniques that can automatically filter shared and unique regions in a pangenome to identify the most interesting candidate regions. These tools will help researchers to discover regions that are conserved across evolutionary space, regions that are novel, regions that have diverged due to positive selection, and regions coding for phenotypic differences across the population.

Algorithms and software tools are available at github.com/abi-pangenomics.

Publications

S. Hokin, A. Cleary and J. Mudge.
Disease association with frequented regions of genotype graphs
medRxiv, 2020
Pan-genomic analysis of complex human diseases Pangenomic Algorithms
S. Hokin and A. Cleary.
Disease Classification with Pan-Genome Frequented Regions and Machine Learning
Gordon Research Conference, 2019
Pan-genomic analysis of complex human diseases Pangenomic Algorithms
A. Cleary, T. Ramaraj, I. Kahanda, J. Mudge and B. Mumey.
Exploring frequented regions in pan-genomic graphs
IEEE/ACM transactions on computational biology and bioinformatics, 2018, DOI 10.1109/tcbb.2018.2864564
Pan-genomic analysis of complex human diseases Pangenomic Algorithms

Pangenomes: Developing algorithms and linking to phenotypes

Publications

About NCGR

Contact

Pangenomes: Developing algorithms and linking to phenotypes

Publications

About NCGR

Contact

Privacy Policy

OUR PLEDGE TO YOU

UPDATES AND CHANGES TO PRIVACY POLICY

HOW AND WHY WE GATHER INFORMATION

EMAIL COMMUNICATIONS

COOKIES AND BEACONS

HOW INFORMATION HELPS BOTH YOU AND US

SOCIAL MEDIA FEATURES

HOW AND WHY OUR COMPANY DISCLOSES YOUR INFORMATION TO THIRD PARTIES

OUR PRIVACY POLICY DOES NOT APPLY TO THIRD-PARTY ACTIVITIES OR SITES

KEEPING INFORMATION SAFE

OPT-OUT POLICY

REVIEWING, CHANGING OR CORRECTING INFORMATION

OUR COMMITMENT TO CHILDREN'S PRIVACY

INTERNATIONAL VISITORS

APPLICABLE LAW AND JURISDICTION

SECURITY

Contact Information

Effective November 4, 2019

Terms of Use

RULES AND RESTRICTIONS ON SUBMISSIONS

OWNER'S RIGHT TO MONITOR AND ADMINISTER THE WEBSITE

ACCOUNTS AND SECURITY

LINKED SITES

OWNERSHIP OF THE WEBSITE AND ITS CONTENTS AND ASSOCIATED TRADEMARKS

DISCLAIMERS

LIMITATIONS OF LIABILITY

REPRESENTATIONS BY USERS; INDEMNIFICATION

CHOICE OF LAW AND CONSENT TO FORUM

RESTRICTION, SUSPENSION AND TERMINATION

PROCEDURES FOR REQUESTING THE REMOVAL OF INFRINGING MATERIAL

HOW TO CONTACT OWNER