With the development of long-read sequencing technology and improved computational methods, it is now possible for small teams of researchers to create reference-quality genome assemblies which will lead to a better understanding of the full spectrum of human genetic variation.
Using sequencing data from the Human Pangenome Reference Consortium, we have created a reference-quality assembly from an Ashkenazi individual and another from a Puerto Rican individual – both of which are more contiguous than the current GRCh38 reference genome. For a genome to function as an effective reference, it also needs to be accurately annotated. For this, we developed Liftoff which is a lift-over tool specifically designed for gene annotations. With Liftoff, we were able to map more than 99% of human protein-coding and non-coding genes onto both assemblies.
- The importance of developing a diverse set of human reference genomes
- Computational methods for assembling high-quality genomes using long-read sequencing data
- Computational methods for mapping annotations onto new reference genomes