Fast and accurate long-range phasing in a UK Biobank cohort

Nat Genet. 2016 Jul;48(7):811-6. doi: 10.1038/ng.3571. Epub 2016 Jun 6.

Abstract

Recent work has leveraged the extensive genotyping of the Icelandic population to perform long-range phasing (LRP), enabling accurate imputation and association analysis of rare variants in target samples typed on genotyping arrays. Here we develop a fast and accurate LRP method, Eagle, that extends this paradigm to populations with much smaller proportions of genotyped samples by harnessing long (>4-cM) identical-by-descent (IBD) tracts shared among distantly related individuals. We applied Eagle to N ≈ 150,000 samples (0.2% of the British population) from the UK Biobank, and we determined that it is 1-2 orders of magnitude faster than existing methods while achieving similar or better phasing accuracy (switch error rate ≈ 0.3%, corresponding to perfect phase in a majority of 10-Mb segments). We also observed that, when used within an imputation pipeline, Eagle prephasing improved downstream imputation accuracy in comparison to prephasing in batches using existing methods, as necessary to achieve comparable computational cost.

MeSH terms

  • Algorithms*
  • Biological Specimen Banks*
  • Cohort Studies
  • Computational Biology / methods*
  • Genetics, Population*
  • Genome, Human
  • Genomics
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Humans
  • Inheritance Patterns / genetics*
  • Polymorphism, Single Nucleotide / genetics
  • Sequence Analysis, DNA / methods
  • United Kingdom
  • White People