Haplotype phasing

We develop computational tools to solve statistical and algorithmic challenges in quantitative genetics.

We are based in the Division of Genetics and Center for Data Sciences at Brigham and Women's Hospital / Harvard Medical School. We are affiliated with the Program in Medical and Population Genetics at the Broad Institute.

Our work is generously supported by an NIH Director's New Innovator Award, a Burroughs Wellcome Fund Career Award at the Scientific Interface, and a Broad Institute Next Generation Fund award, and we are grateful for past support from a Glenn Foundation for Medical Research and AFAR Grant for Junior Faculty and a Sloan Research Fellowship.

Latest News

Two platform talks at ASHG 2022

August 31, 2022

We're very excited to share our latest work at ASHG this October! Ronen Mukamel and Margaux Hujoel will present platform talks describing strong associations of structural variants with heritable traits and diseases that were revealed by statistical haplotype-sharing models. Ronen and Margaux were also selected as finalists for the Charles J. Epstein Trainee Awards -- congratulations!

Ronen Mukamel: "Repeat polymorphisms underlie top genetic risk loci for glaucoma and colorectal cancer" (platform talk, Wed 10/26 at 1:45pm, #206)

Margaux Hujoel: "...

Read more about Two platform talks at ASHG 2022

Paper on cancer mutation modeling published in Nature Biotechnology

June 20, 2022
Maxwell Sherman's paper on modeling somatic mutation rates to uncover cancer drivers (Sherman*, Yaari*, Priebe* et al. 2022 Nat Biotech) is now published -- congratulations, Max! This work, a collaboration with Bonnie Berger's group at MIT, developed a deep-learning model to predict cancer-specific neutral mutation rates at kilobase-scale resolution from epigenomic annotations. Applying this model to the Pan-Cancer Analysis of Whole Genomes (PCAWG) resource... Read more about Paper on cancer mutation modeling published in Nature Biotechnology

Paper on spectrum of recessiveness among Mendelian disease variants published in AJHG

May 31, 2022
Alison Barton's paper on mitigated phenotypes observed in carriers of recessive disease variants (Barton et al. 2022 AJHG) is now published -- congratulations, Alison! This work leveraged whole-exome sequencing together with imputation in UK Biobank to identify carrier effects of rare variants known to cause recessive Mendelian diseases in homozygotes. These analyses identified 103 significant associations between quantitative traits and carrier status for 35... Read more about Paper on spectrum of recessiveness among Mendelian disease variants published in AJHG

Alison Barton receives her PhD

May 27, 2022
Alison Barton has graduated from the Harvard Medical School Bioinformatics and Integrative Genomics (BIG) PhD program and will be moving on to a postdoc in population genetics with David Reich. Congratulations, Alison!

Talk on haplotype-informed CNV analysis at ProbGen 2022

March 16, 2022
At the 2022 Probabilistic Modeling in Genomics (ProbGen) conference, Margaux Hujoel will be presenting her work on haplotype-informed CNV detection and subsequent association and fine-mapping analysis in UK Biobank: "Influences of rare copy number variation on human complex traits" (Mon Mar 28).

Po-Ru Loh receives 2022 ISCB Overton Prize

February 19, 2022
Po-Ru Loh has been awarded the International Society for Computational Biology's Overton Prize for outstanding accomplishment by an early to mid-career scientist in the field of computational biology. A big thank-you to all of the mentors, collaborators, and trainees who contributed to the work recognized by this award! Po-Ru will be accepting the award and presenting a keynote talk at the ISMB 2022 conference in July.

Recent Publications

Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets

Márquez-Luna C, Gazal S, Loh P-R, Kim SS, Furlotte N, Auton A, Auton A, Price AL. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Nat Commun 2021;12(1):6052.Abstract
Polygenic risk prediction is a widely investigated topic because of its promising clinical applications. Genetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, including coding, conserved, regulatory, and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank (avg N = 373 K as training data). LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R2 = 0.144; highest R2 = 0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (N = 1107 K) increased prediction R2 to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.
Read more

GIGYF1 loss of function is associated with clonal mosaicism and adverse metabolic health

Zhao Y, Stankovic S, Koprulu M, Wheeler E, Day FR, Lango Allen H, Kerrison ND, Pietzner M, Loh P-R, Wareham NJ, Langenberg C, Ong KK, Perry JRB. GIGYF1 loss of function is associated with clonal mosaicism and adverse metabolic health. Nat Commun 2021;12(1):4178.Abstract
Mosaic loss of chromosome Y (LOY) in leukocytes is the most common form of clonal mosaicism, caused by dysregulation in cell-cycle and DNA damage response pathways. Previous genetic studies have focussed on identifying common variants associated with LOY, which we now extend to rarer, protein-coding variation using exome sequences from 82,277 male UK Biobank participants. We find that loss of function of two genes-CHEK2 and GIGYF1-reach exome-wide significance. Rare alleles in GIGYF1 have not previously been implicated in any complex trait, but here loss-of-function carriers exhibit six-fold higher susceptibility to LOY (OR = 5.99 [3.04-11.81], p = 1.3 × 10-10). These same alleles are also associated with adverse metabolic health, including higher susceptibility to Type 2 Diabetes (OR = 6.10 [3.51-10.61], p = 1.8 × 10-12), 4 kg higher fat mass (p = 1.3 × 10-4), 2.32 nmol/L lower serum IGF1 levels (p = 1.5 × 10-4) and 4.5 kg lower handgrip strength (p = 4.7 × 10-7) consistent with proposed GIGYF1 enhancement of insulin and IGF-1 receptor signalling. These associations are mirrored by a common variant nearby associated with the expression of GIGYF1. Our observations highlight a potential direct connection between clonal mosaicism and metabolic health.
Read more

Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses

Barton AR, Sherman MA, Mukamel RE, Loh P-R. Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses. Nat Genet 2021;53(8):1260-1269.Abstract
Exome association studies to date have generally been underpowered to systematically evaluate the phenotypic impact of very rare coding variants. We leveraged extensive haplotype sharing between 49,960 exome-sequenced UK Biobank participants and the remainder of the cohort (total n ≈ 500,000) to impute exome-wide variants with accuracy R2 > 0.5 down to minor allele frequency (MAF) ~0.00005. Association and fine-mapping analyses of 54 quantitative traits identified 1,189 significant associations (P < 5 × 10-8) involving 675 distinct rare protein-altering variants (MAF < 0.01) that passed stringent filters for likely causality. Across all traits, 49% of associations (578/1,189) occurred in genes with two or more hits; follow-up analyses of these genes identified allelic series containing up to 45 distinct 'likely-causal' variants. Our results demonstrate the utility of within-cohort imputation in population-scale genome-wide association studies, provide a catalog of likely-causal, large-effect coding variant associations and foreshadow the insights that will be revealed as genetic biobank studies continue to grow.
Read more

Hematopoietic mosaic chromosomal alterations increase the risk for diverse types of infection

Zekavat SM, Lin S-H, Bick AG, Liu A, Paruchuri K, Wang C, Uddin MM, Ye Y, Yu Z, Liu X, Kamatani Y, Bhattacharya R, Pirruccello JP, Pampana A, Loh P-R, Kohli P, McCarroll SA, Kiryluk K, Neale B, Ionita-Laza I, Engels EA, Brown DW, Smoller JW, Green R, Karlson EW, Lebo M, Ellinor PT, Weiss ST, Daly MJ, Daly MJ, Daly MJ, Terao C, Zhao H, Ebert BL, Reilly MP, Ganna A, Machiela MJ, Genovese G, Natarajan P. Hematopoietic mosaic chromosomal alterations increase the risk for diverse types of infection. Nat Med 2021;27(6):1012-1024.Abstract
Age is the dominant risk factor for infectious diseases, but the mechanisms linking age to infectious disease risk are incompletely understood. Age-related mosaic chromosomal alterations (mCAs) detected from genotyping of blood-derived DNA, are structural somatic variants indicative of clonal hematopoiesis, and are associated with aberrant leukocyte cell counts, hematological malignancy, and mortality. Here, we show that mCAs predispose to diverse types of infections. We analyzed mCAs from 768,762 individuals without hematological cancer at the time of DNA acquisition across five biobanks. Expanded autosomal mCAs were associated with diverse incident infections (hazard ratio (HR) 1.25; 95% confidence interval (CI) = 1.15-1.36; P = 1.8 × 10-7), including sepsis (HR 2.68; 95% CI = 2.25-3.19; P = 3.1 × 10-28), pneumonia (HR 1.76; 95% CI = 1.53-2.03; P = 2.3 × 10-15), digestive system infections (HR 1.51; 95% CI = 1.32-1.73; P = 2.2 × 10-9) and genitourinary infections (HR 1.25; 95% CI = 1.11-1.41; P = 3.7 × 10-4). A genome-wide association study of expanded mCAs identified 63 loci, which were enriched at transcriptional regulatory sites for immune cells. These results suggest that mCAs are a marker of impaired immunity and confer increased predisposition to infections.
Read more

A model and test for coordinated polygenic epistasis in complex traits

Sheppard B, Rappoport N, Loh P-R, Sanders SJ, Zaitlen N, Dahl A. A model and test for coordinated polygenic epistasis in complex traits. Proc Natl Acad Sci U S A 2021;118(15):e1922305118.Abstract
Interactions between genetic variants-epistasis-is pervasive in model systems and can profoundly impact evolutionary adaption, population disease dynamics, genetic mapping, and precision medicine efforts. In this work, we develop a model for structured polygenic epistasis, called coordinated epistasis (CE), and prove that several recent theories of genetic architecture fall under the formal umbrella of CE. Unlike standard epistasis models that assume epistasis and main effects are independent, CE captures systematic correlations between epistasis and main effects that result from pathway-level epistasis, on balance skewing the penetrance of genetic effects. To test for the existence of CE, we propose the even-odd (EO) test and prove it is calibrated in a range of realistic biological models. Applying the EO test in the UK Biobank, we find evidence of CE in 18 of 26 traits spanning disease, anthropometric, and blood categories. Finally, we extend the EO test to tissue-specific enrichment and identify several plausible tissue-trait pairs. Overall, CE is a dimension of genetic architecture that can capture structured, systemic forms of epistasis in complex human traits.
Read more