Haplotype phasing

We develop computational tools to solve statistical and algorithmic challenges in quantitative genetics.

We are based in the Division of Genetics and Center for Data Sciences at Brigham and Women's Hospital / Harvard Medical School. We are affiliated with the Program in Medical and Population Genetics at the Broad Institute.

Our work is generously supported by an NIH Director's New Innovator Award, a Burroughs Wellcome Fund Career Award at the Scientific Interface, and a Broad Institute Next Generation Fund award, and we are grateful for past support from a Glenn Foundation for Medical Research and AFAR Grant for Junior Faculty and a Sloan Research Fellowship.

Latest News

Protein-coding variable number tandem repeat (VNTR) paper published in Science

September 23, 2021
Ronen Mukamel and Bob Handsaker's paper on phenotypic effects of protein-coding variable-number-of-tandem repeat (VNTR) polymorphisms (Mukamel*, Handsaker* et al. 2021 Science) is now published -- congratulations, Ronen and Bob! This exciting collaboration with Steve McCarroll's lab found that some of the largest effects of common genetic variants on human phenotypes (including height, biomarkers of health, and hair morphology) arise... Read more about Protein-coding variable number tandem repeat (VNTR) paper published in Science

Three talks and a poster talk at ASHG 2021

August 23, 2021

We're very excited to share our ongoing work at ASHG this October! Alison Barton and Margaux Hujoel will present platform talks on penetrance of disease variants and CNV associations in UK Biobank, Maxwell Sherman will present a plenary talk on somatic mutations in cancer, and Ronen Mukamel will present a poster talk on dissecting Lp(a) genetics. Alison, Margaux, and Max all received semifinalist Charles J. Epstein Trainee Awards -- congratulations!

Alison Barton: "Incomplete penetrance of disease variants in the UK Biobank" (platform talk, Wed 10/20 at 11:15am)...

Read more about Three talks and a poster talk at ASHG 2021

New preprint on learning patterns of somatic mutation in cancer

August 4, 2021
We are excited to share a new preprint, "Learning the mutational landscape of the cancer genome" (Sherman*, Yaari*, Priebe* et al.). This work, a collaboration with Bonnie Berger's group at MIT, developed a deep-learning model to predict cancer-specific neutral mutation rates at kilobase-scale resolution from epigenomic annotations. Applying this model to the Pan-Cancer Analysis of Whole Genomes (PCAWG) resource identified potential new driver mutations in understudied... Read more about New preprint on learning patterns of somatic mutation in cancer

Whole-exome imputation paper published in Nature Genetics

July 5, 2021
Alison Barton's paper on whole-exome imputation and subsequent association and fine-mapping analyses in UK Biobank (Barton et al. 2021 Nat Genet) is now published -- congratulations, Alison! Imputation is a statistical approach that leverages genetic data from a reference panel to enable analysis of genetic variants that are not directly measured in a cohort, thereby expanding the utility of existing data sets without incurring additional cost.... Read more about Whole-exome imputation paper published in Nature Genetics

Two talks and two posters at ProbGen 2021

April 16, 2021
Our lab attended the Probabilistic Modeling in Genomics (ProbGen) 2021 virtual conference. Alison Barton and Maxwell Sherman spoke about their work on whole-exome imputation in UK Biobank (Alison) and deep-learning neutral somatic mutation rates in cancers (Max), and Margaux Hujoel and Ronen Mukamel presented posters on genotyping and association analysis of copy-number variants (Margaux) and variable number tandem repeats (Ronen).

New preprint on large-effect protein-coding repeat polymorphisms

January 20, 2021
We are excited to share a new preprint, "Protein-coding repeat polymorphisms strongly shape diverse human phenotypes" (Mukamel*, Handsaker* et al.), which finds that some of the largest effects of common genetic variants on human phenotypes arise from variable-number-of-tandem-repeat (VNTR) variation unseen by the analytical approaches used in large-scale human genetic studies. This exciting collaboration with Bob Handsaker and Steve McCarroll leveraged the initial... Read more about New preprint on large-effect protein-coding repeat polymorphisms
More

Recent Publications

GIGYF1 loss of function is associated with clonal mosaicism and adverse metabolic health

Zhao Y, Stankovic S, Koprulu M, Wheeler E, Day FR, Lango Allen H, Kerrison ND, Pietzner M, Loh P-R, Wareham NJ, Langenberg C, Ong KK, Perry JRB. GIGYF1 loss of function is associated with clonal mosaicism and adverse metabolic health. Nat Commun 2021;12(1):4178.Abstract
Mosaic loss of chromosome Y (LOY) in leukocytes is the most common form of clonal mosaicism, caused by dysregulation in cell-cycle and DNA damage response pathways. Previous genetic studies have focussed on identifying common variants associated with LOY, which we now extend to rarer, protein-coding variation using exome sequences from 82,277 male UK Biobank participants. We find that loss of function of two genes-CHEK2 and GIGYF1-reach exome-wide significance. Rare alleles in GIGYF1 have not previously been implicated in any complex trait, but here loss-of-function carriers exhibit six-fold higher susceptibility to LOY (OR = 5.99 [3.04-11.81], p = 1.3 × 10-10). These same alleles are also associated with adverse metabolic health, including higher susceptibility to Type 2 Diabetes (OR = 6.10 [3.51-10.61], p = 1.8 × 10-12), 4 kg higher fat mass (p = 1.3 × 10-4), 2.32 nmol/L lower serum IGF1 levels (p = 1.5 × 10-4) and 4.5 kg lower handgrip strength (p = 4.7 × 10-7) consistent with proposed GIGYF1 enhancement of insulin and IGF-1 receptor signalling. These associations are mirrored by a common variant nearby associated with the expression of GIGYF1. Our observations highlight a potential direct connection between clonal mosaicism and metabolic health.
Read more

Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses

Barton AR, Sherman MA, Mukamel RE, Loh P-R. Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses. Nat Genet 2021;53(8):1260-1269.Abstract
Exome association studies to date have generally been underpowered to systematically evaluate the phenotypic impact of very rare coding variants. We leveraged extensive haplotype sharing between 49,960 exome-sequenced UK Biobank participants and the remainder of the cohort (total n ≈ 500,000) to impute exome-wide variants with accuracy R2 > 0.5 down to minor allele frequency (MAF) ~0.00005. Association and fine-mapping analyses of 54 quantitative traits identified 1,189 significant associations (P < 5 × 10-8) involving 675 distinct rare protein-altering variants (MAF < 0.01) that passed stringent filters for likely causality. Across all traits, 49% of associations (578/1,189) occurred in genes with two or more hits; follow-up analyses of these genes identified allelic series containing up to 45 distinct 'likely-causal' variants. Our results demonstrate the utility of within-cohort imputation in population-scale genome-wide association studies, provide a catalog of likely-causal, large-effect coding variant associations and foreshadow the insights that will be revealed as genetic biobank studies continue to grow.
Read more

Hematopoietic mosaic chromosomal alterations increase the risk for diverse types of infection

Zekavat SM, Lin S-H, Bick AG, Liu A, Paruchuri K, Wang C, Uddin MM, Ye Y, Yu Z, Liu X, Kamatani Y, Bhattacharya R, Pirruccello JP, Pampana A, Loh P-R, Kohli P, McCarroll SA, Kiryluk K, Neale B, Ionita-Laza I, Engels EA, Brown DW, Smoller JW, Green R, Karlson EW, Lebo M, Ellinor PT, Weiss ST, Daly MJ, Daly MJ, Daly MJ, Terao C, Zhao H, Ebert BL, Reilly MP, Ganna A, Machiela MJ, Genovese G, Natarajan P. Hematopoietic mosaic chromosomal alterations increase the risk for diverse types of infection. Nat Med 2021;27(6):1012-1024.Abstract
Age is the dominant risk factor for infectious diseases, but the mechanisms linking age to infectious disease risk are incompletely understood. Age-related mosaic chromosomal alterations (mCAs) detected from genotyping of blood-derived DNA, are structural somatic variants indicative of clonal hematopoiesis, and are associated with aberrant leukocyte cell counts, hematological malignancy, and mortality. Here, we show that mCAs predispose to diverse types of infections. We analyzed mCAs from 768,762 individuals without hematological cancer at the time of DNA acquisition across five biobanks. Expanded autosomal mCAs were associated with diverse incident infections (hazard ratio (HR) 1.25; 95% confidence interval (CI) = 1.15-1.36; P = 1.8 × 10-7), including sepsis (HR 2.68; 95% CI = 2.25-3.19; P = 3.1 × 10-28), pneumonia (HR 1.76; 95% CI = 1.53-2.03; P = 2.3 × 10-15), digestive system infections (HR 1.51; 95% CI = 1.32-1.73; P = 2.2 × 10-9) and genitourinary infections (HR 1.25; 95% CI = 1.11-1.41; P = 3.7 × 10-4). A genome-wide association study of expanded mCAs identified 63 loci, which were enriched at transcriptional regulatory sites for immune cells. These results suggest that mCAs are a marker of impaired immunity and confer increased predisposition to infections.
Read more

A model and test for coordinated polygenic epistasis in complex traits

Sheppard B, Rappoport N, Loh P-R, Sanders SJ, Zaitlen N, Dahl A. A model and test for coordinated polygenic epistasis in complex traits. Proc Natl Acad Sci U S A 2021;118(15)Abstract
Interactions between genetic variants-epistasis-is pervasive in model systems and can profoundly impact evolutionary adaption, population disease dynamics, genetic mapping, and precision medicine efforts. In this work, we develop a model for structured polygenic epistasis, called coordinated epistasis (CE), and prove that several recent theories of genetic architecture fall under the formal umbrella of CE. Unlike standard epistasis models that assume epistasis and main effects are independent, CE captures systematic correlations between epistasis and main effects that result from pathway-level epistasis, on balance skewing the penetrance of genetic effects. To test for the existence of CE, we propose the even-odd (EO) test and prove it is calibrated in a range of realistic biological models. Applying the EO test in the UK Biobank, we find evidence of CE in 18 of 26 traits spanning disease, anthropometric, and blood categories. Finally, we extend the EO test to tissue-specific enrichment and identify several plausible tissue-trait pairs. Overall, CE is a dimension of genetic architecture that can capture structured, systemic forms of epistasis in complex human traits.
Read more

Large mosaic copy number variations confer autism risk

Sherman MA, Rodin RE, Genovese G, Dias C, Barton AR, Mukamel RE, Berger B, Park PJ, Walsh CA, Loh P-R. Large mosaic copy number variations confer autism risk. Nat Neurosci 2021;24(2):197-203.Abstract
Although germline de novo copy number variants (CNVs) are known causes of autism spectrum disorder (ASD), the contribution of mosaic (early-developmental) copy number variants (mCNVs) has not been explored. In this study, we assessed the contribution of mCNVs to ASD by ascertaining mCNVs in genotype array intensity data from 12,077 probands with ASD and 5,500 unaffected siblings. We detected 46 mCNVs in probands and 19 mCNVs in siblings, affecting 2.8-73.8% of cells. Probands carried a significant burden of large (>4-Mb) mCNVs, which were detected in 25 probands but only one sibling (odds ratio = 11.4, 95% confidence interval = 1.5-84.2, P = 7.4 × 10). Event size positively correlated with severity of ASD symptoms (P = 0.016). Surprisingly, we did not observe mosaic analogues of the short de novo CNVs recurrently observed in ASD (eg, 16p11.2). We further experimentally validated two mCNVs in postmortem brain tissue from 59 additional probands. These results indicate that mCNVs contribute a previously unexplained component of ASD risk.
Read more

Genetically predicted telomere length is associated with clonal somatic copy number alterations in peripheral leukocytes

Brown DW, Lin S-H, Loh P-R, Chanock SJ, Savage SA, Machiela MJ. Genetically predicted telomere length is associated with clonal somatic copy number alterations in peripheral leukocytes. PLoS Genet 2020;16(10):e1009078.Abstract
Telomeres are DNA-protein structures at the ends of chromosomes essential in maintaining chromosomal stability. Observational studies have identified associations between telomeres and elevated cancer risk, including hematologic malignancies; but biologic mechanisms relating telomere length to cancer etiology remain unclear. Our study sought to better understand the relationship between telomere length and cancer risk by evaluating genetically-predicted telomere length (gTL) in relation to the presence of clonal somatic copy number alterations (SCNAs) in peripheral blood leukocytes. Genotyping array data were acquired from 431,507 participants in the UK Biobank and used to detect SCNAs from intensity information and infer telomere length using a polygenic risk score (PRS) of variants previously associated with leukocyte telomere length. In total, 15,236 (3.5%) of individuals had a detectable clonal SCNA on an autosomal chromosome. Overall, higher gTL value was positively associated with the presence of an autosomal SCNA (OR = 1.07, 95% CI = 1.05-1.09, P = 1.61×10-15). There was high consistency in effect estimates across strata of chromosomal event location (e.g., telomeric ends, interstitial or whole chromosome event; Phet = 0.37) and strata of copy number state (e.g., gain, loss, or neutral events; Phet = 0.05). Higher gTL value was associated with a greater cellular fraction of clones carrying autosomal SCNAs (β = 0.004, 95% CI = 0.002-0.007, P = 6.61×10-4). Our population-based examination of gTL and SCNAs suggests inherited components of telomere length do not preferentially impact autosomal SCNA event location or copy number status, but rather likely influence cellular replicative potential.
Read more
More