Haplotype phasing

We develop computational tools to solve statistical and algorithmic challenges in quantitative genetics.

We are based in the Division of Genetics and Center for Data Sciences at Brigham and Women's Hospital / Harvard Medical School. We are affiliated with the Program in Medical and Population Genetics at the Broad Institute.

Our work is generously supported by an NIH Director's New Innovator Award, a Burroughs Wellcome Fund Career Award at the Scientific Interfaces, a Glenn Foundation for Medical Research and AFAR Grant for Junior Faculty, a Sloan Research Fellowship, a Broad Institute Next Generation Fund award, and startup funding from the Brigham and Women's Hospital Divisions of Genetics and Cardiovascular Medicine.

Latest News

More

Recent Publications

Genes with High Network Connectivity Are Enriched for Disease Heritability

Kim SS, Dai C, Hormozdiari F, van de Geijn B, Gazal S, Park Y, O'Connor L, Amariuta T, Loh P-R, Finucane H, Raychaudhuri S, Price AL. Genes with High Network Connectivity Are Enriched for Disease Heritability. Am J Hum Genet 2019;104(5):896-913.Abstract
Recent studies have highlighted the role of gene networks in disease biology. To formally assess this, we constructed a broad set of pathway, network, and pathway+network annotations and applied stratified LD score regression to 42 diseases and complex traits (average N = 323K) to identify enriched annotations. First, we analyzed 18,119 biological pathways. We identified 156 pathway-trait pairs whose disease enrichment was statistically significant (FDR < 5%) after conditioning on all genes and 75 known functional annotations (from the baseline-LD model), a stringent step that greatly reduced the number of pathways detected; most significant pathway-trait pairs were previously unreported. Next, for each of four published gene networks, we constructed probabilistic annotations based on network connectivity. For each gene network, the network connectivity annotation was strongly significantly enriched. Surprisingly, the enrichments were fully explained by excess overlap between network annotations and regulatory annotations from the baseline-LD model, validating the informativeness of the baseline-LD model and emphasizing the importance of accounting for regulatory annotations in gene network analyses. Finally, for each of the 156 enriched pathway-trait pairs, for each of the four gene networks, we constructed pathway+network annotations by annotating genes with high network connectivity to the input pathway. For each gene network, these pathway+network annotations were strongly significantly enriched for the corresponding traits. Once again, the enrichments were largely explained by the baseline-LD model. In conclusion, gene network connectivity is highly informative for disease architectures, but the information in gene networks may be subsumed by regulatory annotations, emphasizing the importance of accounting for known annotations.
Read more

Maternal and fetal genetic effects on birth weight and their relevance to cardio-metabolic risk factors

Warrington NM, Beaumont RN, Horikoshi M, Day FR, Helgeland Ø, Laurin C, Bacelis J, Peng S, Hao K, Feenstra B, Wood AR, Mahajan A, Tyrrell J, Robertson NR, Rayner WN, Qiao Z, Moen G-H, Vaudel M, Marsit CJ, Chen J, Nodzenski M, Schnurr TM, Zafarmand MH, Bradfield JP, Grarup N, Kooijman MN, Li-Gao R, Geller F, Ahluwalia TS, Paternoster L, Rueedi R, Huikari V, Hottenga J-J, Lyytikäinen L-P, Cavadino A, Metrustry S, Cousminer DL, Wu Y, Thiering E, Wang CA, Have CT, Vilor-Tejedor N, Joshi PK, Painter JN, Ntalla I, Myhre R, Pitkänen N, van Leeuwen EM, Joro R, Lagou V, Richmond RC, Espinosa A, Barton SJ, Inskip HM, Holloway JW, Santa-Marina L, Estivill X, Ang W, Marsh JA, Reichetzeder C, Marullo L, Hocher B, Lunetta KL, Murabito JM, Relton CL, Kogevinas M, Chatzi L, Allard C, Bouchard L, Hivert M-F, Zhang G, Muglia LJ, Heikkinen J, Morgen CS, van Kampen AHC, van Schaik BDC, Mentch FD, Langenberg C, Luan J'an, Scott RA, Zhao JH, Hemani G, Ring SM, Bennett AJ, Gaulton KJ, Fernandez-Tajes J, van Zuydam NR, Medina-Gomez C, de Haan HG, Rosendaal FR, Kutalik Z, Marques-Vidal P, Das S, Willemsen G, Mbarek H, Müller-Nurasyid M, Standl M, Appel EVR, Fonvig CE, Trier C, van Beijsterveldt CE, Murcia M, Bustamante M, Bonas-Guarch S, Hougaard DM, Mercader JM, Linneberg A, Schraut KE, Lind PA, Medland SE, Shields BM, Knight BA, Chai J-F, Panoutsopoulou K, Bartels M, Sánchez F, Stokholm J, Torrents D, Vinding RK, Willems SM, Atalay M, Chawes BL, Kovacs P, Prokopenko I, Tuke MA, Yaghootkar H, Ruth KS, Jones SE, Loh P-R, .., Ong KK, McCarthy MI, Perry JRB, Evans DM, Freathy RM. Maternal and fetal genetic effects on birth weight and their relevance to cardio-metabolic risk factors. Nat Genet 2019;51(5):804-814.Abstract
Birth weight variation is influenced by fetal and maternal genetic and non-genetic factors, and has been reproducibly associated with future cardio-metabolic health outcomes. In expanded genome-wide association analyses of own birth weight (n = 321,223) and offspring birth weight (n = 230,069 mothers), we identified 190 independent association signals (129 of which are novel). We used structural equation modeling to decompose the contributions of direct fetal and indirect maternal genetic effects, then applied Mendelian randomization to illuminate causal pathways. For example, both indirect maternal and direct fetal genetic effects drive the observational relationship between lower birth weight and higher later blood pressure: maternal blood pressure-raising alleles reduce offspring birth weight, but only direct fetal effects of these alleles, once inherited, increase later offspring blood pressure. Using maternal birth weight-lowering genotypes to proxy for an adverse intrauterine environment provided no evidence that it causally raises offspring blood pressure, indicating that the inverse birth weight-blood pressure association is attributable to genetic effects, and not to intrauterine programming.
Read more

Estimating cross-population genetic correlations of causal effect sizes

Galinsky KJ, A Reshef Y, Finucane HK, Loh P-R, Zaitlen N, Patterson NJ, Brown BC, Price AL. Estimating cross-population genetic correlations of causal effect sizes. Genet Epidemiol 2019;43(2):180-188.Abstract
Recent studies have examined the genetic correlations of single-nucleotide polymorphism (SNP) effect sizes across pairs of populations to better understand the genetic architectures of complex traits. These studies have estimated ρ g , the cross-population correlation of joint-fit effect sizes at genotyped SNPs. However, the value of ρ g depends both on the cross-population correlation of true causal effect sizes ( ρ b ) and on the similarity in linkage disequilibrium (LD) patterns in the two populations, which drive tagging effects. Here, we derive the value of the ratio ρ g / ρ b as a function of LD in each population. By applying existing methods to obtain estimates of ρ g , we can use this ratio to estimate ρ b . Our estimates of ρ b were equal to 0.55 ( SE = 0.14) between Europeans and East Asians averaged across nine traits in the Genetic Epidemiology Research on Adult Health and Aging data set, 0.54 ( SE = 0.18) between Europeans and South Asians averaged across 13 traits in the UK Biobank data set, and 0.48 ( SE = 0.06) and 0.65 ( SE = 0.09) between Europeans and East Asians in summary statistic data sets for type 2 diabetes and rheumatoid arthritis, respectively. These results implicate substantially different causal genetic architectures across continental populations.
Read more

Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection

Schoech AP, Jordan DM, Loh P-R, Gazal S, O'Connor LJ, Balick DJ, Palamara PF, Finucane HK, Sunyaev SR, Price AL. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat Commun 2019;10(1):790.Abstract
Understanding the role of rare variants is important in elucidating the genetic basis of human disease. Negative selection can cause rare variants to have larger per-allele effect sizes than common variants. Here, we develop a method to estimate the minor allele frequency (MAF) dependence of SNP effect sizes. We use a model in which per-allele effect sizes have variance proportional to [p(1 - p)], where p is the MAF and negative values of α imply larger effect sizes for rare variants. We estimate α for 25 UK Biobank diseases and complex traits. All traits produce negative α estimates, with best-fit mean of -0.38 (s.e. 0.02) across traits. Despite larger rare variant effect sizes, rare variants (MAF < 1%) explain less than 10% of total SNP-heritability for most traits analyzed. Using evolutionary modeling and forward simulations, we validate the α model of MAF-dependent trait effects and assess plausible values of relevant evolutionary parameters.
Read more

Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes

Chung W, Chen J, Turman C, Lindstrom S, Zhu Z, Loh P-R, Kraft P, Liang L. Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes. Nat Commun 2019;10(1):569.Abstract
We introduce cross-trait penalized regression (CTPR), a powerful and practical approach for multi-trait polygenic risk prediction in large cohorts. Specifically, we propose a novel cross-trait penalty function with the Lasso and the minimax concave penalty (MCP) to incorporate the shared genetic effects across multiple traits for large-sample GWAS data. Our approach extracts information from the secondary traits that is beneficial for predicting the primary trait based on individual-level genotypes and/or summary statistics. Our novel implementation of a parallel computing algorithm makes it feasible to apply our method to biobank-scale GWAS data. We illustrate our method using large-scale GWAS data (~1M SNPs) from the UK Biobank (N = 456,837). We show that our multi-trait method outperforms the recently proposed multi-trait analysis of GWAS (MTAG) for predictive performance. The prediction accuracy for height by the aid of BMI improves from R = 35.8% (MTAG) to 42.5% (MCP + CTPR) or 42.8% (Lasso + CTPR) with UK Biobank data.
Read more

Leveraging Polygenic Functional Enrichment to Improve GWAS Power

Kichaev G, Bhatia G, Loh P-R, Gazal S, Burch K, Freund MK, Schoech A, Pasaniuc B, Price AL. Leveraging Polygenic Functional Enrichment to Improve GWAS Power. Am J Hum Genet 2019;104(1):65-75.Abstract
Functional genomics data has the potential to increase GWAS power by identifying SNPs that have a higher prior probability of association. Here, we introduce a method that leverages polygenic functional enrichment to incorporate coding, conserved, regulatory, and LD-related genomic annotations into association analyses. We show via simulations with real genotypes that the method, functionally informed novel discovery of risk loci (FINDOR), correctly controls the false-positive rate at null loci and attains a 9%-38% increase in the number of independent associations detected at causal loci, depending on trait polygenicity and sample size. We applied FINDOR to 27 independent complex traits and diseases from the interim UK Biobank release (average N = 130K). Averaged across traits, we attained a 13% increase in genome-wide significant loci detected (including a 20% increase for disease traits) compared to unweighted raw p values that do not use functional data. We replicated the additional loci in independent UK Biobank and non-UK Biobank data, yielding a highly statistically significant replication slope (0.66-0.69) in each case. Finally, we applied FINDOR to the full UK Biobank release (average N = 416K), attaining smaller relative improvements (consistent with simulations) but larger absolute improvements, detecting an additional 583 GWAS loci. In conclusion, leveraging functional enrichment using our method robustly increases GWAS power.
Read more
More