Mosaic loss of chromosome Y (LOY) in leukocytes is the most common form of clonal mosaicism, caused by dysregulation in cell-cycle and DNA damage response pathways. Previous genetic studies have focussed on identifying common variants associated with LOY, which we now extend to rarer, protein-coding variation using exome sequences from 82,277 male UK Biobank participants. We find that loss of function of two genes-CHEK2 and GIGYF1-reach exome-wide significance. Rare alleles in GIGYF1 have not previously been implicated in any complex trait, but here loss-of-function carriers exhibit six-fold higher susceptibility to LOY (OR = 5.99 [3.04-11.81], p = 1.3 × 10-10). These same alleles are also associated with adverse metabolic health, including higher susceptibility to Type 2 Diabetes (OR = 6.10 [3.51-10.61], p = 1.8 × 10-12), 4 kg higher fat mass (p = 1.3 × 10-4), 2.32 nmol/L lower serum IGF1 levels (p = 1.5 × 10-4) and 4.5 kg lower handgrip strength (p = 4.7 × 10-7) consistent with proposed GIGYF1 enhancement of insulin and IGF-1 receptor signalling. These associations are mirrored by a common variant nearby associated with the expression of GIGYF1. Our observations highlight a potential direct connection between clonal mosaicism and metabolic health.
Exome association studies to date have generally been underpowered to systematically evaluate the phenotypic impact of very rare coding variants. We leveraged extensive haplotype sharing between 49,960 exome-sequenced UK Biobank participants and the remainder of the cohort (total n ≈ 500,000) to impute exome-wide variants with accuracy R2 > 0.5 down to minor allele frequency (MAF) ~0.00005. Association and fine-mapping analyses of 54 quantitative traits identified 1,189 significant associations (P < 5 × 10-8) involving 675 distinct rare protein-altering variants (MAF < 0.01) that passed stringent filters for likely causality. Across all traits, 49% of associations (578/1,189) occurred in genes with two or more hits; follow-up analyses of these genes identified allelic series containing up to 45 distinct 'likely-causal' variants. Our results demonstrate the utility of within-cohort imputation in population-scale genome-wide association studies, provide a catalog of likely-causal, large-effect coding variant associations and foreshadow the insights that will be revealed as genetic biobank studies continue to grow.
Zekavat SM, Lin S-H, Bick AG, Liu A, Paruchuri K, Wang C, Uddin MM, Ye Y, Yu Z, Liu X, Kamatani Y, Bhattacharya R, Pirruccello JP, Pampana A, Loh P-R, Kohli P, McCarroll SA, Kiryluk K, Neale B, Ionita-Laza I, Engels EA, Brown DW, Smoller JW, Green R, Karlson EW, Lebo M, Ellinor PT, Weiss ST, Daly MJ, Daly MJ, Daly MJ, Terao C, Zhao H, Ebert BL, Reilly MP, Ganna A, Machiela MJ, Genovese G, Natarajan P. Hematopoietic mosaic chromosomal alterations increase the risk for diverse types of infection. Nat Med 2021;27(6):1012-1024.Abstract
Age is the dominant risk factor for infectious diseases, but the mechanisms linking age to infectious disease risk are incompletely understood. Age-related mosaic chromosomal alterations (mCAs) detected from genotyping of blood-derived DNA, are structural somatic variants indicative of clonal hematopoiesis, and are associated with aberrant leukocyte cell counts, hematological malignancy, and mortality. Here, we show that mCAs predispose to diverse types of infections. We analyzed mCAs from 768,762 individuals without hematological cancer at the time of DNA acquisition across five biobanks. Expanded autosomal mCAs were associated with diverse incident infections (hazard ratio (HR) 1.25; 95% confidence interval (CI) = 1.15-1.36; P = 1.8 × 10-7), including sepsis (HR 2.68; 95% CI = 2.25-3.19; P = 3.1 × 10-28), pneumonia (HR 1.76; 95% CI = 1.53-2.03; P = 2.3 × 10-15), digestive system infections (HR 1.51; 95% CI = 1.32-1.73; P = 2.2 × 10-9) and genitourinary infections (HR 1.25; 95% CI = 1.11-1.41; P = 3.7 × 10-4). A genome-wide association study of expanded mCAs identified 63 loci, which were enriched at transcriptional regulatory sites for immune cells. These results suggest that mCAs are a marker of impaired immunity and confer increased predisposition to infections.
Interactions between genetic variants-epistasis-is pervasive in model systems and can profoundly impact evolutionary adaption, population disease dynamics, genetic mapping, and precision medicine efforts. In this work, we develop a model for structured polygenic epistasis, called coordinated epistasis (CE), and prove that several recent theories of genetic architecture fall under the formal umbrella of CE. Unlike standard epistasis models that assume epistasis and main effects are independent, CE captures systematic correlations between epistasis and main effects that result from pathway-level epistasis, on balance skewing the penetrance of genetic effects. To test for the existence of CE, we propose the even-odd (EO) test and prove it is calibrated in a range of realistic biological models. Applying the EO test in the UK Biobank, we find evidence of CE in 18 of 26 traits spanning disease, anthropometric, and blood categories. Finally, we extend the EO test to tissue-specific enrichment and identify several plausible tissue-trait pairs. Overall, CE is a dimension of genetic architecture that can capture structured, systemic forms of epistasis in complex human traits.
Although germline de novo copy number variants (CNVs) are known causes of autism spectrum disorder (ASD), the contribution of mosaic (early-developmental) copy number variants (mCNVs) has not been explored. In this study, we assessed the contribution of mCNVs to ASD by ascertaining mCNVs in genotype array intensity data from 12,077 probands with ASD and 5,500 unaffected siblings. We detected 46 mCNVs in probands and 19 mCNVs in siblings, affecting 2.8-73.8% of cells. Probands carried a significant burden of large (>4-Mb) mCNVs, which were detected in 25 probands but only one sibling (odds ratio = 11.4, 95% confidence interval = 1.5-84.2, P = 7.4 × 10). Event size positively correlated with severity of ASD symptoms (P = 0.016). Surprisingly, we did not observe mosaic analogues of the short de novo CNVs recurrently observed in ASD (eg, 16p11.2). We further experimentally validated two mCNVs in postmortem brain tissue from 59 additional probands. These results indicate that mCNVs contribute a previously unexplained component of ASD risk.
Telomeres are DNA-protein structures at the ends of chromosomes essential in maintaining chromosomal stability. Observational studies have identified associations between telomeres and elevated cancer risk, including hematologic malignancies; but biologic mechanisms relating telomere length to cancer etiology remain unclear. Our study sought to better understand the relationship between telomere length and cancer risk by evaluating genetically-predicted telomere length (gTL) in relation to the presence of clonal somatic copy number alterations (SCNAs) in peripheral blood leukocytes. Genotyping array data were acquired from 431,507 participants in the UK Biobank and used to detect SCNAs from intensity information and infer telomere length using a polygenic risk score (PRS) of variants previously associated with leukocyte telomere length. In total, 15,236 (3.5%) of individuals had a detectable clonal SCNA on an autosomal chromosome. Overall, higher gTL value was positively associated with the presence of an autosomal SCNA (OR = 1.07, 95% CI = 1.05-1.09, P = 1.61×10-15). There was high consistency in effect estimates across strata of chromosomal event location (e.g., telomeric ends, interstitial or whole chromosome event; Phet = 0.37) and strata of copy number state (e.g., gain, loss, or neutral events; Phet = 0.05). Higher gTL value was associated with a greater cellular fraction of clones carrying autosomal SCNAs (β = 0.004, 95% CI = 0.002-0.007, P = 6.61×10-4). Our population-based examination of gTL and SCNAs suggests inherited components of telomere length do not preferentially impact autosomal SCNA event location or copy number status, but rather likely influence cellular replicative potential.
Clonally expanded blood cells that contain somatic mutations (clonal haematopoiesis) are commonly acquired with age and increase the risk of blood cancer. The blood clones identified so far contain diverse large-scale mosaic chromosomal alterations (deletions, duplications and copy-neutral loss of heterozygosity (CN-LOH)) on all chromosomes, but the sources of selective advantage that drive the expansion of most clones remain unknown. Here, to identify genes, mutations and biological processes that give selective advantage to mutant clones, we analysed genotyping data from the blood-derived DNA of 482,789 participants from the UK Biobank. We identified 19,632 autosomal mosaic chromosomal alterations and analysed these for relationships to inherited genetic variation. We found 52 inherited, rare, large-effect coding or splice variants in 7 genes that were associated with greatly increased vulnerability to clonal haematopoiesis with specific acquired CN-LOH mutations. Acquired mutations systematically replaced the inherited risk alleles (at MPL) or duplicated them to the homologous chromosome (at FH, NBN, MRE11, ATM, SH2B3 and TM2D3). Three of the genes (MRE11, NBN and ATM) encode components of the MRN-ATM pathway, which limits cell division after DNA damage and telomere attrition; another two (MPL and SH2B3) encode proteins that regulate the self-renewal of stem cells. In addition, we found that CN-LOH mutations across the genome tended to cause chromosomal segments with alleles that promote the expansion of haematopoietic cells to replace their homologous (allelic) counterparts, increasing polygenic drive for blood-cell proliferation traits. Readily acquired mutations that replace chromosomal segments with their homologous counterparts seem to interact with pervasive inherited variation to create a challenge for lifelong cytopoiesis.
The extent to which the biology of oncogenesis and ageing are shaped by factors that distinguish human populations is unknown. Haematopoietic clones with acquired mutations become common with advancing age and can lead to blood cancers. Here we describe shared and population-specific patterns of genomic mutations and clonal selection in haematopoietic cells on the basis of 33,250 autosomal mosaic chromosomal alterations that we detected in 179,417 Japanese participants in the BioBank Japan cohort and compared with analogous data from the UK Biobank. In this long-lived Japanese population, mosaic chromosomal alterations were detected in more than 35.0% (s.e.m., 1.4%) of individuals older than 90 years, which suggests that such clones trend towards inevitability with advancing age. Japanese and European individuals exhibited key differences in the genomic locations of mutations in their respective haematopoietic clones; these differences predicted the relative rates of chronic lymphocytic leukaemia (which is more common among European individuals) and T cell leukaemia (which is more common among Japanese individuals) in these populations. Three different mutational precursors of chronic lymphocytic leukaemia (including trisomy 12, loss of chromosomes 13q and 13q, and copy-neutral loss of heterozygosity) were between two and six times less common among Japanese individuals, which suggests that the Japanese and European populations differ in selective pressures on clones long before the development of clinically apparent chronic lymphocytic leukaemia. Japanese and British populations also exhibited very different rates of clones that arose from B and T cell lineages, which predicted the relative rates of B and T cell cancers in these populations. We identified six previously undescribed loci at which inherited variants predispose to mosaic chromosomal alterations that duplicate or remove the inherited risk alleles, including large-effect rare variants at NBN, MRE11 and CTU2 (odds ratio, 28-91). We suggest that selective pressures on clones are modulated by factors that are specific to human populations. Further genomic characterization of clonal selection and cancer in populations from around the world is therefore warranted.
Family history of disease can provide valuable information in case-control association studies, but it is currently unclear how to best combine case-control status and family history of disease. We developed an association method based on posterior mean genetic liabilities under a liability threshold model, conditional on case-control status and family history (LT-FH). Analyzing 12 diseases from the UK Biobank (average N = 350,000) we compared LT-FH to genome-wide association without using family history (GWAS) and a previous proxy-based method incorporating family history (GWAX). LT-FH was 63% (standard error (s.e.) 6%) more powerful than GWAS and 36% (s.e. 4%) more powerful than the trait-specific maximum of GWAS and GWAX, based on the number of independent genome-wide-significant loci across all diseases (for example, 690 loci for LT-FH versus 423 for GWAS); relative improvements were similar when applying BOLT-LMM to GWAS, GWAX and LT-FH phenotypes. Thus, LT-FH greatly increases association power when family history of disease is available.
Thompson DJ, Genovese G, Halvardson J, Ulirsch JC, Wright DJ, Terao C, Davidsson OB, Day FR, Sulem P, Jiang Y, Danielsson M, Davies H, Dennis J, Dunlop MG, Easton DF, Fisher VA, Zink F, Houlston RS, Ingelsson M, Kar S, Kerrison ND, Kinnersley B, Kristjansson RP, Law PJ, Li R, Loveday C, Mattisson J, McCarroll SA, Murakami Y, Murray A, Olszewski P, Rychlicka-Buniowska E, Scott RA, Thorsteinsdottir U, Tomlinson I, Moghadam BT, Turnbull C, Wareham NJ, Gudbjartsson DF, Kamatani Y, Hoffmann ER, Jackson SP, Stefansson K, Auton A, Ong KK, Machiela MJ, Loh P-R, Dumanski JP, Chanock SJ, Forsberg LA, Perry JRB. Genetic predisposition to mosaic Y chromosome loss in blood. Nature 2019;575(7784):652-657.Abstract
Mosaic loss of chromosome Y (LOY) in circulating white blood cells is the most common form of clonal mosaicism, yet our knowledge of the causes and consequences of this is limited. Here, using a computational approach, we estimate that 20% of the male population represented in the UK Biobank study (n = 205,011) has detectable LOY. We identify 156 autosomal genetic determinants of LOY, which we replicate in 757,114 men of European and Japanese ancestry. These loci highlight genes that are involved in cell-cycle regulation and cancer susceptibility, as well as somatic drivers of tumour growth and targets of cancer therapy. We demonstrate that genetic susceptibility to LOY is associated with non-haematological effects on health in both men and women, which supports the hypothesis that clonal haematopoiesis is a biomarker of genomic instability in other tissues. Single-cell RNA sequencing identifies dysregulated expression of autosomal genes in leukocytes with LOY and provides insights into why clonal expansion of these cells may occur. Collectively, these data highlight the value of studying clonal mosaicism to uncover fundamental mechanisms that underlie cancer and other ageing-related diseases.
The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies, because biological and technical differences are interspersed. We present Harmony (https://github.com/immunogenomics/harmony), an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms while requiring fewer computational resources. Harmony enables the integration of ~10 cells on a personal computer. We apply Harmony to peripheral blood mononuclear cells from datasets with large experimental differences, five studies of pancreatic islet cells, mouse embryogenesis datasets and the integration of scRNA-seq with spatial transcriptomics data.
Mosaic loss of chromosome Y (mLOY) is frequently observed in the leukocytes of ageing men. However, the genetic architecture and biological mechanisms underlying mLOY are not fully understood. In a cohort of 95,380 Japanese men, we identify 50 independent genetic markers in 46 loci associated with mLOY at a genome-wide significant level, 35 of which are unreported. Lead markers overlap enhancer marks in hematopoietic stem cells (HSCs, P ≤ 1.0 × 10). mLOY genome-wide association study signals exhibit polygenic architecture and demonstrate strong heritability enrichment in regions surrounding genes specifically expressed in multipotent progenitor (MPP) cells and HSCs (P ≤ 3.5 × 10). ChIP-seq data demonstrate that binding sites of FLI1, a fate-determining factor promoting HSC differentiation into platelets rather than red blood cells (RBCs), show a strong heritability enrichment (P = 1.5 × 10). Consistent with these findings, platelet and RBC counts are positively and negatively associated with mLOY, respectively. Collectively, our observations improve our understanding of the mechanisms underlying mLOY.
Regulatory variation plays a major role in complex disease and that cell-type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF binding sites to disease heritability is challenging, as binding is often cell-type-specific and annotations from directly measured TF binding are not currently available for most cell-type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF binding annotations constructed by intersecting sequence-based TF binding predictions with cell-type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell-types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context, and the limitation that sequence-based predictions are generally not cell-type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified LD score regression with the baseline-LD model (which is not cell-type-specific) plus the new annotations. We determined that 100bp windows around MotifMap sequenced-based TF binding predictions intersected with a union of six cell-type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6x vs 7.3x; P = 9 x 10-14 for difference) and a 20% increase in cell-type-specific signal conditional on annotations from the baseline-LD model (P = 8 x 10-11 for difference). Our results show that TF binding annotations explain substantial disease heritability and can help refine genome-wide association signals.
Recent studies have highlighted the role of gene networks in disease biology. To formally assess this, we constructed a broad set of pathway, network, and pathway+network annotations and applied stratified LD score regression to 42 diseases and complex traits (average N = 323K) to identify enriched annotations. First, we analyzed 18,119 biological pathways. We identified 156 pathway-trait pairs whose disease enrichment was statistically significant (FDR < 5%) after conditioning on all genes and 75 known functional annotations (from the baseline-LD model), a stringent step that greatly reduced the number of pathways detected; most significant pathway-trait pairs were previously unreported. Next, for each of four published gene networks, we constructed probabilistic annotations based on network connectivity. For each gene network, the network connectivity annotation was strongly significantly enriched. Surprisingly, the enrichments were fully explained by excess overlap between network annotations and regulatory annotations from the baseline-LD model, validating the informativeness of the baseline-LD model and emphasizing the importance of accounting for regulatory annotations in gene network analyses. Finally, for each of the 156 enriched pathway-trait pairs, for each of the four gene networks, we constructed pathway+network annotations by annotating genes with high network connectivity to the input pathway. For each gene network, these pathway+network annotations were strongly significantly enriched for the corresponding traits. Once again, the enrichments were largely explained by the baseline-LD model. In conclusion, gene network connectivity is highly informative for disease architectures, but the information in gene networks may be subsumed by regulatory annotations, emphasizing the importance of accounting for known annotations.
Warrington NM, Beaumont RN, Horikoshi M, Day FR, Helgeland Ø, Laurin C, Bacelis J, Peng S, Hao K, Feenstra B, Wood AR, Mahajan A, Tyrrell J, Robertson NR, Rayner WN, Qiao Z, Moen G-H, Vaudel M, Marsit CJ, Chen J, Nodzenski M, Schnurr TM, Zafarmand MH, Bradfield JP, Grarup N, Kooijman MN, Li-Gao R, Geller F, Ahluwalia TS, Paternoster L, Rueedi R, Huikari V, Hottenga J-J, Lyytikäinen L-P, Cavadino A, Metrustry S, Cousminer DL, Wu Y, Thiering E, Wang CA, Have CT, Vilor-Tejedor N, Joshi PK, Painter JN, Ntalla I, Myhre R, Pitkänen N, van Leeuwen EM, Joro R, Lagou V, Richmond RC, Espinosa A, Barton SJ, Inskip HM, Holloway JW, Santa-Marina L, Estivill X, Ang W, Marsh JA, Reichetzeder C, Marullo L, Hocher B, Lunetta KL, Murabito JM, Relton CL, Kogevinas M, Chatzi L, Allard C, Bouchard L, Hivert M-F, Zhang G, Muglia LJ, Heikkinen J, Morgen CS, van Kampen AHC, van Schaik BDC, Mentch FD, Langenberg C, Luan J'an, Scott RA, Zhao JH, Hemani G, Ring SM, Bennett AJ, Gaulton KJ, Fernandez-Tajes J, van Zuydam NR, Medina-Gomez C, de Haan HG, Rosendaal FR, Kutalik Z, Marques-Vidal P, Das S, Willemsen G, Mbarek H, Müller-Nurasyid M, Standl M, Appel EVR, Fonvig CE, Trier C, van Beijsterveldt CE, Murcia M, Bustamante M, Bonas-Guarch S, Hougaard DM, Mercader JM, Linneberg A, Schraut KE, Lind PA, Medland SE, Shields BM, Knight BA, Chai J-F, Panoutsopoulou K, Bartels M, Sánchez F, Stokholm J, Torrents D, Vinding RK, Willems SM, Atalay M, Chawes BL, Kovacs P, Prokopenko I, Tuke MA, Yaghootkar H, Ruth KS, Jones SE, Loh P-R, .., Ong KK, McCarthy MI, Perry JRB, Evans DM, Freathy RM. Maternal and fetal genetic effects on birth weight and their relevance to cardio-metabolic risk factors. Nat Genet 2019;51(5):804-814.Abstract
Birth weight variation is influenced by fetal and maternal genetic and non-genetic factors, and has been reproducibly associated with future cardio-metabolic health outcomes. In expanded genome-wide association analyses of own birth weight (n = 321,223) and offspring birth weight (n = 230,069 mothers), we identified 190 independent association signals (129 of which are novel). We used structural equation modeling to decompose the contributions of direct fetal and indirect maternal genetic effects, then applied Mendelian randomization to illuminate causal pathways. For example, both indirect maternal and direct fetal genetic effects drive the observational relationship between lower birth weight and higher later blood pressure: maternal blood pressure-raising alleles reduce offspring birth weight, but only direct fetal effects of these alleles, once inherited, increase later offspring blood pressure. Using maternal birth weight-lowering genotypes to proxy for an adverse intrauterine environment provided no evidence that it causally raises offspring blood pressure, indicating that the inverse birth weight-blood pressure association is attributable to genetic effects, and not to intrauterine programming.
Recent studies have examined the genetic correlations of single-nucleotide polymorphism (SNP) effect sizes across pairs of populations to better understand the genetic architectures of complex traits. These studies have estimated ρ g , the cross-population correlation of joint-fit effect sizes at genotyped SNPs. However, the value of ρ g depends both on the cross-population correlation of true causal effect sizes ( ρ b ) and on the similarity in linkage disequilibrium (LD) patterns in the two populations, which drive tagging effects. Here, we derive the value of the ratio ρ g / ρ b as a function of LD in each population. By applying existing methods to obtain estimates of ρ g , we can use this ratio to estimate ρ b . Our estimates of ρ b were equal to 0.55 ( SE = 0.14) between Europeans and East Asians averaged across nine traits in the Genetic Epidemiology Research on Adult Health and Aging data set, 0.54 ( SE = 0.18) between Europeans and South Asians averaged across 13 traits in the UK Biobank data set, and 0.48 ( SE = 0.06) and 0.65 ( SE = 0.09) between Europeans and East Asians in summary statistic data sets for type 2 diabetes and rheumatoid arthritis, respectively. These results implicate substantially different causal genetic architectures across continental populations.
Understanding the role of rare variants is important in elucidating the genetic basis of human disease. Negative selection can cause rare variants to have larger per-allele effect sizes than common variants. Here, we develop a method to estimate the minor allele frequency (MAF) dependence of SNP effect sizes. We use a model in which per-allele effect sizes have variance proportional to [p(1 - p)], where p is the MAF and negative values of α imply larger effect sizes for rare variants. We estimate α for 25 UK Biobank diseases and complex traits. All traits produce negative α estimates, with best-fit mean of -0.38 (s.e. 0.02) across traits. Despite larger rare variant effect sizes, rare variants (MAF < 1%) explain less than 10% of total SNP-heritability for most traits analyzed. Using evolutionary modeling and forward simulations, we validate the α model of MAF-dependent trait effects and assess plausible values of relevant evolutionary parameters.
We introduce cross-trait penalized regression (CTPR), a powerful and practical approach for multi-trait polygenic risk prediction in large cohorts. Specifically, we propose a novel cross-trait penalty function with the Lasso and the minimax concave penalty (MCP) to incorporate the shared genetic effects across multiple traits for large-sample GWAS data. Our approach extracts information from the secondary traits that is beneficial for predicting the primary trait based on individual-level genotypes and/or summary statistics. Our novel implementation of a parallel computing algorithm makes it feasible to apply our method to biobank-scale GWAS data. We illustrate our method using large-scale GWAS data (~1M SNPs) from the UK Biobank (N = 456,837). We show that our multi-trait method outperforms the recently proposed multi-trait analysis of GWAS (MTAG) for predictive performance. The prediction accuracy for height by the aid of BMI improves from R = 35.8% (MTAG) to 42.5% (MCP + CTPR) or 42.8% (Lasso + CTPR) with UK Biobank data.
Functional genomics data has the potential to increase GWAS power by identifying SNPs that have a higher prior probability of association. Here, we introduce a method that leverages polygenic functional enrichment to incorporate coding, conserved, regulatory, and LD-related genomic annotations into association analyses. We show via simulations with real genotypes that the method, functionally informed novel discovery of risk loci (FINDOR), correctly controls the false-positive rate at null loci and attains a 9%-38% increase in the number of independent associations detected at causal loci, depending on trait polygenicity and sample size. We applied FINDOR to 27 independent complex traits and diseases from the interim UK Biobank release (average N = 130K). Averaged across traits, we attained a 13% increase in genome-wide significant loci detected (including a 20% increase for disease traits) compared to unweighted raw p values that do not use functional data. We replicated the additional loci in independent UK Biobank and non-UK Biobank data, yielding a highly statistically significant replication slope (0.66-0.69) in each case. Finally, we applied FINDOR to the full UK Biobank release (average N = 416K), attaining smaller relative improvements (consistent with simulations) but larger absolute improvements, detecting an additional 583 GWAS loci. In conclusion, leveraging functional enrichment using our method robustly increases GWAS power.
Common variant heritability has been widely reported to be concentrated in variants within cell-type-specific non-coding functional annotations, but little is known about low-frequency variant functional architectures. We partitioned the heritability of both low-frequency (0.5%≤ minor allele frequency <5%) and common (minor allele frequency ≥5%) variants in 40 UK Biobank traits across a broad set of functional annotations. We determined that non-synonymous coding variants explain 17 ± 1% of low-frequency variant heritability ([Formula: see text]) versus 2.1 ± 0.2% of common variant heritability ([Formula: see text]). Cell-type-specific non-coding annotations that were significantly enriched for [Formula: see text] of corresponding traits were similarly enriched for [Formula: see text] for most traits, but more enriched for brain-related annotations and traits. For example, H3K4me3 marks in brain dorsolateral prefrontal cortex explain 57 ± 12% of [Formula: see text] versus 12 ± 2% of [Formula: see text] for neuroticism. Forward simulations confirmed that low-frequency variant enrichment depends on the mean selection coefficient of causal variants in the annotation, and can be used to predict effect size variance of causal rare variants (minor allele frequency <0.5%).