Publications by Year: 2017

2017
Weng L-C, Choi SH, Klarin D, Smith GJ, Loh P-R, Chaffin M, Roselli C, Hulme OL, Lunetta KL, Dupuis J, Benjamin EJ, Newton-Cheh C, Kathiresan S, Ellinor PT, Lubitz SA. Heritability of Atrial Fibrillation. Circ Cardiovasc Genet 2017;10(6)Abstract
BACKGROUND: Previous reports have implicated multiple genetic loci associated with AF, but the contributions of genome-wide variation to AF susceptibility have not been quantified. METHODS AND RESULTS: We assessed the contribution of genome-wide single-nucleotide polymorphism variation to AF risk (single-nucleotide polymorphism heritability, h2g ) using data from 120 286 unrelated individuals of European ancestry (2987 with AF) in the population-based UK Biobank. We ascertained AF based on self-report, medical record billing codes, procedure codes, and death records. We estimated h2g using a variance components method with variants having a minor allele frequency ≥1%. We evaluated h2g in age, sex, and genomic strata of interest. The h2g for AF was 22.1% (95% confidence interval, 15.6%-28.5%) and was similar for early- versus older-onset AF (≤65 versus >65 years of age), as well as for men and women. The proportion of AF variance explained by genetic variation was mainly accounted for by common (minor allele frequency, ≥5%) variants (20.4%; 95% confidence interval, 15.1%-25.6%). Only 6.4% (95% confidence interval, 5.1%-7.7%) of AF variance was attributed to variation within known AF susceptibility, cardiac arrhythmia, and cardiomyopathy gene regions. CONCLUSIONS: Genetic variation contributes substantially to AF risk. The risk for AF conferred by genomic variation is similar to that observed for several other cardiovascular diseases. Established AF loci only explain a moderate proportion of disease risk, suggesting that further genetic discovery, with an emphasis on common variation, is warranted to understand the causal genetic basis of AF.
Márquez-Luna C, Loh P-R, Price AL. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet Epidemiol 2017;41(8):811-823.Abstract
Methods for genetic risk prediction have been widely investigated in recent years. However, most available training data involves European samples, and it is currently unclear how to accurately predict disease risk in other populations. Previous studies have used either training data from European samples in large sample size or training data from the target population in small sample size, but not both. Here, we introduce a multiethnic polygenic risk score that combines training data from European samples and training data from the target population. We applied this approach to predict type 2 diabetes (T2D) in a Latino cohort using both publicly available European summary statistics in large sample size (Neff  = 40k) and Latino training data in small sample size (Neff  = 8k). Here, we attained a >70% relative improvement in prediction accuracy (from R2  = 0.027 to 0.047) compared to methods that use only one source of training data, consistent with large relative improvements in simulations. We observed a systematically lower load of T2D risk alleles in Latino individuals with more European ancestry, which could be explained by polygenic selection in ancestral European and/or Native American populations. We predict T2D in a South Asian UK Biobank cohort using European (Neff  = 40k) and South Asian (Neff  = 16k) training data and attained a >70% relative improvement in prediction accuracy, and application to predict height in an African UK Biobank cohort using European (N = 113k) and African (N = 2k) training data attained a 30% relative improvement. Our work reduces the gap in polygenic risk prediction accuracy between European and non-European target populations.
Gazal S, Finucane HK, Furlotte NA, Loh P-R, Palamara PF, Liu X, Schoech A, Bulik-Sullivan B, Neale BM, Gusev A, Price AL. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat Genet 2017;49(10):1421-1427.Abstract
Recent work has hinted at the linkage disequilibrium (LD)-dependent architecture of human complex traits, where SNPs with low levels of LD (LLD) have larger per-SNP heritability. Here we analyzed summary statistics from 56 complex traits (average N = 101,401) by extending stratified LD score regression to continuous annotations. We determined that SNPs with low LLD have significantly larger per-SNP heritability and that roughly half of this effect can be explained by functional annotations negatively correlated with LLD, such as DNase I hypersensitivity sites (DHSs). The remaining signal is largely driven by our finding that more recent common variants tend to have lower LLD and to explain more heritability (P = 2.38 × 10-104); the youngest 20% of common SNPs explain 3.9 times more heritability than the oldest 20%, consistent with the action of negative selection. We also inferred jointly significant effects of other LD-related annotations and confirmed via forward simulations that they jointly predict deleterious effects.
Willems SM, Wright DJ, Day FR, Trajanoska K, Joshi PK, Morris JA, Matteini AM, Garton FC, Grarup N, Oskolkov N, Thalamuthu A, Mangino M, Liu J, Demirkan A, Lek M, Xu L, Wang G, Oldmeadow C, Gaulton KJ, Lotta LA, Miyamoto-Mikami E, Rivas MA, White T, Loh P-R, .., Rivadeneira F, Langenberg C, Perry JRB, Wareham NJ, Scott RA. Large-scale GWAS identifies multiple loci for hand grip strength providing biological insights into muscular fitness. Nat Commun 2017;8:16015.Abstract
Hand grip strength is a widely used proxy of muscular fitness, a marker of frailty, and predictor of a range of morbidities and all-cause mortality. To investigate the genetic determinants of variation in grip strength, we perform a large-scale genetic discovery analysis in a combined sample of 195,180 individuals and identify 16 loci associated with grip strength (P<5 × 10-8) in combined analyses. A number of these loci contain genes implicated in structure and function of skeletal muscle fibres (ACTG1), neuronal maintenance and signal transduction (PEX14, TGFA, SYT1), or monogenic syndromes with involvement of psychomotor impairment (PEX14, LRPPRC and KANSL1). Mendelian randomization analyses are consistent with a causal effect of higher genetically predicted grip strength on lower fracture risk. In conclusion, our findings provide new biological insight into the mechanistic underpinnings of grip strength and the causal role of muscular strength in age-related morbidities and mortality.
Day FR, Thompson DJ, Helgason H, Chasman DI, Finucane H, Sulem P, Ruth KS, Whalen S, Sarkar AK, Albrecht E, Altmaier E, Amini M, Barbieri CM, Boutin T, Campbell A, Demerath E, Giri A, He C, Hottenga JJ, Karlsson R, Kolcic I, Loh P-R, .., Murray A, Murabito JM, Stefansson K, Ong KK, Perry JRB. Genomic analyses identify hundreds of variants associated with age at menarche and support a role for puberty timing in cancer risk. Nat Genet 2017;49(6):834-841.Abstract
The timing of puberty is a highly polygenic childhood trait that is epidemiologically associated with various adult diseases. Using 1000 Genomes Project-imputed genotype data in up to ∼370,000 women, we identify 389 independent signals (P < 5 × 10(-8)) for age at menarche, a milestone in female pubertal development. In Icelandic data, these signals explain ∼7.4% of the population variance in age at menarche, corresponding to ∼25% of the estimated heritability. We implicate ∼250 genes via coding variation or associated expression, demonstrating significant enrichment in neural tissues. Rare variants near the imprinted genes MKRN3 and DLK1 were identified, exhibiting large effects when paternally inherited. Mendelian randomization analyses suggest causal inverse associations, independent of body mass index (BMI), between puberty timing and risks for breast and endometrial cancers in women and prostate cancer in men. In aggregate, our findings highlight the complexity of the genetic regulation of puberty timing and support causal links with cancer susceptibility.
Hill A, Loh P-R, Bharadwaj RB, Pons P, Shang J, Guinan E, Lakhani K, Kilty I, Jelinsky SA. Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis. Gigascience 2017;6(5):1-10.Abstract
Background: The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Results: Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Conclusions: Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics.
Hayeck TJ, Loh P-R, Pollack S, Gusev A, Patterson N, Zaitlen NA, Price AL. Mixed Model Association with Family-Biased Case-Control Ascertainment. Am J Hum Genet 2017;100(1):31-39.Abstract
Mixed models have become the tool of choice for genetic association studies; however, standard mixed model methods may be poorly calibrated or underpowered under family sampling bias and/or case-control ascertainment. Previously, we introduced a liability threshold-based mixed model association statistic (LTMLM) to address case-control ascertainment in unrelated samples. Here, we consider family-biased case-control ascertainment, where case and control subjects are ascertained non-randomly with respect to family relatedness. Previous work has shown that this type of ascertainment can severely bias heritability estimates; we show here that it also impacts mixed model association statistics. We introduce a family-based association statistic (LT-Fam) that is robust to this problem. Similar to LTMLM, LT-Fam is computed from posterior mean liabilities (PML) under a liability threshold model; however, LT-Fam uses published narrow-sense heritability estimates to avoid the problem of biased heritability estimation, enabling correct calibration. In simulations with family-biased case-control ascertainment, LT-Fam was correctly calibrated (average χ(2) = 1.00-1.02 for null SNPs), whereas the Armitage trend test (ATT), standard mixed model association (MLM), and case-control retrospective association test (CARAT) were mis-calibrated (e.g., average χ(2) = 0.50-1.22 for MLM, 0.89-2.65 for CARAT). LT-Fam also attained higher power than other methods in some settings. In 1,259 type 2 diabetes-affected case subjects and 5,765 control subjects from the CARe cohort, downsampled to induce family-biased ascertainment, LT-Fam was correctly calibrated whereas ATT, MLM, and CARAT were again mis-calibrated. Our results highlight the importance of modeling family sampling bias in case-control datasets with related samples.