Key Points
-
Genome-wide association studies are systematic, well-powered surveys to explore the relationships between sites of common genome sequence variation and disease predisposition on a genome-wide scale.
-
The capacity to undertake genome-wide association studies has resulted in spectacular advances in the understanding of the genetic basis of common phenotypes of biomedical importance, such as diabetes, asthma and some cancers.
-
Application of this approach to large, well-characterized data sets has revealed over 50 disease-susceptibility loci and has provided valuable insights into the allelic architecture of multifactorial traits.
-
The implementation of such studies requires meticulous attention to all stages of the experimental process, from the ascertainment of the samples through to analysis and interpretation of the findings. There is considerable potential for a wide variety of errors and biases to result in spurious associations if precautions are not taken.
-
Extensive replication of positive findings remains the best guarantee against erroneous claims of association. The demand for large-scale replication is leading to extensive international collaborations between groups.
-
Nonetheless, substantial challenges remain as researchers seek more complete descriptions of the susceptibility architecture of traits of interest, and to translate the information gathered into improvements in clinical management.
Abstract
The past year has witnessed substantial advances in understanding the genetic basis of many common phenotypes of biomedical importance. These advances have been the result of systematic, well-powered, genome-wide surveys exploring the relationships between common sequence variation and disease predisposition. This approach has revealed over 50 disease-susceptibility loci and has provided insights into the allelic architecture of multifactorial traits. At the same time, much has been learned about the successful prosecution of association studies on such a scale. This Review highlights the knowledge gained, defines areas of emerging consensus, and describes the challenges that remain as researchers seek to obtain more complete descriptions of the susceptibility architecture of biomedical traits of interest and to translate the information gathered into improvements in clinical management.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007). In this study, high density, genome-wide association data on 17,000 individuals identified many novel complex-trait susceptibility loci and explored key methodological and technical issues relevant to the GWA approach.
Todd, J. A. et al. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nature Genet. 39, 857–864 (2007).
Hakonarson, H. et al. A genome-wide association study identifies KIAA0350 as a type 1 diabetes gene. Nature 448, 591–594 (2007).
Sladek, R. et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445, 881–885 (2007).
Zeggini, E. et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316, 1336–1341 (2007).
Scott, L. J. et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316, 1341–1345 (2007).
Diabetes Genetics Initiative. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336 (2007).
Steinthorsdottir, V. et al. A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nature Genet. 39, 770–775 (2007).
Zeggini, E., Scott, L. J., Saxena, R., Voight, B. & DIAGRAM Consortium. Meta-analysis of genome-wide association data and large-scale replication identifies several additional susceptibility loci for type 2 diabetes. Nature Genet. 30 Mar 2008 (doi:10.1038/ng.120).
Parkes, M. et al. Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility. Nature Genet. 39, 830–832 (2007).
Duerr, R. H. et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 314, 1461–1463 (2006).
Rioux, J. D. et al. Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nature Genet. 39, 596–604 (2007).
Libioulle, C. et al. Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS Genet. 3, e58 (2007).
Hampe, J. et al. A genome-wide association scan of nonsynonymous SNPs identifies a susceptibility variant for Crohn disease in ATG16L1. Nature Genet. 39, 207–211 (2007).
Gudmundsson, J. et al. Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nature Genet. 39, 631–637 (2007).
Gudmundsson, J. et al. Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nature Genet. 39, 977–983 (2007). This paper is one of the clearest demonstrations so far of the potential for pleiotropy: the same variants in TCF2 influence risk to both type 2 diabetes and prostate cancer.
Yeager, M. et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nature Genet. 39, 645–649 (2007).
Thomas, G. et al. Multiple loci identified in a genome-wide association study of prostate cancer. Nature Genet. 40, 310–315 (2008).
Gudmundsson, J. et al. Common sequence variants on 2p15 and Xp11.22 confer susceptibility to prostate cancer. Nature Genet. 40, 281–283 (2008).
Eeles, R. A. et al. Multiple newly identified loci associated with prostate cancer susceptibility. Nature Genet. 40, 316–321 (2008).
Easton, D. F. et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, 1087–1093 (2007).
Hunter, D. J. et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nature Genet. 39, 870–874 (2007).
Stacey, S. N. et al. Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor-positive breast cancer. Nature Genet. 39, 865–869 (2007).
Moffatt, M. F. et al. Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature 448, 470–473 (2007).
Helgadottir, A. et al. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316, 1491–1493 (2007).
McPherson, R. et al. A common allele on chromosome 9 associated with coronary heart disease. Science 316, 1488–1491 (2007).
Samani, N. J. et al. Genomewide association analysis of coronary artery disease. N. Engl. J. Med. 357, 443–453 (2007).
Gudbjartsson, D. F. et al. Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature 448, 353–357 (2007).
Willer, C. J. et al. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nature Genet. 40, 161–169 (2008).
Kathiresan, S. et al. Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Nature Genet. 40, 189–197 (2008).
Kooner, J. S. et al. Genome-wide scan identifies variation in MLXIPL associated with plasma triglycerides. Nature Genet. 40, 149–151 (2008).
Weedon, M. N. et al. A common variant of HMGA2 is associated with adult and childhood height in the general population. Nature Genet. 39, 1245–1250 (2007). This paper demonstrates the power of the GWA approach to identify genes influencing continuous biomedical phenotypes, in this case, height.
Sanna, S. et al. Common variants in the GDF5-UQCC region are associated with variation in human height. Nature Genet. 40, 198–203 (2008).
Weedon, M. N. et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nature Genet. (in the press).
Lettre, G. et al. Genome-wide association studies identify 10 novel loci for height and highlight new biological pathways in human growth. Nature Genet. (in the press).
Frayling, T. M. et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889–894 (2007).
Scuteri, A et al. Genome-wide association scans shows genetic variants in the FTO gene are associated with obesity-related traits. PLoS Genet. 3, e115 (2007).
Loos, R. J. F. et al. Association studies involving over 90,000 people demonstrate that common variants near to MC4R influence fat mass, weight and risk of obesity. Nature Genet. (in the press).
Altshuler, D. & Daly, M. Guilt beyond a reasonable doubt. Nature Genet. 39, 813–815 (2007).
Li, M., Boehnke, M. & Abecasis, G. R. Efficient study designs for test of genetic association using sibship data and unrelated cases and controls. Am. J. Hum. Genet. 78, 778–792 (2006).
Howson, J. M., Barratt, B.J., Todd, J. A. & Cordell, H. J. Comparison of population- and family-based methods for genetic association analysis in the presence of interacting loci. Genet. Epidemiol. 29, 51–67 (2005).
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904–909 (2006).
Voight, B. F. & Pritchard, J. K. Confounding from cryptic relatedness in case–control association studies. PLoS Genet. 1, e32 (2005).
Zheng, G., Freidlin, B. & Gastwirth, J. L. Robust genomic control for association studies. Am. J. Hum. Genet. 78, 350–356 (2006).
Paschou, P. et al. PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genet. 3, e160 (2007).
Tian, C. et al. Analysis and application of European genetic substructure using 300K SNP information. PLoS Genet. 4, e4 (2008).
Price, A. L. et al. Discerning the ancestry of European Americans in genetic association studies. PLoS Genet. 4, e236 (2008).
Fellay, J. et al. A whole-genome association study of major determinants for host control of HIV-1. Science 317, 944–947 (2007).
International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
Laird, N. M. & Lange, C. Family-based designs in the age of large-scale gene-association studies. Nature Rev. Genet. 7, 385–394 (2006).
Chen, W. M. & Abecasis, G. R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).
Clayton, D. G. et al. Population structure, differential bias and genomic control in a large-scale, case–control association study. Nature Genet. 37, 1243–1246 (2005). This paper presents a detailed description of the potential for bias and error to complicate the analysis of large-scale genetic association data.
Plagnol, V., Cooper, J. D., Todd, J. A. & Clayton D. G. A method to address differential bias in genotyping in large-scale association studies. PLoS Genet. 3, e74 (2007).
Cupples, L. A. et al. The Framingham Heart Study 100k SNP genome-wide association study resource: overview of 17 phenotype working group reports. BMC Med. Genet. 8, S1 (2007).
Ridker, P. M. et al. Rationale, design, and methodology of the Women's Genome Health Study: A genome-wide association study of more than 25,000 initially healthy American women. Clin. Chem. 54, 249–255 (2008).
Li, S. et al. The GLUT9 gene is associated with serum uric acid levels in Sardinia and Chianti cohorts. PLoS Genet. 3, e194 (2007).
Cordell, H. J. & Clayton, D. G. Genetic association studies. Lancet 366, 1121–1131 (2005).
Wong, M. Y., Day, N. E., Luan, J. A., Chan, K. P & Wareham, N. J. The detection of gene–environment interaction for continuous traits: should we deal with measurement error by bigger studies or better measurement? Int. J. Epidemiol. 32, 51–57 (2003).
Wong, M. Y., Day, N. E., Luan, J. A. & Wareham, N. J. Estimation of magnitude in gene–environment interactions in the presence of measurement error. Stat. Med. 23, 987–998 (2004).
Burke, W., Khoury, M. J., Stewart, A., Zimmern, R. L. & Bellagio Group. The path from genome-based research to population health: development of an international public health genomics network. Genet. Med. 8, 451–458 (2006).
Barrett, J. C. & Cardon, L. R. Evaluating coverage of genome-wide association studies. Nature Genet. 38, 659–662 (2006).
Pe'er, I. et al. Evaluating and improving power in whole-genome association studies using fixed marker sets. Nature Genet. 38, 663–667 (2006).
Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genet. 39, 906–913 (2007).
Servin, B. & Stephens, M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 3, e114 (2007).
McCarroll, S. A. & Altshuler, D. M. Copy-number variation and association studies of human disease. Nature Genet. 39, S37–S42 (2007). This paper gives an excellent summary of the challenges to be addressed if large-scale genetic association studies are to be extended to CNVs.
Scherer, S. W. et al. Challenges and standards in integrating surveys of structural variation. Nature Genet. 39, S7–S15 (2007).
Weiss, L. A. et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008).
Sham, P., Bader, J. S., Craig, I., O'Donovan, M. & Owen, M. DNA pooling: a tool for large-scale association studies. Nature Rev. Genet. 3, 862–871 (2002).
Cargill, M. et al. A large-scale genetic association study confirms IL12B and leads to the identification of IL23R as psoriasis-risk genes. Am. J. Hum. Genet. 80, 273–290 (2007).
Wang, W. Y., Barratt, B. J., Clayton, D. G. & Todd, J. A. Genome-wide association studies: theoretical and practical concerns. Nature Rev. Genet. 6, 109–118 (2005).
Hirschhorn, J. N. & Daly, M. J. Genome-wide association studies for common diseases and complex traits. Nature Rev. Genet. 6, 95–108 (2005).
Nicolae, D. L,. Wu, X., Miyake, K. & Cox, N. J. GEL: a novel genotype calling algorithm using empirical likelihood. Bioinformatics 22, 1942–1947 (2006).
Rabbee, N. & Speed, T. P. A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics 22, 7–12 (2006).
Xiao, Y., Segal, M. R., Yang, Y. H. & Yeh, R. F. A multi-array multi-SNP genotyping algorithm for Affymetrix SNP microarrays. Bioinformatics 23, 1459–1467 (2007).
Wittke-Thompson, J. K., Pluzhnikov, A. & Cox, N. J. Rational inferences about departures from Hardy–Weinberg equilibrium. Am. J. Hum. Genet. 76, 967–986 (2005).
Cox, D. G. & Kraft, P. Quantification of the power of Hardy–Weinberg equilibrium testing to detect genotyping error. Hum. Hered. 61, 10–14 (2006).
Smyth, D. J. et al. A genome-wide association study of nonsynonymous SNPs identifies a type 1 diabetes locus in the interferon-induced helicase (IFIH1) region. Nature Genet. 38, 617–619 (2006).
Lettre, G., Lange, C. & Hirschhorn, J. N. Genetic model testing and statistical power in population-based association studies of quantitative traits. Genet. Epidemiol. 31, 358–362 (2007).
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).
Hoggart, C. J. et al. Genome-wide significance for dense SNP and resequencing data. Genet. Epidemiol. 32, 179–185 (2008).
Wacholder, S., Chanock, S., Garcia-Closas, M., El Ghormli, L. & Rothman, N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst. 96, 434–442 (2004). This is an influential paper setting out the rationale for a Bayesian interpretation of genetic association findings, focusing on methods for establishing the confidence with which any given positive association can be regarded.
Wakefield, J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am. J. Hum. Genet. 81, 208–227 (2007).
De Bakker, P. I. et al. Efficiency and power in genetic association studies. Nature Genet. 37, 1217–1223 (2005).
Morris, A. P. A flexible Bayesian framework for modeling haplotype association with disease, allowing for dominance effects of the underlying causative variants. Am. J. Hum. Genet. 79, 679–694 (2006).
De Bakker, P. I. et al. Transferability of tag SNPs in genetic association studies in multiple populations. Nature Genet. 38, 1298–1303 (2006).
Service, S. et al. Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies. Nature Genet. 38, 556–560 (2006).
Zeggini, E. et al. An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets. Nature Genet. 37, 1320–1322 (2005).
Easton, D. F. et al. A systematic genetic assessment of 1,433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer-predisposition genes. Am. J. Hum. Genet. 81, 873–883 (2007).
Marchini, J., Donnelly, P. & Cardon, L. R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genet. 37, 413–417 (2005).
Hirschhorn, J.N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet. Med. 4, 45–61 (2002).
NCI-NHGRI Working Group on Replication in Association Studies. Replicating genotype–phenotype associations: what constitutes replication of a genotype–phenotype association, and how best can it be achieved? Nature 447, 655–660 (2007). This feature article is a thoughtful summary of the main issues relating to replication of genetic association studies.
Lohmueller, K. E., Pearce, C. L., Pike, M., Lander, E. S. & Hirschhorn, J. N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nature Genet. 33, 177–182 (2003).
Clarke, G. M., Carter, K. W., Palmer, L. J., Morris, A. P. & Cardon, L. R. Fine mapping versus replication in whole-genome association studies. Am. J. Hum. Genet. 81, 995–1007 (2007).
Skol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke M. Optimal designs for two-stage genome-wide association studies. Genet. Epidemiol. 31, 766–788 (2007).
Wang, H., Thomas, D. C., Pe'er, I. & Stram, D. O. Optimal two-stage genotyping designs for genome-wide association scans. Genet. Epidemiol. 30, 356–368 (2006).
Müller, H. H., Pahl, R. & Schäfer, H. Including sampling and phenotyping costs into the optimization of two stage designs for genome wide association studies. Genet. Epidemiol. 31, 844–852 (2007).
Zollner, S. & Pritchard, J. K. Overcoming the winner's curse: estimating penetrance parameters from case–control data. Am. J. Hum. Genet. 80, 605–615 (2007).
Yu, K et al. Flexible design for following up positive findings. Am. J. Hum. Genet. 81, 540–551 (2007).
Gorrochurn, P., Hodge, S. E., Heiman, G. A., Durner, M. & Greenberg, D. A. Non-replication of association studies: 'pseudo-failures' to replicate? Genet. Med. 9, 325–331 (2007).
Ioannidis J. P., Patsopoulos, N. A. & Evangelou, E. Heterogeneity in meta-analyses of genome-wide association investigations. PLoS ONE 2, e841 (2007).
Ioannidis J. P. Non-replication and inconsistency in the genome-wide association setting. Hum. Hered. 64, 203–213 (2007).
Moonesinghe, R., Khoury, M. J., Liu, T. & Ioannidis, J. P. Required sample size and nonreplicability thresholds for heterogeneous genetic associations. Proc. Natl Acad. Sci. USA 105, 617–622 (2008).
The GAIN Collaborative Research Group. New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nature Genet. 39, 1045–1051 (2007).
Egger, M., Schneider, M. & Davey Smith, G. Spurious precision? Meta-analysis of observational studies. BMJ 316, 140–144 (1998).
Helgason, A. et al. Refining the impact of TCF7L2 gene variants on type 2 diabetes and adaptive evolution. Nature Genet. 39, 218–225 (2007).
Locke, D. P., et al. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79, 275–290 (2006).
ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007). This is a detailed examination of the functional annotation of a subset of the human genome, which reveals the complexity of genomic organization.
Stranger, B. et al. Population genomics of human gene expression. Nature Genet. 39, 1217–1224 (2007).
Dixon, A. L. et al. A genome-wide association study of global gene expression. Nature Genet. 39, 1202–1207 (2007).
Goring, H. H. et al. Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nature Genet. 39, 1208–1216 (2007).
Ioannidis, J. P. & Kavvoura, F. K. Concordance of functional in vitro data and epidemiological associations in complex disease genetics. Genet. Med. 8, 583–593 (2006).
Lowe, C. E. et al. Large-scale genetic fine mapping and genotype–phenotype associations implicate polymorphism in the IL2RA region in type 1 diabetes. Nature Genet. 39, 1074–1082 (2007).
Ioannidis, J. P. et al. Assessment of cumulative evidence on genetic associations: interim guidelines. Int. J. Epidemiol. 37, 120–132 (2008).
Davey Smith, G. & Ebrahim, S. 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).
Zheng, S. L. et al. Cumulative association of five genetic variants with prostate cancer. N. Engl. J. Med. 358, 910–919 (2008).
Stratton, M. R. & Rahman, N. The emerging landscape of breast cancer susceptibility. Nature Genet. 40, 17–22 (2008).
Mailman, M. D. et al. The NCBI dbGaP database of genotypes and phenotypes. Nature Genet. 39, 1181–1186 (2007).
Zheng, S. L. et al. Association between two unlinked loci at 8q24 and prostate cancer risk among European Americans. J. Natl Cancer Inst. 99, 1499–1501 (2007).
Von Elm, E. & Egger, M. The scandal of poor epidemiological research. BMJ 329, 868–869 (2004).
Brazma, A. et al. Minimum information about a microarray experiment (MIAME) — toward standards for microarray data. Nature Genet. 29, 356–371 (2001).
Altman, D. & Moher, D. Developing guidelines for reporting healthcare research: scientific rationale and procedures. Med. Clin. (Barc). 125, 8–13 (2005).
Gludd, L. L. Bias in clinical intervention research. Am. J. Epidemiol. 163, 493–501 (2006).
Altman, D. G. et al. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann. Intern. Med. 134, 663–694 (2001).
Von Elm, E. et al. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet 370, 1453–1457 (2007).
Seminara, D. et al. The emergence of networks in human genome epidemiology: challenges and opportunities. Epidemiology 18, 1–8 (2007).
Ge, D. et al. WGAViewer: a software for genomic annotation of whole genome association studies. Genome Res. 3 Mar 2008 (doi:10.1101/gr.071571.107).
Janssens, A. C. J. W, Gwinn, M., Subramonia-Iyer, S. & Khoury, M. J. Does genetic testing really improve the prediction of future type 2 diabetes? PLOS Med. 3, e114 (2006).
Acknowledgements
Preparation of this article was supported by funding from the European Commission to the MolPAGE Consortium (LSHG-CT-2004-512066: MMcC) and by research grants from the National Institutes for Health (NHGRI and NHLBI; GRA). We thank our colleagues — particularly P. Donnelly, J. Marchini, J. Barrett, E. Zeggini, C. Lindgren, M. Boehnke, F. Collins, C. Spencer and D. Altshuler for discussions and the reviewers for their comments.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
After the competion of this work, Professor Cardon accepted the post of Head of Genetics at GlaxoSmithKline. Although this move occurred after the completion of the paper, it nonetheless might be perceived as a conflict with industry and thus we wish to declare it explicitly.
Related links
Related links
FURTHER INFORMATION
Catalog of published genome-wide association studies
Consolidated standards of reporting trials (CONSORT)
European Genotyping Archive (EGA)
Genetic Association Information Network (GAIN)
Human Genome Epidemiology Network (HuGeNet)
International HapMap Consortium
National Cancer Institute's cancer genetic markers of susceptibility (CGEMS) study
Policy for sharing of data obtained in NIH supported or conducted GWA studies
Strengthening the reporting of observational studies in epidemiology (STROBE)
Glossary
- Genome-wide association (GWA) studies
-
Studies in which a dense array of genetic markers, which captures a substantial proportion of common variation in genome sequence, is typed in a set of DNA samples that are informative for a trait of interest. The aim is to map susceptibility effects through the detection of associations between genotype frequency and trait status.
- Case–control design
-
An association study design in which the primary comparison is between a group of individuals (cases), ascertained for the phenotype of interest and that are presumed to have a high prevalence of susceptibility alleles for that trait, and a second group (controls), not ascertained for the phenotype and considered likely to have a lower prevalence of such alleles.
- Selection bias
-
Bias arising from the fact that the samples ascertained for the study (particularly controls) might not be representative of the wider population that they are purported to represent.
- Misclassification bias
-
Bias resulting from the failure to correctly assign individuals to the relevant group in a casecontrol study; for example, the presence of some individuals who meet the criteria for being cases in a population-based control sample.
- Population stratification
-
The presence in study samples of individuals with different ancestral and demographic histories: if cases and controls differ with respect to these features, markers that are informative for them might be confounded with disease status and lead to spurious associations.
- Cryptic relatedness
-
Evidence typically gained from analysis of GWA data that, despite allowance for known family relationships, individuals in the study sample have residual, non-trivial degrees of relatedness, which can violate the independence assumptions of standard statistical techniques.
- Family-based association methods
-
A suite of analytical approaches in which association testing is performed within families: such approaches offer protection from population substructure effects but at the price of reduced power.
- Pleiotropy
-
The phenomenon whereby a single allele can affect several distinct aspects of the phenotype of an organism, often traits not previously thought to be mechanistically related.
- Linkage disequilibrium
-
(LD). The nonrandom allocation of alleles at nearby variants to individual chromosomes as a result of recent mutation, genetic drift or selection, manifest as correlations between genotypes at closely linked markers.
- Copy number variant
-
(CNV). A class of DNA sequence variant (including deletions and duplications) in which the result is a departure from the expected diploid representation of DNA sequence.
- DNA pooling approaches
-
Association studies that are conducted using estimates of allele frequencies derived from pools of DNA compiled from multiple subjects rather than individual DNA samples.
- Informative missingness
-
If patterns of missing data are nonrandom with respect to both genotype and trait status, then analysis of the available genotypes can result in misleading associations where none truly exists.
- Signal intensity (cluster) plots
-
Plots of raw intensity data for individual variants that are generated by the genotyping platform and represent the extent to which the various genotypes can be discriminated: these provide a useful visual diagnostic for the genotyping data quality.
- Hardy–Weinberg equilibrium
-
(HWE). A theoretical description of the relationship between genotype and allele frequencies that is based on expectation in a stable population undergoing random mating in the absence of selection, new mutations and gene flow: in the context of genetic studies, departures from equilibrium can be used to highlight genotyping errors.
- Quantile-quantile plot
-
(Q-Q plot). In the context of GWA studies, a Q-Q plot is a diagnostic plot that compares the distribution of observed test statistics with the distribution expected under the null.
- Cochran–Armitage test
-
A genotype-based contingency-table test for association that is well suited to the detection of trends across ordinal categories (in this case, genotypes).
- Frequentist
-
A school of statistics that uses p values and combines them with hypothesis testing to make inferences.
- Bayes' factors
-
The Bayesian alternative to classical frequentist approaches to hypothesis testing, essentially equivalent to likelihood ratio tests: prior and posterior information are combined in a ratio that measures the strength of the evidence in favour of one model rather than the other.
- False-positive report probability
-
The probability that a reported association between a genetic variant and a trait of interest is not true.
- Haplotype-based methods
-
Association methods that rely on the relationship between the distribution of estimated haplotype frequencies and trait status, rather than each individual variant in turn.
- Imputation methods
-
A set of approaches for filling in missing genotype data using a sparse set of genotypes (for example, from a GWA scan) and a scaffold of linkage disequilibrium relationships (as provided by the HapMap).
- Mendelian randomization
-
An analytical approach that allows one to test for a causal relationship between two phenotypes that show observational associations, but are subject to confounding: Mendelian randomization makes use of the random segregation of susceptibility alleles at meiosis to explore causality in a model that is freed from most sources of confounding.
Rights and permissions
About this article
Cite this article
McCarthy, M., Abecasis, G., Cardon, L. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9, 356–369 (2008). https://doi.org/10.1038/nrg2344
Issue Date:
DOI: https://doi.org/10.1038/nrg2344