Genome-Wide Association

Which Do You Want First: the Good News, the Bad News, or the Good News?

  1. Kent D. Taylor12,
  2. Jill M. Norris3 and
  3. Jerome I. Rotter124
  1. 1Medical Genetics Institute, Cedars-Sinai Medical Center, Los Angeles, California
  2. 2Department of Pediatrics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California
  3. 3Department of Preventive Medicine and Biometrics, University of Colorado Health Sciences Center, Denver, Colorado
  4. 4Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, California
  1. Address correspondence and reprint requests to Jerome I. Rotter, Medical Genetics Institute, Cedars-Sinai Medical Center, 8700 Beverly Blvd., Suite 665, West Los Angeles, California 90048-1804. E-mail: jerome.rotter{at}cshs.org

It has long been appreciated that there is a genetic component to type 2 diabetes, but progress in gene finding has been slow because the genetic component is complex. Accumulating data on various diabetes-related phenotypes suggest that so-called “type 2 diabetes” is likely a collection of many diseases due to varying but often overlapping underlying mechanisms. As such, the search for the once hoped-for simple genetic basis of type 2 diabetes has proved elusive.

This year marks the application of the next gene-finding tool to the study of type 2 diabetes, the genome-wide association study (GWA). This type of study is the logical extension of the candidate gene association study, an approach in which a few (1–20 or so) single nucleotide polymorphisms (SNPs) within a single gene are tested for association with the phenotype of interest. By that approach, the candidate gene is chosen based on biochemistry or physiology related to that phenotype. The candidate gene approach has identified some genes but has not yielded the definitive picture of the genetic contribution to type 2 diabetes (Table 1; reviewed in [1]).

In contrast, the GWA approach tests every gene by testing the association of SNPs in every known gene (∼100,000 SNPs) or in both known genes and in regions outside of genes throughout the genome (∼300,000 to 1 million). The GWA is therefore not biased by a priori assumptions based on prior observations of the phenotype (e.g., kinetics of insulin signaling, glucose-mediated insulin secretion, etc). Therefore, the strength of the GWA is that it has the potential to identify genes of high genetic effect that were previously unsuspected as candidates. The latter might be because little was known about the genes previously or because investigators simply had not addressed the action of the products of these genes in prior studies of the biochemistry and physiology of type 2 diabetes.

In one sense, the GWA approach is driven by gene or marker location rather than by gene function and thus extends the genome-wide linkage approach. However, the advantages of the GWA approach are that it does not depend on the availability of families for study and that it can detect smaller effects than detectable by a genome-wide linkage approach. GWAs are now technically feasible because of several recent developments. First, there has been a dramatic reduction in genotyping costs, brought about by new technologies. Second, a large number of possible SNPs are now available for use in the catalog of over 12 million human SNPs in public databases (9). Third, there has been a reduction in the number of SNPs required to cover the entire human genome because of the block structure of the human genome and the ability to predict tag SNPs. Tag SNPs are SNPs that provide the most information for association studies (10,11), available from data provided by the International HapMap Project (12).

The good news is that yesterday there were twice as many confirmed type 2 diabetes genes to study compared with the number confirmed when the survey by Willer et al. (1) was published in January 2007 (Table 1). Moreover, articles in this issue both add to this list and extend GWA findings into other populations important for type 2 diabetes: Amish (39), Pima Indian (40), and Mexican American (41). As Table 1 shows, genes such as PPARG, KCNJ11, and TCF7L2 were also associated with type 2 diabetes, with high significance in several GWAs, thus reassuring us that GWAs do find genetic determinants of diabetes. Table 1 also shows that other genes with evidence for linkage and/or association to type 2 diabetes, such as CAPN10 and ENPP1, have not yet been found to be associated with type 2 diabetes using this tool. This is possible for several reasons: 1) Technological: the particular “chip” configuration used covered these genes poorly and so the genetic effect was missed. 2) Genetic: the genes identified using the previous tools were not detected using GWA because their quantitative contribution to type 2 diabetes is lower when ranked and compared with other genes. Such a ranking occurs in a GWA experiment study. Finally, 3) Epidemiological: the association was missed because it was not present to a great degree in a particular study sample. For example, CAPN10 and ENPP1 may yet be important within the context of a subset of metabolic disease or of a particular ethnic group.

The possibility that results are merely random is ever present in genetic analysis because subject recruitment is in fact a sampling from a population and because the path to gene finding requires statistical methods. In addition, the discipline of genetic analysis continues to mature, and we are finding out that we may have glossed over very complex problems in past work. For example, we have made naive assumptions about populations. Recent papers have demonstrated the difficulty of genetic analysis in the “Hispanic”or “Latino” population, due to subgroups with various amounts of contribution from ancestral Native-American, West African, and European populations (35). Further, the Wellcome Trust GWA identified a substructure within the European-Caucasian population and even within the U.K. (2). It is possible that ENPP1 and CAPN10, for example, showed linkage and/or association only in particular populations. There is as yet no definitive answer on what the ENPP1 and CAPN10 candidate gene studies mean to the genetic component of type 2 diabetes as a whole.

Candidate gene studies of type 2 diabetes and complications of related physiological abnormalities will, however, remain important in the years ahead for a variety of reasons: 1) A candidate gene study may extend the results of a GWA into other ethnic groups or into groups with detailed phenotypes such as the oral glucose tolerance test, as represented by articles in this issue (3942). 2) A candidate gene study may evaluate an association in greater depth and may include haplotype structure, population-attributable risks, and resequencing to identify specific variants. 3) A candidate gene study is more straightforward to conduct than a GWA because it is testing a specific hypothesis. 4) A candidate gene study may have greater power to detect a genetic effect and may be able to detect smaller effects because there are fewer statistical tests being performed. 5) Completion of a candidate gene study is practical for many more investigators in the research community. 6) A candidate gene study may focus on particular subgroups, e.g., minority populations, groups that do and do not respond to particular drug therapies, subgroups sharing particular clinical characteristics, or smaller numbers of patients with data from very detailed but very complicated physiological studies.

However, the bad news is that, as we have long suspected, the increase in risk for each gene remains modest, in the range of ∼1.2–1.4. Thus, the genetic analysis of type 2 diabetes by no means ends this year with the GWA, whether published earlier this year (28) or as reported by Florez et al. (42), Rampersaud et al. (39), Hayes et al. (41), and Hanson et al. (40).

There remains much to do. These articles show the types of approaches to be taken to disentangle type 2 diabetes genetics and to bring the changes to clinical practice promised by the genome era of medicine.

Genotyping with different SNP ensembles.

Each particular GWA is performed with a particular technology using materials manufactured by a particular company, e.g., Illumina or Affymetrix. These technologies are different, and the actual list of SNPs tested by each chip type is different. This difference is due to design decisions and to which SNPs work reliably with each method. For example, the 100K chips represented by the articles in this issue focused more on including SNPs that were located within annotated genes and that changed amino acids in the protein sequence. The design of this particular chip type therefore focused more on testing all of the genes known at the time of chip manufacture rather than attempting to place SNPs within regions that had no known genes. In contrast, the design of 300K and 500K configurations have attempted to include SNPs that would tag haplotype blocks across the entire genome in the Caucasian population. Designs of 700K to 1M have attempted to include SNPs important for other populations. This latter, more genome-wide approach might well identify genetic contributions from regions that do not appear at this time to contain a gene that codes for any protein. This may be due either to our lack of knowledge of all proteins or our lack of knowledge of how regions seemingly far from a known gene may exert control over other protein-coding genes. A recent example of this is the recently identified region for myocardial infarction that is not, as of today, located within a gene annotated in the dbGene software (NCBI) (36,37). Thus, the papers in this issue focus on candidate genes (in this context, all genes) rather than on truly complete genome coverage. The usefulness of different chip designs together has been demonstrated in studies of Crohn’s disease, another complex genetic trait. An association between Crohn’s disease and TNFSF15 was first detected by a 100K candidate gene approach but only by one of two 300K whole-genome approaches (1317). The importance of autophagy and the ATG16L1 gene in Crohn’s disease was demonstrated first using a collection of nonsynonymous SNPs and then observed by using a whole-genome collection (18,19), while using a whole genome collection emphasized the importance of the IL23R pathway (14).

Therefore, while the number of SNPs in the studies reported in this issue is perhaps lower than that in the other studies reported this year, the studies in this issue make a substantial contribution to identifying genetic associations with type 2 diabetes phenotypes. The four studies shared results earlier rather than later so that replication and confirmation studies could be rapidly set up and finished (3942).

Extension into other ethnic groups.

Observations of both the same and different genes across Caucasian Americans, Mexican Americans, the Amish, and Pima Indians are reported here. These differences are most likely related to the different evolutionary histories of these populations and probably contribute to the complexity of the genetics of type 2 diabetes because they affect different pathways or different pathophysiology, but all ultimately result in the clinical phenotype of type 2 diabetes. Close attention to the similarities and differences between populations definitely promises to help unravel the genetic complexity.

Analysis of subphenotypes.

The definition of type 2 diabetes is arbitrary and changes every few years. It is therefore not surprising that different insights arise from examining the association between genetic variants and measurements of physiological parameters (e.g., by the oral glucose tolerance test). In the present studies, the association between SNPs and diabetes-related phenotypes are reported, and this approach will likely be powerful in gene finding. Furthermore, these studies demonstrate the great value of beginning with SNPs associated with more than one, but related, trait (for example, type 2 diabetes and fasting plasma glucose). Similarities and differences in the association of SNPs with many diabetes-related subclinical phenotypes promise to point to pathways important both in the development of subsets of patients with type 2 diabetes and in the temporal development of type 2 diabetes.

Of interest to the study of type 2 diabetes is that many, if not most, of the genes identified by earlier GWAs (28) appear to be involved in β-cell function rather than insulin secretion/resistance, i.e., only one of the two primary phenotypes directly predisposing to type 2 diabetes. Why might this be so? One explanation is that the variance in insulin sensitivity is due to a greater environmental component and thus that genetic risk for type 2 diabetes is indeed more related to β-cell function, development, and/ or survival. A second possible explanation is that defects impairing β-cell function are less common than defects leading to insulin resistance. This would lead to a higher relative risk for β-cell defects and greater ease of identifying genes contributing to that risk. Continuing with this line of reasoning, the genes leading to insulin resistance would be many, with each contributing a slight increase in risk that would be difficult or impossible to detect reliably. These considerations may explain why identifying genes for Crohn’s disease by GWA appears to require a smaller sample and has had greater success than GWA for type 2 diabetes thus far (2). As shown in families, genes contribute more to the relative risk for Crohn’s disease than to type 2 diabetes. A further implication of this line of reasoning is that, as GWAs continue to be applied and focus on traits such as insulin sensitivity/resistance, genes for this important component of type 2 diabetes may also emerge. A third possibility is that we may find, as we continue to dissect type 2 diabetes genetics, that genes related to insulin sensitivity are also genes related to β-cell function and that, therefore, the major genetic component of type 2 diabetes is indeed the genetic alteration of β-cell function, development, and/or survival acting through many different pathways. Evidence for such genetic pleiotropism has been suggested for the susceptibility contributed by CAPN10 (36).

The new vista open to type 2 diabetes genetics goes beyond the individual gene to the entire pathway related to each. This is also good news; there remains much more to do, but we have a better place to start from than we did last year.

TABLE 1

Genetic factors for type 2 diabetes

Footnotes

  • The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

    • Accepted September 29, 2007.
    • Received September 18, 2007.

REFERENCES

« Previous | Next Article »Table of Contents

Navigate This Article