Genetic Studies of the Etiology of Type 2 Diabetes in Pima Indians

Hunting for Pieces to a Complicated Puzzle

  1. Leslie J. Baier, and
  2. Robert L. Hanson
  1. From the Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, Arizona
  1. Address correspondence and reprint requests to Leslie J. Baier, PhD, Clinical Diabetes and Nutrition Section, NIDDK, NIH, 4212 N. 16th St., Phoenix, AZ 85016. E-mail: lbaier{at}

Obesity has become a major public health concern in the U.S., reaching epidemic proportions among adults and children. Recent national surveys show that American adults have experienced a 50% increase in the prevalence of overweight and obesity, while children and adolescents have experienced a 100% increase since the 1970s (rev. in 1). Coincident with this increase in obesity, the prevalence of type 2 diabetes has also reached epidemic proportions. The prevalence of diagnosed diabetes among adults in the U.S. has increased by 40% from 1990 to 1999, and is projected to increase by 165% between the years 2020 and 2050 (2,3). Recent studies have estimated the lifetime risk of diagnosed diabetes to be approximately one in three for males and two in five for females born in the U.S. in the year 2000 (4). Moreover, an alarming increase in the incidence of type 2 diabetes has now been reported in very young children (5).

The escalating rates of both type 2 diabetes and obesity are likely due to changes in the environment, coupled with changes in human behavior and lifestyle. However, in most developed countries, food is now plentiful and lifestyle is generally sedentary, but not all people become obese, and furthermore, most obese people do not develop type 2 diabetes. It is likely that genetic factors underlie a significant portion of the susceptibility to both obesity and type 2 diabetes, and that expression of this susceptibility is dependent on environmental variables.

Type 2 diabetes and obesity have a genetic basis.

The large variation in prevalence rates of type 2 diabetes among ethnic groups living in similar environments, the increased risk to siblings of affected individuals, and the high concordance rate for the disease in monozygotic twins compared with dizygotic twins indicate that this disease has a significant genetic component (rev. in 6). Studies in twins, and particularly in twins reared apart, have also produced high heritability estimates for BMI, ranging between 0.6 and 0.9 (7). However, the extent to which this familial aggregation reflects a small number of genes with major effects, a large number of genes each with small effect (polygenic inheritance), and environmental factors shared among family members remains unknown. For example, certain aspects of the intrauterine environment, such as the blood supply from the placenta, are shared more strongly between monozygotic compared with dizygotic twins, and intrauterine effects are known to have a significant impact on the future development of diabetes (8).

Complex segregation analysis is a technique whereby the distribution and familial occurrence of a trait are analyzed to infer the extent to which they are consistent with the action of at least one major gene, multifactoral inheritance, or a mixture of the two. Segregation analyses of type 2 diabetes in a number of populations have been inconsistent. In some populations there is strong evidence for a single major gene influencing susceptibility (9), whereas in others there is evidence only for multifactoral inheritance (10,11). However, segregation analysis requires a number of distributional assumptions, and the accuracy of the results depends on the validity of these assumptions. For example, the absence of a major-gene effect may simply reflect the failure of the model to adequately capture the underlying genetic architecture. It is also possible that environmental factors have aggregated in such a way as to lead to a false inference of a major-gene effect.

Segregation analyses of BMI have been somewhat more consistent, as more studies have suggested the effect of at least one major gene in addition to multifactoral inheritance (12). However, the relative contribution of the multifactoral and major genetic effects appears to vary by population and, perhaps, age (13). In addition, some studies have suggested that more than one locus with a major effect may influence BMI, and the transmission patterns for these loci may not be strictly Mendelian (14). These findings suggest complexity beyond a single major gene model; however, segregation analyses alone cannot unequivocally determine the number of major loci contributing to a disease.

Intermediate phenotypes of type 2 diabetes may be less genetically complex.

If the genetic basis of type 2 diabetes is highly complex, perhaps the genetic basis for a single risk factor, or an intermediate phenotype, of type 2 diabetes is less “complex.” Independent of obesity, type 2 diabetes is predicted by a combination of impaired insulin secretion and increased insulin resistance (15). Quantitative measures of insulin sensitivity and insulin secretion can be obtained in controlled physiologic conditions (e.g., using a hyperinsulinemic-euglycemic clamp and an intravenous glucose tolerance test). It is reasonable to hypothesize that these traits may be influenced by fewer physiologic pathways and, thus, be determined by fewer genetic loci than the development of type 2 diabetes itself.

Twin studies have provided the strongest evidence that glucose tolerance and indexes of insulin sensitivity and secretion in nondiabetic subjects have significant genetic components (16). Although heritablity estimates vary among different studies, in general, the heritability for fasting glucose is 40–50%, for 1- to 2-h postglucose load is 40–88%, for fasting insulin is 14–53%, and for 30-min insulin is 47–60% (17). Segregation analyses have suggested that the heritable portion of the variance in serum insulin concentrations may be partially explained by the action of at least one major gene with residual polygenic inheritance (18,19).

The genetics of type 2 diabetes in the Pima Indians.

Another approach to reduce the “complexity” of type 2 diabetes has been to study this disease within populations that have limited genetic and environmental variability. The Pima Indians of Arizona have the highest reported prevalence of diabetes of any population in the world (20). This population has minimal European admixture (21), and their diabetes appears to be exclusively type 2 diabetes, with no evidence of the autoimmunity characteristic of type 1 diabetes, even in very young subjects with an early onset of the disease (22). The absence of type 1 diabetes and the minimal admixture in this population may indicate limited genetic and environmental variability in the etiology of type 2 diabetes in the Pima Indians, making this population more amenable to the identification of susceptibility loci. Diabetes in Pima Indians is also familial, and the degree of familiality is greater at younger ages of onset compared with older ages of onset (23).

The Pima Indians of the Gila River Indian Community have participated in longitudinal studies of the etiology of diabetes since 1965 (20). Many of the research findings originally described in the Pima population appear to be universal. The current diagnostic criteria for type 2 diabetes adopted by the World Health Organization were initially established in this Native American population (24). Pima Indians with type 2 diabetes are metabolically characterized by obesity, insulin resistance, insulin secretory dysfunction, and increased rates of endogenous glucose production, which are the clinical characteristics that define this disease across most populations (25). Therefore, it is likely that the major metabolic pathways that determine type 2 diabetes in the Pima Indians will be common to both Native American and non–Native American populations. However, the specific genetic polymorphisms that alter the function of a susceptibility gene may vary among populations.

Segregation analyses that included >2,600 Pima Indians suggested that the inheritance of this disease was consistent with the hypothesis that at least one major gene influences the risk of type 2 diabetes by affecting age of onset (23). Evidence for a major genetic effect on lifetime susceptibility to diabetes was much weaker, and these findings are consistent with the epidemiological findings that young-onset diabetes is strongly familial. These results suggest that discrete genetic factors may have important influences on age of onset of diabetes in Pima Indians, beyond the influence of multifactoral inheritance alone. However, these results are subject to all of the potential limitations of segregation analysis outlined above. In particular, the number of genes with major effects that are potentially detectable by linkage analysis remains unknown.

The degree of heritability of type 2 diabetes and its intermediate phenotypes cannot be assessed by studies in twins in Pima Indians, as there are not enough pairs in this population to make such studies feasible. However, an estimate of heritability (h2) can be calculated from the analysis of familial resemblance. Using maximum likelihood methods, such estimates have been obtained for several intermediate phenotypes in nondiabetic Pima Indians (26). For example, 38–49% of the variance in insulin action, independent of the effect of obesity, is familial in Pima Indians. The acute insulin response (AIR) is highly familial (h2 = 0.80 at 10 min), even after controlling for percent fat and insulin action (h2 = 0.70). Insulin action at physiologic plasma insulin concentrations is familial (h2 = 0.61) but less so after controlling for percent fat and waist-to-thigh ratio (WTR) (h2 = 0.38). At maximally stimulating insulin concentrations, insulin action is familial (h2 = 0.45) and is less influenced by controlling for percent fat and WTR (h2 = 0.49). These levels of familiality for quantitative measurements in Pima Indians are slightly higher than estimates determined in other populations (27,28). By comparison, based on sibling-sibling resemblance, h2 = 0.65 for maximum BMI after age 15 years (observed in the longitudinal study adjusted for age, sex, and birth cohort), and h2 = 0.62 for liability to type 2 diabetes (estimated from tetrachoric correlation coefficient derived from the sibling-sibling concordance adjusted for age, sex, and birth cohort).

Approaches to identify susceptibility genes in Pima Indians.

Two basic approaches have been used to identify susceptibility genes for complex diseases: analysis of candidate genes and genomic approaches. Candidate genes can be analyzed for sequence variation that is associated or linked with the disease. Selection of a biologically defined candidate gene for analysis requires some a priori knowledge of the pathophysiology of a disease. Alternatively, susceptibility genes can be identified by genome-wide linkage (or association) scans, which are followed by positional cloning. Positional cloning requires no knowledge and/or judgment of the “biologically plausible” genetic candidates. Instead, a disease gene is discovered because it resides on a chromosomal region that segregates with a phenotype. Both of these approaches are being pursued in the Pima Indian population, in which a genome-wide linkage study has been conducted and several candidate genes have been analyzed.

Genome-wide linkage results.

In 1998, we completed an autosomal genome-wide linkage study to search for loci influencing type 2 diabetes and BMI in Pima Indians (29). Among 264 nuclear families containing 966 siblings (1,766 sibling pairs), 516 autosomal markers with a median distance between adjacent markers of 6.4 cM were genotyped. Variance component methods were used to test for linkage with an age-adjusted diabetes score and with BMI. In multipoint analysis, the strongest evidence for linkage with either phenotype was on chromosome 11q23-q24, where linkage to BMI had a logarithm of odds (LOD) score of 3.6. This same region had some evidence for linkage to age-adjusted diabetes (LOD = 1.7), and a bivariate analysis gave very strong evidence for linkage to both traits (LOD = 5.0). The bivariate method allows one to conduct statistical tests of whether linkage signals for two traits localizing to the same chromosomal region reflect the coincidence of two separate loci for each trait or whether a single locus pleiotropically influences both traits. In the Pima Indians, these analyses strongly supported the hypothesis of pleiotropy and suggested that an allele at this putative locus predisposes to obesity and young-onset diabetes.

Additional analysis of sib-pairs concordant for young-onset diabetes versus discordant pairs (with young-onset defined as affected before the age of 45 years), provided evidence for linkage to type 2 diabetes on chromosome 1 (LOD = 2.5) in Pima Indians. Conventionally, a LOD >3.6 is considered to represent genome-wide significance, while a LOD of 2.1–3.6 is considered “suggestive” of linkage (30). Lower LOD scores (1.0–2.0), though not conclusive, are still consistent with linkage and may provide evidence for replication if other studies have demonstrated linkage in the same region.

We additionally sought loci for pre-diabetic, or intermediate, traits in our genome-wide linkage study. Evidence for linkage was observed at several chromosomal regions, including 3q21-q24 linked to fasting plasma insulin concentration (LOD = 1.5) and in vivo insulin action (LOD = 1.0), 4p15-q12 linked to fasting plasma insulin concentration (LOD = 1.0), 9q21 linked to 2-h insulin concentration during oral glucose tolerance testing (LOD = 2.2), and 22q12-q13 linked to fasting plasma glucose concentration (LOD = 2.4) (31). However, none of these regions showed linkage to type 2 diabetes itself, and none of the linkages met genome-wide significance. Genomic regions that showed linkage (LOD >2.0) for any phenotype using a variance components method are given in online appendix 1 (available at

Positional cloning within linked regions.

The process of identifying variants responsible for a linkage signal is extremely costly, both in terms of time and money. Therefore, it is not feasible to pursue every region identified by linkage studies. However, selection of “the best” region for positional cloning studies is not always straightforward: some arguments favor a region that provides the highest LOD score in a genome scan, whereas others favor a region that has been replicated in other studies. Alternatively, positional cloning in a region linked to an intermediate phenotype may be preferable because the underlying variant would be predicted to have a larger effect on a less complex trait.

In studies of Pima Indians, chromosome 11q23-q24 was selected for positional cloning of the putative obesity locus because this was the only region of linkage that met genome-wide significance (LOD = 3.6 for BMI). This region also had nominal linkage to type 2 diabetes (LOD = 1.7). Chromosome 1q21-q24 was also selected for positional cloning because it provided the highest LOD score to type 2 diabetes (LOD = 2.5). Although a LOD score of this magnitude falls below the accepted criteria for genome-wide significance, it still represented the most probable position to harbor a diabetes susceptibility locus in the Pima population. In retrospect, the decision to pursue these two regions for positional cloning appears to have been judicious. Linkage to BMI at the identical position on chromosome 11 (D11S4464) has now been replicated in Caucasians from the Framingham Heart Study (32) and obese males from Utah (33), and an adjacent region (11q22) has been linked to BMI in Nigerians (34). Linkage to type 2 diabetes, again at D11S4464, has also been replicated in Mexican Americans (35). Perhaps more importantly, linkage to type 2 diabetes on 1q21-q24 has been replicated in studies of Utah Caucasians, Old Order Amish, French, U.K., Framingham, and Chinese populations, making 1q21-q24 one of the most consistently identified regions emerging from genome-wide linkage studies of type 2 diabetes (rev. in 36).

At present, we are not attempting to identify any of the variants responsible for linkage signals with an intermediate trait. None of these regions met the criteria for genome-wide significance. Our linkage study may have lacked the power to convincingly identify loci influencing these traits, possibly due to the smaller number of nondiabetic sibpairs that had been metabolically phenotyped. Detailed metabolic measurements were available on 388 nondiabetic sibpairs, of whom 186 sibpairs were normal glucose tolerant for analysis of the AIR. However, a larger number of nondiabetic sibling pairs were available for analysis of surrogate indexes of insulin sensitivity and secretion derived from the oral glucose tolerance test (these indexes are simpler to measure but potentially less accurate than more sophisticated measures). Linkage analysis of these traits revealed modest evidence for linkage with 2-h corrected insulin response, a measure of insulin secretion (LOD = 1.6), near the region of chromosome 1q linked to young-onset diabetes (37). These findings suggest that the chromosome 1q locus may influence diabetes risk through an effect on insulin secretion, but that power to detect linkage with sophisticated measures in the present number of families may have been insufficient. It should be noted, however, that gathering detailed metabolic data on this number of subjects has taken >10 years, and expansion of the number of metabolically phenotyped subjects to adequately power a linkage study will take many more years. In addition, if a region is linked to an intermediate phenotype, but is not linked to type 2 diabetes, its overall significance in type 2 diabetes susceptibility comes into question. Therefore, although the decision not to pursue linkage to intermediate phenotypes remains debatable, we have opted to prioritize positional cloning in regions that presumably target disease-causing loci.

Two approaches are being used to identify the genetic variant or variants that gave rise to the linkage signals on chromosome 1q21-q24 and 11q23-q24 in Pima Indians. Each of these regions encompasses ∼30 million base pairs of DNA. Positional candidate genes that map to these regions are being screened for variation, and variants are being genotyped for association analyses in the same subjects who were part of the genome-wide linkage scan (n = 1,338). A list of genes that have been screened as positional candidates is given (online appendix 2). To date, no variant in a positional candidate gene has been identified that, by itself, accounts for a linkage signal. However, no gene has been excluded because it always remains possible that the causative variant within the gene was not detected (e.g., it is positioned in an intronic or distal regulatory region that wasn’t sequenced, or it is rare and wasn’t observed in the subjects who were screened for variant detection). At the same time, linkage disequilibrium (LD) mapping is being used in an effort to narrow each region of interest. For LD mapping, single nucleotide polymorphisms (SNPs) are being identified at fixed intervals (e.g., every 25 kb) across the two regions of linkage, and each SNP is genotyped in the same subjects that were part of the genome-wide linkage scan (n = 1,338). Using this technique, it is anticipated that clusters of SNPs will be identified that are associated with either BMI or type 2 diabetes, presumably because they are in close proximity and thus in high LD with the disease-causing variant. For LD mapping on chromosome 11, >750 evenly spaced SNPs have been genotyped, to date, across the 30-Mb region of linkage, and the largest remaining “gap” is 150 kb. For LD mapping on chromosome 1, >350 SNPs have been genotyped to date, but many large gaps remain. Completion of these two LD maps remains in progress.

Candidate genes, not in regions of linkage, in Pima Indians.

The power of linkage studies, with typical sample sizes, is low for detection of loci with modest effects on a trait, particularly if the genetic variation is not very polymorphic (i.e., the susceptibility variant is either extremely common or extremely rare). As the effects of such “minor” loci may be biologically important, we are also investigating candidate genes that map to genomic regions that did not show significant linkage in our genome-wide scan. Candidate genes for analysis are selected based on a known physiologic role in glucose and/or lipid metabolism, as well as genes that are associated with type 2 diabetes or BMI in other populations or animal models.

We have investigated >50 physiologic candidate genes that do not map to either chromosome 1q21-q24 or 11q22-q24. Among these genes only two, PPP1R3 (38) and IRS-1 (39), contain variants that were statistically significantly associated with type 2 diabetes when analyzed in at least 900 full-heritage Pima Indian subjects. The PPP1R3 locus maps to a region on chromosome 7q31 demonstrating some evidence for linkage to type 2 diabetes in this population (LOD = 1.8), while IRS-1 maps to chromosome 2q36, where there is no evidence for linkage. The difference in prevalence of type 2 diabetes between two genotypic extremes for an “ATTTA” polymorphism (ARE1 and ARE2) in the 3′ untranslated region of PPP1R3 was 13% (type 2 diabetes prevalence = 56% for ARE1/ARE1 vs. 69% for ARE2/ARE2; odds ratio = 1.33; 95% CI 1.10–1.69). The difference in type 2 diabetes prevalence between genotypic extremes for a silent Ala804 (G to A) variant in IRS-1 was 10% (type 2 diabetes prevalence = 69% for G/G vs. 59% for A/A; odds ratio = 1.29; 95% CI 1.02–1.63). Genetic variants in both PPP1R3 and IRS-1 were also associated with several measures of insulin resistance in nondiabetic Pima Indians, which provides a potential mechanistic role for these genes in increasing susceptibility to type 2 diabetes. Nondiabetic subjects homozygous for the ARE2 variant in PPP1R3 had a 7% higher mean fasting insulin level and a 25% lower mean glucose uptake rate in response to a euglycemic clamp as compared with subjects homozygous for the ARE1 variant. Similarly, nondiabetic subjects homozygous for the G804 variant in IRS-1 had a 7% higher mean fasting insulin level and a 12% lower mean glucose uptake rate in response to a hyperinsulinemeic clamp as compared with subjects homozygous for the A804 variant. However, the overall contribution of these variants to the occurrence to type 2 diabetes in Pima Indians appears to be minor, as each of them accounts for ∼2% of the total variance in liability (as estimated from the polychoric correlation coefficient).

Variations in four additional candidate genes that do not map to chromosome 1q21-q24 or 11q22-q24 were associated with metabolic predictors of type 2 diabetes, but were not significantly associated with type 2 diabetes itself, when analyzed in full-heritage Pima Indians. Three of these genes, FABP2 (40), CAPN10 (41), and PPARγ2 (42), were associated with measurements of insulin action. For example, the difference in insulin-stimulated glucose disposal rates in response to a hyperinsulinemic-euglycemic clamp between subjects grouped by genotype for a single SNP within each of these genes was 12% for FABP2 (Ala54Thr), 12% for PPARγ2 (Pro12Ala), and 6% for CAPN10 (SNP43), whereas differences in mean fasting insulin levels was 26% for PPARγ2, 14% for FABP2, and 10% for CAPN10. Only one gene, PGC-1 (Gly482Ser), was found to be significantly associated with measures of acute insulin secretion in Pima Indians, where the AIR differed by 28% between subjects grouped by genotype (43).

It is recognized that association between an allele and a phenotype may occur by a number of mechanisms: the allele may be the functional variant that affects the phenotype per se, it may be in LD with such a variant, or it may be associated with some unmeasured confounding variable that defines different subpopulations (population stratification). Functional studies have been done for specific associated variants in four of the six candidate genes described above. The ARE1/ARE2 variant in the 3′ untranslated region of PPP1R3 has been shown to affect mRNA stability (38), the Ala54Thr substitution in the coding region of FABP2 has been shown to alter lipid binding and transport characteristics (44), the Pro12Ala substitution in PPARγ2 has been shown reduce protein activity (45), and intronic SNP43 in CAPN10 has been shown to alter transcription levels (46). In contrast, the G804A polymorphism in IRS-1 predicts a silent substitution and has not been functionally studied because it is assumed to be in high LD with a distal variant in an unknown regulatory region. Functional studies of the Gly482Ser substitution in PGC-1 have not yet been reported.

However, we are now recognizing that identifying a single functional variant within a gene is not always equivalent to identifying the causative variant/variants. A difference that is measured in vitro may be insufficient to cause, or even be unrelated to, the in vivo physiologic perturbations that lead to type 2 diabetes. Studies on CAPN10 have clearly shown that variation in SNP43 alone is not sufficient to explain the difference in type 2 diabetes susceptibility observed at this locus (46). Similarly, recently identified variants in the promoters of both PPARγ2 and FABP2 have been shown to alter in vitro transcription levels (42,47). However, these functional promoter variants are in high LD with the Pro12Ala and Ala54Thr variants, respectively, making it ambiguous as to whether gene expression levels or missense coding substitutions, or a combination of both, contributes to the pathophysiology underlying the type 2 diabetes–related associations observed at these loci. It should also be noted that previous studies on IRS-1 in numerous populations have focused on a Gly972Arg variant that impairs insulin-stimulated signaling in transfected cells (48), and a meta-analysis of 27 studies estimated a risk ratio for this variant of 1.25 (95% CI 1.05–1.48) (49). In Pima Indians, this variant was monomorphic, but the relative risk for other variants within this gene was of comparable magnitude (39). Once again, there may be multiple functional variations within or near the IRS-1 locus contributing to type 2 diabetes–related phenotypes, or these variants may be in LD with a major, but as yet unidentified, functional variant.


A few functional variants in biologic candidate genes have been identified, but these are only minor pieces in the multigenic puzzle of type 2 diabetes. It is our expectation that variants with larger effects ultimately will be identified in genes that map to a region of linkage (i.e., chromosome 1q21-q24 and 11q23-q24). Whether all the genes that influence type 2 diabetes and obesity in Pima Indians also influence these diseases in other populations remains uncertain. However, the replication of the linkage results on chromosomes 1 and 11 across multiple ethnic groups suggests that important susceptibility genes may be the same, and the identification of these genes could ultimately lead to the development of therapeutic and preventative interventions for populations worldwide. Although disease-locus identification within chromosomal regions identified by genome-wide linkage scans requires an enormous commitment of time and resources, this strategy has been successful for other complex diseases (ADAM33 in asthma, NOD2 in Crohn’s disease, BRCA1/2 in breast cancer [5054]), and we believe this remains the best strategy to pursue. It is our hope that the recent wealth of information provided in Human Genome databases, SNP databases, and haplotype maps, combined with technological advances in large-scale genotyping methodologies, will accelerate efforts to identify the larger, missing pieces to this complicated puzzle.


  • Additional information for this article can be found in an online appendix at

    • Accepted January 29, 2004.
    • Received October 20, 2003.


| Table of Contents