Diabetes
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online September 10, 2007
Diabetes 56:3045-3052, 2007
DOI: 10.2337/db07-0462
© 2007 by the American Diabetes Association
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Online-Only Appendix
Right arrow All Versions of this Article:
db07-0462v1
56/12/3045    most recent
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Request Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hanson, R. L.
Right arrow Articles by Knowler, W. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hanson, R. L.
Right arrow Articles by Knowler, W. C.
Social Bookmarking
 Add to CiteULike   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

A Search for Variants Associated With Young-Onset Type 2 Diabetes in American Indians in a 100K Genotyping Array

Robert L. Hanson1, Clifton Bogardus1, David Duggan2, Sayuko Kobes1, Michele Knowlton2, Aniello M. Infante1, Leslie Marovich2, Deb Benitez2, Leslie J. Baier1, and William C. Knowler1

1 Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, Arizona
2 Translational Genomics Research Institute, Phoenix, Arizona

Address correspondence and reprint requests to Robert L. Hanson, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, 550 E. Indian School Rd., Phoenix, AZ 85014. E-mail: rhanson{at}phx.niddk.nih.gov

Abbreviations: HRR, hazard rate ratio; PAR, population-attributable risk; SNP, single nucleotide polymorphism


    ABSTRACT
 TOP
 ABSTRACT
 RESEARCH DESIGN AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
OBJECTIVE— To identify genetic variants in linkage disequilibrium with those conferring diabetes susceptibility, a genome-wide association study for young-onset diabetes was conducted in an American-Indian population.

RESEARCH DESIGN AND METHODS— Data come from 300 case subjects with type 2 diabetes with age of onset <25 years and 334 nondiabetic control subjects aged ≥45 years. To provide for tests of within-family association, 121 nondiabetic siblings of case subjects were included along with 140 diabetic siblings of control subjects (172 sibships). Individuals were genotyped on the Affymetrix 100K array, resulting in 80,044 usable single nucleotide polymorphisms (SNPs). SNPs were analyzed for within-family association and for general association in case and control subjects, and these tests were combined by Fisher's method, with priority given to the within-family test.

RESULTS— There were more SNPs with low P values than expected theoretically under the global null hypothesis of no association, and 128 SNPs had evidence for association at P < 0.001. The association of these SNPs with diabetes was further investigated in 1,207 diabetic and 1,627 nondiabetic individuals from the population study who were not included in the genome-wide study. SNPs from 10 genomic regions showed evidence for replication at P < 0.05. These included SNPs on chromosome 3 near ZNF659, chromosome 11 near FANCF, chromosome 11 near ZBTB15, and chromosome 12 near SENP1.

CONCLUSIONS— These studies suggest several regions where marker alleles are potentially in linkage disequilibrium with variants that confer susceptibility to young-onset type 2 diabetes in American Indians.

Type 2 diabetes is substantially influenced by genetic factors, as indicated by studies of familial aggregation (13) and twins (46); however, the identity of most of the specific variants that influence diabetes susceptibility remains unknown. Consistent, albeit modest, associations have been observed with alleles at PPARG and KCNJ11 (78). Recently, Grant et al. (9) identified an association of type 2 diabetes with alleles in TCF7L2. This association, which is of greater magnitude than those for the other polymorphisms, has been widely replicated (1013). These variants explain only a small fraction of the genetic contribution to type 2 diabetes, so it is likely that additional variants remain unidentified. A number of genome-wide linkage studies have been conducted (1415), and, while these have revealed several putative susceptibility loci, the variants responsible have not been definitively identified.

Recent advances in technology have produced methods for large-scale genotyping of dense panels of single nucleotide polymorphisms (SNPs). Thus, genome-wide association studies are feasible, and these provide another, potentially powerful, approach for detection of novel variants that influence susceptibility to diabetes. The present study represents a genome-wide association study of type 2 diabetes in the Pima Indians, a population with a high prevalence of obesity and diabetes in whom, when diabetes occurs, it is overwhelmingly (if not exclusively) type 2, even in childhood (16). Analyses of the familial pattern of diabetes in this population show that young-onset diabetes is particularly familial and that genetic determinants are likely to influence age at onset of diabetes (3,1718); therefore, the present study was designed to detect variants associated with young-onset diabetes.


    RESEARCH DESIGN AND METHODS
 TOP
 ABSTRACT
 RESEARCH DESIGN AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The present data come from participants in a longitudinal study conducted in the Gila River Indian Community in central Arizona (19). In this study, community residents aged ≥5 years are invited to a research examination every 2 years. These examinations include a 75-g oral glucose tolerance test. Diabetes is diagnosed if the fasting plasma glucose concentration is ≥7.0 mmol/l, the 2-h plasma glucose concentration is ≥11.1 mmol/l (20), or the diagnosis is made during routine clinical care (19). DNA has been extracted from blood leukocytes. To detect variants associated with young-onset diabetes, individuals were selected from the extremes of the age-of-onset distribution. Thus, 300 case subjects were selected who had developed type 2 diabetes before the age of 25 years; for comparison, 334 control subjects were selected who were nondiabetic and aged ≥45 years when last examined. All individuals were full-heritage American Indian, and all individuals fulfilling criteria were selected regardless of affection status of other family members.

Although the case-control approach is potentially powerful for detection of associated variants (21,22), it is liable to spurious results due to population stratification. Therefore, to allow for within-family association tests that are robust to such confounding, siblings of these case or control subjects who defined discordant sibling pairs were selected. Thus, 121 siblings of case subjects were included who were nondiabetic when last examined and whose age was older than that at which the youngest case of diabetes onset in the family occurred. Likewise, 140 diabetic siblings of control subjects with younger age of onset than that of the oldest control subject in the family were included. These individuals constituted 340 discordant sibling pairs in 172 potentially informative sibships.

Population-based association studies.
To examine the potential importance of associated markers on a population basis, a population-based sample of individuals from the longitudinal study was selected for genotyping with selected markers. This sample consisted of all participants from the longitudinal study with available DNA whose heritage was full Pima and/or Tohono O'odham (a closely related tribe); 1,561 of these individuals had diabetes, while 1,940 were nondiabetic. There were 2,834 individuals who were not included in the genome-wide association study, and analyses in these individuals were used to provide a replication of results from the genome-wide study. Analyses in all 3,501 individuals from the population study (who constituted 1,880 sibships) were used to determine population-based parameters. Because of differences in selection criteria, there were 228 individuals in the genome-wide study who were not in the population-based study. Characteristics of individuals in the various studies are shown in Table 1.


View this table:
[in this window]
[in a new window]

 
TABLE 1 Characteristics of individuals in the genome-wide and population-based studies

 
Genotyping.
DNA was isolated using a proteinase K high-salt ethanol precipitation method. Before genotyping for the genome-wide association study, DNA was purified using Montage PCR plates (Millipore). Individuals were genotyped using the Affymetrix 100K Human Mapping array (Affymetrix, Santa Clara, CA), which contains 115,810 SNPs with known positions on the autosomal and X-chromosomes, according to the manufacturer's protocol. Genotypes were generated using the BRLMM (Robust Linear Model with Mahalanobis distance classifier and the addition of a Bayesian step) algorithm (23). A total of 50 individuals were genotyped in duplicate, and 5,122 SNPs were excluded from analysis that had discrepancy rates of >2.7% or that produced valid genotypes in <85% of individuals. The mean genotypic concordance rate among the 50 pairs of duplicates was 99.5%. Since spurious associations may occur more frequently with very rare alleles, a further 28,215 SNPs with minor allele frequency <1% were excluded. Hardy-Weinberg equilibrium was tested among all genotyped individuals using a continuity correction to produce a better approximate statistic for rarer alleles (24). Since systematic genotyping errors may produce severe violations of Hardy-Weinberg equilibrium, a further 2,429 SNPs were excluded that deviated from Hardy-Weinberg equilibrium at P < 0.001. Thus, the present genome-wide association analyses include results for 80,044 SNPs. Family relationships (parents and siblings) were confirmed by comparison of the observed proportion of alleles identical by state for these markers with that expected. SNPplex (Applied Biosystems, Foster City, CA) was used to genotype individuals in the follow-up population-based studies. Physical positions are given according to NCBI Build 35.

Statistical analysis.
Analyses were performed using the SAS package (SAS Institute, Cary, NC). The association between genotype and case-control status was assessed with logistic regression. Genotype was analyzed as a numeric variable representing the number (0, 1, or 2) of copies of a given allele. For X-chromosome markers outside the pseudoautosomal regions, men were coded as homozygous. To account for the inclusion of multiple individuals in the same sibship, data were analyzed using a modified regressive model in which, for each individual, the prevalence of case status among all of his or her siblings was included as a covariate (25). This produces an approximation to the regressive model of Bonney, in which covariates are used to model the residual phenotypic correlation among siblings (26). To provide for a specific test of within-family association in sibships discordant for diabetes, data were also analyzed with conditional logistic regression (27). All models included sex as a covariate, and the likelihood ratio test was used to assess statistical significance. Odds ratios (ORs) were calculated from regression coefficients. In the event that one allele is absent from case or control subjects, parameter estimates will not reliably converge, but the likelihood ratio statistic is still approximately correct. In these cases, the estimated OR is infinite; therefore, we report an OR of infinity for these SNPs.

To summarize the results of case-control and within-family analyses, P values for the two tests were combined with Fisher's method (28). To maintain robustness to population stratification, priority was given to the P value for the within-family test (PWI). This was accomplished by calculating a one-sided P value for the case-control test (PCC1) for an association in the same direction as that observed in the within-family analysis. The contribution of the case-control result (PCC*) to the combined test statistic was taken as the maximum of PWI and 1 – (1 – PCC1)2, where PCC1 employs a correction to partially account for the fact that the two tests have been performed in some of the same individuals. The combined test statistic was calculated as follows:

Formula

Simulation studies show that this method augments power of the within-family test by using information from the general test while maintaining robustness to population stratification (29). If PWI and PCC* were independent, the P value associated with {chi}2CC-WI on 4 d.f. would have a uniform distribution under the null hypothesis (28). However, because PCC* is truncated to be ≥PWI and because of correlation among the tests, this "nominal" P value for {chi}2CC-WI does not have a uniform distribution. To generate a P value corrected for these distributional deviations, 50 million pairs of standardized test statistics, representing within-family and case-control logarithms of the OR divided by their SEs, were simulated from a bivariate distribution with correlation of 0.32 (the observed correlation among these statistics for 80,044 SNPs). In these simulated data, {chi}2CC-WI was calculated for each pair of tests, and the P value for the value of {chi}2CC-WI observed for a given SNP was taken as the proportion of replicates at which the test statistic for these simulated values exceeded the observed value.

Permutations.
To assess experiment-wide statistical significance, affection status was permuted across individuals and analyses of the association of genotypes, with permuted affection status repeated genome wide. A total of 200 permutations were thus analyzed. To maintain the familial nature of the study, all data for a sibship were permuted in tandem (across sibships of the same size), and affection status of individuals, including covariates, was then further permuted within the sibship. The proportion of permutations in which the maximum of {chi}2CC-WI exceeded the value observed for a given SNP was taken as the experiment-wide P value for that SNP. To compare the global distribution of test statistics with that expected under the null hypothesis of no association with any marker, the signed Kolmogorov-Smirnov deviation (d) was calculated for the observed distribution in comparison with the null distribution from each permutation (30). This statistic is the maximum deviation, over the full distribution of P values, of the observed cumulative distribution function (f) for any given observed P value [f(P)obs] from the value expected under the null [f(P)null]:

Formula
where i is an indicator variable that is 1 if f(P)obs ≥ f(P)null and –1 if f(P)obs < f(P)null at the point of maximum deviation. Under the global null hypothesis, the expected value of d is 0; thus, the test that the mean d = 0 across all permutations provides an empirical test of the global null hypothesis. Values of d > 0 indicate a shift toward lower P values in the observed compared with the null distribution. Since this analysis assumes exchangeability among men and women, it was restricted to the 78,568 autosomal SNPs.

Analysis of population study.
SNPs that had the strongest associations in the genome-wide study were genotyped and analyzed in the population sample to examine the association on a population basis. To assess replication in a separate group of individuals, the association of genotype with prevalence of diabetes at the last available examination was analyzed among the 2,834 individuals not included in the genome-wide association study. These analyses were conducted using logistic regression models fit with generalized estimating equations to account for correlation among siblings (31). Within-family tests of association were also conducted using a modification of the method of Abecasis et al. (32), which partitions the association into between- and within-family components. In this method, the sibship mean of the numeric variable representing genotype is used to assess the between-family effects, and each individual's deviation from this mean is used to assess within-family effects. The P value for this within-family coefficient (PWI) was further combined with the general association result using the modification of Fisher's method described above to produce a summary test of these two effects, with the difference that PWI was calculated in a one-sided fashion to ensure that claims of replication would only be made if the direction of association was the same as that observed in the genome-wide association study. The distribution-corrected P value was calculated as described above by simulation of 50 million pairs of test statistics from a bivariate distribution in which the correlation was 0.52.

Analyses of the potential importance of these SNPs in the population were conducted among the 3,501 individuals from the population study. The hazard rate ratio (HRR) for diabetes associated with each copy of the marker allele was estimated. In this model, individuals who developed diabetes were considered to have developed the disease at the age of onset observed in the longitudinal study, while nondiabetic individuals were considered to be at risk for diabetes until the age at last examination. To account for familial resemblance among siblings, these analyses were conducted with a generalized estimating equations model in which diabetes incidence rates were represented as a Poisson function. The HRR was used to calculate population-attributable risk (PAR) for each marker (33). Where PLL, PLH, and PHH represent, respectively, the proportion of individuals homozygous for the low-risk genotype and heterozygous and homozygous for the high-risk genotype, the PAR, under a multiplicative model, is as follows:

Formula

The PAR represents the proportion by which diabetes incidence would decrease if all individuals had the same risk as individuals with the low-risk genotype.


    RESULTS
 TOP
 ABSTRACT
 RESEARCH DESIGN AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The distribution of P values for all 80,044 SNPs is shown in Fig. 1, along with the distribution expected under the global null hypothesis. Overall, the observed distribution of P values was similar to that expected under the null hypothesis; however, there was a slight excess of low P values beyond that expected, and this is more apparent if the portion of the distribution at P < 0.05 is examined (Fig. 1B). The average Kolmogorov-Smirnov deviation comparing the observed and expected distributions in 200 permutations was significantly greater than 0 (average d = 0.012, SE 0.00036, and P < 0.0001). This indicates that the observed distribution contains a statistically significantly greater proportion of low P values than would be expected under the global null hypothesis of no association with any SNP. Similar results were obtained with the within-family test alone (d = 0.006, SE 0.00035, and P < 0.0001). To explore genotyping artifact as a potential source of bias, analyses were repeated with more stringent thresholds for Hardy-Weinberg equilibrium, genotype success rate, and minor allele frequency with similar results. For the 25,915 SNPs with P > 0.05 for Hardy-Weinberg equilibrium, genotype success rate >0.99, and minor allele frequency >0.1, results were very similar (e.g., for the within-family test, d = 0.008, SE 0.00058, and P < 0.0001).


Figure 1
View larger version (15K):
[in this window]
[in a new window]

 
FIG. 1. Cumulative distribution of P values observed for all 80,044 SNPs in the genome-wide association study compared with the expected distribution under the null hypothesis of no association with any marker. A: Entire distribution. B: Distribution below P = 0.05.

 
Results of tests for individual SNPs for association with young-onset type 2 diabetes are shown in Fig. 2. There were several regions where ≥1 SNP showed fairly strong evidence for association. The SNPs with the lowest P values were rs686989 (P = 2.7 x 10–6, which corresponds with an experiment-wide P value of 0.11) and rs672849 (P = 1.5 x 10–5, experiment-wide P = 0.55), both on chromosome 11 at 113.54 Mb; rs2164000 on chromosome 9 at 18.75 Mb (P = 2.3 x 10–5, experiment-wide P = 0.69); and rs10520926 on chromosome 5 at 25.36 Mb (P = 2.6 x 10–5, experiment-wide P = 0.73). The two chromosome 11 SNPs were highly concordant (r2 = 0.99). There were 128 SNPs with P < 0.001 (~80 expected under the null), and these SNPs were further genotyped in individuals from the larger population. The 128 SNPs are listed in supplementary Table 1 (available in an online appendix at http://dx.doi.org/10.2337/db07-0462).


Figure 2
View larger version (22K):
[in this window]
[in a new window]

 
FIG. 2. P values for association with diabetes across all chromosomes. The x-axis represents the position of SNPs on each chromosome. The y-axis is the negative of the base 10 logarithm of the P value (higher values represent greater statistical significance). For ease of presentation, only SNPs with P < 0.01 are shown.

 
There were 11 of these SNPs (out of 119 successfully genotyped) that also showed evidence for association (P < 0.05) with diabetes in individuals from the population who were not included in the genome-wide study and for whom the direction of association was the same as that in the genome-wide study. One would expect approximately six such replications by chance at this level of significance. Results of analyses for these SNPs are shown in Table 2. The chromosome 11 SNPs at 113.54 Mb (rs686989 and rs672849) and the chromosome 5 SNP at 25.36 Mb (rs10520926) were among those replicated. In general, the HRR in the population study was lower than the OR from the original study; this is expected given the selection procedure and the fact that the OR only approximates the HRR under certain assumptions (e.g., where incident cases are sampled or the disease is rare [34]). The HRR for most alleles is modest, and those with high PAR are largely those for which the risk alleles are at high frequency.


View this table:
[in this window]
[in a new window]

 
TABLE 2 Association of SNPs detected in the genome-wide association study with evidence for replicated association in additional individuals

 
To assess the extent to which SNPs identified as associated with type 2 diabetes in the present study replicated in other populations, data from three other studies conducted with the Affymetrix 100K array were analyzed for all 646 SNPs with P < 0.007 in the present study. These data were provided by investigators from the Amish Family Diabetes Study (35), the Framingham Heart Study (36), and the Starr County Diabetes Study of Mexican-Americans (37). P values for an association in the same direction observed in the present study were combined across all three other studies by Fisher's method (28). Of the SNPs, 88.2% had valid data for all three other studies, 6.0% for two other studies, 5.6% for one other study, and 0.2% (one SNP) for none. There were 30 SNPs, shown in Table 3, that replicated the results of the present study at P < 0.05 (~32 expected under the null). None of these overlap with those shown in Table 2. The overall analysis strategy and the number of SNPs selected at various stages of analysis are summarized in Fig. 3.


View this table:
[in this window]
[in a new window]

 
TABLE 3 SNPs with evidence for replicated association with type 2 diabetes (combined P value <0.05) in Amish, Framingham, and Starr County studies

 

Figure 3
View larger version (28K):
[in this window]
[in a new window]

 
FIG. 3. Diagrammatic representation of the present study, along with molecular and analytic strategies to assess replication.

 

    DISCUSSION
 TOP
 ABSTRACT
 RESEARCH DESIGN AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Type 2 diabetes is partially genetically determined (16), but susceptibility variants conclusively identified to date explain only a small portion of the familial risk. It is therefore likely that there are additional susceptibility variants that have not yet been identified. Genome-wide association studies are a potentially powerful way to detect these variants. The present analysis represents a genome-wide association study in Pima Indians, a population with a high prevalence of type 2 diabetes and obesity (3,19). Diabetes in this population is highly familial, particularly when it occurs at young ages (3,1718,38); therefore, the present study was designed to detect variants associated with young-onset diabetes. Although case-control and related association tests can be powerful, they are liable to confounding by population stratification (39,40). In contrast, within-family tests, though less powerful, are robust to such confounding (40). In the present study, both case-control and family-based designs were employed and results combined so that the case-control results augment the power of the within-family test while maintaining robustness to population stratification (29). Thus, markers identified with this approach are likely to reflect both association and linkage with diabetes susceptibility variants.

Many of the associations observed in the present study have P values that are quite strong by conventional criteria. However, the interpretation of P values in genetic association studies is problematic. Because the prior probability that any given variant is in linkage disequilibrium with a disease susceptibility allele is low, many statisticians recommend very stringent criteria for declaration of statistical significance, e.g., P values from 10–5 to 10–8 (4143). The problem is compounded in genome-wide studies, where because of multiple tests, one would expect some strong associations to occur by chance. Classical methods of adjustment for multiple comparisons, such as Bonferroni, are too stringent for genome-wide association studies because they assume that tests are independent and ignore linkage disequilibrium among markers. Furthermore, while corrections for multiple testing proceed on the assumption that the global null hypothesis of no association with any marker is of interest (44), the corrections are applied on a marker-wise basis and, thus, may not efficiently utilize information from the full distribution of P values relevant to this global null.

In the present analysis, a permutation procedure was used to assess experiment-wide statistical significance. This procedure maintains the linkage disequilibrium structure among markers. In addition, the Kolmogorov-Smirnov test was used to compare the overall distribution of P values observed in the actual data with that observed in data permuted under the null hypothesis. Although no single marker achieved experiment-wide statistical significance, analysis of the distribution of P values resulted in strong rejection of the global null hypothesis of no association with any marker. This result is expected if there are multiple susceptibility variants because, in such a situation, a test of the full distribution can be more powerful than a test of a single variant. A systematic bias is an alternative possibility that is difficult to completely exclude, but the present techniques are robust to bias from population stratification. The present results were also unmodified when restricted to SNPs with increased stringency of Hardy-Weinberg equilibrium, allele frequency, and genotype success rate, and this suggests that they are not explained by bias due to genotypic artifacts that can be captured by these measures. The number of true functional susceptibility variants is difficult to determine, given that modest linkage disequlibrium may extend over long distances and that many associations may be due to chance. Thus, while the present results strongly suggest that some of the low P values observed reflect linkage disequilibrium with diabetes susceptibility loci, the distinction between true- and false-positive results is difficult.

Replication of the association in a separate group of individuals provides additional evidence that a marker is associated with diabetes. In the present study, SNPs with the strongest evidence for association in the genome-wide study were further evaluated in a separate set of individuals from the population. (However, some of these individuals were related to those in the genome-wide study.) Several SNPs showed nominal evidence for association (P < 0.05) in this replication set. This analysis is limited in power because the vast majority of individuals from the extremes of the age-of-onset distribution, who provide much of the statistical power, were excluded due to participation in the genome-wide study. In addition, one would expect some of the markers to show association by chance. Thus, some false-positives are likely to remain among the SNPs that replicated and some true positives among those that did not. However, the probability that SNPs showing replication are in linkage disequilibrium with diabetes susceptibility variants is enhanced above that for the SNPS for which replication was not observed. The result for rs686989 is of particular interest because it is in a region identified as linked to diabetes and obesity in a previous genome-wide linkage study in this population (15); however, the association with rs686989 itself does not explain the linkage signal (data not shown).

The PAR, calculated in a group representative of the full population, provides a measure of the potential importance of each associated SNP, and this information can be used to prioritize regions for follow-up studies. The PAR is calculated on the assumption that the observed association is causal; therefore, it may be underestimated if the marker is not highly concordant with a functional allele or overestimated if confounded by population stratification. In the present analysis, the PAR was calculated from longitudinal data observed in the population study. Thus, in contrast to estimates derived from case-control studies based on prevalent cases, the present results do not depend on the questionable assumptions of a rare disease or of sampling only incident cases (34).

Studies of these SNPs in other populations may also be relevant in prioritization of regions for follow-up studies. The present results were further compared with those obtained from three other genome-wide association studies of type 2 diabetes on the Affymetrix 100K array (3537). Comparison among studies is complicated by the fact that all had different study designs; the present study focused on young-onset diabetes and gave priority to within-family tests. (Characteristics of each study are presented in supplementary Tables 2 and 3.) Nonetheless, several of the SNPs identified in the present study had some evidence for association in the other studies (overall P < 0.05). These regions are also high priority for follow-up studies. It is noteworthy that rs516415, which is also in the diabetes-linked region on chromosome 11q, showed replication, as did rs1886004, which is in a region on chromosome 1q linked to diabetes in Pima Indians and other populations (1415). However, none of the SNPs with evidence for replication in the Pima Indians at P < 0.05 also had a combined P < 0.05 in these other studies. This may reflect low power across the studies to detect alleles of modest effect.

The power of genetic association studies depends on the frequency of the functional polymorphisms and on the magnitude of their effects (22). Given that case subjects represent the lower 10% of the age-of-onset distribution and control subjects the upper 45%, we estimate that the present sample size has ~75% power to detect a common (minor allele frequency >0.1) functional allele at P < 0.001 that explains 3% of the variance in age at onset of diabetes. Power also depends on the extent to which one of the typed markers is strongly concordant with an allele that influences disease susceptibility (21,22). Since the 100K array used in the present study does not exhaustively capture allelic variation, it is possible that additional regions that contain diabetes susceptibility variants were not identified. The pattern of linkage disequilibrium among SNPs identified in the HapMap project suggests that ~30% of common variants have r2 > 0.8 with a marker on this array in non-African populations (45). American Indians are not included in the HapMap, and they tend to share relatively few common haplotypes with HapMap populations (46). This may be reflected in the fact that a larger proportion of markers were nearly monomorphic in the present study than in the other studies that used the 100K array (3537), and this could result in lower power for this array to detect associations in American Indians. On the other hand, surveys of various populations suggest that linkage disequilibrium is higher among American Indians than in many other populations such that fewer variants are required to capture common haplotypes. (46). Thus, fixed marker sets, such as the 100K array, may capture common variation in American Indians to an extent similar to that in other non-African populations.

Recently, several high-density genome-wide association studies have been conducted in northern European populations, and these have identified six gene regions (apart from TCF7L2, PPARG and KCNJ11) that contain putative diabetes susceptibility variants (4750). None of the SNPs most strongly associated in the present study was in any of these regions. Furthermore, none of the SNPs consistently associated across these other genome-wide studies is well-captured by the 100K array. However, as shown in Table 4, some SNPs in these regions had modest evidence for association in the present study (P < 0.05). These putative diabetes susceptibility variants have been largely identified in northern Europeans, and it is not clear whether they play a significant role in diabetes in American Indians or other high-risk populations. A more exhaustive survey of variation in these regions is required to quantify their role in diabetes susceptibility in the Pima Indian population. The variants in TCF7L2 that are most strongly associated with type 2 diabetes in other populations (913) have been typed in the Pima Indians, in whom there is little evidence that they influence diabetes risk (51).


View this table:
[in this window]
[in a new window]

 
TABLE 4 Strongest association results in present study for SNPs in regions identified as associated with type 2 diabetes in multiple high-density genome-wide association studies

 
Genome-wide mapping studies are typically only an initial step in the elucidation of susceptibility variants. The present analyses have identified several regions that may harbor genetic variants that influence susceptibility to young-onset diabetes in American Indians. Several of the associated SNPs are in or near genes, including ZNF659 (chromosome 3, 21.47 Mb), FANCF (chromosome 11, 22.60 Mb), ZBTB15 (chromosome 11, 113.54 Mb), and SENP1 (chromosome 12, 46.71 Mb). Fine-mapping studies of these regions are needed to confidently localize the signals to specific genes. Confirmation of the role of genes in the regions identified in the present study will require replication studies in other populations and, ultimately, functional studies.


    ACKNOWLEDGMENTS
 
This research was supported by the Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK).

We thank Jose Florez for review of the manuscript. We also thank the staff of the Diabetes Epidemiology and Clinical Research Section and the Diabetes Molecular Genetics Section (NIDDK, Phoenix, Arizona) for assistance. We further thank members of the Gila River Indian Community who participated in the study.


    FOOTNOTES
 
Published ahead of print at http://diabetes.diabetesjournals.org on 10 September 2007. DOI: 10.2337/db07-0462.

Additional information for this article can found in an online appendix at http://dx.doi.org/10.2337/db07-0462.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Received for publication April 3, 2007 and accepted in revised form September 5, 2007


    REFERENCES
 TOP
 ABSTRACT
 RESEARCH DESIGN AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Rich SS: Mapping genes in diabetes: genetic epidemiological perspective. Diabetes 39:1315–1319, 1990[Abstract]
  2. Hanson RL, Knowler WC: Type 2 diabetes and maturity-onset diabetes of the young. In Analysis of Multifactorial Disease. Bishop T, Sham P, Eds. Oxford, U.K., BIOS Scientific Publishers, 2000, p.131–147
  3. Knowler WC, Pettitt DJ, Saad MF, Bennett PH: Diabetes mellitus in the Pima Indians: incidence, risk factors and pathogenesis: Diabetes Metab Rev 6:1–27, 1990[Medline]
  4. Newman B, Selby JV, King MC, Slemenda C, Fabsitz R, Friedman GD: Concordance for type 2 (non-insulin-dependent) diabetes mellitus in male twins. Diabetologia 30:763–768, 1987[Medline]
  5. Kaprio J, Tuomilehto J, Koskenvuo M, Romanov K, Reunanen A, Eriksson J, Stengård J, Kesäniemi YA: Concordance for type 1 (insulin-dependent) and type 2 (non-insulin-dependent) diabetes mellitus in a population-based cohort of twins in Finland. Diabetologia 35:1060–1067, 1992[Medline]
  6. Matsuda A, Kuzuya T: Diabetic twins in Japan. Diabetes Res Clin Pract 24 (Suppl.):S63–S67, 1994[Medline]
  7. Altshuler D, Hirschhorn JN, Klannemark M, Lindgren CM, Vohl MC, Nemesh J, Lane CR, Schaffner SF, Bolk S, Brewer C, Tuomi T, Gaudet D, Hudson TJ, Daly M, Groop L, Lander ES: The common PPAR{gamma} Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nat Genet 26:76–80, 2000[Medline]
  8. Gloyn AL, Weedon MN, Owen KR, Turner MJ, Knight BA, Hitman G, Walker M, Levy JC, Sampson M, Halford S, McCarthy MI, Hattersley AT, Frayling TM: Large-scale association studies of variants in genes encoding the pancreatic ß-cell KATP channel subnunits Kir6.2 (KCNJ11) and SUR1 (ABCC8) confirm that the KCNJ11 E23K variant is associated with type 2 diabetes. Diabetes 52:568–572, 2003[Abstract/Free Full Text]
  9. Grant SFA, Thorleifsson G, Reynisdottir I, Benediktsson R, Manolescu A, Sainz J, Helgason A, Stefansson H, Emilsson V, Helgadottir A, Styrkarsdottir U, Magnusson KP, Walters GB, Palsdottir E, Jonsdottir T, Gudmundsdottir T, Gylfason A, Saemundsdottir J, Wilensky RL, Reilly MP, Rader DJ, Bagger Y, Christiansen C, Gudnason V, Sigurdsson G, Thorsteinsdottir U, Gulcher JR, Kong A, Stefansson K: Variant of transcription factor 7-like 2 (TFC7L2) gene confers risk of type 2 diabetes. Nat Genet 38:320–323, 2006[Medline]
  10. Zeggini E, McCarthy MI: TCF7L2: the biggest story in diabetes genetics since HLA? Diabetologia 50:1–4, 2007[Medline]
  11. Helgason A, Pálsson S, Thorleifsson G, Grant SFA, Emilsson V, Gunnarsdottir S, Adeyemo A, Chen Y, Chen G, Reynisdottir I, Benediktsson R, Hinney A, Hansen T, Andersen G, Borch-Johnsen K, Jorgensen T, Schäfer H, Faruque M, Doumatey A, Zhou J, Wilensky RL, Reilly MP, Rader DJ, Bagger Y, Christiansen C, Sigurdsson G, Hebebrand J, Pedersen O, Thorsteinsdottir U, Gulcher JR, Kong A, Rotimi C, Stefánsson K: Refining the impact of TCF7L2 gene variants on type 2 diabetes and adaptive evolution. Nat Genet 39:218–225, 2007[Medline]
  12. Florez JC, Jablonski KA, Bayley N, Pollin TI, de Bakker PIW, Shuldiner AR, Knowler WC, Nathan DM, Altshuler D, the Diabetes Prevention Program Research Group: TCF7L2 polymorphisms and progression to diabetes in the Diabetes Prevention Program. N Engl J Med 355:241–250, 2006[Abstract/Free Full Text]
  13. Chandak GR, Janipalli CS, Bhaskar S, Kulkarni SR, Mohankrishna P, Hattersley AT, Frayling TM, Yajnik CS: Common variants in the TCF7L2 gene are strongly associated with type 2 diabetes mellitus in the Indian population. Diabetologia 50:63–67, 2007[Medline]
  14. McCarthy MI: Growing evidence for diabetes susceptibility genes from genome scan data. Curr Diab Rep 3:159–167, 2003[Medline]
  15. Hanson RL, Ehm MG, Pettitt DJ, Prochazka M, Thompson DB, Timberlake D, Foroud T, Kobes S, Baier L, Burns DK, Almasy L, Blangero J, Garvey WT, Bennett PH, Knowler WC: An autosomal genomic scan for loci linked to type II diabetes mellitus and body-mass index in Pima Indians. Am J Hum Genet 63:1130–1138, 1998[Medline]
  16. Dabelea D, Palmer JP, Bennett PH, Pettitt DJ, Knowler WC: Absence of glutamic acid decarboxylase antibodies in Pima Indian children with diabetes. Diabetologia 42:1265–1266, 1999[Medline]
  17. Hanson RL, Knowler WC: Analytic strategies to detect linkage to a common disorder with genetically determined age of onset: diabetes mellitus in Pima Indians. Genet Epidemiol 15:299–315, 1998[Medline]
  18. Hanson RL, Elston RC, Pettitt DJ, Bennett PH, Knowler WC: Segregation analysis of non-insulin-dependent diabetes mellitus in Pima Indians: evidence for a major-gene effect. Am J Hum Genet 57:160–170, 1995[Medline]
  19. Knowler WC, Bennett PH, Hamman RF, Miller M: Diabetes incidence and prevalence in Pima Indians: a 19-fold greater incidence than in Rochester, Minnesota. Am J Epidemiol 108:497–505, 1978[Abstract/Free Full Text]
  20. The Expert Committee on the Diagnosis and Classification of Diabetes Mellitus: Report of the Expert Committee on the Diagnosis and Classification of Diabetes Mellitus. Diabetes Care 20:1183–1197, 1997[Medline]
  21. Zondervan KT, Cardon LR: The complex interplay among factors that influence allelic association. Nat Rev Genet 5:89–100, 2004[Medline]
  22. Hanson RL, Looker HC, Ma L, Muller YL, Baier LJ, Knowler WC: Design and analysis of genetic association studies to finely map a locus identified by linkage analysis: sample size and power calculations. Ann Intern Med 70:332–349, 2006
  23. BRLMM: an improved genotype calling method for the GeneChip Human Mapping 500K Array Set [article online], 2006. Available from http://www.affymetrix.com/support/technical/whitepapers/brlmm_whitepaper.pdf. Accessed 25 March 2007
  24. Emigh TH: A comparison of tests for Hardy-Weinberg equilibrium. Biometrics 36:627–642, 1980
  25. Schnell AH, Karunaratne PM, Witte JS, Dawson DV, Elston RC: Modeling age of onset and residual familial correlations for the linkage analysis of bipolar disorder. Genet Epidemiol 14:675–680, 1997[Medline]
  26. Bonney GE: Regressive logistic models for familial disease and other binary traits. Biometrics 42:611–625, 1986[Medline]
  27. Witte JS, Gauderman WJ, Thomas DC: Asymptotic bias and efficiency in case-control studies of candidate genes and gene-environment interactions: basic family designs. Am J Epidemiol 149:693–705, 1999[Abstract/Free Full Text]
  28. Elston RC: On Fisher's method of combining p-values. Biometrical J 33:339–345, 1991
  29. Hanson RL, Knowler WC: Design and analysis of genetic association studies to finely map a locus identified by linkage analysis: assessment of the extent to which an association can account for the linkage. Ann Intern Med. 12 July 2007 [Epub ahead of print]
  30. Massey FJ: The Kolmogorov-Smirnov test for goodness of fit. J Am Stat Assoc 253:68–78, 1951
  31. Zeger SL, Liang KY: Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42:121–130, 1986[Medline]
  32. Abecasis GR, Cardon LR, Cookson WOC: A general test of association for quantitative traits in nuclear families. Am J Hum Genet 66:279–292, 2000[Medline]
  33. Kleinbaum DG, Kupper LL, Morgenstern H: Measures of potential impact and summary of the measures. In Epidemiologic Research: Principles and Quantitative Methods. Kleinbaum DG, Kupper LL, Morgenstern H, Eds. New York, Van Nostrand Reinhold Company, 1982, p.159–180
  34. Walter WD: The estimation and interpretation of attributable risk in health research. Biometrics 32:829–849, 1976[Medline]
  35. Rampersaud E, Damcott DM, O'Connell J, McArdle P, Shen H, Fu M, Shelton J, Ying J, Shi X, Ott SH, Zhang L, Zhao Y, Mitchell BD, Shuldiner AR: Identification of novel candidate genes in the Old Order Amish with replication in independent genome-wide association scans (GWAS) of type 2 diabetes. Diabetes 56:3053–3062, 2007[Abstract/Free Full Text]
  36. Florez JC, Manning MK, Dupuis J, McAteer J, Irenze K, Gianniny L, Mirel DB, Fox CS, Cupples LA, Meigs JB: A 100K genome-wide association scan for diabetes and related traits in the Framingham Heart Study: replication and integration with other genome-wide datasets. Diabetes 56:3063–3074, 2007[Abstract/Free Full Text]
  37. Hayes GM, Pluzhnikov A, Miyake K, Sun Y, Below JE, Ng MCY, Roe CA, Bell GI, Cox NJ, Hanis CL: Identification and replication of novel type 2 diabetes genes in Mexican Americans through genome-wide association studies. Diabetes 56:3033–3044, 2007[Abstract/Free Full Text]
  38. Baier LJ, Hanson RL: Genetic studies of the etiology of type 2 diabetes in Pima Indians: hunting for pieces to a complicated puzzle. Diabetes 53:1181–1186, 2004[Free Full Text]
  39. Knowler WC, Williams RC, Pettitt DJ, Steinberg AG: Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am J Hum Genet 43:520–526, 1988[Medline]
  40. Cardon LR, Bell JI: Association study designs for complex diseases. Nat Rev Genet 2:91–99, 2001[Medline]
  41. Risch N, Merikangas K: The future of genetic studies of complex human diseases. Science 273:1516–1517, 1996[Abstract/Free Full Text]
  42. Colhoun HM, McKeigue PM, Davey Smith G: Problems of reporting genetic associations with complex outcomes. Lancet 361:865–872, 2003[Medline]
  43. Manly KF: Reliability of statistical associations between genes and disease. Immunogenetics 57:549–558, 2005[Medline]
  44. Rothman KJ: No adjustments are needed for multiple comparisons. Epidemiology 1:43–46, 1990[Medline]
  45. Pe'er I, de Bakker PIW, Maller J, Yelensky R, Altshuler D, Daly MJ: Evaluating and improving power in whole-genome association studies using fixed marker sets. Nat Genet 38:663–667, 2006[Medline]
  46. Conrad DF, Jakobsson M, Coop G, Wen X, Wall JD, Rosenberg NA, Pritchard JK: A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nat Genet 38:1251–1260, 2006[Medline]
  47. Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, Boutin P, Vincent D, Belisle A, Hadjadj S, Balkau B, Heude B, Charpentier G, Hudson TJ, Montpetit A, Pshezhetsky AV, Prentki M, Posner BI, Balding DJ, Meyre D, Polychronakos C, Froguel P: A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445:881–885, 2007[Medline]
  48. Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PIW, Chen H, Roix JR, Kathiresan S, Hirschhorn JN, Daly MJ, Hughes TE, Groop L, Altshuler D: Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316:1331–1335, 2007[Abstract/Free Full Text]
  49. Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JRB, Rayner NW, Freathy RM, Barrett JC, Shields B, Morris AP, Ellard S, Groves CJ, Harries LW, Marchini JL, Owen KR, Knight B, Cardon LR, Walker M, Hitman GA, Morris AD, Doney ASF; Wellcome Trust Case Control Consortium (WTCCC), McCarthy MI, Hattersley AT: Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316:1336–1341, 2007[Abstract/Free Full Text]
  50. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines CS, Jackson AU, Prokunknina-Olsson L, Ding CJ, Swift AJ, Narisu N, Hu T, Pruim R, Xiao R, Li XY, Conneely KN, Riebow NL, Sprau AG, Tong M, White PP, Hetrick KN, Barnhart MW, Bark CW, Goldstein JL, Watkins L, Xiang F, Saramies J, Buchanan TA, Watanabe RM, Valle TT, Kinnunen L, Abecasis GR, Pugh EW, Doheny KF, Bergman RN, Tuomilehto J, Collins FS, Boehnke M: A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316:1341–1345, 2007[Abstract/Free Full Text]
  51. Guo T, Hanson RL, Traurig M, Muller YL, Ma L, Mack J, Kobes S, Knowler WC, Bogardus C, Baier LJ: TCF7L2 is not a major susceptibility gene for type 2 diabetes in Pima Indians: analysis of 3,501 individuals Diabetes 56:3075–3088, 2007[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
DiabetesHome page
S. S. Rich, J. M. Norris, and J. I. Rotter
Genes Associated With Risk of Type 2 Diabetes Identified by a Candidate-Wide Association Scan: As a Trickle Becomes a Flood
Diabetes, November 1, 2008; 57(11): 2915 - 2917.
[Full Text] [PDF]


Home page
DiabetesHome page
L. Ma, R. L. Hanson, L. N. Que, Y. Guo, S. Kobes, C. Bogardus, and L. J. Baier
PCLO Variants Are Nominally Associated With Early-Onset Type 2 Diabetes and Insulin Resistance in Pima Indians
Diabetes, November 1, 2008; 57(11): 3156 - 3160.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
K. J. Gaulton, C. J. Willer, Y. Li, L. J. Scott, K. N. Conneely, A. U. Jackson, W. L. Duren, P. S. Chines, N. Narisu, L. L. Bonnycastle, et al.
Comprehensive Association Study of Type 2 Diabetes and Related Quantitative Traits With 222 Candidate Genes
Diabetes, November 1, 2008; 57(11): 3136 - 3144.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
K. D. Taylor, J. M. Norris, and J. I. Rotter
Genome-Wide Association: Which Do You Want First: the Good News, the Bad News, or the Good News?
Diabetes, December 1, 2007; 56(12): 2844 - 2848.
[Full Text] [PDF]


Home page
DiabetesHome page
E. Rampersaud, C. M. Damcott, M. Fu, H. Shen, P. McArdle, X. Shi, J. Shelton, J. Yin, Y.-P. C. Chang, S. H. Ott, et al.
Identification of Novel Candidate Genes for Type 2 Diabetes From a Genome-Wide Association Scan in the Old Order Amish: Evidence for Replication From Diabetes-Related Quantitative Traits and From Independent Populations
Diabetes, December 1, 2007; 56(12): 3053 - 3062.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
M. G. Hayes, A. Pluzhnikov, K. Miyake, Y. Sun, M. C.Y. Ng, C. A. Roe, J. E. Below, R. I. Nicolae, A. Konkashbaev, G. I. Bell, et al.
Identification of Type 2 Diabetes Genes in Mexican Americans Through Genome-Wide Association Studies
Diabetes, December 1, 2007; 56(12): 3033 - 3044.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
J. C. Florez, A. K. Manning, J. Dupuis, J. McAteer, K. Irenze, L. Gianniny, D. B. Mirel, C. S. Fox, L. A. Cupples, and J. B. Meigs
A 100K Genome-Wide Association Scan for Diabetes and Related Traits in the Framingham Heart Study: Replication and Integration With Other Genome-Wide Datasets
Diabetes, December 1, 2007; 56(12): 3063 - 3074.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Online-Only Appendix
Right arrow All Versions of this Article:
db07-0462v1
56/12/3045    most recent
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Request Permissions
Citing Articles
Right arrow Citing Articles via HighWire