Comprehensive Association Study of Type 2 Diabetes and Related Quantitative Traits With 222 Candidate Genes

  1. Kyle J. Gaulton1,
  2. Cristen J. Willer2,
  3. Yun Li2,
  4. Laura J. Scott2,
  5. Karen N. Conneely2,
  6. Anne U. Jackson2,
  7. William L. Duren2,
  8. Peter S. Chines3,
  9. Narisu Narisu3,
  10. Lori L. Bonnycastle3,
  11. Jingchun Luo4,
  12. Maurine Tong3,
  13. Andrew G. Sprau3,
  14. Elizabeth W. Pugh5,
  15. Kimberly F. Doheny5,
  16. Timo T. Valle6,
  17. Gonçalo R. Abecasis2,
  18. Jaakko Tuomilehto678,
  19. Richard N. Bergman9,
  20. Francis S. Collins3,
  21. Michael Boehnke2 and
  22. Karen L. Mohlke1
  1. 1Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
  2. 2Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan
  3. 3Genome Technology Branch, National Human Genome Research Institute, Bethesda, Maryland
  4. 4Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
  5. 5Center for Inherited Disease Research, Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland
  6. 6Diabetes and Genetic Epidemiology Unit, Department of Epidemiology and Health Promotion, National Public Health Institute, Helsinki, Finland
  7. 7Department of Public Health, University of Helsinki, Helsinki, Finland
  8. 8South Ostrobothnia Central Hospital, Seinäjoki, Finland
  9. 9Department of Physiology and Biophysics, Keck School of Medicine, University of Southern California, Los Angeles, California
  1. Corresponding author: Karen Mohlke, mohlke{at}med.unc.edu

Abstract

OBJECTIVE—Type 2 diabetes is a common complex disorder with environmental and genetic components. We used a candidate gene–based approach to identify single nucleotide polymorphism (SNP) variants in 222 candidate genes that influence susceptibility to type 2 diabetes.

RESEARCH DESIGN AND METHODS—In a case-control study of 1,161 type 2 diabetic subjects and 1,174 control Finns who are normal glucose tolerant, we genotyped 3,531 tagSNPs and annotation-based SNPs and imputed an additional 7,498 SNPs, providing 99.9% coverage of common HapMap variants in the 222 candidate genes. Selected SNPs were genotyped in an additional 1,211 type 2 diabetic case subjects and 1,259 control subjects who are normal glucose tolerant, also from Finland.

RESULTS—Using SNP- and gene-based analysis methods, we replicated previously reported SNP-type 2 diabetes associations in PPARG, KCNJ11, and SLC2A2; identified significant SNPs in genes with previously reported associations (ENPP1 [rs2021966, P = 0.00026] and NRF1 [rs1882095, P = 0.00096]); and implicated novel genes, including RAPGEF1 (rs4740283, P = 0.00013) and TP53 (rs1042522, Arg72Pro, P = 0.00086), in type 2 diabetes susceptibility.

CONCLUSIONS—Our study provides an effective gene-based approach to association study design and analysis. One or more of the newly implicated genes may contribute to type 2 diabetes pathogenesis. Analysis of additional samples will be necessary to determine their effect on susceptibility.

Type 2 diabetes is a metabolic disorder characterized by insulin resistance and pancreatic β-cell dysfunction and is a leading cause of morbidity and mortality in the U.S. and worldwide. The incidence of type 2 diabetes is rapidly increasing, with 1.6 million new cases of diabetes diagnosed in individuals aged ≥20 years in the U.S. in 2007 (available at http://www.diabetes.niddk.nih.gov/dm/pubs/statistics/). While environmental factors play a major role in predisposition to type 2 diabetes, substantial evidence supports the influence of genetic factors on disease susceptibility. For example, the twin concordance rate is an estimated 34% for monozygotic twins and 16% for dizygotic twins (1). However, the underlying genetic variants are just beginning to be identified (2).

Numerous published reports (35) have identified association between type 2 diabetes and common genetic variants in human populations; however, until very recently, variants in only a few genes have been consistently replicated across populations and with large sample sizes. Among these are the Pro12Ala (rs1801282) variant in peroxisome proliferator–activated receptor γ (PPARG) (6), the Glu23Lys (rs5210) variant in the potassium channel gene KCNJ11 (7), and several variants in the Wnt-receptor signaling pathway member TCF7L2 (8).

Recent genome-wide studies have implicated many previously unreported genes in type 2 diabetes susceptibility. The first reported genome-wide association (GWA) scan implicated variants at five susceptibility loci that include TCF7L2 and novel loci near the genes SLC30A8, IDE-KIF11-HHEX, LOC387761, and EXT-ALX4 (9). Three companion GWA studies (1012), including one by our group, replicated evidence for PPARG, KCNJ11, TCF7L2, SLC30A8, and IDE-KIF11-HHEX and provided new evidence for CDKAL1, CDKN2A-CDKN2B, IGF2BP2, FTO, and a region of chromosome 11 with no annotated genes. Additional GWA studies (1318) provided additional evidence for TCF7L2, CDKAL1, and SLC30A8. The candidate genes WFS1 (19) and TCF2 (20,21) have also been confirmed in large samples, bringing the current list of type 2 diabetes susceptibility loci to at least 10. The recent discovery of these loci still explains only a small fraction (∼2.3%) of the overall risk of type 2 diabetes (12). Therefore, novel susceptibility genes remain to be identified through increasingly comprehensive analyses of both individual genes and the entire genome.

The Finland-U.S. Investigation of Type 2 Diabetes Genetics (FUSION) study aims to identify variants influencing susceptibility to type 2 diabetes and related quantitative traits in the Finnish population (22). FUSION has previously identified modest type 2 diabetes association in Finns with variants in HNF4A (23); four genes known to cause maturity-onset diabetes of the young (5,23,24); PPARG, KCNJ11, ENPP1, SLC2A2, PCK1, TNF, IL6 (5), and TCF7L2 (25); and the loci identified in the GWA studies.

As a complementary approach to GWA studies, which are conducted without a priori biological hypotheses, we sought to perform an in-depth analysis of >200 genes likely to influence susceptibility to type 2 diabetes and quantitative trait variation that we selected by applying CandidAtE Search And Rank (CAESAR), a text- and data-mining algorithm (26). We aimed to analyze the full spectrum of HapMap-based common variation in each of these candidate genes. The combination of high throughput genotyping, linkage disequilibrium (LD) information from HapMap (27), the ability to impute ungenotyped variants (28), and the improved functional annotation of the genome makes in-depth candidate gene–based association analysis possible.

RESEARCH DESIGN AND METHODS

The stage 1 sample set consisted of 2,335 Finnish individuals from the FUSION (22,29) and Finrisk 2002 (30) studies (Table 1) (online appendix Table 1A [available at http://dx.doi.org/10.2337/db07-1731]). The sample included 1,161 individuals with type 2 diabetes and 1,174 control subjects with normal glucose tolerance. Diabetes was defined according to 1999 World Health Organization criteria (fasting plasma glucose concentration ≥7.0 mmol/l or 2-h plasma glucose concentration ≥11.1 mmol/l), by report of diabetes medication use, or based on medical record review. Normal glucose tolerance was defined as having fasting glucose <6.1 mmol/l and 2-h glucose <7.8 mmol/l. A total of 120 FUSION offspring with genotyped parents were included for quantitative trait analysis; all offspring had normal glucose tolerance except one type 2 diabetic individual who was included in the case sample.

Stage 2 consisted of 2,473 Finnish individuals (Table 1) (online appendix Table 1B) and included 1,215 individuals with type 2 diabetes and 1,258 control subjects with normal glucose tolerance (10). A total of 56 duplicate samples were used for quality control. The sample sets are identical to those used in the FUSION GWA study (10). Study protocols were approved by local ethics committees and/or institutional review boards, and informed consent was obtained from all study participants.

Gene selection.

A total of 222 candidate genes were selected for study using two strategies. Two hundred and seventeen candidate genes were selected using CAESAR, an algorithm that prioritizes candidate genes for complex human traits based on trait-relevant functional annotation (26). Given a trait-relevant input text, CAESAR 1) uses text mining to extract gene symbols and to find and rank terms present in four biomedical ontologies (gene ontology biological process [31], gene ontology molecular function [31], eVOC anatomy [32], and mammalian phenotype ontology [33]) based on frequency of occurrence, 2) uses the ranked ontology terms and extracted gene symbols to data mine several public databases for human genes annotated with the ontology terms or extracted gene symbols, and 3) integrates the resulting gene annotation lists to provide a combined score and rank for each gene. Details of gene selection using custom parameters for CAESAR are provided in the online appendix.

Five genes were not ranked high enough to have been included using CAESAR. ENPP1, HFE, WFS1, and ZNHIT3 were included because each had one or more single nucleotide polymorphisms (SNPs) associated with type 2 diabetes (P < 0.1) in a prior study of a subset of FUSION samples (6) (C.J.W., L.L.B., M.B., and K.L.M., unpublished data); in addition, ENPP1 and WFS1 had been previously studied as type 2 diabetes candidate genes. CAPN10 was included because it had been previously studied by FUSION (34) and others (35,36).

SNP selection.

We defined the “transcribed region” of each of the 222 candidate genes as the sequence including the first exon of any transcribed isoform through the last exon of any transcribed isoform, and we aimed to capture variation up to 10 kb upstream and 5 kb downstream of the transcribed region (−10 kb/+5 kb). In this process, we allowed SNPs to be located as far as 50 kb upstream and 50 kb downstream (−50 kb/+50 kb) of the transcribed region if they tagged a −10-kb/+5-kb SNP at r2 > 0.8.

Briefly, 3,531 SNPs were selected for stage 1 genotyping as follows. We selected SNPs from the Illumina Infinium II HumanHap300 BeadChip that tagged one or more −10-kb/+5-kb SNPs (r2 > 0.8). Then, to evaluate each gene region more comprehensively, we selected 1) additional tagSNPs and 2) functionally annotated non-HapMap SNPs for genotyping on an Illumina GoldenGate panel. We also included eight SNPs that had been previously genotyped in candidate gene studies on a smaller subset of FUSION samples (5). Additional details of SNP selection are provided in the online appendix.

Genotyping.

Stage 1 genotyping of 317,503 SNPs was performed at the Center for Inherited Disease Research on the HumanHap300 BeadChip using the Illumina Infinium II assay protocol (10), and 1,527 SNPs were genotyped in partnership with the Mammalian Genotyping Core at the University of North Carolina using the Illumina GoldenGate assay. We performed additional genotyping for eight previously reported SNPs (5) using the Sequenom homogeneous MassEXTEND assay and four imputed SNPs using Applied Biosystems TaqMan allelic discrimination assays. There was a genotype consistency rate of >99.88% between each platform, using 79 duplicate samples. Stage 2 genotyping of 31 SNPs was performed using the homogeneous MassEXTEND assay; there was a genotype consistency rate of 100%, using 56 duplicate samples. SNP and sample success rates and quality-control filters are described in the online appendix.

Imputation.

We used MACH, a computationally efficient hidden Markov model–based algorithm (available at http://www.sph.umich.edu/csg/abecasis/MACH/) (28), to impute genotypes in FUSION samples for 7,498 common (minor allele frequency [MAF] > 0.05) HapMap SNPs present in the target regions but not genotyped in our study. To improve the quality of imputation near the ends of the target regions, we used at least 1 Mb of flanking genotype information to impute SNPs in target regions.

Coverage of HapMap SNPs.

Coverage was calculated as the percentage of all common (MAF > 0.05) HapMap Release 21 CEU SNPs in the −10-kb/+5-kb gene regions that are tagged by a genotyped SNP at an r2 threshold of at least 0.8.

Type 2 diabetes association analysis.

Genotyped SNPs were tested for type 2 diabetes association using logistic regression under additive (Padd), dominant, and recessive genetic models with adjustment for 5-year age category, sex, and birth province. Imputed SNPs were tested for type 2 diabetes association using logistic regression under an additive model (Pimpute), with the expected allele count in place of the allele count and adjusted for the same covariates. This approach takes into account the degree of uncertainty of genotype imputation in a computationally efficient manner by replacing allele counts (0, 1, and 2) at the marker locus by predicted allele counts based on estimated probabilities of 0, 1, or 2 copies of a SNP allele (available at http://www.sph.umich.edu/csg/abecasis/MACH/) (28).

We accounted for carrying out multiple correlated tests using the P value adjusted for correlated tests (PACT) method (37). The PACT method was used to correct the minimum P value among 1) tests of three genetic models for a single SNP (PSNP) and 2) multiple SNPs and models across a gene region (Pgene). Details are provided in the online appendix. We determined the independence of significant association signals in genes by including one SNP as a covariate in logistic regression and reassessing the evidence for association with the other SNPs.

Quantitative trait analysis.

We tested all genotyped and imputed SNPs for association with 20 type 2 diabetes–related quantitative traits, including, in control subjects only, fasting insulin, fasting glucose, homeostasis model adjustment, and fasting free fatty acids; and, in all samples, BMI, weight, waist circumference, hip circumference, waist-to-hip ratio, waist-to-height2 ratio, total cholesterol, HDL cholesterol, LDL cholesterol, triglyceride level, cholesterol-to-HDL ratio, triglyceride-to-HDL ratio, diastolic blood pressure, systolic blood pressure, pulse, and pulse pressure.

For case and control subjects separately, we regressed the quantitative trait variables on age, age2, sex, birth province, and study indicator and transformed the residuals of each quantitative trait to approximate normality using inverse normal scores, which involves ranking the residual values and then converting these to z-scores according to quantiles of the standard normal distribution. We then carried out association analysis on the residuals. To allow for relatedness, regression coefficients were estimated in the context of a variance component model that also accounted for background polygenic effects (38). For genotyped SNPs, we tested for association using the residuals under an additive model. For imputed SNPs, we tested for association using the residuals and the expected allele count in place of the allele count under an additive model. Case and control results were combined using meta-analysis, as described in the online appendix.

RESULTS

We studied 222 candidate genes for type 2 diabetes association in our stage 1 sample of 1,161 type 2 diabetic case subjects and 1,174 control subjects with normal glucose tolerance from the FUSION study (Table 1). Of 10,762 target HapMap SNPs (MAF > 0.05) in the −10-kb/+5-kb gene regions, 3,531 genotyped SNPs cover 10,299 (95.7%) SNPs at an r2 threshold of 0.8. This represents an improvement over the genome-wide HumanHap300 genotyped SNPs, which alone cover 79.0% of the target SNPs at r2 ≥ 0.8 (Table 2). A total of 3,187 of 3,531 genotyped SNPs are located in the −10-kb/+5-kb regions. Of the remaining 7,575 ungenotyped target SNPs, 7,498 were successfully imputed. Altogether, 99.9% of all target variation was genotyped, imputed, or tagged (r2 ≥ 0.8) by an analyzed SNP.

We evaluated the significance of genotyped SNPs in each gene region after correcting for multiple SNPs tested while accounting for the LD between SNPs, designated Pgene (37). Given six pairs of adjacent genes (see online appendix), we analyzed 216 distinct gene regions for type 2 diabetes association (online appendix Table 2). SNPs in four gene regions (rs11183212 in ARID2 [Pgene = 0.0029], rs2235718 in FOXC1 [Pgene = 0.0028], rs8069976 in SOCS3 [Pgene = 0.0037], and rs222852 in SLC2A4 [Pgene = 0.0024]) were significantly associated with type 2 diabetes at Pgene < 0.005, although no Pgene result reached a study-wide significance of 0.00023, a threshold determined using a Bonferroni correction. SNPs in 19 genes were significant at Pgene < 0.05, including SNPs in three genes previously implicated in type 2 diabetes susceptibility in FUSION (5) (Table 3). There was an excess of significant Pgene results at both thresholds (4 at Pgene < 0.005 [P = 0.024]; 19 at Pgene < 0.05 [P = 0.013]). The excess of significant results at Pgene < 0.005 is maintained after excluding 1) seven genes showing prior evidence of association with any SNP in FUSION samples (P = 0.022) or 2) five genes not selected by CAESAR (P = 0.022), as no excluded genes were significant at that threshold (see online appendix).

To evaluate all 3,531 genotyped SNPs (online appendix Table 3), we permuted the case/control status to estimate whether an excess of significant results was observed. A total of 214 SNPs showed significant type 2 diabetes association at a PSNP threshold of 0.05, and, of these, 26 were associated at a PSNP threshold of 0.005 (Table 4). There was modest, but not significant, excess at both of these PSNP thresholds (observed = 214, expected = 183.3, P = 0.09 and observed = 26, expected = 18.9, P = 0.12, respectively). The most significant PSNP value of 3.6 × 10−4 was observed for rs11183212, an intronic SNP in the ARID2 gene, but when compared with an empirical distribution of the most significant P values, this SNP does not reach a study-wide significance threshold of 6.3 × 10−5, based on 1,000 permutations. In the combined stage 1 and 2 sample, we have >99% power (80% in stage 1 alone) to detect the most strongly associated previously observed type 2 diabetes SNP, rs7903146 in TCF7L2 (912), at a study-wide significance level, and substantially less power to detect type 2 diabetes–associated SNPs with smaller effect sizes.

Nineteen of 216 gene regions have at least one SNP significantly associated with type 2 diabetes at PSNP < 0.005; among these, Pro12Ala (rs1801282) in PPARG (PSNP = 0.0025) was the only SNP that matched or was in high LD (r2 ≥ 0.8) with a previously reported variant, given the available HapMap LD information. Imputation identified 421 additional SNPs in 59 genes significantly associated with type 2 diabetes (Pimpute < 0.05) (online appendix Table 4), including SNPs in 10 genes that did not contain a significant genotyped SNP (PSNP > 0.05). We genotyped four of these initially imputed SNPs that were both significantly associated with type 2 diabetes (Pimpute < 0.05) and for which the imputation-based P value was at least five times more significant than that for any nearby genotyped SNP; three of four SNPs had highly concordant imputed and genotyped P values (online appendix Table 5).

We selected for follow-up genotyping in stage 2 samples 24 SNPs that were either significant at PSNP < 0.005 or, if a nonsynonymous variant, significant at PSNP < 0.01 (Table 1). The most significant SNPs in the combined stage 1 and 2 samples were rs4740283 in RAPGEF1 (PSNP = 0.00013), rs2021966 in ENPP1 (PSNP = 0.00026), Arg72Pro (rs1042522) in TP53 (PSNP = 0.00086), and rs1882095 in NRF1 (PSNP = 0.00096). In total, 16 SNPs were significant at PSNP < 0.05 in the combined stage 1 and 2 samples (Table 4).

To evaluate the effect of BMI, we included BMI as an additional covariate in an analysis of the additive model for all genotyped and imputed SNPs. Of 11 SNPs originally significant at Padd < 0.001, all P values were similar (Padd < 0.01) after adjustment (online appendix Table 6A). Of 16 SNPs significant at Padd < 0.001 after adjustment, two SNPs had notably less significant P values (Padd > 0.01) before adjustment; both SNPs are located at the TRIP10/C3 locus (online appendix Table 6B).

Four genotyped and 30 imputed SNPs were strongly associated (P < 0.0001) with one or more of 20 quantitative traits after combining case and control subjects by meta-analysis (see research design and methods) (Table 5 and online appendix Table 7). Variants in APOE and PPARA showed strong evidence of association with serum lipid levels, confirming previous reports (39,40). Strong novel associations (P < 1 × 10−5) were observed for rs4912407 in PRKAA2 with triglyceride level (P = 3.68 × 10−6), rs10517844 in CPE with HDL level (P = 2.07 × 10−5), and rs4689388 in WFS1 with LDL level (P = 5.30 × 10−5). We followed-up genotyped SNPs significantly associated (P < 0.0001) with one or more quantitative traits by genotyping the stage 2 samples. No SNP showed study-wide significance in the combined stage 1 and 2 samples (Table 5).

DISCUSSION

In this study, we evaluated the evidence for type 2 diabetes association for SNPs in 222 candidate genes and provided a framework for thorough analysis of association of common variation to disease using gene-based functional annotation, HapMap LD information, and imputation of genotypes. This framework could be used in the context of a GWA study or an independent investigation of candidate genes. We replicated previous type 2 diabetes association with SNPs in PPARG, KCNJ11, and SLC2A2; identified significant SNPs in genes previously implicated in type 2 diabetes risk, NRF1 and ENPP1; and identified additional genes that may influence susceptibility to type 2 diabetes and related quantitative traits, including RAPGEF1 and TP53. While some of the genes may be significant by chance, one or more may represent true susceptibility genes. We expect that true susceptibility genes identified in our sample set will, in many cases, be shared in additional populations, as the FUSION GWA study identified many of the same risk alleles as other GWA studies of European populations (913).

To assess the role of 222 genes in susceptibility to type 2 diabetes, we attempted to assess complete coverage of common (MAF > 0.05) SNPs in the HapMap CEU database. The coverage of common HapMap CEU SNPs across all 222 candidate genes using genotyped SNPs was 95.7%, a 16.7% percent improvement over the coverage of 79.0% based on the Illumina HumanHap300 genome-wide panel (Table 2). HapMap provides excellent coverage of common variation in European samples; however, there are additional non-HapMap SNPs in these gene regions (27). Of 122 genotyped SNPs not in HapMap, 10 were not tagged at an r2 threshold of 0.8 by a HapMap SNP, indicating that some of the non-HapMap variation is better covered in our study than the GWA study panel.

Our SNP that is most strongly associated with type 2 diabetes in the stage 1 and 2 samples was SNP rs4740283 (PSNP = 0.00013), located 4 kb downstream of Rap guanine nucleotide exchange factor 1 (RAPGEF1). RAPGEF1 is a ubiquitously expressed gene involved in insulin signaling (41) and Ras-mediated tumor suppression (42). rs4740283 is in strong LD with SNPs in the coding region and may affect either a regulatory element or protein function. Variation in this gene may contribute to susceptibility through reduced ability of peripheral tissues to absorb glucose in response to insulin.

The second strongest-associated SNP in the stage 1 and 2 samples was Arg72Pro in TP53 (rs1042522, PSNP = 0.00086), which was originally identified by imputation, subsequently genotyped, and not well tagged by any originally genotyped SNP (maximum r2 = 0.27 with rs2909430). TP53 encodes the tumor suppressor protein p53, and the Arg72Pro variant has a functional role in the efficiency of p53 in inducing apoptosis, possibly through reduced localization to the mitochondria (43). The risk allele Arg72 has higher apoptotic potential, which is consistent with a possible link between increased pancreatic β-cell apoptosis, impaired insulin secretion, and type 2 diabetes.

We observed significant association with SNPs in two genes previously implicated in type 2 diabetes susceptibility, nuclear respiratory factor 1 (NRF1) and the insulin-dependent facilitated glucose transporter SLC2A2. NRF1 helps regulate mitochondrial transcription and oxidative phosphorylation (44), which has a known role in insulin resistance, and the associated NRF1 variant, rs1882095, is located 1 kb downstream of the gene and not in modest LD (r2 > 0.6) with any HapMap SNP. In SLC2A2 we found supporting evidence in stage 1 for the nonsynonymous variant Thr110Ile (rs5400) (PSNP = 0.0065), as well as a previously unreported variant, rs10513684 (PSNP = 0.0046). The rs10513684 signal became slightly more significant after stage 2 genotyping (PSNP = 0.0023); however, the signal was attenuated (P = 0.18) after inclusion of Thr110Ile in the analysis.

Among the most significant type 2 diabetes–associated SNPs is rs2021966 in ENPP1 (PSNP = 0.00026). SNPs in high LD with rs2021966 are located in intron 1, in a region of strong multispecies conservation containing a pseudogene but no known transcripts. Previous studies of ENPP1 have reported associations with rs1044498 and with a related three-SNP haplotype (rs1044498, rs1799774, and rs7754561) and support a modest role in type 2 diabetes susceptibility, possibly acting through obesity (45). In our study, rs1044498 (PSNP = 0.16) and rs7754859 (PSNP = 0.18, r2 = 1 with rs7754561) were not significantly associated with type 2 diabetes (rs1799774 was not tested). The newly identified variants are in very low LD with rs1044498 (r2 < 0.05).

Although we observed significant quantitative trait associations in previously implicated genes (APOE and PPARA with serum lipid levels), no quantitative trait associations became more significant after addition of stage 2 samples (Table 5). This is likely due in part to the small number of SNPs selected for follow-up. Stage 2 genotyping of SNPs less significant in stage 1 samples will be necessary to establish whether any novel SNPs contribute to quantitative trait variability.

In any gene-based study, the definition of gene boundaries is critical but, by necessity, somewhat arbitrary. We defined a gene region as 10 kb upstream of the first known exon through 5 kb downstream of the last known exon in an attempt to capture the majority of nearby regulatory elements influencing a gene. Regulatory elements, however, can often be found up to several hundred kilobases away from a gene (46). We evaluated whether a broader definition of a gene had a substantial effect on the Pgene results by testing extended gene regions 50 kb upstream and 50 kb downstream of transcribed regions and by including HumanHap300 SNPs from these regions in our analysis. Using the extended gene boundaries, the insulin gene INS would be the most significant gene in our study (Pgene = 0.0019), driven by SNP rs10743152 (PSNP = 0.00015) located 13 kb upstream of the first exon. Other genes that had significant SNPs (Pgene < 0.05) only in the extended gene region were MAP2K1, CDK4, and IRF4.

Even using the narrow gene boundaries, several SNPs in our study may influence expression or function of other nearby or even more distant genes. Recent GWA studies have confirmed novel susceptibility variants downstream of HHEX, a gene selected for this study by CAESAR (912); the reported SNPs are located outside of the narrow gene region (−10 kb/+5 kb) in a large LD block that includes KIF11 and IDE, and we only detected nominal significance in the narrow HHEX region (PSNP = 0.037 for rs12262390). For some genes, the extent of LD surrounding significant SNPs implicates flanking genes. For example, in ARID2, rs35115 (PSNP = 0.0067) is located in intron 7 but also tags the nonsynonymous variant rs7315731 in SFRS2IP (r2 = 0.93). These examples demonstrate that defining a gene boundary requires a balance between capturing all possible SNPs influencing the gene and introducing SNPs that may be more functionally relevant to other genes. A more sophisticated approach to establish gene boundaries that defines each gene boundary separately by considering the genomic context around the gene may be helpful in future gene-based approaches.

Gene-based approaches to interpreting the results of candidate gene and even genome-wide association studies are important because most variation influencing susceptibility to type 2 diabetes and other common complex traits is currently expected to be gene centric, although the definition of a gene is constantly evolving. Detailed coverage of the common variation in these genes represents a critical requirement for an effective and thorough gene-based study. Here, we have identified genes significantly associated with type 2 diabetes and related quantitative traits that are attractive targets for future replication studies. Confirmation in a larger sample set and meta-analyses across studies will be important to help determine the role of these genes.

TABLE 1

Characteristics of the stage 1 and 2 case and control samples

TABLE 2

Coverage of 10,762 HapMap SNPs (MAF > 0.05)* within −10 kb/+5 kb of 222 candidate genes

TABLE 3

Gene regions (−10 kb/+5 kb) associated with type 2 diabetes (Pgene < 0.05) in stage 1 samples

TABLE 4

Type 2 diabetes association for SNPs genotyped in FUSION stage 1 and 2 samples, sorted by combined stages 1 and 2 PSNP

TABLE 5

Quantitative trait association results for SNPs genotyped in FUSION stage 1 and 2 samples

Acknowledgments

Support for this research was provided by National Institutes of Health (NIH) Grants DK072193 (to K.L.M.) and DK062370 (to M.B.), a postdoctoral fellowship award from the American Diabetes Association (to C.J.W.), and the National Center for Integrative Biomedical Informatics (NCIBI) at the University of Michigan (U54 DA021519). K.L.M. and G.R.A. are Pew Scholars in the Biomedical Sciences. Genome-wide genotyping was performed by the Johns Hopkins University Genetic Resources Core Facility (GRCF) SNP Center at the Center for Inherited Disease Research (CIDR), with support from CIDR NIH contract no. N01-HG-65403 and the GRCF SNP Center.

We thank the Finnish citizens who generously participated in this study, Michael Andre and Rachana Kshatriya of the University of North Carolina Mammalian Genotyping Core for Illumina GoldenGate genotyping, Amy Swift and Mario Morken of the NHGRI for stage 2 genotyping, and Kurt Hetrick, Michael Barnhart, Craig Bark, Janet Goldstein, and Lee Watkins of the CIDR for expert technical work on genome-wide Illumina Infinium genotyping.

Footnotes

  • Published ahead of print at http://diabetes.diabetesjournals.org on 4 August 2008.

    Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. See http://creativecommons.org/licenses/by-nc-nd/3.0/ for details.

    The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

    • Accepted July 21, 2008.
    • Received December 11, 2007.

REFERENCES

| Table of Contents

This Article

  1. Diabetes vol. 57 no. 11 3136-3144
  1. Online-Only Appendix
  2. All Versions of this Article:
    1. db07-1731v1
    2. 57/11/3136 most recent