Diabetes 54:1214-1221, 2005 © 2005 by the American Diabetes Association, Inc. A Single Nucleotide Polymorphism in MGEA5 Encoding O-GlcNAcselective N-Acetyl-ß-D Glucosaminidase Is Associated With Type 2 Diabetes in Mexican Americans
1 Department of Medicine, University of Texas Health Science Center, San Antonio, Texas
Excess O-glycosylation of proteins by O-linked ß-N-acetylglucosamine (O-GlcNAc) may be involved in the pathogenesis of type 2 diabetes. The enzyme O-GlcNAcselective N-acetyl-ß-D glucosaminidase (O-GlcNAcase) encoded by MGEA5 on 10q24.1-q24.3 reverses this modification by catalyzing the removal of O-GlcNAc. We have previously reported the linkage of type 2 diabetes and age at diabetes onset to an overlapping region on chromosome 10q in the San Antonio Family Diabetes Study (SAFADS). In this study, we investigated menangioma-expressed antigen-5 (MGEA5) as a positional candidate gene. Twenty-four single nucleotide polymorphisms (SNPs), identified by sequencing 44 SAFADS subjects, were genotyped in 436 individuals from 27 families whose data were used in the original linkage report. Association tests indicated significant association of a novel SNP with the traits diabetes (P = 0.0128, relative risk = 2.77) and age at diabetes onset (P = 0.0017). The associated SNP is located in intron 10, which contains an alternate stop codon and may lead to decreased expression of the 130-kDa isoform, the isoform predicted to contain the O-GlcNAcase activity. We investigated whether this variant was responsible for the original linkage signal. The variance attributed to this SNP accounted for 25% of the logarithm of odds. These results suggest that this variant within the MGEA5 gene may increase diabetes risk in Mexican Americans.
Many nuclear and cytoplasmic proteins are glycosylated on serine or threonine residues by O-linked ß-N-acetylglucosamine (O-GlcNAc) (1,2). This posttranslational modification is a dynamic and regulated process, much like protein phosphorylation (3), requiring the coordinated action of two enzymes: O-GlcNAc transferase, which uses the substrate uridine diphosphate-N-acetylglucosamine to attach a single O-GlcNAc residue and the enzyme O-GlcNAcselective N-acetyl-ß-D glucosaminidase (O-GlcNAcase), which catalyzes its removal (4). Aberrant protein glycosylation by O-GlcNAc may be involved in the pathogenesis of type 2 diabetes. Elevated levels of extracellular glucose, by providing more substrate for O-GlcNAc transferase, appears to lead to increased intracellular O-GlcNAc modification of proteins, which can perturb normal insulin signaling events (5,6). Pancreatic ß-cells are particularly vulnerable to alterations in O-GlcNAc metabolism (rev. in 7). The ß-cells are uniquely enriched with O-GlcNAc transferase and are therefore heavily dependent on the activity of O-GlcNAcase to regulate the O-glycosylation pathway. The pancreatic ß-cellspecific toxin streptozotocin, an analogue of GlcNAc that is widely used to induce diabetes in animal models, irreversibly inhibits O-GlcNAcase (6,8). The resulting accumulation of glycosylated proteins in pancreatic ß-cells may be the underlying mechanism causing ß-cell death leading to diabetes in these models (9). Altered metabolism of O-GlcNAc has also been linked to insulin resistance (10,11), and potential substrates for O-GlcNAc involved in the mechanism include glycogen synthase (12,13) as well as proteins in the insulin signaling cascade (11,14). Recent studies (12,15) have demonstrated that the removal of O-GlcNAc residues either enzymatically or by introducing virally transmitted O-GlcNAcase is sufficient to normalize cell function despite continued exposure to elevated extracellular glucose. These data suggest that O-GlcNAcase has the ability to counteract the detrimental effects of exposure to hyperglycemia. Therefore, it follows that impairment of O-GlcNAcase enzymatic activity, via alterations of the gene encoding O-GlcNAcase (menangioma-expressed antigen-5 [MGEA5]), could influence susceptibility to diabetes.
MGEA5 has been localized to chromosome 10q24.124.3 (16). The MGEA5 gene consists of 16 exons spanning We have previously reported linkage of type 2 diabetes and age at onset of diabetes to a region overlapping the MGEA5 locus on chromosome 10q in the San Antonio Family Diabetes Study (SAFADS) (19). In addition, results from a number of genome scans for type 2 diabetes and measures of insulin sensitivity in other populations including the Pima Indians (2023) have also implicated chromosome 10q as a region that might harbor a gene(s) influencing susceptibility to these traits. Therefore, in this study, we investigated MGEA5 as a positional and biological candidate gene for type 2 diabetes in the SAFADS, an extended pedigree study consisting of Mexican-American families.
Subjects used in this study were participants of the population-based SAFADS that has been described in detail elsewhere (19). Probands for SAFADS were low-income Mexican Americans with type 2 diabetes, and all first-, second-, and third-degree relatives of the probands, aged 18 years, were considered eligible for the study. The institutional review board of the University of Texas Health Science Center at San Antonio approved all procedures, and all subjects gave informed consent.
As part of a prior genome-mapping project, highly polymorphic markers providing coverage at 10- to 20-cM intervals on all autosomes were genotyped in a subset of 440 participants in the 27 most informative extended families. The genotyping methods and marker information have already been described (19,24). A genome-wide scan for type 2 diabetes susceptibility genes previously conducted using the genotypic and phenotypic data from these subjects revealed significant evidence for linkage to a region on chromosome 10q (19). Subsequently, a new genomic scan of 382 highly polymorphic markers distributed throughout the genome at
Molecular methods.
All variants identified using this sequencing strategy were genotyped in 436 individuals for whom DNA was available. Most single nucleotide polymorphism (SNP) assays were performed using the Applied Biosystems (Foster City, CA) TaqMan Allelic Discrimination methodology on an ABI Prism 7900HT Sequence Detection System. Others were genotyped using either restriction fragmentpolymorphism assays, primer extension (ABI SNaPshot; Applied Biosystems), or direct sequencing as listed in Table 1. SNP LLY-MGEA5-14 was genotyped using a restriction fragmentpolymorphism assay, and subsequently all genotypes were confirmed by sequencing.
Statistical analysis. Linkage disequilibrium (LD) between each pair of SNPs was calculated by direct correlation (|r|) between SNP genotype vectors in which individual SNP genotypes were scored as 0, 1, or, 2, depending on how many copies of the rarer allele an individual carried. This calculation is performed in SOLAR, which then produces a graphical plot of the absolute correlations among SNPs by nucleotide position. Haplotypes were estimated using the computer program MERLIN (37). For haplotypes with sufficient frequency (greater than five copies existing in the samples), haplotype score vectors were then generated with elements containing a 0, 1, or 2, depending upon the number of copies of a specific haplotype that an individual carried.
To test the association between each SNP or haplotype and the traits diabetes and age of onset of diabetes (i.e., the Martingale residual), a measured genotype approach (38) was used, with the allele counts at individual SNPs or the haplotype counts at all SNPs jointly serving as the measured genotypes. This method accounts for the relatedness among family members by estimating the likelihood of genetic models given the pedigree structure. The likelihood for a model in which the trait mean is allowed to vary according to genotype was compared with a nested model in which the genotypic means were restricted to be equal to each other. The significance of the association was tested by likelihood ratio tests, which compare the difference in the likelihoods of the full and nested models. Two times the difference between the logarithm of the likelihoods of the two models is distributed asymptotically as a To assess whether a SNP accounted for the linkage signal, linkage on chromosome 10q was reevaluated conditional on the measured genotype effects. By including a genotype-based covariate in the model of the trait mean, the variance attributed to it is removed from the linkage model. If the measured genotype is the sole functional variant in this region of linkage that is influencing the trait, then identity-by-descent allele sharing should provide no additional information, and the LOD score in the conditional linkage analysis should drop to nearly zero. If the genotyped variant is one of several functional variants or is in LD with the true functional variant, not all of the quantitative trait locivariance will be absorbed into the mean effects model, and some evidence for linkage should remain in the conditional analysis. This method and background are described in Almasy and Blangero (41).
The data from a total of 436 individuals, aged 1797 years, were used for this study. The characteristics for these subjects by diabetes status are presented in Table 2. The age and age-adjusted (2) heritability (h2 ± SE) for diabetes was 0.63 ± 0.16 (P < 0.0001), while the heritability for the Martingale residual was 0.23 ± 0.081 (P = 0.0002). The Martingale residuals for diabetes age of onset meet the prerequisites of the variance component method used (all phenotypes are within 4 SDs of the mean, kurtosis of 0.49, residual kurtosis of 0.52, and skewness of 0.37).
Sequencing of the 44 selected SAFADS subjects identified SNPs in this locus, of which 19 are novel. The minor allele frequency ranged from 0.02 to 0.25, as described in Table 3. Pairwise LD tests were conducted with all SNP genotypes. As can be seen from Fig. 1, there is weak or no association between LD and physical distance in this particular gene, suggesting that LD in this region is unpredictable. For example, SNPs LLY-MGEA5-4 and LLY-MGEA5-12 have a minor allele frequency >15% and are <1 kb apart but exhibit very little LD. In contrast, common SNPS LLY-MGEA5-23 and LLY-MGEA5-16 are >10 kb apart and are in complete LD. Also, rare (minor allele frequency <5%) SNPs LLY-MGEA5-22, LLY-MGEA5-3, and LLY-MGEA5-5 span >7 kb and are in near complete LD, while rare SNPs LLY-MGEA5-1 and LLY-MGEA5-2 are only 12 bp apart and exhibit no LD. The average absolute correlation among the 24 SNPs was 0.133. This is quite low. However, four sets of SNPs showed high intraset correlation. The highly correlated sets are LLY-MGEA5-4, -9, and -23; LLY-MGEA5-10, -12, and -13; and LLY-MGEA5-3, -5, and -22. The members of each of these sets exhibit a correlation of at least 0.95 with every other member of the set. The 24 SNPs behave statistically like 20.8 independent SNPs using the method described by Nyholt (39). This level of observed nonindependence among SNPs requires, using Bonferronis correction, that we observe a P value <0.002460 (or negative log P > 2.6091) to obtain an experiment-wide P value 0.05.
Individual association tests indicated association of two SNPs with the traits diabetes age of onset or diabetes as shown in Figs. 2 and Table 3. Significant association of SNP LLY-MGEA5-14 was observed with the traits diabetes age of onset (P = 0.0017) and diabetes (P = 0.0128). The risk for diabetes was 2.77 times greater for subjects carrying one copy of the T allele for SNP LLY-MGEA5-14 compared with those subjects with two A alleles. No homozygotes for the T allele were observed. In addition, SNP LLY-MGEA5-20 was moderately associated with diabetes age of onset (P = 0.0336). Both SNPs were only present in the heterozygous state; therefore, only association tests using additive (in this case, equivalent to dominance) models were done. Using the quantitative trait disequilibrium method of Abecasis et al. (40), we observed no evidence for hidden stratification for these SNPs (data not shown). Haplotype analyses revealed that the rare variants for MGEA5-14 and MGEA5-20 reside on distinct haplotypes. Association tests using haplotype information for all SNPs did not reveal any stronger association than the individual SNPs themselves.
After correcting for multiple testing as described above, the association of SNP LLY-MGEA5-14 with the trait diabetes age of onset remained significant, so next we investigated whether this variant was responsible for the original linkage signal. Variance components linkage analysis conditional on the SNP genotypes as fixed effects was conducted. As shown in Fig. 3A, the variance attributed to SNP LLY-MGEA5-14 accounted for 25% of the LOD score for the trait diabetes age of onset. The LOD dropped from 3.77 to 2.84 when SNP LLY-MGEA5-14 was in the model as a fixed effect. As an exploratory analysis, we separated our families based on the presence of the T allele in at least one family member. Following the nomenclature of Silander et al. (42), those families carrying the T allele were identified as "at risk" (12 families), and those in whom the T allele was not present were identified as "not at risk" (15 families). Linkage analysis of the at-risk and not-at-risk families indicated that nearly all of the evidence for linkage on chromosome 10q was observed in the former (Fig. 3B). The peak LOD in all 27 families was 3.77. The peak LOD in the at-risk families was 3.75, and the LOD in the not-at-risk families was 0.48. The average number of family members for whom both phenotypic and genotypic information were available in the at-risk and not-at-risk families was 21.2 (range 241) and 12.6 (range 423), respectively. Therefore, the at-risk families are larger on average and may be contributing more to linkage simply due to the greater number of relative pairs for which identity-by-descent information is available.
Characteristics of the subjects who carried a T allele at SNP MGEA5-14 were compared with those subjects who do not. As shown in Table 4, the mean age and BMI were not statistically different between the groups (P = 0.82 for age, P = 0.40 for BMI) when using a measured genotype approach to account for family relations. However, as stated above, age of diabetes onset was significantly lower and the prevalence of diabetes was significantly higher in the individuals that carried a T allele at this SNP.
The gene encoding O-GlcNAcase, MGEA5, is an appealing candidate gene for type 2 diabetes. Accumulating evidence using animal models suggests that impairment of the enzyme activity may impair pancreatic ß-cell function and/or lead to insulin resistance, thereby enhancing susceptibility to diabetes. We have therefore investigated MGEA5 as a positional and biological candidate gene for type 2 diabetes and age of onset of diabetes in the SAFADS, an admixed population of European and Native-American origin.
We identified variants in the gene by resequencing the coding and potential regulatory regions of the locus in diabetic and nondiabetic subjects. Twenty-four SNPs were identified in the To determine whether the variance attributed to SNP LLY-MGEA5-14 accounted for our linkage signal, we reevaluated linkage on chromosome 10, conditional on the measured genotype effect. By including LLY-MGEA5-14 genotypes as a covariate in the model, the LOD score for the trait diabetes age of onset dropped by 25%, indicating that this variant accounts for a considerable part of the observed LOD. The point estimate of the residual LOD is still significant (P = 0.00015), however suggesting that other as yet unidentified variants, such as those in nonconserved introns or more distant regulatory regions, in this gene may be involved as well. We are currently expanding our resequencing efforts to screen more regions of the gene for further investigation. It is also possible that a cluster of genes underlies the linkage signal, and variation in those genes may account for the remaining LOD score. Alternatively the LLY-MGEA5-14 variant may be in LD with another variant in this region. It is also interesting to note that this SNP is only present in the 12 families that contributed nearly all of the evidence for linkage to chromosome 10q, yet conditional linkage results indicate that variation at this SNP clearly does not account for all of the linkage. In this case, identity-by-descent allele sharing among the relatives of those 12 families is providing additional information for linkage to indicate that additional variation at this locus is influencing the trait. That is, although the at-risk families are responsible for nearly all of the evidence for linkage, the LLY-MGEA5-14 SNP itself is not. Farook et al. (43) previously examined the MGEA5 gene as a candidate gene in the Pima Indians and reported no evidence to support its involvement in susceptibility to diabetes or insulin resistance. Based on SNP discovery efforts conducted on 30 subjects, that study identified only two variants in the gene regions that were screened. One SNP was located in the putative promoter with a minor allele frequency of only 2% and was not analyzed. We did not observe this SNP in any of the 436 subjects in our study (data not shown). The second SNP corresponded to dbSNP entry rs2305194 and showed no association with any indexes of insulin resistance. SNP rs2305194 was observed in the SAFADS subjects with a minor allele frequency of only 23% compared with 40% in the Pima Indians, and likewise, no association was observed with this SNP and the traits diabetes or diabetes age of onset. While Farook et al. investigated similar regions of MGEA5 and found only two SNPs, this study identified many additional SNPs. This could be due to a variety of reasons, such as the use of different methods for variant detection (we did not pool samples), differences in population admixture (the SAFADS population consists of an admixed population of Native and European Americans, while the Pimas are primarily of Native-American descent), and the screening of additional intronic regions in this study. Eight of the SNPs identified in this study were located in conserved regions of introns 10 and 11 that were not screened in the previous study. Interestingly, the SNPs exhibiting association with diabetes traits in this study are located in this region of the gene. In conclusion, this study provides the first evidence that the gene encoding O-GlcNAcase may be a susceptibility locus for type 2 diabetes in humans. The relative risk for diabetes attributed to having the rare allele for the associated SNP LLY-MGEA5-14 is substantial in this population. Future functional studies are planned to determine whether this variant located in intron 10 results in impairment of the O-GlcNAcase enzyme activity.
This research was supported by grants from the National Institutes of Health (R01-DK-42273, R01-DK-47482, R01-DK-53889, MH-59490, and P50DK061597) and a Junior Faculty Award from the American Diabetes Association (to D.M.L.). We thank the participants of SAFADS and are grateful for their participation and cooperation. We are also very appreciative of the support from Dr. Jude Onyia for this project.
Additional information for this article can be found in an online appendix at http://diabetes.diabetesjournals.org. Address correspondence and reprint requests to Donna M. Lehman, Department of Medicine/Clinical Epidemiology, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr., San Antonio, Texas 78229. E-mail: lehman{at}uthscsa.edu Received for publication September 30, 2004 and accepted in revised form December 9, 2004
Abbreviations: LD, linkage disequilibrium; LOD, logarithm of odds; MGEA5, menangioma-expressed antigen-5; O-GlcNAc, O-linked ß-N-acetylglucosamine; O-GlcNAcase, O-GlcNAcselective N-acetyl-ß-D glucosaminidase; SAFADS, San Antonio Family Diabetes Study; SNP, single nucleotide polymorphism
This article has been cited by other articles:
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||