Searching for Type 2 Diabetes Genes on Chromosome 20
Genome scans in families with type 2 diabetes identified a putative locus on chromosome 20q. For this study, linkage disequilibrium mapping was used in an effort to narrow a 7.3-Mb region in an Ashkenazi type 2 diabetic population. The region encompassed a 1-logarithm of odds (LOD) interval around the microsatellite marker D20S107, which gave a LOD score of >3 in linkage analysis of a combined Caucasian population. This 7.3-Mb region contained 25 known and 99 predicted genes. Predicted single nucleotide polymorphisms (SNPs) were chosen from public databases and validated. Two SNPs were unique to the Ashkenazi. Here, 91 SNPs with a minor allele frequency of ≥10% were genotyped in pooled DNA from 150 case subjects and 150 control subjects of Ashkenazi Jewish descent. The SNP association study showed that SNP rs2664537 in the TIX1 gene had a significant P value of 0.035, but the finding did not replicate in an additional case pool. In addition, HNF4a and Mybl2 were screened for mutations and new polymorphisms. No mutations were identified, and a new nonsynonymous SNP (R687C in exon 14 of Mybl2) was found. The limits to this type of association study are discussed.
Type 2 diabetes is a complex metabolic disorder characterized by abnormal hepatic glucose output, insulin resistance, and impaired insulin production (1). Multiple environmental and genetic factors contribute, and genome scans in families with multiple affected individuals from several racial/ethnic groups have been undertaken (Table 1). A number of potential loci have been identified, but in general, the evidence for linkage has not been strong, and the regions identified have been quite broad. A major question remaining is how to proceed with the search for complex disease genes, knowing that a single gene is neither necessary nor sufficient and that recombinant mapping in families will not suffice. The next phase of the search for diabetes susceptibility genes will likely require new strategies.
A genome scan with microsatellite markers at an average distance of 9.5 cM was completed in type 2 diabetic sibling pairs (n = 472) of Ashkenazi Jewish descent (2). The Ashkenazi population is relatively young and homogeneous, having undergone several constrictions and expansions resulting in reduced genetic heterogeneity in comparison to that of most Western Caucasian subjects. Studies of DNA polymorphisms have suggested that present-day Ashkenazi Jews descended from a small founder population, numbering perhaps as few as 10,000 individuals who existed in Eastern Europe at about 1500 AD (3). Today, there are about 10 million, representing a 1,000-fold expansion in roughly 20 generations. In the genome scan, five regions on four chromosomes exhibited nominal evidence for linkage (P < 0.05) (2). A maximal signal of Z = 2.05 was observed on chromosome 20 near D20S195. Because four other groups had previously reported evidence for linkage in the same region of chromosome 20q in Caucasians (see summary in Table 1), this region was further considered.
Subsequent to the Ashkenazi type 2 diabetes genome scan, several investigators contributed linkage data for chromosome 20 to the International Type 2 Diabetes Linkage Analysis Consortium (http://www.sfbr.org/external/diabetes), a collaborative effort organized principally by the National Institute of Diabetes and Digestive and Kidney Diseases to map genes for the disease. Common markers on chromosome 20 were genotyped, and results were combined for a total of 1,852 families. The initial results of this analysis suggested a peak of linkage at D20S107 with a logarithm of odds (LOD) of >3. We therefore decided to target this region in our search in Ashkenazi patients with type 2 diabetes.
Gene mapping by linkage disequilibrium analysis in related families identifies broad conserved regions of chromosomal DNA. In contrast, “unrelated” affected individuals share smaller conserved chromosomal regions because there are many more meioses, resulting in greater recombination around the region harboring the disease gene. Theoretically, polymorphic markers in linkage disequilibrium with the disease locus can be used to find associations of regions containing the gene mutation.
Linkage disequilibrium mapping has been shown to be an important tool for fine-mapping of monogenic diseases (10–12), and recently there has been success in identifying a gene involved in a complex disease—inflammatory bowel disease (13). However, there is little doubt that linkage disequilibrium mapping for most other complex diseases will be more difficult (14). The distance over which disequilibrium extends between markers and disease loci is not well understood nor is the degree of genetic risk contributed by any particular locus, suggesting that genotyping closely spaced markers in many case and control subjects would be required. Single nucleotide polymorphisms (SNPs), whereas biallelic, have been preferred over simple sequence repeat polymorphisms for this type of analysis because SNPs are more abundant in the genome (15). As a general rule, the extent of linkage disequilibrium or association between a marker and a disease locus depends on the genetic distance between the two and the number of generations that have occurred since the mutation originated (16). In isolated populations such as the Ashkenazi Jews, linkage disequilibrium has been shown to extend over a broad region (3). We therefore undertook a search for diabetes susceptibility gene(s) on chromosome 20q in Ashkenazi Jewish unrelated patients with diabetes. Here we report the initial results of our association studies with 91 validated SNPs spanning a 1-LOD interval (7.3 Mb) around the microsatellite marker D20S107 candidate region on chromosome 20. DNA from case and control subjects was genotyped in pools by a recently described method involving pyrosequencing technology (17). No significant associations have been found to date.
RESEARCH DESIGN AND METHODS
Type 2 diabetic case subjects and control subjects were of Ashkenazi Jewish descent as described (2). As shown in Table 2, the control subjects were significantly older than the case subjects by design because control subjects were selected for old age and absence of diabetes.
DNA isolation, quantification, and construction of pools.
The DNA samples were isolated from whole blood using the Puregene kit as described (Gentra Systems, Minneapolis, MN). DNA was quantified with the TKO 100 Mini-Fluorometer and Hoechst dye method as described (Hoefer Scientific Instruments, San Francisco, CA). For the purposes of creating DNA pools, efforts to accurately determine DNA concentrations for each sample are critical because errors will skew the proportion of each genotype in the pool. Spectrophotometric analysis was avoided because substances such as protein and salts may give spurious results (18). The DNA samples were gently mixed on a rocking platform to ensure homogeneity before pipetting. Equal volumes of each sample were delivered to a sterile 55-ml polypropylene solution basin (Labcor Products, Frederick, MD) using an accurately calibrated multichannel pipette. After mixing gently and thoroughly overnight, the pooled DNA was placed into 1.0-ml aliquots in sterile 1.5-ml polypropylene microtubes and stored at 4°C in the dedicated refrigerator. As a quality control, the uniformity of the mixing procedure was verified by genotyping replicate aliquots of the pools for several SNPs.
The reaction consisted of 2.5 μl GeneAmp 10× Buffer II (Applied Biosystems, Foster City, CA), 2.0–3.0 μl of 25 mmol/l MgCl2 solution, 0.5 μl of each 20 mmol/l dNTP (Amersham Pharmacia, Piscataway, NJ), 1 μl of 10 pmol/μl 5′ biotin-triethylene glycol-labeled high-performance liquid chromatography (HPLC)-purified primer and 1 μl of 10 pmol/μl unlabeled primer (IDT, Coralville, IA), 0.25 μl (1.25 units) Amplitaq Gold (Applied Biosystems), 2.5 μl of 10 ng/μl DNA (pooled or individual), and sterile water to 25 μl total volume. Thermal cycling was done interchangeably on a GeneAmp 9700 (Applied Biosystems) or PTC-200 (MJ Research, Watertown, MA) using the following profile: heated lid, 95°C for 10 min × 1 cycle/95°C for 45 s/annealing temperature for 45 s/72°C for 1 min × 45 cycles to 4°C. The annealing temperature varied from 56 to 62°C. Forty-five cycles ensured that all PCR components were exhausted. PCR primers were designed with Primer3 Software (code available at http://www-genome.wi.mit.edu/genome_software/other/primer3.html) (19), and the predicted reaction conditions (annealing temperature and MgCl2) were tested on several nonessential DNA samples. Fragment sizes between 100 and 500 bp were successfully analyzed.
PCR plate setup, template preparation, and pyrosequencing.
There were 96-well plates (8 × 12) set up with eight replicates of the case and control pools. A variable number of replicates from three to ten were tested, and it was found that eight replicates most consistently resulted in an SD of ≤2%. Template preparation and pyrosequencing (Pyrosequencing AB) was conducted as described (17). Genotypes for a 96-well plate were generated in 10 min.
Allele quantification software.
Allele frequencies in the samples were assessed by SNP Software AQ (Pyrosequencing AB) as described, and the data were exported to an Excel (Microsoft) spreadsheet for further analysis.
The PCR was as previously described. Heteroduplex DNA was formed and analyzed as described with the Wave (Transgenomic, Omaha, NE).
For each SNP assay, a reference individual and three DNA pools of 42 individuals from African-American, Asian (10 Chinese/32 Japanese), and Caucasian (the SNP Consortium DNA panels, Coriell Institute, Camden, NJ, http://arginine.umdnj.edu) populations were amplified by PCR. The Caucasian pool was used as a reference for Ashkenazi study. PCR products were cycle-sequenced using one of the original PCR primers and fluorescent dye-terminators and electrophoresed on an ABI 3700 (Applied Biosystems). For analysis, the allele frequencies in the pooled samples were estimated by comparison of the nucleotide peak heights in the electropherogram (20).
The P values quantifying the significance of the difference between pools of case and control subjects were calculated using the two-sample test for binomial proportions (normal theory test) (21), including twice the measurement variance in the calculation of the Z statistic.
The microsatellite markers bordering the 1-LOD interval around D20S107 include RPN2 and D20S911 at 35.7 and 43 Mb, respectively. This 7.3-Mb region contains a total of 25 known and 99 predicted genes (NCBI Human Genome Map, Build 28) (http://www.ncbi.nlm.nih.gov). All of the known genes in this region are shown in Table 3, along with the indication of those genes with expressed sequence tags (ESTs) expressed in pancreatic islets found in UniGene (http://www.ncbi.nlm.nih.gov/UniGene). As can be seen, seven of the known genes and two predicted genes were found with islet ESTs, and one gene (MAFB) showed relatively high expression in islets.
Because it would be difficult to directly sequence all 124 known and predicted genes within the putative at-risk region for a significant number of patients with diabetes, linkage disequilibrium mapping was used to evaluate the entire 7.3-Mb region. SNPs were identified in public databases and validated in the Ashkenazi population through a collaborative arrangement with a member of the SNP Consortium (P. Kwok, Washington University Medical School). Direct sequencing of pooled DNA from Caucasian, Asian, and African-American individuals was conducted. SNPs with minor allele frequencies greater than 10% in the Caucasian subjects were then tested in pooled samples of DNA from Ashkenazi subjects. Interestingly, only ∼50% of the SNPs submitted for validation were found to have a minor allele frequency of ≥10%. The validation data were entered into the public SNP database 1 week after testing (http://www.ncbi.nih.gov/SNP). The results for the validated SNPs for the four racial/ethnic groups are shown in Table A1 (in the appendix). The SNPs rs736823 and rs932440, which were monomorphic in the Caucasian subjects, had minor allele frequencies of 42.2 and 11.1, respectively.
The allele frequencies of SNPs between Ashkenazi case subjects and control subjects (n = 300 each) were tested. As shown in Table A2, a total of 91 SNPs were examined. Of these, 65 SNPs were located in known or predicted genes, and 26 were intragenic. Statistical analyses (Fig. 1) showed that the allele frequency for one SNP at TIX1 appeared to differ (13.2 vs. 20.8%T for control and case subjects, respectively; P = 0.035) but on replication was found to have no difference (17.8%T in case pool 2, P = 0.22 vs. control) (Table A2 and Fig. 1).
Two genes in the region, Mybl2 and HNF4a, were screened by denaturing HPLC for exonic mutations and new polymorphisms. Mybl2 was screened because of the marginally significant P value of 0.057 for rs419842. HNF4a has been described in maturity-onset diabetes of the young (MODY)-1 (22) and as a possible type 2 diabetes candidate gene and has not yet been examined in Ashkenazi subjects. No coding or obvious splicing mutations were identified in either gene. However, in Mybl2, an unpublished nonsynonymous SNP R687C in exon 14 was identified (nt2186 C to T, accession number NM 002466) (data not shown).
In addition, allele frequencies for reported type 2 diabetes candidate gene SNPs were determined for the islet ATP-sensitive K+ channel (KIR6.2 and SUR), peroxisome proliferator-activated receptor (PPAR)-γ, three SNPs for Calpain 10, and two SNPs in insulin receptor substrate 1, each previously shown in more than one study to be associated with type 2 diabetes (23,24). No differences were found between allele frequencies in case and control subjects for these candidate SNPs (Table 4).
This study involved SNP association in pooled DNA for a 7.3-Mb chromosomal region on 20q containing a total of 124 known and predicted genes. Allele frequencies were assessed in DNA pools rather than individual genotypes to expedite the study and decrease costs. By using pyrosequencing technology, allele frequencies in the pooled DNAs occurred within 2% of those frequencies defined by individual genotypes (17).
One SNP (TIX1) out of 91 tested showed marginal significance in case subjects versus control subjects, which was not replicated in a second pool of type 2 diabetic case subjects. Despite the lack of association to a specific SNP at this preliminary stage in the study, the occurrence of significant LOD scores along chromosome 20q from four racial/ethnic group studies supports the hypothesis that a genetic element contributing to type 2 diabetes is present. However, there are several limiting factors. First, each of the chromosome 20q peaks are fairly broad and encompass ∼10–20 Mb of DNA. This in turn makes it difficult to anticipate the risk of not identifying the disease genes in the region around D20S107 and to determine whether a shift in focus more centromeric toward D20S195 or more telomeric toward D20S197 should be indicated. Second, lack of reported SNPs in this region mandates denaturing HPLC and sequencing of the genome in an effort to find more. For example, MAFB, a transcription factor, had fourfold more expression in islet cDNAs than any other gene, and only one SNP has been tested. Testing of 91 SNPs does not constitute an adequate evaluation of a 7.3-Mb region, and the SNP association study is still in progress. Further, we used the NCBI Chromosome Mapviewer as a roadmap for our SNP work. There have been at least five different “builds” or updates to the map since we started about 1 year ago. The map has expanded from 25 to possibly 124 genes in the region. It is evolving and so are our studies.
A major question remains as to what SNP density is required for accurate analysis. Does this mean typing one SNP every 10 kb requiring ∼730 SNPs or one SNP every 1 kb requiring ∼7,300 SNPs? A public project initiative to identify linkage disequilibrium blocks in the human genome may facilitate an answer to this question; however, this project is at least 2 years from completion. Furthermore, the differences in SNP allele frequencies between Ashkenazi and Western Caucasian subjects (Table A1) suggest that a unique haplotype map for isolated Caucasian subjects may be necessary. On the positive side, five groups have identified essentially the same region as harboring a putative diabetes gene(s) on chromosome 20q. In fact, a gene associated with type 2 diabetes, PPAR-γ, on chromosome 3p has not given a positive linkage signal in any of the Caucasian genome scans, suggesting perhaps that the signal on chromosome 20q confers a greater risk than that for PPAR-γ. The results of the current study indicate that the process of linkage disequilibrium mapping to narrow regions of linkage for complex diseases such as type 2 diabetes will be a difficult one.
This work was supported in part by National Institutes of Health Grants DK16746, DK07120, and DK49583 (to M.A.P.).
Address correspondence and reprint requests to M.A. Permutt, 660 S. Euclid Ave., Campus Box 8127, Saint Louis, MO 63110. E-mail:.
Received for publication 20 March 2002 and accepted in revised form 15 May 2002.
LOD, logarithm of odds; PPAR, peroxisome proliferator-activated receptor; SNP, single nucleotide polymorphism.
The symposium and the publication of this article have been made possible by an unrestricted educational grant from Servier, Paris.