Analysis of single nucleotide polymorphisms identifies major type 1A diabetes locus telomeric of the major histocompatibility complex.

OBJECTIVE
HLA-DRB1*03-DQB1*0201/DRB1*04-DQB1*0302 (DR3/4-DQ8) siblings who share both major histocompatibility complex (MHC) haplotypes identical-by-descent with their proband siblings have a higher risk for type 1A diabetes than DR3/4-DQ8 siblings who do not share both MHC haplotypes identical-by-descent. Our goal was to search for non-DR/DQ MHC genetic determinants that cause the additional risk in the DR3/4-DQ8 siblings who share both MHC haplotypes.


RESEARCH DESIGN AND METHODS
We completed an extensive single nucleotide polymorphism (SNP) analysis of the extended MHC in 237 families with type 1A diabetes from the U.S. and 1,240 families from the Type 1 Diabetes Genetics Consortium.


RESULTS
We found evidence for an association with type 1A diabetes (rs1233478, P = 1.6 x 10(-23), allelic odds ratio 2.0) in the UBD/MAS1L region, telomeric of the classic MHC. We also observed over 99% conservation for up to 9 million nucleotides between chromosomes containing a common haplotype with the HLA-DRB1*03, HLA-B*08, and HLA-A*01 alleles, termed the "8.1 haplotype." The diabetes association in the UBD/MAS1L region remained significant both after chromosomes with the 8.1 haplotype were removed (rs1233478, P = 1.4 x 10(-12)) and after adjustment for known HLA risk factors HLA-DRB1, HLA-DQB1, HLA-B, and HLA-A (P = 0.01).


CONCLUSIONS
Polymorphisms in the region of the UBD/MAS1L genes are associated with type 1A diabetes independent of HLA class II and I alleles.

G enetic susceptibility to immune-mediated diabetes is determined by polymorphisms/mutations in multiple genes in both human and animal models. This includes rare immunologic syndromes such as the autoimmune polyendocrine syndrome, type 1 (APS-1), and the immune dysfunction, polyendocrinopathy, enteropathy, X-linked (IPEX) syndrome, in which mutations of single genes greatly enhance the risk of developing autoimmune diabetes (1)(2)(3). The majority of individuals born with mutations of the FOXP3 gene (IPEX syndrome) develop diabetes as neonates or young infants, indicating that, in the absence of FOXP3dependent function (most likely because of the lack of a major class of regulatory T-cells), there is high risk for immune destruction of islet ␤-cells (4). In contrast, for common forms of type 1A diabetes, single genes with major influence outside of the major histocompatibility complex (MHC) region have apparently been ruled out, because whole genome association of large and independent populations have not identified single non-MHC genes with high odds ratios for diabetes risk (5). The MHC is the principal determinant for type 1A diabetes risk, with the largest contribution attributed to the HLA-DR/DQ alleles, including contributions from HLA-DRB1*04 subtypes and HLA-DP alleles (6,7).
Several reports have suggested the existence of MHC and MHC-linked loci that contribute to diabetes risk in addition to the HLA-DR/DQ loci (8 -12). We have recently reported that risk for anti-islet autoimmunity may be as high as 80% for HLA-DRB1*0301-DQB1*0201/DRB1*04-DQB1*0302 (DR3/4-DQ8) siblings who share both of their MHC haplotypes with their proband identical-by-descent, but risk is only 20% for those DR3/4-DQ8 siblings who share one or no haplotypes identical-by-descent with their sibling proband (13). This strongly implicates major diabetogenic MHC-linked alleles not accounted for by molecular genotyping of known HLA-DR/DQ sequences.
There is a classic body of work documenting existence of what have been termed "extended" or "ancestral" MHC haplotypes on which a series of polymorphisms extending over ϳ1 million nucleotides are in almost total linkage disequilibrium (14 -16). With the advent of high-density single nucleotide polymorphism (SNP) analysis and resequencing of extended regions of the MHC, extended haplotypes, such as the HLA-DRB1*03, HLA-B*08, HLA-A*01 (8.1) haplotype, have been shown to have over 99% conservation for 3.2 Mb of the MHC region, including alleles of the HLA-DRB1, HLA-DQB1, HLA-B, HLA-C, and HLA-A genes (17,18). The 8.1 haplotype is a common haplotype that is present in ϳ9% of Caucasian MHC control haplotypes and 18% of MHC case haplotypes from families with type 1A diabetes (19). Even though there are other conserved extended haplotypes (e.g., HLA-DRB1*03, HLA-B*18, HLA-A*30) (20), none have even half the frequency of the 8.1 haplotype in the Caucasian population with type 1A diabetes (16). We report here strong evidence for an association between type 1A diabetes and a locus in the UBD/MAS1L region that is apparently independent of HLA class II and I alleles and also independent of linkage disequilibrium with the common MHC 8.1 haplotype.

RESEARCH DESIGN AND METHODS
We analyzed three datasets (Table 1), beginning with a survey study that we called "Illumina 1." In Illumina 1, samples from 45 families (143 individuals) of children in the prospective cohort Diabetes Autoimmunity Study of the Young (DAISY) were selected for analysis at 656 MHC SNPs. DAISY has HLA-typed Ͼ30,000 newborns in Denver Colorado at the HLA-DR/DQ loci with informed parental consent and Institutional Review Board (IRB) oversight at the University of Colorado at Denver and Health Sciences Center (21,22). A subset of these children at high, intermediate, and low risk for type 1 diabetes by family history and HLA type were followed with measurements of autoantibody levels (insulin, GAD65, and IA-2) (22). Given preliminary results from the Illumina 1 analysis, a follow-up study, "Illumina 2," was performed to refine the regions of association in Illumina 1 and extend the coverage of the telomeric end of the MHC. In the Illumina 2 study, DAISY families (n ϭ 165 families, 523 individuals, 138 families independent of the Illumina 1 study) and Human Biological Data Interchange (HBDI) families (n ϭ 72 families, 241 individuals) were analyzed at 364 SNPs. All of the 237 families in the Illumina 2 study were independent of the Illumina 1 study except for 27 DAISY families who were included in both studies (families with a prospectively followed child at extremely high risk for developing type 1A diabetes [high-risk DR3/4-DQ8 genotype and a relative with type 1A diabetes]). As a replication study of Illumina 2, we analyzed data provided by the Type 1 Diabetes Genetics Consortium (T1DGC) ("Illumina 3"). This study included 1,240 families (6,297 individuals, primarily affected sib-pairs and their parents) from the British Diabetes Association, Danish, Joslin, HBDI, and U.K. populations.
We designed custom SNP panels for our Illumina 1 (656 SNPs) and Illumina 2 (364 SNPs) analyses. The Illumina 1 SNP panel covered 3 million base pairs of the classic MHC and included mapping and exon-centric SNPs with an average of one SNP every 6.3 kb and a maximum interval of 30 kb. The Illumina 2 custom SNP panel concentrated on nominally significant SNPs from the Illumina 1 study and SNPs telomeric of the classic MHC. We selected 53 SNPs in the classic MHC region (including nominally significant SNPs from the Illumina 1 analysis [P Ͻ 0.01 within the peak of association around the HLA-DRB1 and -DQB1 genes or P Ͻ 0.05 outside of the HLA-DR/DQ major peak of association], SNPs near genes of interest [MICA, TNF, and HLA-DPB1], and 12 "sentinel" SNPs with alleles defining the DR3, B8, A1 [ overlapping SNPs, 2,918 of 3,072 SNPs were successfully typed for a 95% SNP success rate). Illumina 2 and Illumina 3 had 83 SNPs in common for our combined study of the classic MHC. In addition, rs1233478, a highly significant SNP in the Illumina 3 analysis that was not included in the initial Illumina 2 panel, was genotyped with Taqman probes in the Illumina 2 study samples (hybridization/extension reaction with a fluorescence detection system developed by Applied Biosystems).
For all three Illumina SNP datasets, the PedCheck program (http:// watson.hgen.pitt.edu) (23) was used to ensure that the SNP genotype data in each family demonstrated a Mendelian inheritance pattern. Merlin software (www.sph.umich.edu/csg/abecassis/Merlin) (24) was used to phase the SNP genotype data from families into haplotypes. For those SNPs for which phase was ambiguous for an individual, we excluded that haplotype from analysis for that specific SNP. Using affected family-based control (AFBAC) methodology, chromosomes were assigned case status if they were present in any family member who was anti-islet autoantibody positive (Illumina 2) or diagnosed with diabetes (Illumina 2 and Illumina 3) (25). Chromosomes were assigned control status if they were in unaffected families or were not carried by any affected family members (25).
SNP allele frequencies from case versus control parental chromosomes were analyzed with the Fisher's two-tailed exact test (25). To verify that statistically significant findings in Illumina 3 were not caused by population stratification, the transmission disequilibrium test was also computed using the mapping panel data from Illumina 3 with Haploview software (26) (http://www.broad.mit.edu/mpg/haploview). Prism software (www.graphpad-.com/prism) was used to perform survival curve analysis, using the log-rank test with the ␣ level for significance set at 0.05. SAS software (www.SAS.com) was used to perform logistic regression analysis.

RESULTS
The first two projects performed involved association analyses of MHC SNPs in families with type 1A diabetes from DAISY (a study of prospectively followed HLAcharacterized children at risk for type 1 diabetes who were selected for prospective follow-up after HLA typing Ͼ30,000 newborns) and the HBDI study. SNP association analyses were performed by comparing allele frequencies in case versus control chromosomes using AFBAC methodology (25). In a survey study of 45 DAISY families, the "Illumina 1" study, we detected a cluster of six SNPs that were nominally associated with type 1 diabetes (P Ͻ 0.05) in a region telomeric of the HLA-F locus (29.2-29.5 Mb) at the telomeric end of the classic MHC. We were particularly interested in this region with significant SNPs given its long distance (over 3 million nucleotides) from the diabetes-associated HLA-DR/DQ alleles.
We performed a follow-up project in a larger panel of DAISY and HBDI families, the "Illumina 2" study, genotyping a greater density of SNPs telomeric of the HLA-F gene, and with extension of the SNP analysis 6 Mb farther telomeric of the MHC. In the Illumina 2 study, we also genotyped 12 sentinel SNPs reported to define the 8.1 haplotype (boxed in Fig. 1) and SNPs in the MHC region that were significantly associated with diabetes and autoimmunity in the Illumina 1 study (17). We again found a cluster of nominally significant SNPs telomeric of the HLA-F gene, some with far more significant P values (Fig.  1, seven SNPs with P Ͻ 1.0 ϫ 10 Ϫ5 ). The most significant SNP in this cluster, rs389419, is located 2 kb telomeric of the ubiquitin D (UBD) gene. As expected, many of the SNP For more convincing replication, we compared MHC SNP allele frequencies in case and control chromosomes from family data provided by the T1DGC from three countries (U.K., U.S., and Denmark), the "Illumina 3" study. In this analysis of the T1DGC data, rs389419 was again significantly associated with diabetes (P ϭ 2.3 ϫ 10 Ϫ12 , Fig. 2). The most significant SNP telomeric of the HLA-F gene was rs1233478, located 48 kb telomeric of the UBD gene and 23 kb centromeric of the MAS1L gene (P ϭ 2.7 ϫ 10 Ϫ23 , odds ratio [OR] 2.2 [95% CI 1.9 -2.6], Fig. 2). Allele "A" of rs1233478 is present in 27.5% (817 of 2,967) of case chromosomes and 14.7% (228 of 1,550) of control chromosomes. Analysis of the T1DGC data for distorted transmission of alleles (transmission disequilibrium test) showed that the "A" allele for rs1233478 was significantly overtransmitted to children with diabetes (P ϭ 2.9 ϫ 10 Ϫ27 ; 964 transmitted:544 not transmitted). A haplotype with the "A" allele for rs1233478 and alleles from four adjacent SNPs was also significantly overtransmitted (P ϭ 4.1 ϫ 10 Ϫ22 ). The OR for rs1233478 with combined analysis of DAISY, HBDI, and T1DGC data was 2.0 (1.7-2.3, P ϭ 1.6 ϫ 10 Ϫ23 ). Since rs1233478 was not typed as part of Illumina 2, we subsequently typed it in our Illumina 2 families and found that it was significantly associated with diabetes and autoimmunity (P ϭ 1.1 ϫ 10 Ϫ6 , OR 2.2 [1.6 -2.9]).
There were highly significant SNPs close to the HLA-DRA gene (e.g., rs3129871, P ϭ 1.9 ϫ 10 Ϫ15 in the Illumina 2 data, P ϭ 1.1 ϫ 10 Ϫ125 in the Illumina 3 data) and also near the HLA-B and HLA-C genes (e.g., rs2596560, P ϭ 8.8 ϫ 10 Ϫ51 , in Illumina 3 data) (supplementary Tables 1  and 2). However, we lacked sufficient information to distinguish whether these class II and class I MHC SNPs remained significant independent of linkage disequilibrium with HLA-DR/DQ alleles. There was also a significant SNP in the Illumina 2 data near the PRSS16 gene (rs996247, P ϭ 8.5 ϫ 10 Ϫ5 in the Illumina 2 data), telomeric of the classic MHC (supplementary Tables 1 and 2), but this region was not included in the Illumina 3 T1DGC data.
Others have previously identified MHC regions of SNP identity between unrelated individuals using DNA sequence or microsatellite data (14 -16), and we have described a 3 million nucleotide conserved MHC region on the 8.1 haplotype using SNP analyses (18). This means that almost all 8.1 haplotypes from different individuals have close to 100% identity for the 3 million nucleotides of the MHC between the HLA-DQB1 and HLA-A loci. Analysis of the Illumina 3 study data demonstrated that 91% of the 8.1 (HLA-DR3, B8, A1) haplotypes (n ϭ 467) had essentially identical SNP alleles for the 3 Mb between HLA-DQB1 and HLA-F, confirming the remarkable conservation of the 8.1 haplotype. The Illumina 2 study extends the SNP analysis over 6 Mb farther telomeric than the Illumina 3 study (Fig.  3) and shows that the region of identity between unrelated chromosomes with 8.1 haplotypes gradually breaks up telomeric of HLA-A. In a subset of 8.1 chromosomes, the region of identity extends as far as 9 million nucleotides telomeric of the HLA-DR/DQ loci. Also of note, the conservation of the case 8.1 haplotypes is similar to the conservation of the control 8.1 haplotypes.
In addition, we found that two sentinel SNPs can be used to identify 8.1 haplotypes (rs17196849 and rs3117573). For 99% of chromosomes with the 8.1 haplotype, rs17196849 and rs3117573 both have "G" alleles,

FIG. 1. AFBAC results for Illumina 2 (DAISY and HBDI). The x-axis represents the distance of each SNP from the telomere, and the y-axis represents the ؊log( p) value for each SNP. The circled SNPs represent the ؊log( p) values for the most significant SNPs telomeric of HLA-F. The most significant SNP (rs389419) is 2 kb telomeric of the UBD gene. The SNPs surrounded by boxes (n ‫؍‬ 12) are sentinel SNPs that identify and are specific for the 8.1 haplotype.
whereas Ͻ5% of non-8.1 haplotypes have "G" alleles for the two SNPs.
Considering that the 8.1 haplotype has an unusually high frequency for an extended haplotype (present in 18% of case haplotypes) and its consensus sequence is associated with the high-risk HLA-DRB1*03 allele, we wanted to correct the association analysis of rs1233478 for a possible confounding effect because of linkage disequilibrium with the 8.1 haplotype. The linkage disequilibrium between the "A" allele of rs1233478 and the 8.1 haplotype is moderate but incomplete (Illumina 3 study: DЈ ϭ 0.81, r 2 ϭ 0.28). However, even with exclusion of all of the 8.1 chromosomes (n ϭ 467) from the Illumina 3 dataset, the association of the "A" allele of rs1233478 with case chromosomes remained significant (rs1233478, P ϭ 1.4 ϫ 10 Ϫ12 ).
To evaluate whether rs1233478 accounted for the association in the region or whether other SNPs were indepen-dently associated, we performed stepwise logistic regression using the Illumina 3 data. The 15 most significant UBD/MAS1L SNPs were included as explanatory variables and a 0.05 level of significance was required for each addition to and retention within the model (27). The SNP with the lowest P value in the final model with stepwise selection was rs1233478 (P ϭ 6.6 ϫ 10 Ϫ8 , OR 1.8 [1.4 -2.2]). Two other SNPs were included in the model, but were not significant after correction for HLA alleles (rs3131020, P ϭ 0.02, OR 1.1 [1.0 -1.2], and rs1611527, P ϭ 0.0001, OR 1.3 [1.2-1.6]).
We wanted to further account for linkage disequilibrium with other HLA alleles to confirm that the association with rs1233478 was independent of known HLA effects.
Stepwise logistic regression analysis was performed including HLA-DRB1, HLA-DQB1, HLA-B, HLA-A, and rs1233478 as explanatory variables. The final model included HLA-

FIG. 2. AFBAC results for Illumina 3 (T1DGC). Comparison of SNP allele frequencies in 3,138 case and 1,681 AFBAC control chromosomes from T1DGC data indicate that a SNP between the UBD and MAS1L
genes is significantly associated with type 1A diabetes (rs1233478, P ‫؍‬ 2.7 ؋ 10 ؊23 ). 3. HLA-DRB1*03, HLA-B*08, and HLA-A*01 (8.1) Table 2). Even though the P value for rs1233478 is not as striking in this analysis as in our previous analyses and the estimate of effect is attenuated, proper adjustment for all of the HLA alleles required the estimation of many additional parameters, causing the power for the specific test of rs1233478 to be reduced. Details of HLA-DRB1, HLA-DQB1, HLA-B, and HLA-A types included as input for the logistic regression analysis can be found in supplementary Tables 3-6. HLA-A24 and HLA-B39 are both class I MHC alleles that have been reported to contribute high risk for type 1A diabetes (28,29). Within the Illumina 3 chromosomes, HLA-A24 is present in 10% of cases and 7% of controls, HLA-B39 in 4% of cases and 1% of controls, and the rs1233478 "A" allele in 28% of cases and 15% of controls. We completed a stepwise logistic regression analysis by analyzing HLA-DRB1, HLA-DQB1, HLA-B39, HLA-A24, and rs1233478 as explanatory variables. The final model includes all of the variables (rs1233478, P ϭ 0.01, OR 1.3 [1.1-1.6], Table 2).

FIG.
Finally, D6S2223 has been reported to influence type 1A diabetes risk (8,30,31) and therefore logistic regression analysis was performed including D6S2223 as an explanatory variable, in addition to those listed above (with HLA-B and HLA-A as class variables). All variables were included in the best-fitting final model except for HLA-A (D6S2223, P ϭ 0.001, odds ratio 1.4 [1.1-1.7]; rs1233478, P ϭ 0.01, OR 1.4 [1.1-1.7]), as in the previous analysis.
DAISY is prospectively following a cohort of children with an immediate family history of type 1A diabetes and children from the general population (with no immediate family history of type 1 diabetes) for anti-islet autoimmunity and diabetes. As illustrated in Fig. 4, we stratified children with the high-risk HLA-DR3/4-DQ8 genotype by their genotypes for rs1233478 (A/A, A/C, and C/C) and analyzed their risk for progression to type 1A diabetes. Children with the A/A genotype had a higher risk than children with either the A/C or C/C genotypes (43.7 Ϯ 16.8% [means Ϯ SE] A/A vs. 13.8 Ϯ 6.1% A/C vs. 11.8 Ϯ 3.8% C/C by age 12 years, P ϭ 0.05, log-rank test).

DISCUSSION
The MHC region has a major influence on risk for type 1A diabetes that has been predominantly ascribed to HLA alleles and in particular to HLA-DR/DQ alleles. Nevertheless, it is estimated that individuals from the general population with the highest risk HLA-DRB1*0301-DQB1*0201/DRB1*04-DQB1*0302 (DR3/4-DQ8) genotype have only a 5% risk of developing childhood-onset type 1A diabetes. This is significantly greater than the approximate general population risk of 0.3%, but significantly less than our recently reported risk of 80% for persistent anti-islet autoantibody expression among the DR3/4-DQ8 siblings who share both MHC regions identical-by-descent with their proband siblings (13). The DR3/4-DQ8 siblings who do not share both MHC haplotypes with their proband siblings have a risk of 20%. The high risk in the DR3/4-DQ8 siblings who share both haplotypes identical-by-descent strongly suggests that additional loci within or linked to the MHC contribute to risk. Our analysis of data from multiple independent populations confirms a major locus marked by SNPs in the UBD/MAS1L region.
The UBD protein (also known as ubiquitin D, diubiquitin, and FAT10) was so-named because it has two tandem head-to-tail ubiquitin-like domains. Ubiquitin proteins are generally known for their ability to bind to proteins, targeting them for degradation by the proteosome. They are also involved in regulation of members of the immunoreceptor family, including B-cell antigen receptors (32). The UBD gene is predominantly expressed in B-cells and dendritic cells, which are antigen-presenting cells that initiate the T-cell response that triggers diabetes, and these cells also participate in T-cell tolerance (33). Ocklenburg et al. (34) recently reported that the UBD protein regulates CD4 ϩ T-cell anergy as a downstream element of the FOXP3 protein, possibly by upregulating LGALS3 protein expression. Expression studies of the  Our extended analysis of SNP alleles of the 8.1 haplotype demonstrates that the remarkable conservation for this common haplotype can extend as far as 9 million base pairs telomeric of the HLA-DR/DQ locus. Analysis of 467 8.1 chromosomes showed that Ͼ91% of 8.1 haplotypes are essentially completely conserved between HLA-DQB1 and HLA-F. We analyzed a subset of 12 SNPs reported by Smith et al. (17) to define the 8.1 haplotype and identified two sentinel SNPs (rs17196849 "G" allele and rs3117573 "G" allele) that can be used to identify 8.1 haplotypes with high sensitivity (99%) and specificity (95%).
When we excluded chromosomes with the 8.1 haplotype from the analysis, the association of the UBD/MAS1L region with diabetes remained significant. This diabetes association of the UBD/MAS1L region with removal of the 8.1 haplotypes indicates that the differences in allele frequencies between the case and control chromosomes in the UBD/MAS1L region are independent of linkage disequilibrium with the 8.1 haplotype.
An important aspect of finding a major diabetes-associated region is that it has the potential to improve genetic prediction of type 1A diabetes. Survival curve analysis of children with the high-risk DR3/4-DQ8 genotype indicates that children with the A/A genotype for rs1233478 in the UBD/MAS1L region have a higher risk for type 1A diabetes by the age of 12 than children with an A/C or C/C genotype (43.7% A/A vs. 13.8% A/C vs. 11.8% C/C).
We have concentrated on the UBD region signal because it was confirmed in multiple populations and is not caused by linkage disequilibrium with class I and class II HLA alleles. However, it is likely that other MHC region alleles/ SNPs influence risk for type 1A diabetes. Specific HLA-B alleles, HLA-A alleles (e.g., HLA-A24), and a microsatellite (D6S2223) have been associated with diabetes in previous reports. In our data, there were clusters of diabetesassociated SNPs near, but not contiguous with, HLA-B and HLA-A and a very strong signal at HLA-DRA (supplementary Tables 1 and 2). Stepwise logistic regression analysis with HLA-DR, HLA-DQ, HLA-A24, HLA-B39, D6S2223, and the UBD SNP rs1233478 showed that rs1233478 has an independent effect on diabetes risk. Of note, a SNP marking the ITPR3 gene (rs2229634) was not significantly associated with type 1A diabetes (P ϭ 0.4). Extension of these studies to larger populations as well as finer mapping of SNPs in the UBD/MAS1L region will be important to assess the full impact of this region on the prediction and pathogenesis of type 1A diabetes.