PTPN22 Trp620 Explains the Association of Chromosome 1p13 With Type 1 Diabetes and Shows a Statistical Interaction With HLA Class II Genotypes

  1. Deborah J. Smyth,
  2. Jason D. Cooper,
  3. Joanna M.M. Howson,
  4. Neil M. Walker,
  5. Vincent Plagnol,
  6. Helen Stevens,
  7. David G. Clayton and
  8. John A. Todd
  1. From the Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Cambridge, U.K
  1. Corresponding author: John A. Todd, Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Cambridge, U.K. E-mail: john.todd{at}


OBJECTIVE—The disease association of the common 1858C>T Arg620Trp (rs2476601) nonsynonymous single nucleotide polymorphism (SNP) of protein tyrosine phosphatase; nonreceptor type 22 (PTPN22) on chromosome 1p13 has been confirmed in type 1 diabetes and also in other autoimmune diseases, including rheumatoid arthritis and Graves’ disease. Some studies have reported additional associated SNPs independent of rs2476601/Trp620, suggesting that it may not be the sole causal variant in the region and that the relative risk of rs2476601/Trp620 is greater in lower risk by HLA class II genotypes than in the highest risk class II risk category.

RESEARCH DESIGN AND METHODS—We resequenced PTPN22 and used these and other data to provide >150 SNPs to evaluate the association of the PTPN22 gene and its flanking chromosome region with type 1 diabetes in a minimum of 2,000 case subjects and 2,400 control subjects.

RESULTS—Due to linkage disequilibrium, we were unable to distinguish between rs2476601/Trp620 (P = 2.11 ×10−87) and rs6679677 (P = 3.21 ×10−87), an intergenic SNP between the genes putative homeodomain transcription factor 1 and round spermatid basic protein 1. None of the previously reported disease-associated SNPs proved to be independent of rs2476601/Trp620. We did not detect any interaction with age at diagnosis or sex. However, we found that rs2476601/Trp620 has a higher relative risk in type 1 diabetic case subjects carrying lower risk HLA class II genotypes than in those carrying higher risk ones (P = 1.36 × 10−4 in a test of interaction).

CONCLUSIONS—In our datasets, there was no evidence for allelic heterogeneity at the PTPN22 locus in type 1 diabetes, indicating that the SNP rs2476601/Trp620 remains the best candidate in this chromosome region in European populations. The heterogeneity of rs2476601/Trp620 disease risk by HLA class II genotype is consistent with previous studies, and the joint effect of the two loci is still greater in the high-risk group.

To date, there are 10 loci with confirmed evidence for association with type 1 diabetes: the major histocompatibility complex (MHC) HLA class I and II genes (1,2), insulin (3,4), the CTLA4 locus (5,6), protein tyrosine phosphatase;nonreceptor type 22 (PTPN22; 7), interleukin-2 receptor-2 α chain (IL2RA) (810), interferon induced with helicase C domain 1 (IFIH1) (11), and the regions on chromosomes 12q24, 12q13, 16p13, and 18p11 (1214). With the recent capability to scan hundreds of thousands of single nucleotide polymorphisms (SNPs) across the genome in thousands of samples, the identification of complex disease genes and regions has greatly been enhanced (12). However, the localization of the causal variant(s) in the chromosome regions identified by genome-wide association (GWA) studies requires comprehensive resequencing, extensive genotyping, and statistical analysis in large sample sets. With the exceptions of the MHC (2), cytotoxic T-lymphocyte–associated protein 4 (CTLA4)-ICOS (6), IL2RA (8), and INS (4), which have been subjected to further resequencing and genotyping to try to narrow down candidates for the causal variant(s), the other type 1 diabetes loci have not yet been studied in any great detail (6,11,13). For PTPN22, there has been extensive focus on the single nsSNP rs2476601/Arg620Trp (7,1518). Functional studies suggest that the Trp620 allele has gain-of-function, immunosuppressive effects compared with the more common allele Arg620 (1921). However, there have been attempts to assess whether there are other variants that confer risk of disease that are independent of rs2476601/Trp620 (22,23).

One of the largest studies thus far to characterize disease association with PTPN22 resequenced the coding regions of the gene in 48 North American individuals and genotyped 37 SNPs in or near PTPN22 in up to 1,136 rheumatoid arthritis case subjects and 1,797 control subjects (set 1: 475 case subjects and 475 control subjects; set 2: 661 case subjects and 1,322 control subjects) (24). They reported two SNPs (rs1310182 and rs3789604) on a common haplotype that were associated with rheumatoid arthritis independently of rs2476601/Trp620 (rs1310182, P = 0.002 and 0.052 for sample sets 1 and 2, respectively; rs3789604, P = 0.002 and 0.014 for sample sets 1 and 2, respectively). In another rheumatoid arthritis study, Steer et al. (22) genotyped 45 SNPs from the PTPN22 region using Affymetrix and Illumina GWA technology on pools of 250 rheumatoid arthritis case subjects and 250 control subjects. They reported a SNP (rs1343125) in the gene membrane-associated guanylate kinase, WW, and PDZ domain containing 3 (MAGI3) that provided some evidence for an association independent of rs2476601/Trp620 (P = 0.03).

For type 1 diabetes, two studies proposed that there is evidence for allelic heterogeneity. Onengut-Gumuscu et al. (23) resequenced the coding regions in 94 type 1 diabetic case subjects and genotyped 11 SNPs in 374 type 1 diabetic families. They identified a rare nsSNP K750N, which generates a premature stop codon, and presented some evidence of association with type 1 diabetes (P = 0.026, minor allele frequency [MAF] = 0.006). An Asian population study (25) did not detect the rs2476601/Arg620Trp variant in 1,690 Japanese or 180 Korean subjects but found a promoter SNP −1123G/C (rs2488457) to be associated with type 1 diabetes (P = 0.0105; odds ratio [OR] 1.41 [95% CI 1.09–1.82]). They also genotyped this SNP in 95 families from the Diabetes U.K. Warren 1 Repository and found the associations to be the same for −1123G/C (rs2488457) and rs2476601/Trp620 (P = 0.019, relative risk (RR) 1.46 [95% CI 1.06–2.01]; and P = 0.046, 1.46 [1.01–2.13], respectively) (25). Furthermore, a Norwegian rheumatoid arthritis study of 861 case subjects and 559 control subjects (26) found they could not distinguish between −1123G/C (rs2488457) and rs2476601/Trp620 because of linkage disequilibrium.

Recently, Chelala et al. (27) genotyped the rheumatoid arthritis–associated SNP rs3789604, which marks a haplotype that may be associated with rheumatoid arthritis independently of rs2476601/Trp620, and the Japanese-associated SNP rs2488547 but failed to find evidence of an independent association with type 1 diabetes in 528 French, Danish, and American multiplex families (27). Heward et al. (28) selected five common rheumatoid arthritis–associated SNPs to use as tag SNPs (rs2488458, rs12730735, rs1310182, rs1217413, and rs3811021) and reported no evidence of association with Graves’ disease in 768 case subjects, 768 control subjects, and 313 families (minimum P = 0.292). Using the five tag SNPs and rs2476601/Trp620, they did, however, obtain evidence for a predisposing haplotype (haplotype 2, P = 6.77 × 10−8) and a protective haplotype (haplotype 3, P = 3.7 × 10−5) (28). Dissection of the association of these haplotypes revealed that rs2476601/Trp620 was responsible for the predisposing effect observed with haplotype 2 but could not explain the apparent protective effect obtained for haplotype 3 (28).

Two previous studies have explored heterogeneity in the disease-predisposing effect of rs2476601/Trp620 with regard to HLA class II genotype. Hermann et al. (29) observed that the effect of rs2476601/Trp620 was more pronounced in subjects with non–DR4-DQ8/low-risk HLA genotypes (P = 0.0004) from a dataset of 546 case subjects, 538 control subjects, and 245 nuclear families from Finland (mean age at diagnosis of case subjects 8.2 ± 4.1 years). They also reported some evidence that boys carrying the rs2476601 Trp620 allele were at higher risk of disease than girls (P = 0.021; 29). In Steck et al. (30), it was observed that after stratification by high-risk HLA-DR3/4 genotype, the effect of rs2476601/Trp620 was greater in the non-DR3/4 subgroup (OR 2.07 [95% CI 1.54–2.79]; P < 0.0001) compared with DR3/4-positive case subjects (1.44 [0.85–2.45]; P = 0.18), using 690 non-Hispanic white case subjects and 515 non-Hispanic white control subjects (mean age at diagnosis of case subjects 11.2 years; range 0.3–54 years).

In the current study, we assessed as comprehensively as possible given current knowledge of the polymorphism content of the region, from our resequencing and public databases, whether rs2476601/Trp620 is the sole type 1 diabetes susceptibility variant in PTPN22 and then whether there was any evidence for additional disease risk alleles or haplotypes in the entire 400-kb linkage disequilibrium block containing PTPN22 and at least six other known genes: MAGI3; putative homeodomain transcription factor 1 (PHTF1); round spermatid basic protein 1 (RSBN1); adaptor-related protein complex 4, β1 subunit (AP4B1); DNA cross-link repair 1B (DCLRE1B); and chromosome 1 open reading frame 178 (C1orf178) (Supplementary Fig. 1, available in an online appendix at


The type 1 diabetic individuals were recruited as part of the Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory British case collection (Genetic Resource Investigating Diabetes), which is a joint project between the Department of Pediatrics, Addenbrooke's Hospital, University of Cambridge, and the Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge. Most affected individuals were <16 years of age at the time of collection (mean age at diagnosis 7.5 years; range 0.5–16 years) and all resided in the U.K. The control samples were obtained from the British 1958 Birth Cohort (B58C;, an ongoing study of all people born in the U.K. during 1 week in 1958. All case and control subjects were of self-reported white ethnicity. The relevant research ethics committees approved the study, and written informed consent was obtained from the participants or their parents/guardian for those too young to consent. A total of 2,400 control subjects and 2,800 case subjects had been genotyped as part of the nsSNP GWA study (31), and 3,000 control subjects and 2,000 case subjects were genotyped as part of the Wellcome Trust Case-Control Consortium (WTCCC) GWA study (12), with 1,100 control subjects and 1,700 case subjects being common to both studies. For TaqMan fine-mapping genotyping, a maximum of 7,200 control subjects and 7,600 case subjects were used, encompassing all of the samples from both GWA studies.

PTPN22 sequencing.

Polymorphisms in PTPN22 were identified by resequencing 32 Centre d'Etude du Polymorphisme Humain (CEPH) DNA samples (from Utah residents with northern and western European ancestry) in common with HapMap (; 32). The sequencing reactions were performed using Applied Biosystems BigDye chemistry (version 3.1), and the sequences were resolved using an ABI 3700 Genetic Analyzer. Analyses of the sequence traces were performed using the Staden package. All 14 newly discovered SNPs have been submitted to dbSNP (

Genotyping methods.

Information for the PTPN22 region was taken from the WTCCC study, which used the GeneChip 500K Mapping Array Set (Affymetrix chip) (12), with molecular inversion probe (MIP) technology (Affymetrix) as part of our nsSNP GWA study (31,33) and TaqMan (Applied Biosystems) genotyping, which was carried out in accordance with the manufacturers’ protocols. All genotyping data were scored blind to case-control status; TaqMan genotyping was double scored by a second operator to minimize error. We used the clustering method developed by Plagnol et al. (34) to address bias due to differential misclassification in genotyping in large-scale association studies to call genotypes of SNPs incorporated in our nsSNP GWA study (13). No SNPs deviated from Hardy-Weinberg equilibrium (P > 0.05). The genotype concordance between TaqMan genotyping and MIP technology for rs2476601/Arg620Trp was 99.67% for control samples (2,383 of 2,391) and 99.07% for case samples (2,986 of 3,014), and the genotype concordance between TaqMan genotyping and GeneChip 500K (Affymetrix) results for rs6679677 was 99.78% for control samples (1,380 of 1,383) and 99.35% for case samples (1,826 of 1,838).

Initially, despite almost perfect linkage disequilibrium (r2 = 0.994 in control subjects) between rs2476601 and rs6679677, we found that we could apparently differentiate between the effects of the two SNPs in stepwise logistic regression analyses. The P value for adding rs2476601 to rs6679677 was 0.010 and, in the reverse regression analysis, adding rs6679677 to rs2476601 (P = 0.497), suggesting that rs2476601 was sufficient to model the association. This result was unexpected given the almost perfect linkage disequilibrium, and consequently, when we then merged data from three TaqMan genotyping assays of duplicate genotyping of the case-control collection, genotyped at different times for rs2476601, we identified 36 individuals who were discordant between assays. After resequencing these 36 individuals for both rs2476601 and rs6679677, we identified 15 of the 36 individuals who had genuinely different genotypes at rs2476601 and rs6679677. The remaining 21 differences were due to random genotyping error (<0.5%) and were excluded. On repeating the logistic regression analysis, as expected, no effects were observed at P < 0.05. These results highlight the sensitivity of the logistic regression analysis to genotyping error and missing data.

Statistical analyses.

All statistical analyses were performed in either Stata ( or the R environment ( Logistic regression models were used for all case-control association tests. We stratified the analysis by place of collection for the case subjects and place of birth for the control subjects for 12 geographical regions of England, Scotland, and Wales (Southwestern, Southern, Southeastern, London, Eastern, Wales, Midlands, North Midlands, Northwestern, East and West Riding, Northern, and Scotland) to exclude the possibility of confounding by geography with little loss of power, given how well the case and control subjects were matched geographically (31). In the logistic regression analysis of a SNP, we performed a 1-d.f. likelihood ratio test to determine whether a 1-d.f. multiplicative allelic effects model or a 2-d.f. genotype effects model (no specific mode of inheritance assumed) was more appropriate (35). We assumed a multiplicative allelic effects model because it was not significantly different from the genotype model. SNPs were modeled as a numerical indicator variable coded 0, 1, or 2, representing the number of occurrences of the minor allele. In the forward logistic regression analysis, we started by assessing the evidence against the most significant SNP being the sole variant in the region (in other words, whether this SNP alone was sufficient to model the association). For the purposes of this analysis, we assumed the genotype (2-d.f.) parameterization for the most associated SNP (A>a) or for any additional SNP with significant independent effects on type 1 diabetes, so genotype risks of A/A and A/a were modeled relative to the a/a genotype. We then used a 1-d.f. test for adding each of the remaining SNPs to the model by assuming multiplicative allelic effects for the additional SNPs.

To correct the most significant P value (observed test statistic χ12 = θ̂) for the addition of rs735074 in the forward logistic regression analysis for multiple testing, we performed a permutation test to estimate the probability of observing a test statistic at least as large as θ̂ when the null hypothesis is true (that is, when no additional SNP significantly adds to the model). We stratified by rs735074 genotype, randomly assigning case-control status within these strata and then adding in each SNP to the model, including the most associated. After recording the largest test statistic achieved under the null hypothesis, we repeated the permutation of disease status and analyzed 300 times. The corrected P value then equals the number of times that the largest test statistic achieved under the null hypothesis is at least as large as θ̂ divided by the number of permutations.

The six SNP haplotypes were assigned to subjects (case and control subjects separately) using SNPHAP ( and analyzed using logistic regression models for each subject. Haplotype assignments were weighted by their posterior probabilities, and the Huber/White/Sandwich estimate of variance was used to allow for the nonindependence between the possible haplotype assignments to each individual. In addition to testing the overall significance of the six SNP haplotypes, we tested whether any of the SNPs alone were sufficient to model the association or, in addition to the five SNP haplotypes of the remaining SNPs, were independently associated, the latter providing evidence of allelic heterogeneity. We performed this analysis for each SNP by dropping the SNP from the six SNP haplotypes and testing whether the five SNP haplotypes were independently associated when added to the SNP. We also performed the reverse analysis, testing whether the SNP was independently associated when added to the five SNP haplotypes.

Statistical interaction between loci can be tested using a case-only analysis within a regression model (15,3638). The PTPN22 nsSNP, rs2476601/Arg620Trp was put in a regression model as the dependent variable, the SNP genotypes as predictor variables, and geographical region included as strata. A case-only interaction was performed for age at diagnosis and sex. However, for HLA, the number of HLA class II genotypes (>200) that exist in our collection means there are too many parameters to estimate, and therefore we grouped the HLA-DRB1 and HLA-DQB1 genotypes together using three different strategies. The first is a 12-group risk-based model derived using recursive partitioning: individuals are categorized into groups according to their class II genotype such that each group is as homogeneous with respect to case-control status as possible (2). The second grouping classifies individuals as high, medium, and low risk according to the haplotype risk estimates published in Table 3 of Koeleman et al. (39): Risks >1 with 95% CIs that did not exceed 1 were classed as high risk; risks with 95% CIs exceeding 1 were medium risk; and risks <1 with 95% CIs that did not exceed 1 were classed low risk (2,39). The third method groups together individuals with HLA-DRB1*03/HLA-DRB1*04 genotypes versus those without. Genotypes positive for HLA-DRB1*0403 and HLA-DQB1*0301 alleles were put in the nonDRB1*03/HLA-DRB1*04 group.

We used a log-linear model to estimate the joint effects of PTPN22 and HLA class II in the logistic regression while maintaining the efficiency of the case-only estimate of the interaction terms (40). We assumed that the HLA and rs2476601 genotypes are conditionally independent in control subjects given geographical region as strata. This is a four-way contingency table of disease status (D; coded 0, control subjects; 1, case subject), rs2476601 genotype (G; coded as a three-level factor—one for each genotype), HLA group (M; coded as a factor with levels for each group, e.g., two levels for the DR3/4 versus non-DR3/4 grouping) and geographical region (S; coded as a 12-level factor). We fitted the model Formula

Here, E(F) is the expected frequency, using a generalized linear model with Poisson family and log link function (40). b0, b1, b2, b3, b5, and b6 parameterize the independent relationship between HLA group and rs2476601 genotype within control subjects, allowing for strata (geographical region). Specifically, b1 is the log odds of a particular HLA group in control subjects and b5 gives this within strata. b2 is the log odds of having rs2476601-susceptible genotype in control subjects, and b6 gives this within strata. A logistic model is assumed for disease risk such that b8 and b9 assess the HLA and HLA*PTPN22 interaction effects; i.e., b9 gives the log OR of rs2476601 for each HLA group and is reported in Table 4.

Measures of linkage disequilibrium, D′ and r2, were calculated using the Haploview package ( and were subsequently generated and displayed through T1DBase ( for Supplementary Fig. 1.


We resequenced all 21 exons of PTPN22 and 3 kb of 5′ and 3′ sequence of the gene using DNA from 32 CEPH individuals (32), identifying 33 SNPs and one indel, 14 of which were novel when compared with genome build 36. Twenty-three SNPs had an MAF >0.05 (Supplementary Table 1). Twenty-four SNPs from dbSNP and 22 SNPs discovered from resequencing were incorporated into a GWA study of nsSNPs using the MIP technology (31,33) for genotyping in up to 2,800 type 1 diabetic case samples and 2,400 control samples (Table 1). Nineteen SNPs had P ≤ 1.86 × 10−8, with rs2476601/Arg620Trp being the most associated SNP (P = 6.19 × 10−37, OR for minor allele Trp620 1.98 [95% CI 1.78–2.21]) (Table 1; Supplementary Fig. 1).

To assess the evidence against rs2476601/Arg620Trp being the sole causal variant in the region, we used a logistic regression analysis to test whether rs2476601/Arg620Trp alone was sufficient to model the association. We assumed no specific mode of inheritance for rs2476601/Arg620Trp and used a 1-d.f. test for adding each SNP to the model. We found that when the number of tests was considered, no SNP convincingly added to rs2476601/Arg620Trp in the logistic regression analysis (minimum [uncorrected] P = 0.0423 ss73688598; Table 1). We also performed the reverse analysis, adding rs2476601/Arg620Trp to each of the other 46 SNPs; we found that rs247660/1Arg620Trp significantly added to all of the SNPs (Pmax = 1.57 × 10−34). This is consistent with rs2476601/Arg620Trp being the sole causal variant in the region.

To assess further the evidence against rs2476601/Arg620Trp being the sole causal variant in the wider PTPN22 region, we selected 111 SNPs from the 400-kb linkage disequilibrium block containing PTPN22 and from 200-kb regions on either side of the linkage disequilibrium block that had been genotyped in 2,000 type 1 diabetic case subjects and 3,000 control subjects as part of the WTCCC GWA study (12). We noted that although the nsSNP rs2476601/Arg620Trp was not included on the GeneChip 500K Mapping Array Set (Affymetrix chip) used for genotyping by the WTCCC GWA study, rs2476601/Arg620Trp was in perfect linkage disequilibrium (r2 = 1) with rs6679677, an intergenic SNP between the genes PHTF1 and RSBN1, in the 3,000 British control subjects; this near-perfect linkage disequilibrium has been reported by others (12,17). We genotyped rs6679677 and rs2476601/Arg620Trp in the full case-control collection, with 7,500 case subjects and 7,200 control subjects. We found that we could not distinguish between the effects of the two SNPs on type 1 diabetes (Table 2). We found that only 15 of 14,487 samples had discordant genotypes at both SNPs, giving an r2 of 0.996 in case subjects and 0.994 in control subjects; we ruled out random genotyping error by resequencing these individuals and verifying their genotypes (see research design and methods).

From the WTCCC GWA study, 13 SNPs had P < 4.28 × 10−8, with rs6679677 being the most associated (P = 1.32 × 10−23, OR for minor allele A 1.88 [95% CI 1.66–2.13]) (Supplementary Table 2; Supplementary Fig. 1). We performed a forward logistic regression analysis on these 110 WTCCC SNPs from the wider PTPN22 region to test whether there was any evidence for a variant associated with type 1 diabetes independently of rs6679677. After correcting the most associated additional SNP, there was no evidence of any additional independent effects (minimum uncorrected P = 0.00854 for rs735074 located in the flanking linkage disequilibrium block; corrected P = 0.243), including the previously associated MAGI3 SNP rs1343125 (22) (uncorrected P = 0.907), the Japanese-associated SNP rs2488457 (25) (uncorrected P = 0.866), and the rheumatoid arthritis–associated SNP rs3789604 (24) (uncorrected P = 0.401) (Table 1; Supplementary Table 2).

We genotyped an additional five SNPs with TaqMan because they were not covered by either GWA scan: two SNPs (rs1310182 and rs2488458) associated with rheumatoid arthritis from Carlton et al. (24), the rare nsSNP (ss73688585/K750N) associated with type 1 diabetes from Onengut-Gumuscu et al. (23), and two SNPs (rs3789608 and ss73688608) found from our resequencing. Although we found evidence for associations between type 1 diabetes and rs1310182 (P = 9.33 × 10−13), rs2488458 (P = 3.46 × 10−34), and ss73688608 (P = 1.80 × 10−7), these results were not independent of rs2476601/Arg620Trp (uncorrected P = 0.840, 0.343, and 0.702, respectively). No evidence of an association was found with either rs3789608 or ss73688585/K750N (uncorrected P = 0.0615 and 0.117, respectively).

We also tested the six SNP haplotypes from the Graves’ study by Heward et al. (28) (rs2488458, rs12730735, rs2476601, rs1310182, rs1217413, and rs3811021) in 3,129 case subjects and 3,633 control subjects for which we obtained complete data for all six SNPs. The haplotypes that we identified correlated with the 10 common haplotypes identified in the Graves’ study (28) and also the previous rheumatoid arthritis study (24) (Table 3). We found that the susceptible haplotype (haplotype 2), the only haplotype to carry the T/Trp allele of rs2476601/Arg620Trp, was associated with disease, consistent with the two previous studies (Table 3). We did not, however, obtain any evidence for protective haplotypes in contrast to the Graves’ study, in which haplotype 3 was protective (28), and in the rheumatoid arthritis study, haplotypes 5 and 6 were protective (24) (Table 3). In our study, the Arg620 protective allele-bearing haplotype 3 was slightly increased in frequency in the case subjects, but this is unlikely to be a true result given the multiplicity of tests (OR 1.31 [95% CI 1.00–1.72], P = 0.048). More importantly, we noted that the control frequency for haplotype 3 in the Graves’ study is unexpectedly higher (0.062) than for the control frequency in our study (0.022) or the rheumatoid arthritis study (0.034 and 0.032) (24) and may account for the apparent protective haplotype effect reported previously (28) (Supplementary Table 3). Furthermore, in contrast to the Graves’ study but consistent with the rheumatoid arthritis study, the SNPs used in the haplotype analysis (rs2476601, rs2488458, rs12730735, rs1310182, rs1217413, and rs3811021) were all associated with type 1 diabetes. None of these SNP associations were found to be independent, as expected, of rs2476601/Arg620Trp (Table 3).

Finally, we tested for evidence of a statistical interaction of rs2476601/Arg620Trp with age at type 1 diabetes diagnosis and sex in the case subjects, obtaining no convincing support for interaction in 7,443 case subjects (P = 0.106 and 0.219, respectively). We did, however, find evidence for statistical interaction with the HLA class II genotypes for all three MHC class II groupings considered in up to 2,459 case subjects (Table 4), with a minimum P = 1.36 × 10−4. Specifically, the relative risk of rs2476601/Trp620 is higher in low-risk HLA genotypes, e.g., non-DR3/4 (RR 2.10), than the high-risk DR3/4 (1.59).


After extensive resequencing and genotyping of the PTPN22 region, we have found that we are unable to distinguish statistically between two associated SNPs, rs2476601/Trp620 and rs6679677, located 73 kb apart. rs6679677 is located in the intergenic region between PHTF1 and RSBN1 and does not have an obvious potential functional effect, and it does not reside in a transcriptional factor binding site, CpG island, or highly conserved sequence block. The TFMATRIX web tool (, however, predicts the disruption of a binding site of promoter CCAAT binding factors by this SNP. The nsSNP rs2476601/Arg620Trp changes the amino acid residue 620 from Arg to Trp in the encoded LYP protein, a well-established suppressor of T-cell activation (7,41). Recently, it has been shown that Arg620Trp is a functional residue in the Pro-rich motif in LYP that binds the SH3 domain of CSK; Trp620 fails to bind CSK (19). Trp620 may, however, lead to a gain of function (more negative regulation of the immune system) such that cells in vitro were reported to be less responsive in individuals with the susceptibility allele rather than the expected overactivation due to loss of PTPN22 function with this minor, disease-associated allotype.

We have provided evidence of statistical interaction between HLA class II genotypes and a non-MHC HLA susceptibility locus in agreement with two previous studies from Hermann et al. (29) and Steck et al. (30). This interaction, which is less than multiplicative, cannot be interpreted biologically (42), and all that we are able to conclude from it is that despite the relative risk of PTPN22 being higher in low-risk HLA genotypes, the combined risk of diabetes for individuals carrying high-risk HLA and PTPN22 genotypes is higher than that for those carrying low risk HLA genotypes and high-risk PTPN22 genotypes. This finding is also supported by the previous observation that non-MHC loci had smaller relative risks in affected sibpairs (who are “DR3/4” rich) than in single affected sibling families or case/control collections and that epistatic models could explain such observations (2,43). In comparison with PTPN22, we note that for INS, there is no evidence for a statistical interaction with HLA class II genotypes (P ≥ 0.0976) for all three MHC class II groupings considered in the same sample set (not shown).

Our results support the hypothesis that rs2476601/Arg620Trp is the most likely causal variant for type 1 diabetes and that for this disease, in large sample sizes, there is no evidence for allelic heterogeneity. However, the possibility still remains that an alternative or as yet unidentified variant(s) could have a role in disease susceptibility, as it is not known how many more SNPs are in the region because the entire linkage disequilibrium region has not been resequenced. For example, 73 kb telomeric of rs2476601/Arg620Trp, SNP rs6679677 is in perfect linkage disequilibrium with the PTPN22 functional candidate SNP and genetically and statistically, therefore, is as good a positional candidate as PTPN22 rs2476601/Arg620Trp. Our failure to obtain any evidence for allelic heterogeneity or haplotype effects in type 1 diabetes for this chromosome region suggests that for type 1 diabetes, previous studies (23,25,26) could have been false positives, especially because much smaller sample sizes were studied. We cannot comment on the validity of additional variants in rheumatoid arthritis and Graves’ disease (22,24,28), except that any result should be replicated in additional datasets. Nevertheless, it does appear that the haplotype effect in Graves’ disease (28) may be due to fluctuation in control haplotype frequencies rather than a genuine difference between case and control frequencies.

With the recent availability of next-generation sequencing technologies (4446), the essential next step in the localization and disease association analysis of all of the variation in this region of chromosome 1p13 will be made more feasible, being less labor-intensive and costly. It will not be surprising if additional variants are found that are in near perfect linkage disequilibrium with rs2476601/Arg620Trp, as is the case for SNP rs6679677, making them genetically plausible candidates for this type 1 diabetes locus. Furthermore, it is possible that an unknown variant exists in the region that is not in complete linkage disequilibrium with rs2476601 and is more strongly associated with type 1 diabetes. Further genetic studies, including populations with different genetic diversity from European populations, combined with targeted functional studies will confirm unequivocally rs2476601/Arg620Trp as the sole type 1 diabetes locus in the region. Not withstanding this possibility, our results thus far provide no evidence for allelic heterogeneity and support an etiological role for the rs2476601/Trp620 allele.


SNPs from a 140-kb region of chromosome 1p13 containing five genes, including PTPN22, genotyped using molecular inversion probe genotyping technology as part of the nsSNP study in 2,800 type 1 diabetic case subjects and 2,400 control subjects


Single-locus test results for rs2476601/Arg620Trp and rs6679677 in the case-control collection


Six marker PTPN22 haplotype frequencies in 3,129 type 1 diabetic case subjects and 3,633 control subjects


Joint effects of HLA class II genes and the nsSNP rs2476601/Arg620Trp of PTPN22


This work was funded by the Juvenile Diabetes Research Foundation International, the Wellcome Trust, and the National Institute for Health Research, Cambridge Biomedical Centre.

We gratefully acknowledge the participation of all of the patients and control subjects. We acknowledge use of the DNA from the British 1958 Birth Cohort collection, funded by the Medical Research Council and Wellcome Trust. We also thank the Avon Longitudinal Study of Parents and Children laboratory in Bristol and the British 1958 Birth Cohort team, including S. Ring, R. Jones, M. Pembrey, W. McArdle, D. Strachan, and P. Burton, for preparing and providing the control DNA samples. We thank colleagues at Affymetrix for help and advice in genotyping and T. Willis, M. Faham, and P. Hardenbol for the MIP technology. We thank Oliver Burren and Barry Healy for bioinformatics support. DNA samples were prepared by K. Bourget, S. Duley, M. Hardy, S. Hawkins, S. Hood, E. King, T. Mistry, A. Simpson, S. Wood, P. Lauder, S. Clayton, F. Wright, and C. Collins.

This study makes use of data generated by the WTCCC. A full list of the investigators who contributed to the generation of the data is available from


  • Published ahead of print at on 27 February 2008. DOI: 10.2337/db07-1131.

  • Additional information for this article can be found in an online appendix at

  • The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

    • Accepted February 23, 2008.
    • Received August 13, 2007.


No Related Web Pages
| Table of Contents

This Article

  1. Diabetes vol. 57 no. 6 1730-1737
  1. Online-Only Appendix
  2. adfcda
  3. All Versions of this Article:
    1. db07-1131v1
    2. db07-1131v2
    3. 57/6/1730 most recent