Diabetes 54:333-339, 2005 © 2005 by the American Diabetes Association, Inc. Identification of Individuals With Insulin Resistance Using Routine Clinical Measurements
1 School of Finance and Applied Statistics, Australian National University, Canberra, Australia
Insulin resistance is a treatable precursor of diabetes and potentially of cardiovascular disease as well. To identify insulin-resistant patients, we developed decision rules from measurements of obesity, fasting glucose, insulin, lipids, and blood pressure and family history in 2,321 (2,138 nondiabetic) individuals studied with the euglycemic insulin clamp technique at 17 European sites; San Antonio, Texas; and the Pima Indian reservation. The distribution of whole-body glucose disposal appeared to be bimodal, with an optimal insulin resistance cutoff of <28 µmol/min · kg lean body mass. Using recursive partitioning, we developed three types of classification tree models: the first, based on clinical measurements and all available laboratory determinations, had an area under the receiver operator characteristic curve (aROC) of 90.0% and generated a simple decision rule: diagnose insulin resistance if any of the following conditions are met: BMI >28.9 kg/m2, homeostasis model assessment of insulin resistance (HOMA-IR) >4.65, or BMI >27.5 kg/m2 and HOMA-IR >3.60. The fasting serum insulin concentrations corresponding to these HOMA-IR cut points were 20.7 and 16.3 µU/ml, respectively. This rule had a sensitivity and specificity of 84.9 and 78.7%, respectively. The second model, which included clinical measurements but no laboratory determinations, had an aROC of 85.0% and generated a decision rule that had a sensitivity and specificity of 78.7 and 79.6%, respectively. The third model, which included clinical measurements and lipid measurements but not insulin (and thus excluded HOMA-IR as well), had a similar aROC (85.1%), sensitivity (81.3%), and specificity (76.3%). Thus, insulin-resistant individuals can be identified using simple decision rules that can be tailored to specific needs.
Address correspondence and reprint requests to Michael P. Stern, MD, Division of Clinical Epidemiology, Department of Medicine, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr., San Antonio, TX 78229-3900. E-mail: stern{at}uthscsa.edu
Abbreviations: aROC, area under the receiver operator characteristic curve; EGIR, European Group for the Study of Insulin Resistance; HOMA-IR, homeostasis model assessment of insulin resistance There is abundant evidence that insulin resistance is a precursor of type 2 diabetes (1,2) and perhaps of cardiovascular disease as well (3–5). The latter association, which is independent of diabetes, may be partially a consequence of the relationship between insulin resistance and the "metabolic syndrome," which consists of obesity, particularly abdominal obesity; impaired glucose regulation; dyslipidemia of the high-triglyceride/low–HDL cholesterol type; and hypertension (4,6).
A number of techniques are available for making definitive measurements of insulin resistance, including the hyperinsulinemic-euglycemic clamp technique (7), the frequently sampled intravenous glucose tolerance test (8), and the insulin suppression test (9,10). These techniques, however, are complicated, cumbersome, and, in general, not suitable for large-scale population studies or routine clinical work. For that reason a wide variety of indexes based on simpler, clinical measurements have been proposed for assessing insulin resistance. We recently reviewed a number of these indexes (11). Most have been validated with either the euglycemic clamp or the frequently sampled intravenous glucose tolerance test, but the populations in which these validations have been carried out typically have been small to moderate, ranging from <50 to Indexes of insulin resistance have acquired new salience with the development of various pharmaceutical agents, specifically metformin and the thiazolidinediones, that sensitize the body to the action of endogenous insulin. Although initially developed for the treatment of diabetes, these agents also have a potential role in reducing the risk of diabetes and perhaps also of cardiovascular disease in insulin-resistant nondiabetic individuals. Moreover, the potential public health impact of such treatment could be large because it has been estimated that in developed countries as many as 25% of the nondiabetic population are as insulin resistant as patients with type 2 diabetes (3). Clinical trials would of course be needed to document the benefits of treating insulin-resistant nondiabetic individuals with insulin-sensitizing agents. Efforts to document the benefits of such treatment, however, have been hampered by the lack of an accepted method for assessing insulin resistance based on routine clinical measurements. Although a clinical trial could conceivably be performed based on enrolling insulin-resistant patients as defined by one of the definitive tests, translation of the results of such a trial into ordinary clinical practice would be problematic, given the lack of a clinical test for identifying the target population for treatment. In the current study we have assembled what we believe to be the largest collection of euglycemic clamp data in the world from numerous research centers, and we have used recursively partitioned classification trees to develop decision rules for identifying insulin-resistant individuals based on routinely available clinical measurements.
The results of 2,321 (2,138 nondiabetic) euglycemic insulin clamp studies were assembled from several sources, including the European Group for the Study of Insulin Resistance (EGIR) project (n = 1,436), the Pima Indian Study (n = 597), and studies performed in San Antonio (n = 288, of whom 99 were Mexican American). The EGIR studies were performed on Caucasians from 17 European sites (Athens, Greece; Baden, Heidelberg, Kreisha, and Munich, Germany; Belgrade, Serbia; Geneva, Switzerland; Goteborg, Sweden; Helsinki and Kuopio, Finland; Odense, Denmark; and Naples, Padova, Pisa, Rome, Torino, and Verona, Italy). With the exception of the Pima sample, which was population-based, all of the other samples were recruited from clinic populations. The recruitment criteria and procedures and the euglycemic clamp protocols, which were similar at all sites, have all been described in original publications from these studies (14–16). In particular the same insulin infusion rate (40 mU/min · m2, equivalent to 1 mU/min · kg body wt) was used in all studies. All procedures were approved by the institutional review boards of the institutions contributing data to this study, and all participants gave informed consent to the procedures. The response variable was insulin-stimulated whole-body glucose disposal (µmol/min · kg lean body mass). Predictor variables included sex, weight (kg), BMI (kg/m2), lean body mass (kg), waist and hip circumferences (cm), fasting glucose (mmol/l), fasting insulin (pmol/l), total cholesterol (mmol/l), LDL and HDL cholesterol (mmol/l), free fatty acids (µmol/l), triglycerides (mmol/l), systolic and diastolic blood pressure (mmHg), and family history of a first-degree relative with diabetes. We also evaluated certain combined variables, namely, the triglyceride-to-HDL ratio and the HOMA-IR (fasting insulin x fasting glucose/22.5, with fasting insulin expressed in µU/ml and fasting glucose expressed in mmol/l) (13).
Statistical methods
Classification trees. Subjects for whom certain covariate data were missing were retained in the analyses using the method of "surrogate splits" (21,22). This method assigns a secondary, "surrogate" covariate to each split in the tree model, allowing classification of individuals with missing values for the primary covariate to be made on the basis of the associated surrogate covariate. The choice of surrogate covariates is made by identifying the covariate split that most closely matches the actual split among those individuals for whom both the actual and the surrogate covariates are available. There were no missing values for BMI. For the other covariates that figured in the ultimate decision rules, namely HOMA-IR, family history, and triglycerides, the percentages of missing values were 6.1, 28.1, and 34.8%, respectively.
Covariate selection. In addition to the tree model incorporating all available covariates, two additional tree models were fit based on predictor subsets chosen to reflect various practical considerations, such as the ease of obtaining and the degree of standardization of the covariate measurements. The first of these two additional models was based on routine clinical measurements, excluding any that required obtaining a blood specimen. The second model was fit using these same clinical measurements, but it also incorporated the lipid measurements, though not the insulin measurement.
Development of decision rules.
Optimal cutoff for determination of insulin resistance. Figure 1 presents histograms of the insulin clamp measurements for all 2,321 subjects and for the 2,138 nondiabetic subjects. The histogram excluding diabetic subjects is scaled to an area of 2,138/2,321, the observed prevalence of nondiabetic subjects in the data, allowing more direct comparison with the histogram including diabetic subjects. Both histograms are compatible with bimodality, and it appears that the lower mode, associated with diabetic subjects in the full-data histogram, is mirrored in the histogram with the diabetic subjects removed. The estimated bimodal normal mixture density is overlaid on the plot. A likelihood ratio test comparing the bimodal mixture model to a single distribution model confirmed that the former was preferred (P < 0.0001). The maximum likelihood estimates for the model parameters are shown in Table 1. Also shown in Table 1 for comparison are the mean and standard deviation for the clamp values in the diabetic subjects. Because Pima Indians constituted a sizeable proportion of our dataset (597/2,138 = 27.9%), and because Pima Indians are well known to be very insulin resistant, we checked for bimodality after excluding this group. Again, a bimodal mixture model fit the data significantly better than a single distribution model (P < 0.0001).
In view of the evidence for bimodality in the distribution of clamp measurements, we used this information to assist us in picking a cut point to define insulin resistance, rather than select a purely arbitrary cut point. The "theoretical" prevalence of insulin resistance in the nondiabetic population, as estimated from the bimodal mixture model, was 23.1%, which is in line with previously published estimates (3). The optimal cut point for identifying nondiabetic insulin-resistant individuals, based on maximizing the sum of theoretical sensitivity and specificity, as determined from the fitted bimodal normal mixture distribution, was 28 µmol/min · kg lean body mass. This cut point gave an estimated sensitivity and specificity of 97.3 and 85.6%, respectively, relative to the theoretical prevalence from the bimodal mixture model, and 32.7% of the subjects fell below the cut point, i.e., were considered to be insulin resistant. Using this same cut point, it was found that 92.9% of diabetic subjects were insulin resistant. Thus, the cut point based on the bimodality analysis closely approximates a cut point based on the 95th percentile of the distribution of glucose disposal rates in diabetic subjects.
Tree models for predicting insulin resistance using all predictors.
If we declare nodes in which the proportion of insulin-resistant individuals is 0.25, i.e., nodes 4–8 in Fig. 2, to be test positive, then the associated decision rule takes the simple form of predicting an individual to be insulin resistant if any of the following conditions are met: 1) HOMA-IR >4.65, 2) BMI >28.9 kg/m2, or 3) HOMA-IR >3.60 and BMI >27.5 kg/m2. The insulin concentrations corresponding to the HOMA-IR cutoffs of 4.65 and 3.60 are 124.5 pmol/l (20.7 µU/ml) and 97.9 pmol/l (16.3 µU/ml), respectively. The insulin concentrations corresponding to the other HOMA-IR cutoffs are given in the legend to Fig. 2. The above prediction rule has an estimated sensitivity and specificity of 84.9 and 78.7%, respectively, obtained by summing the insulin-resistant and the non–insulin-resistant individuals in the nodes declared to be test positive (nodes 4–8, true positives and false positives) and similarly obtaining the true negatives and the false negatives from the remaining nodes (nodes 1–3). Other choices for the predictive cutoff value (i.e., other than 0.25) will lead to different prediction rules and different associated sensitivities and specificities. The results of the random 10-fold cross validation showed a strong degree of consistency in the splitting choices for the 10 development subsets and also satisfactory external validity (as judged by the sensitivities, specificities, and aROCs at the 0.25 prediction cutoff) in the 10 validation subsets (online appendix [available at http://diabetes.diabetesjournals.org]).
Models using clinical variables not requiring blood specimens.
Models using clinical variables plus lipid measurements. Figure 4 shows the fitted tree based on the clinical predictors plus lipid measurements. The aROC for this tree was 85.1%, virtually identical to the tree that did not include the lipid variables (aROC of 85.0%). At the 0.25 prediction cutoff level (nodes 3 and 5–8), the decision rule for insulin resistance is to predict an individual to be insulin resistant if any of the following conditions are met: 1) BMI >28.7 kg/m2, 2) BMI >27.0 kg/m2 and family history of diabetes is positive, or 3) family history of diabetes is negative, but triglycerides >2.44 mmol/l. This decision rule has an estimated sensitivity and specificity of 81.3 and 76.3%, respectively.
The current results suggest that the distribution of whole-body glucose disposal rates is bimodal. The presence of bimodality facilitates the choice of a cut point for defining insulin resistance that has some basis in the underlying biology and is not wholly arbitrary. It must be acknowledged, however, that unspecified population differences could have contributed to the appearance of bimodality in the glucose disposal rate distribution. In particular, although the protocols for the euglycemic clamp studies were nominally similar at all sites, methodological differences between the sites could have affected the distribution. Thus, if glucose disposal rates had been systematically underestimated relative to some "true" value in highly insulin-resistant individuals (e.g., Pima Indians) and/or overestimated in more insulin-sensitive individuals (e.g., Caucasians) relative to the same gold standard, such a methodological bias would have tended to drive the Caucasian and the Pima means apart and perhaps have generated the appearance of bimodality. Although there is no evidence that such a bias was operating, we cannot definitively exclude it. We are reassured, however, that the cut point we have chosen based on the bimodality analysis is reasonable because it corresponds to a cut point generated by an independent method. Specifically, the bimodality-driven cut point closely approximates the 95th percentile of the distribution of glucose disposal rates observed in diabetic subjects. Thus, the normal glucose-tolerant individuals who were defined as being insulin resistant had glucose disposal rates corresponding to the bottom 92.9% of clamp values obtained in the individuals with diabetes, a condition known to be associated with moderate to severe insulin resistance (23). Interestingly, a trimodal distribution of maximal insulin-stimulated glucose uptake rates has been reported in Pima Indians and has been interpreted as evidence for a single gene effect with a codominant mode of inheritance (24). Unfortunately, these results cannot be directly compared with the current results because the earlier Pima analyses were based on maximal insulin-stimulated glucose uptake that was achieved using a 10-fold higher insulin infusion rate than that used in the studies included in the current analyses. Using recursively partitioned classification trees, we have developed simple decision rules for identifying individuals deemed insulin resistant by the euglycemic insulin clamp technique. These decision rules are based on routine clinical measurements and appear to have acceptable sensitivity and specificity. Also, by performing a random 10-fold cross validation and by focusing on physiological and biochemical variables, rather than study-specific variables such as ethnicity, we hoped to enhance the ultimate generalizability of the decision rules. Nevertheless, it must be acknowledged that certain populations, such as Asians, may be more insulin resistant for a given BMI than Caucasians (25,26). Thus, the performance of our decision rules may be suboptimal in these populations. The most accurate decision rule is based on HOMA-IR (which requires a measurement of fasting insulin concentration) and BMI. However, only slightly less accurate rules can be derived that do not require insulin measurements or that do not require obtaining a blood specimen at all. In view of the lack of standardization of insulin assays, these latter rules may be preferred.
The sensitivities and specificities of the decision rules we have presented flow from our decision to declare nodes to be test positive if the proportion of insulin- resistant individuals in them was Highly specific decision rules, such as those just discussed, might be useful for a clinical trial where one wished to be highly certain that the enrolled participant actually had the condition in question and where sensitivity was a lesser concern. Of course, such a strategy might mean that when the results of the clinical trial, assuming a positive outcome, were translated into clinical practice, the public health impact might be compromised because, owing to the reduced sensitivity of the entry criteria, many who might have benefited from the treatment would not have been deemed eligible for the trial. Classification trees are a nonparametric alternative to classical statistical discrimination techniques, such as logistic regression. Their advantages include the lack of a required a priori choice of model structure for the predictor scales and interactions, the ease of incorporation of observations with missing covariate values, and the simplicity and interpretability of the resultant prediction rules. Because the partitioning scheme is recursive, each successive split of the predictor space is conditional on all previous splits. Thus, interactions or nonlinear structures in the relationship between the predictors and the response variable can be captured automatically. For example, if a split occurs based on a particular cut point for BMI, it may be that the subsequent split for those above the BMI cut point might be based on HOMA-IR, whereas the subsequent split for those below the BMI cut point might be based on triglycerides. Such a three-way interaction term would almost never be detected by traditional regression techniques such as multiple logistic regression analysis, where it would rarely, if ever, even be sought in the absence of a powerful prior hypothesis. In addition, because the splits are simple bifurcations along predictor axes, the process is invariant to monotonic rescalings of the predictors. Thus, whether a predictor or, say, its logarithm is used has no effect on the resultant analysis. In the current instance, multiple logistic regression analyses were also performed and gave results, in terms of sensitivities, specificities, and aROCs, that were generally similar to those obtained with the tree-based models (27). Application of logistic regression models, however, requires computing scores that are typically less readily interpretable than decision rules based on tree models. Moreover, the structure of a decision tree often leads to insights into the data that are not as easily gleaned from classic parametric analyses without a more detailed mathematical understanding of their model structure. In addition, decision trees permit some individuals to be classified on the basis of only one, or at most a few, measurements, whereas scores derived from multiple logistic regression models require that all covariates be available. Tree-based models typically make use of a greater percentage of the available data. Logistic regression models, on the other hand, are limited to individuals for whom none of the covariates are missing, unless one imputes values for missing covariates. This practice is often discouraged, however. A disadvantage of imputed values is that they are not "real" data, whereas surrogate splits, used to maximize the amount of the data being utilized, are based on genuine observations.
Recently, McLaughlin et al. (28) reported that among 258 overweight individuals, fasting insulin, triglycerides, and the triglyceride-to-HDL ratio were the best predictors of insulin resistance, as defined by the insulin suppression test. The sensitivities of their cutoff points ranged from 57 to 67% and the specificities from 68 to 85%. These results are not necessarily incompatible with our results because they pertain specifically to overweight individuals ( In conclusion, we have shown that it is possible to identify individuals who are insulin resistant using routine clinical measures, thereby improving the likelihood that recognition of this important harbinger of serious diseases (i.e., diabetes and cardiovascular disease) will be incorporated into clinical trials and ordinary clinical practice.
This work was supported by a grant from the American Diabetes Association and Takeda Pharmaceuticals.
Additional information for this article can be found in an online appendix at http://diabetes.diabetesjournals.org. Received for publication May 4, 2004 and accepted in revised form October 27, 2004
This article has been cited by other articles:
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||