# Assessing the Predictive Accuracy of QUICKI as a Surrogate Index for Insulin Sensitivity Using a Calibration Model

## Abstract

The quantitative insulin-sensitivity check index (QUICKI) has an excellent linear correlation with the glucose clamp index of insulin sensitivity (SI_{Clamp}) that is better than that of many other surrogate indexes. However, correlation between a surrogate and reference standard may improve as variability between subjects in a cohort increases (i.e., with an increased range of values). Correlation may be excellent even when prediction of reference values by the surrogate is poor. Thus, it is important to evaluate the ability of QUICKI to accurately predict insulin sensitivity as determined by the reference glucose clamp method. In the present study, we used a calibration model to compare the ability of QUICKI and other simple surrogates to predict SI_{Clamp}. Predictive accuracy was evaluated by both root mean squared error of prediction as well as a more robust leave-one-out cross-validation–type root mean squared error of prediction (CVPE). Based on data from 116 glucose clamps obtained from nonobese, obese, type 2 diabetic, and hypertensive subjects, we found that QUICKI and log (homeostasis model assessment [HOMA]) were both excellent at predicting SI_{Clamp} (CVPE = 1.45 and 1.51, respectively) and significantly better than HOMA, 1/HOMA, and fasting insulin (CVPE = 3.17, *P* < 0.001; 1.67, *P* < 0.02; and 2.85, *P* < 0.001, respectively). QUICKI and log(HOMA) also had the narrowest distribution of residuals (measured SI_{Clamp} − predicted SI_{Clamp}). In a subset of subjects (*n* = 78) who also underwent a frequently sampled intravenous glucose tolerance test with minimal model analysis, QUICKI was significantly better than the minimal model index of insulin sensitivity (SI_{MM}) at predicting SI_{Clamp} (CVPE = 1.54 vs. 1.98, *P* = 0.001). We conclude that QUICKI and log(HOMA) are among the most accurate surrogate indexes for determining insulin sensitivity in humans.

- CVPE, cross-validation–type root mean squared error of prediction
- FSIVGTT, frequently sampled intravenous glucose tolerance test
- HOMA, homeostasis model assessment
- QUICKI, quantitative insulin-sensitivity check index
- RMSE, square root of the mean squared error of prediction

Insulin resistance contributes significantly to the pathophysiology of type 2 diabetes and is a hallmark of obesity, dyslipidemias, hypertension, and other components of the metabolic syndrome (rev. in 1,2). Some therapies for these conditions, including thiazoladinediones, ACE inhibitors, statins, weight reduction, and exercise, significantly improve insulin sensitivity (3–8). Thus, an accurate method for easily evaluating insulin sensitivity and following changes after therapeutic intervention is needed for epidemiological studies, clinical investigations, and clinical practice. The hyperinsulinemic- euglycemic glucose clamp is the reference method for quantifying insulin sensitivity in humans because it directly measures effects of insulin to promote glucose utilization under steady-state conditions in vivo (9). However, the glucose clamp is a complicated, labor-intensive procedure best suited for small research studies that is difficult to apply in either large-scale investigations or clinical practice. Therefore, a number of surrogate indexes for insulin sensitivity or insulin resistance have been developed. The simplest indexes are derived from fasting glucose and/or insulin levels and include fasting insulin and homeostasis model assessment (HOMA) (10–12). There are also several insulin sensitivity indexes based on oral glucose tolerance tests that require slightly more effort (13–15). In addition, some surrogate indexes are based on glucose and insulin infusion protocols of various complexity (16). This includes the frequently sampled intravenous glucose tolerance test (FSIVGTT) with minimal model analysis that requires nearly as much effort as the glucose clamp (17,18).

Recently, we developed the quantitative insulin-sensitivity check index (QUICKI) that is determined by a mathematical transformation of fasting glucose and fasting insulin levels (19). In our initial validation studies in nonobese, obese, diabetic, and hypertensive subjects, QUICKI had a significantly better linear correlation with the reference glucose clamp method (SI_{Clamp}) than other surrogates including HOMA and the minimal model insulin sensitivity index SI_{MM} (19,20,21). In addition, test characteristics of QUICKI including coefficient of variation and discriminant ratio are significantly better than other simple surrogate indexes and comparable with those of the glucose clamp (20). In a number of relevant clinical conditions including type 2 diabetes, gestational diabetes, hypertension, polycystic ovary syndrome, and liver disease, QUICKI can appropriately follow changes in insulin sensitivity after various therapeutic interventions when compared directly with glucose clamp results (8,20–23). Moreover, a large meta-analysis of insulin-resistant subjects demonstrated that QUICKI is among the best surrogate indexes in terms of predictive power for the onset of diabetes (11). In that study, QUICKI was the best fasting index for predicting the onset of diabetes, although other indexes based on glucose tolerance tests were slightly superior in this regard (11).

To date, the best direct validation studies of simple surrogate indexes of insulin sensitivity, including QUICKI, were based on examining correlations with the reference glucose clamp method (19–21,24–27). However, if variability in insulin sensitivity between subjects is large in a given cohort, the linear correlation between surrogate index and “gold standard” may be excellent even when prediction of true values by the surrogate is poor. Therefore, an important component of validation for a surrogate index is evaluation of its predictive accuracy. In the present study, we examined the ability of QUICKI and other surrogates to predict SI_{Clamp} by regressing SI_{Clamp} on each surrogate index in a large group of subjects and fitting these data to a calibration model.

## RESEARCH DESIGN AND METHODS

We used data from 110 subjects ranging in age from 19 to 64 years old who underwent a hyperinsulinemic-isoglycemic glucose clamp at the National Institutes of Health Clinical Center. Six subjects underwent two separate glucose clamps performed at least 10 months apart. Thus, the total number of glucose clamp studies used in our analysis was 116. We also used data from a subset of 78 subjects who underwent an insulin-modified FSIVGTT (28) in addition to the glucose clamp. Data from some of these subjects have been previously reported (19,21). Among the 110 subjects, there were 57 Caucasians, 40 African Americans, 4 Hispanics, and 9 Asians. Nonobese subjects had a BMI < 30 kg/m^{2}, whereas subjects with BMI ≥ 30 were considered obese. Diabetic subjects met the American Diabetes Association criteria for type 2 diabetes (29). Diabetic and hypertensive subjects were studied after their antidiabetes and antihypertensive medication was discontinued for at least 1 week. Diabetic patients whose fasting blood glucose exceeded 300 mg/dl when not taking medication were given medication again and excluded from further study. Hypertensive subjects whose blood pressure exceeded 170/109 mmHg when not taking their medication were given the medication again and excluded from further study. Subjects with thyroid, liver, kidney, or pulmonary disease as well as end-organ damage were excluded from this study. Subjects with a positive pregnancy test were also excluded. Informed consent was obtained from each subject. All clinical studies were approved by the Institutional Review Board of the National Heart, Lung and Blood Institute, and the procedures followed were in accordance with institutional guidelines.

### Hyperinsulinemic-isoglycemic glucose clamp.

The clamp studies were conducted as previously described (19). All studies were performed in the Clinical Center at the National Institutes of Health beginning at ∼8:30 a.m. after an overnight fast of at least 10 h. An insulin solution (Humulin; Eli Lilly) was infused at a rate of 120 mU · m^{−2} · min^{−1} for 4 h using a calibrated syringe pump (model A-99; Razel Industries, Stamford, CT). A solution of potassium phosphate was also infused at the same time (0.23 mEq · kg^{−1} · h^{−1}) to prevent hypokalemia. Blood glucose concentrations were measured at the bedside every 5–10 min using a glucose analyzer (YSI 2700 Select; YSI, Yellow Springs, OH), and an infusion of 20% dextrose was adjusted to maintain the blood glucose concentration at the fasting level. Blood samples were also collected every 20–30 min for measuring plasma insulin concentrations (DPC Immulite 2000; Diagnostic Products, Los Angeles, CA). The steady-state period of the clamp was defined as a ≥60-min period (1–2 h after the beginning of the insulin infusion) where the coefficient of variation for blood glucose, plasma insulin, and glucose infusion rate was <5%. Mean values during the steady-state period were used to calculate SI_{Clamp}. The glucose clamp–derived index of insulin sensitivity (SI_{Clamp}) was defined as M/(G × ΔI) corrected for body weight, where M is the steady-state glucose infusion rate (milligram per minute), G is steady-state blood glucose concentrations (milligrams per decaliter), and ΔI is the difference between basal and steady-state plasma insulin concentrations (microunits per milliliter).

### FSIVGTT and minimal model analysis.

The studies of insulin-modified FSIVGTT were carried out in the Clinical Center at the National Institutes of Health beginning at ∼8:30 a.m. after an overnight fast as previously described (19). Briefly, a bolus of glucose (0.3 g/kg) was infused intravenously over 2 min. Twenty minutes after initiation of the glucose bolus, an intravenous infusion of insulin (4 mU · kg^{−1} · min^{−1} regular Humulin) was given for 5 min. Blood samples were collected for blood glucose and plasma insulin determinations. A total of 30 blood samples were drawn over 3 h as previously described (19). Data were subjected to minimal model analysis using the computer program MINMOD (generous gift from R.N. Bergman) to calculate the minimal model index of insulin sensitivity (SI_{MM}) (17).

### QUICKI.

QUICKI was calculated as previously defined from fasting glucose and insulin values (19). QUICKI = 1/[log(I_{0}) + log(G_{0})], where I_{0} is fasting insulin (microunits per milliliter) and G_{0} is fasting glucose (milligrams per decaliter). Because QUICKI is the reciprocal of the log-transformed product of fasting glucose and insulin, it is a dimensionless index without units.

### HOMA.

HOMA was calculated as G_{0} (mmol/l) × I_{0} (μU/ml)/22.5 (10).

### Calibration model.

Calibration is inverse regression (30). For the model *y* = *f* (*x;θ*) + *ε*, *x* is the independent variable, *y* is the dependent variable, *θ* is an unknown parameter, and ε is the random error, using an estimated model *y* = *f* (*x;θ̂*) to predict a new *y** for a given *x** is regression. Conversely, predicting a new *x** for a given *y** is calibration. If *x* values are prespecified as part of an experimental design, this is called classical or controlled calibration. If both *x* and *y* are random, the process is called random calibration. Because both QUICKI and the SI_{Clamp} are measured with error from a patient population, random calibration is the more appropriate method to use. In random calibration, there is no difficulty in specifying the conditional distribution of *x* given *y*, so that random calibration is similar to regression in prediction (e.g., just regress SI_{Clamp} on QUICKI). Here, we fitted a calibration model *x*_{i} = α + β*y*_{i} + ε_{i}, where *x*_{i} is the SI_{Clamp}, *y*_{i} is the surrogate index, and ε_{i} is the random error for the *i*th subject. It was assumed that the random error had Gaussian distribution with mean = 0 and a constant variance. Even though SI_{Clamp} is measured with error, it was assumed for our model that the measurement error of SI_{Clamp} (determined from a robust, direct, and data-intensive protocol) is very small relative to that of simple surrogates determined from single fasting measurements (e.g., QUICKI). Therefore, to simplify the analysis, we neglected the measurement error for SI_{Clamp} in our calibration model. For each surrogate index, two types of predicted residuals were considered. The first type of residual is the difference between the measured SI_{Clamp} (*x*_{i}, for the *i*th subject) and the fitted SI_{Clamp} (◯_{i} = *â* + β̂*y*_{i}). That is, the residual *e*_{i} = *x*_{i} − ◯_{i}, is derived from the calibration model with all subjects included in the estimation of model parameters α and β. The second type of residual is a cross-validation type predicted residual *e*_{(i)} = *x*_{i} − ◯_{(i)}, where *x*_{i} is still the measured SI_{Clamp}, but ◯_{(i)} = â + β̂*y*_{(i)} is the predicted SI_{Clamp} from the calibration model that excludes the *i*th subject, and the subscript (*i*) means “with the *i*th subject deleted.” Then, two useful criterion functions were used for the evaluation of prediction accuracy: square root of the mean squared error of prediction {RMSE = [Σ*e*_{i}^{2}/(*n* − 2)]^{1/2}} and leave-one-out cross-validation–type root mean squared error of prediction {CVPE = [Σ*e*_{(i)}^{2}/*n*]^{1/2}}. Smaller values of RMSE and CVPE indicate better prediction. However, RMSE is likely to underestimate prediction errors, and CVPE is more robust.

### Boxplots.

The distribution of residuals for each surrogate index was displayed with a boxplot. This is a graphical representation of the bulk of the data where the lower and upper edges of the box represent the first and third quartiles, respectively. The median is designated by a horizontal line segment inside the rectangle. The “whiskers” extend vertical lines from the center of each edge of the box to the most extreme data values that are no farther than 1.5 × interquartile range (IQR) (the third quartile minus the first quartile) from each edge. All points that are more extreme than the “whiskers” identify potential outliers and are plotted separately on the graph.

### Statistical analysis.

Student’s *t* tests were used to compare different subgroups of subjects with respect to clinical characteristics when appropriate. To compare the predictive accuracy of QUICKI and other surrogates in terms of CVPE and RMSE, we performed hypothesis testing with the one-sided alternative hypothesis that QUICKI had a smaller RMSE or CVPE than another surrogate using a Bootstrap percentile method with 60,000 replications performed for each comparison (31). The bootstrap method is appropriate because the RMSEs (or CVPEs) corresponding to QUICKI and other surrogates were derived from the same group of subjects and thus correlated. The *P* values calculated for comparison of RMSE and CVPE were for pairwise comparisons. For example, when QUICKI and HOMA were compared with respect to the CVPE based on the 116 patients, a bootstrap percentile method with 60,000 replications was used to get a sample of 60,000 differences in CVPE [CVPE (HOMA) − CVPE (QUICKI)], and then a *P* value for one-sided superiority testing was estimated as the proportion of the bootstrap replications less than zero. One-sided hypothesis testing was used because multiple previous studies have demonstrated the superiority of QUICKI as a surrogate index of insulin sensitivity from a variety of perspectives supporting an a priori expectation. *P* < 0.05 was considered to indicate statistical significance. The software used for statistical analysis and the random calibration model was the SAS System V8 and Resampling Stats 5.0.2.

## RESULTS

The clinical characteristics of our study subjects are shown in Table 1 for the entire cohort and in Table 2 for the subset of 78 subjects who also underwent insulin-modified FSIVGTT with minimal model analysis. Clinical characteristics for nonobese, obese, diabetic, and hypertensive subgroups were similar between the entire cohort and the subset. The mean BMI, fasting insulin, total cholesterol, and LDL were all significantly higher in the obese, diabetic, and hypertensive subjects when compared with the healthy, nonobese subjects. Note that none of the subjects in the nonobese, obese, and hypertensive groups had diabetes.

### Determinations of insulin sensitivity.

Mean SI_{Clamp}, QUICKI, HOMA, log(HOMA), 1/HOMA, and fasting insulin for each subset of our entire cohort were calculated from data obtained during the hyperinsulinemic-isoglycemic glucose clamp as described in research design and methods (Table 3). During the glucose clamp, steady-state mean blood glucose levels were 85 ± 2, 86 ± 3, 159 ± 7, and 83 ± 2 mg/dl for nonobese, obese, diabetic, and hypertensive subjects, respectively. The steady-state mean plasma insulin levels were 272 ± 24 (nonobese), 334 ± 22 (obese), 280 ± 10 (diabetic), and 302 ± 20 μU/ml (hypertensive). Mean glucose infusion rates at steady state were 870 ± 50 (nonobese), 802 ± 64 (obese), 798 ± 54 (diabetic), and 800 ± 55 mg/min (hypertensive). As determined by SI_{Clamp}, diabetic subjects were the most insulin resistant, followed by obese subjects and hypertensive subjects. As expected, nonobese subjects were the most insulin sensitive (Table 3). All of the simple surrogate indexes of insulin sensitivity, except for fasting insulin, also determined an identical rank order of insulin sensitivity. The degree of insulin resistance in the presence of obesity in the diabetic and hypertensive groups tended to be higher when compared with nonobese diabetic and hypertensive subjects, respectively. However, these tendencies as determined by SI_{Clamp} did not achieve statistical significance (data not shown). The distribution of values for SI_{Clamp} in our cohort is shown in Fig. 1*A*.

In the subset of 78 subjects who underwent FSIVGTT, minimal model analysis was used to generate SI_{MM} (Table 4). As evaluated by SI_{Clamp}, the same rank order of insulin sensitivity among disease groups observed in the entire cohort was also maintained in the subset of 78 subjects (Table 4). In this subset, as with the entire cohort, rank order of insulin sensitivity determined by QUICKI, HOMA, log(HOMA), and 1/HOMA agreed with SI_{Clamp}. Of note, a different rank order of insulin sensitivity was determined by both SI_{MM} and fasting insulin. These results are consistent with our previous studies, demonstrating that the correlation between SI_{Clamp} and QUICKI or log(HOMA) is substantially and significantly better than that between SI_{Clamp} and SI_{MM} (19,21). The distribution of values for SI_{Clamp} in the subset of 78 subjects is shown in Fig. 1*B*.

### Calibration model analysis.

As described in research design and methods, we regressed measured SI_{Clamp} for each subject on each surrogate index and fitted these data to a calibration model. This determined model parameters α and β for each surrogate index in the entire cohort (116 glucose clamps) as well as for the subset of 78 subjects who also underwent FSIVGTT (Table 5). We then used the fitted calibration model (using leave-one-out cross-validation analysis) to generate plots of the values for predicted SI_{Clamp} by each surrogate index as a function of the measured SI_{Clamp} determined from the actual glucose clamp results in our entire cohort (Figure 2). If a surrogate index perfectly predicted SI_{Clamp}, results for each subject would fall on a straight line with a slope of 1 and a *y*-intercept of 0. By inspection, it is clear that QUICKI and log(HOMA) generated more accurate predictions of SI_{Clamp} (closer to the ideal line) than HOMA, 1/HOMA, or fasting insulin. In addition, a linear least-squares fit between predicted SI_{Clamp} and measured SI_{Clamp} derived from QUICKI and log(HOMA) data also had correlation coefficients (*r* = 0.75 and 0.73, respectively) that were significantly higher than HOMA, 1/HOMA, and fasting insulin (*r* = 0.11, 0.66, and 0.15, respectively). This is important because it is possible that a surrogate index may have systematic errors that result in inaccuracy but still have significant positive predictive power if it can correctly rank the degree of insulin sensitivity. The fact that the linear correlation of predicted SI_{Clamp} versus measured SI_{Clamp} was best for QUICKI and log(HOMA) suggests that these surrogates are also likely to have the best predictive power for outcomes related to insulin sensitivity. When we generated data from the calibration model using RMSE analysis, we obtained results similar to those shown in Fig. 2 (data not shown). For the subset of 78 subjects who underwent FSIVGTT, we plotted predicted SI_{Clamp} (determined by leave-one-out cross-validation analysis) as a function of measured SI_{Clamp} for both QUICKI and the minimal model SI_{MM} (Fig. 3). Results from QUICKI were closer to ideal than results from SI_{MM}. In addition, the linear correlation between predicted SI_{Clamp} and measured SI_{Clamp} was significantly higher for QUICKI than for SI_{MM} (*r* = 0.75 vs. 0.53).

To quantitatively assess predictive accuracy for each surrogate index, residuals (measured SI_{Clamp} − predicted SI_{Clamp}) generated from random calibration analysis were used to calculate the CVPE and RMSE as described in research design and methods. For the entire cohort, QUICKI and log(HOMA) had comparable CVPEs that were significantly smaller than HOMA, 1/HOMA, and fasting insulin (Table 6). Similar results were observed for the less robust RMSE. For the subset of 78 subjects who underwent FSIVGTT, SI_{MM} had the largest CVPE and RMSE, indicating that the minimal model had the least predictive accuracy. QUICKI and log(HOMA) had the smallest CVPE and RMSE (Table 7). The *P* values shown for comparison of RMSE and CVPE are for pairwise comparisons. For example, when QUICKI and HOMA were compared with respect to the CVPE based on the 116 patients, a bootstrap percentile method with 60,000 replications was used to get a sample of 60,000 differences in CVPE [CVPE (HOMA) − CVPE (QUICKI)], and then a *P* value for one-sided superiority testing was estimated as the proportion of the bootstrap replications less than zero. It is possible that the multiple comparisons could inflate the risk of finding small *P* values just by chance. We did not perform any adjustment for multiple comparisons because this is an exploratory study rather than a confirmatory study. To exclude the presence of leverage and influential points, we performed an analysis of our data to evaluate Cook’s distance for each subject. We found large values for Cook’s distance only for subject 52 (of 116) in the RMSE analysis of HOMA versus SI_{Clamp}, log(HOMA) versus SI_{Clamp}, and fasting insulin versus SI_{Clamp}. For all other patients in all other comparisons, the Cook’s distance was much smaller than 1. Therefore, we repeated our analyses excluding subject 52. The results of these analyses were similar to those of the original analyses. That is, QUICKI and log(HOMA) had the smallest RMSEs that were significantly smaller than HOMA, 1/HOMA, and fasting insulin (data not shown).

To further evaluate the predictive accuracy of various surrogate indexes, we used boxplots to display the distribution of residuals. For the entire cohort, QUICKI and log(HOMA) had the narrowest distribution of residuals with a median closer to zero, narrower IQR, and fewer outliers with smaller magnitude when compared with HOMA, 1/HOMA, or fasting insulin (Fig. 4*A*). In the subset of 78 subjects who underwent FSIVGTT, QUICKI also had a more favorable distribution of residuals than SI_{MM} (Fig. 4*B*). As another way to display this data, we plotted residuals versus the predicted SI_{Clamp} (from leave-one-out cross-validation analysis for each surrogate index) (Fig. 5). If a surrogate index predicts SI_{Clamp} well, the residuals should be close to zero with a random pattern. With this analysis, QUICKI, log(HOMA), and 1/HOMA had similar distributions of residuals whereas the residuals for HOMA and fasting insulin tended to increase more with an increase in predicted SI_{Clamp}. The plot of residuals versus predicted SI_{Clamp} for QUICKI and SI_{MM} for the 78 subjects who underwent FSIVGTT is shown in Fig. 6.

## DISCUSSION

The incidence and prevalence of type 2 diabetes, obesity, and the metabolic syndrome are increasing at an alarming rate in the U.S. and around the world. Insulin resistance is a key pathophysiological marker for all of these major public health problems. Therefore, developing simple, reliable, and accurate methods for quantifying insulin sensitivity in humans is an important goal. The best previous studies evaluating simple surrogate indexes of insulin sensitivity such as QUICKI and HOMA examined the correlation between a surrogate index and the reference standard glucose clamp estimate of insulin sensitivity (10,19–21,24,25,32–36). Some studies have also evaluated the positive predictive power of simple surrogates for some clinical outcome such as the onset of diabetes or carotid artery intima-media thickness (11,37). In the present study, we evaluated, for the first time the predictive accuracy of various surrogate indexes of insulin sensitivity/resistance using a calibration model.

Our entire cohort was relatively large (116 glucose clamp studies) and contained both normal healthy subjects and subjects with obesity, essential hypertension, and type 2 diabetes. The metabolic and hemodynamic characteristics of these subjects were as expected. In particular, subjects with hypertension, obesity, and diabetes were significantly insulin resistant on average. Note that subjects in the nonobese, obese, and hypertensive groups did not have diabetes. This is important because QUICKI and HOMA indexes include glucose in their calculation, and it is possible that these indexes may reflect glucose tolerance as well as insulin resistance. However, this seems unlikely because both QUICKI and HOMA use fasting glucose levels in their calculations and fasting glucose levels are steady-state levels that are not a reflection of glucose utilization after a glucose load. The rank order of insulin resistance for these groups determined by the reference glucose clamp method was also determined by QUICKI, log(HOMA), HOMA, and 1/HOMA but not by fasting insulin or SI_{MM}. This crude comparison suggests that QUICKI and HOMA (and its various transformations) are superior to fasting insulin and the minimal model in terms of positive predictive power. These results also underscore the importance of comparing surrogate indexes with a reference standard, because comparisons with SI_{MM} alone may lead to erroneous conclusions (38,39,40,41). It is also interesting to note that the average values obtained for SI_{MM} in most patient subgroups (nonobese, obese, and hypertensive) were less than the average values for SI_{Clamp} (Table 4). This is consistent with our previous findings that SI_{MM} systematically underestimates SI_{Clamp} (42). This was not true for the diabetic subjects because some diabetic subjects had to be excluded because of well-known artifacts in minimal model analysis of subjects with inadequate insulin secretion leading to negative values for SI_{MM} (19). It is also important to note that all indexes of insulin sensitivity including those based on fasting glucose and insulin, oral glucose tolerance tests, intravenous glucose tolerance tests, and the glucose clamp depend, in part, on measuring insulin levels. Because of significant laboratory-to-laboratory variability in insulin determinations and the lack of standardization in insulin assays, it is not possible at this time to determine universal cutoffs that define insulin resistance using QUICKI or any other method of determining insulin sensitivity that depends on insulin measurements.

The distribution of values for insulin sensitivity in our cohort as determined by SI_{Clamp} covered a wide range between 1 and 11. However, these values were not evenly distributed across the entire range. The bulk of the subjects had values between 1 and 5 for both the entire cohort and the subset of subjects who also underwent FSIVGTT. It is possible that this uneven distribution may bias our calibration analysis. However, because the primary utility of QUICKI and other surrogate indexes of insulin sensitivity is to identify and characterize subjects with insulin resistance, the overrepresentation of insulin-resistant subjects in our cohort should not significantly affect the reliability of our calibration analysis results with respect to accuracy in insulin-resistant subjects. In addition, it is likely that the analysis from our entire cohort of 116 subjects is more reliable than that from the subset of 78 subjects simply because more data are included for the calibration analysis.

Because previous studies demonstrated a linear relationship between SI_{Clamp} and QUICKI or SI_{MM} (19,21,43), we chose a standard calibration model to evaluate the predictive accuracy of various simple surrogate indexes of insulin sensitivity. This model was sufficient to demonstrate the predictive accuracy of QUICKI and log(HOMA). When an expensive or laborious but accurate measurement method is replaced by an inexpensive and quick but indirect method, application of a calibration model is particularly appropriate for validating the surrogate index (30). Predictive accuracy was assessed by two criterion functions, a commonly used RMSE and a so-called “leave-one-out” CVPE. CVPE is more robust than RMSE because CVPE uses an estimate that excludes the *i*th subject when predicting results for the *i*th subject. This reflects more closely a clinical situation in which data for each new patient is based on a model obtained from previous patients. CVPE also handles extremes in data in a more rigorous fashion and has less tendency to underestimate error than RMSE. In our study, the values for CVPE and RMSE were similar, suggesting that there were no extreme outliers in our dataset that were biasing our results. When we excluded the only subject with a large value for Cook’s distance and repeated our calibration analysis, results were similar to the original analysis. That is, QUICKI and log(HOMA) had the smallest RMSEs that were significantly better than HOMA, 1/HOMA, and fasting insulin. Among the simple surrogate indexes tested here, QUICKI and log(HOMA) were the most accurate in predicting SI_{Clamp} and had significantly lower CVPE and RMSE than other simple surrogates. Of note, in the subset of 78 subjects who also underwent FSIVGTT, minimal model analysis (SI_{MM}) had the worst predictive accuracy for SI_{Clamp} with the highest CVPE and RMSE. Consistent with these findings, QUICKI and log(HOMA) clearly had the narrowest and most favorable distribution of residuals. The plots of residuals versus predicted SI_{Clamp} are useful to explore features of the calibration model. A few relatively large residuals may be an indication of outliers (i.e., subjects for whom the model is somehow inappropriate). An increase in the magnitude of residuals as a function of the magnitude of predicted SI_{Clamp} may indicate nonconstant residual variance or heterogeneity of subjects. Our normal subjects tended to have larger residuals that were more spread out, suggesting that they may have more heterogeneity in their determinants of insulin sensitivity than other subgroups. Our linear calibration model assumes that the standard deviation for SI_{Clamp} is constant. If there is large variability in the standard deviation for SI_{Clamp}, then other calibration models may be more appropriate. In addition, the fact that the more extreme residuals were mostly positive values may reflect a bias in the calibration results because our cohort did not include large numbers of subjects who were very insulin sensitive.

It is important to note that in our subset of 78 subjects, there were only 11 with diabetes. Because changes in glucose do not contribute much to any of the other surrogate indexes in nondiabetic subjects, this helps to explain why fasting insulin performed relatively better in the subset of 78 subjects than it did in the entire cohort of 116 subjects. The relative paucity of diabetic subjects in this subset also helps to explain why the calibration model parameter β changed by approximately fourfold when the value of β determined from the entire cohort is compared with value determined from the subset.

Previous studies have documented excellent linear correlation between QUICKI and the reference standard glucose clamp in a variety of insulin-resistant diseases including type 2 diabetes, obesity, hypertension, gestational diabetes, and polycystic ovary syndrome (8,19–24,26,27,44). In addition, test characteristics of QUICKI and log(HOMA) including coefficient of variation and discriminant ratio are comparable with those of the glucose clamp and superior to other simple surrogates (20). Moreover, QUICKI can appropriately follow changes in insulin sensitivity after various therapeutic interventions (8,20–23). Taken together with the superior predictive accuracy of QUICKI demonstrated in the present study, these findings help to explain why QUICKI is among the best simple surrogate indexes of insulin sensitivity for predicting the onset of diabetes and increased carotid artery intima-media thickening (11,37). However, future studies are required to investigate whether the superior accuracy of QUICKI demonstrated in the present study translates into a significant clinical benefit.

In summary, using a random calibration model in a relatively large cohort consisting of normal, obese, hypertensive, and diabetic subjects, we demonstrated that QUICKI and log(HOMA) are superior to other simple surrogates of insulin sensitivity in accurately predicting insulin sensitivity determined by the reference SI_{Clamp}. We conclude that QUICKI and log(HOMA) are among the most accurate and useful surrogate indexes for determining insulin sensitivity in humans.

## Footnotes

- Accepted April 4, 2005.
- Received September 11, 2004.

- DIABETES