## Abstract

OBJECTIVE—The transition of an individual from normoglycemia to diabetes has generally been thought to involve either moderate or rapid changes in glucose over time, although few studies have analyzed these changes. We sought to determine whether a general pattern of glucose change exists in most individuals who become diabetic.

RESEARCH DESIGN AND METHODS—We examined longitudinal data from Pima Indians who developed diabetes after several biennial examinations to characterize changes in 2-h plasma glucose. A distinct pattern of glucose change was apparent in the time course of most individuals, an initial linear trend followed by a steeper rise in glucose values. A model consisting of additive linear and exponential functions was hypothesized to account for this pattern and was tested for goodness of fit on 55 individuals who became diabetic after at least 10 previous examinations.

RESULTS—The combined linear and exponential model provided a significantly better fit than linear or exponential models alone in 40 of the 55 cases (*P* < 10^{−38}). Using this model, the timeframe over which glucose values rose suddenly was estimated, having a median time to onset of <4.5 years from the time at which the exponential effect had contributed a modest increase of 10 mg/dl to the initial linear trend.

CONCLUSIONS—We conclude that there are two distinct processes affecting glucose levels in most individuals who progress to type 2 diabetes and that the rapid glucose rise identified in these people may be an important period for physiologic and preventive research.

Type 2 diabetes is a condition of sustained elevated blood glucose concentration. As the disease develops, glucose concentration rises from a state that is considered normal through levels indicative of impaired glucose homeostasis and finally to the hyperglycemia defining diabetes. Although the progression to diabetes is described by various models (1–5a), most of these studies relied on pooled data, rather than on analyses of glucose dynamics in individuals. While these studies have been helpful for describing changes in mean glucose levels of a population, they may not accurately account for the processes involved with glucose change in the individuals themselves.

The pattern of progression to type 2 diabetes in individuals is difficult to quantify for several reasons. First, type 2 diabetes is usually diagnosed in adulthood, leaving a longer pre-onset time span over which the disease may progress than a disease of younger onset. This necessitates the study of individuals for long periods of time over which the disease may eventually unfold. Second, the glucose tolerance tests in common use throughout most epidemiologic studies have relatively large intra-individual variation (6–12). Whether variations in glucose from visit to visit are due to a slowly moving trend or to these random fluctuations may be difficult to ascertain over a short timeframe. Third, glucose is not routinely measured in nondiabetic individuals. There are only a few longitudinal studies with cohorts of healthy individuals in whom glucose is measured over many years or decades. Fourth, it is likely that overall rates of progression and even the patterns of progression themselves may differ among individuals. Hence, an overall trend present in a majority of individuals may be difficult to observe.

To overcome problems of measurement variation and heterogeneous progression, some investigators have examined the mean glucose levels before and after diabetes diagnosis among participants in longitudinal studies who developed the disease during the period of observation. These models suggested a rapid glucose rise as the individuals pass the diabetes threshold (4,5a,13,14). However, because the diagnosis is based on a threshold glucose level, these findings may reflect the dichotomous nature of this diagnosis rather than any physiologic process, since any diagnostic threshold will artifactually produce a rapid rise in the mean of those who exceed the threshold for the first time (appendix). Also, such pooled data produce average rates of glucose change, which may be uncharacteristic of that found in the individuals themselves.

The rate at which hyperglycemia progresses may have potential implications for preventive strategies. For example, different economic analyses of the lifestyle intervention used in the Diabetes Prevention Program reached vastly different conclusions about its cost-effectiveness (15,16), at least partially as a consequence of different assumptions about the rate of progression of hyperglycemia (17). In the present study, we analyze the rate at which 2-h plasma glucose concentrations change during progression from the nondiabetic state to diabetes among Pima Indians who participated in a longitudinal study. A model in which an individual's glucose concentration initially changes linearly followed by a steeper exponential rise is proposed to describe the progression to diabetes. This model, herein named the glucose effects model (GEM), allows for assessment of whether the rapid exponential component is generally responsible for the development of diabetes or whether a single linear process is sufficient. We also conduct simulations to determine the potential bias introduced when the diagnosis is based on a threshold value.

## RESEARCH DESIGN AND METHODS

Since 1965, Pima Indians of the Gila River Indian Community in Arizona have participated in a longitudinal study of diabetes and related conditions. Community residents aged ≥5 years are invited to a research examination every 2 years. These examinations include measurements of plasma glucose from a 75-g oral glucose tolerance test (OGTT). Informed consent was provided by adult participants and parents or guardians of minors.

For the present analysis, diabetes diagnosis was based on the 1985 World Health Organization criterion of 2-h plasma glucose ≥200 mg/dl (11.1 mmol/l). The use of the 1985 criterion, which is based solely on 2-h glucose values, allows for the use of examinations before 1975, when fasting plasma glucose was not measured routinely. Fifty-five individuals who had at least 10 nondiabetic biennial OGTT measurements followed by a 2-h plasma glucose ≥200 mg/dl were selected for the analysis. Individuals who were diagnosed with diabetes before attaining this 1985 World Health Organization criterion, either outside of the study or solely by fasting plasma glucose after 1997, when the additional American Diabetes Association fasting criterion of ≥126 mg/dl (7.0 mmol/l) was instituted, were excluded, as they may have changed behavior and/or begun treatment before the next study exam as a consequence of their diagnosis. The goal of this study was to determine generally unmodified glucose patterns leading to a diabetes diagnosis in individuals; hence, OGTT measurements taken after the initial diagnosis were excluded from the analysis to limit effects of treatment. Four individuals who solely met the American Diabetes Association fasting criterion before 1997 were not excluded from analysis, as they were not diagnosed with diabetes at that time.

### Exploratory data analysis.

To quantify glucose changes and behaviors in individual progressors, we inspected glucose profiles of hundreds of participants who became diabetic over various time courses. In approximately half of these individuals, an apparently moderate linear pattern of glucose increase was followed by a larger jump in glucose values (Fig. 1*A* and *B*). An additional distinctly occurring pattern (∼25%) was that of a linear decline in glucose values followed by a gradual turn upwards that increased in slope, resulting in an overall J shape (Fig. 1*C* and *D*). The remaining individuals followed a more erratic pattern of glucose values with an overall growing trend (Fig. 1*E* and *F*).

### Model construction.

A linear change in glucose values often having a nonzero slope was noted in a majority of individuals early in the time course of progression. Subsequently, the glucose values rose more rapidly. Since an exponential curve has negligible values initially, it was plausible to consider an exponential effect in addition to the previously noted linear effect without distorting the initial linear growth. Such a composite model would then portray an initial period of linear change (having either positive, zero, or negative slope), subsequently followed by a gradual transition to an exponential rate of increase. As multiple processes affecting glucose levels are likely to occur simultaneously (such as long-term changes in insulin resistance and insulin secretory function), the overlap of these effects in the model is physiologically plausible (Fig. 2). This GEM describes the glucose value (*G*) at a given age (*t*) in years as:
where the parameter *a* (mg/dl per year) may be interpreted as the initial linear glucose slope of an individual, the parameters *b* (mg/dl) and *d* (years) are location parameters, and the parameter *c* (year^{−1}) is a scaling factor. G_{0} is defined to be 1 mg/dl, representing the base level scaled by the exponential effect.

### Model testing/statistical analyses.

While no model has previously been tested for describing the overall pattern of glucose change in individuals, mean levels of glucose in population studies appear to progress linearly (16,18), though such a population rate reflects the average of individual trajectories that may or may not be linear in time. We compared the GEM against such a linear model as well as other possible trajectories, namely a linear effect alone (*G*_{lin}), an exponential effect alone (*G*_{exp}), an exponential effect plus a constant (*G*_{expc}), and a continuous piecewise linear spline with a single knot (*G*_{spline}). These models are described as follows:
where *q* is the knot location defined to be (*n* − *b*)/(*a* − *m*) to make the spline continuous.

Each of these models was fit separately to every individual who met the selection criteria to determine which of the models best described the glucose pattern in a given individual. Extensive initial parameter grid searches were designed, and PROC NLIN (Version 8.02; SAS Institute) was used to perform all model fits. A determination of which model was generally better at describing the glucose patterns in the individuals of the study could then be made.

As the linear and exponential alone models were nested in relation to the GEM, goodness-of-fit comparisons for each of these models were made with an *F* ratio test, which adjusts for the decrease in sum of squared error (SSE) due to the increase in total model parameters. (A more complex model will always have an SSE less than or equal to that of its nested models.) To determine which of the models was more likely for an individual, the more complicated model (e.g., GEM vs. linear only) was taken as providing a significantly better fit if the *F* test gave *P* < 0.05. For comparison of the GEM and spline models, which are not nested but have the same number of model parameters, Wilcoxon's signed rank sum test was used to test for any significant difference between the average individual SSE produced by these two models.

To determine overall which of the models most adequately described the glucose changes in the average progressor, the number of individuals for which a more complex model was selected was compared with the number of individuals expected to be selected by that model if the simpler model was the true profile. Significance of the difference between the number of observed and expected selections was calculated using the test of binomial proportions. The one-tailed *P* value from this test is determined by the formula:
where *n* is the number of times a model was selected, *E* is the proportion of times the model was expected to be selected due to our chosen level of α, and 55 is the number of model fits compared in this study. For nested model comparisons, *E* was equal to the α level used (0.05). Two-sided *P* values were obtained by multiplying the one-tailed value by a factor of two and are used herein. These *P* values were tabulated using Microsoft Excel 2002.

### Simulations.

The *F* ratio test uses an *F* distribution that is only approximate for nonlinear models, although it works quite well for hypothesis testing (19). Another source of bias in comparing growth models occurs when study participants are selected based on their having reached a cutoff value (e.g., intermediate glucose tolerance, diabetes, etc.). Such a cutoff introduces a natural bias on the prior range of the slopes of these participants, the extent of which is related to the prior number of biennial exam measurements. As this applied to the unique subset of the population we were analyzing (2-h plasma glucose <200 mg/dl for at least 10 biennial exams), it was desirable to determine the extent this influenced model selection. To estimate the overall bias of these factors, simulations were run to show the likelihood of surpassing a threshold while following a linear pattern with various numbers of biennial exams before the threshold. These simulations were performed using estimates of linear rise in glucose obtained from all individuals participating in the longitudinal study and were used to validate the appropriateness of the methods used herein for model comparisons (20).

From these analyses, an expected value was derived for the proportion of times the more complicated model would be selected if the true effect was linear. The number of individuals for whom the more complicated model was selected in actual data was compared with this “empirically” derived expected value to determine the statistical significance corrected for these sources of bias.

## RESULTS

Characteristics of the 55 individuals are summarized in Table 1. The various proposed models of glucose change were fit to each individual and the SSE calculated. For comparison of the nested models, the *F* ratio test was used to determine whether a given model produced a significantly better fit. This test was performed for each model comparison on each individual to determine whether the relative change in SSE was greater than expected by chance given the difference in total model parameters. Combined results for the number of times a model was selected over an alternative model at an α level of 0.05 for the entire study are shown in Table 2.

Overall model comparison was made by comparing the proportion of total times a given model was selected over an alternative model with the expected number of times the model should have been chosen due to chance. The *P* value of this model comparison was determined using the test for binomial proportions, and these results are also listed in Table 2. The GEM model produced a significantly lower SSE than the best linear fit in 41 of the 55 individuals analyzed. For α = 0.05, the total number of times the GEM should have been chosen over the linear model by the *F* ratio test due to chance was 2.75 (55 × 0.05). The binomial proportions test indicates that the observed 41 selections is significantly different from the expected value of 2.75 with *P* < 10^{−40}.

The GEM also provided a significantly better fit to individual glucose profiles more often than the exponential alone model in 40 of the 55 cases (*P* < 10^{−38}). It was also chosen nearly three times as often as expected in preference to a model consisting of an exponential function plus a constant (*P* = 0.012). This nested model corresponds to the GEM with an initial linear slope equal to zero. Thus, the likelihood suggests that there is a linear slope significantly different from zero more often than would be expected by chance. Visual examination of the model fits on the data showed the model's ability to conform to a variety of progressive patterns. Glucose profiles together with the GEM fit for four individuals are shown in Fig. 3*A*–*D*.

While the GEM captured the overall glucose pattern significantly better than less complex models in most individuals, the GEM was not selected for others in whom the fit showed only insignificant improvement. This occurred with some individuals for whom the initial linear trend approached quite near the cutoff level for diabetes before attaining it. While subsequent OGTT exams in some of these individuals had been made and showed continued glucose increase that would have favored the GEM, these data points were not used in fitting the models as specified in research design and methods.

Wilcoxon's signed rank sum test was used for overall comparison of the GEM and single knot spline model fits on the 55 individuals. There was no significant difference in SSE between the models (*P* = 0.70). As both of these models are able to describe two distinct growth rates, their agreement was not surprising. Application of this test to compare the linear and exponential alone models showed the exponential alone model to be favored over the linear model (*P* < 10^{−9}).

Summary statistics for each of the parameters in the glucose effects model for the 55 individuals are summarized in Table 3. The median value of “*a*” was 0.75 mg/dl per year with an interquartile range of −0.44–1.51 mg/dl per year. The median value of “*b*” was 93.03 mg/dl. The median value of the time scaling factor “*c*” for the exponential function was 0.55/year, and the parameter “*d*” had a median value of 35.85 years.

The age at which the exponential effect begins to contribute 1 mg/dl to the overall glucose level of an individual is the value of the parameter “*d*,” and for all but five of the individuals occurred within the scope of the data. Such a time of glucose increase will occur before onset (median age of onset 43.4 years), and the median time between this point of increase and the earliest measure of glucose at a diabetic level was 8.3 years. The age at which this exponential growth had contributed 10 mg/dl can also be solved using the model and was found to be 40.2 years, occurring a median of 4.1 years (interquartile range 1.2–9.2) before diagnosis. Within less than an additional 1.5 years, an extra 10 mg/dl of rise in glucose was contributed by this exponential effect in most of the individuals, bringing the median time from the point of the exponential effect's glucose contribution of 20 mg/dl to diabetes onset to <3 years.

Using the GEM, it was also possible to estimate the instantaneous glucose slopes for each individual throughout the time course of the measurements by taking the derivatives of the estimated model functions with respect to time. Figure 4 shows the magnitude of these glucose changes for the individuals studied at various times preceding diagnosis. It is apparent that as an individual is closer to onset, the glucose values begin to increase more rapidly, with a median growth rate of ∼15 mg/dl per year 2 years before diagnosis.

Approximately 6 years before diabetes onset there were no longer any individuals with an overall negative slope, the exponential effect outweighing any initial downward trend. From this point forward, the average glucose rate increased rapidly. Before ∼12 years from onset, the exponential contribution was negligible. A possible explanation is that for most progressors the exponential effect begins near or after such a time and affects the glucose level increasingly thereafter.

As participants in the study were required to have glucose measurements below a cutoff value for at least 10 biennial exams, a downward bias on the initial slopes of the progressors was probable. Also, an additional upward bias in final glucose increase may have been present due to the requirement of attaining this same threshold of 200 mg/dl. The overall effect of these biases on the analysis was calculated using numerical simulations outlined in research design and methods. The simulations estimated the overall bias at the α = 0.05 level to inflate the expectation of the GEM versus the linear model by a factor of 1.08 (i.e., the expected proportion of individuals in whom the more complex model is preferred is 0.054 [1.08 × 0.05], rather than the nominal 0.05, if the true effect is linear). Such an inflation did not alter the statistical significance of the finding between these models (i.e., all the tests favoring the GEM over less complex models were still significant when tested against *E* = 0.054 instead of *E* = 0.05).

## DISCUSSION

The finding that glucose concentration in most individuals who attain hyperglycemia changes linearly, subsequently followed by an exponential growth, gives new insight into the process of diabetes development, including the timeframe over which the disease process occurs. While a period of rapid glucose rise had previously been suspected, the model proposed herein describes the overall pathogenesis of the disease in terms of the glucose concentration on the individual level, while at the same time accounting for the threshold bias that was not addressed in these prior studies. The model also provides a framework to more accurately test hypotheses for disease development.

Determining the rate at which glucose levels change in individuals longitudinally provides a number of insights not possible in cross-sectional analyses. These insights include a successful determination of *1*) the average timeframe over which abnormal glucose change leading to diabetes onset occurs, *2*) whether glucose patterns appear to be the result of a single or multiple effects, and *3*) target time periods for effective preventive strategies. A definitive answer to these questions has been lacking, partly due to the lack of a theoretical framework for describing glucose level changes in individuals.

The model identified in the present longitudinal analysis was used to determine the rate of change of glucose in those who develop diabetes. The insights gleaned from a knowledge of this rate in Pima Indians include, *1*) while glucose levels rise steadily in many individuals, diabetic levels are usually attained following a period of rapid increase in glucose, occurring over a relatively short timeframe; *2*) glucose levels in individuals that have risen to a diabetic state are due to at least two different effects; and *3*) preventive strategies that slow the rapid rise of glucose in the exponential effect are likely to be most effective.

An additional analysis was performed on 213 individuals with only six to nine previous nondiabetic biennial measurements before a 2-h plasma glucose of 200 mg/dl to determine how well the model fit on individuals with fewer data points. Similar ratios of observed to expected model selections were found. Also, the median glucose slope (derivative) at baseline was nearly identical to that of our original study group. This comparison suggests that the GEM is representative of the disease process measured over shorter timeframes as well. Figure 3*E* shows the model fit to one of these individuals.

There are individuals whose initial linear rate of rise in glucose is faster than in the individuals with six or more nondiabetic examinations or whose diabetes developed at a younger age that precluded them from having so many nondiabetic measurements. The extent to which patterns described in this article apply to such people is unknown. However, the proportion of individuals with an easily detectable rapid rise in the present study appears quite similar to that found by Ferrannini et al. (4), who compared slope changes over a shorter study period.

We also compared model fits on 220 individuals who had not developed diabetes over the course of at least 11 exam periods. Although the proportion of those with negative slopes was similar between this group and those who developed diabetes, the latter had on average slightly, but significantly higher baseline slopes (0.774 mg/dl per year in those developing diabetes and 0.143 mg/dl per year in those who did not develop diabetes (*P* = 0.007 Wilcoxon-Mann-Whitney test). More importantly, however, the GEM was selected over a linear model in only 4.55% of these nondiabetic cases, indicating that the rapid jump was not yet significantly measurable in almost all of these individuals (i.e., a linear model describes their glucose dynamics, Fig. 3*F*). This suggests that while the linear component makes some contribution to development of the disease, the exponential component greatly accelerates progression to diabetes and is responsible for the majority of diabetes cases.

Without the initiation of the exponential effect, the onset of diabetes would have been delayed greatly, if not entirely. Using the model, we estimate that if the initial linear trends were followed without the secondary exponential effect, only 35% of the individuals would have attained diabetes by age 100 years. Of these, the median age of diabetes development would have been 77.1 years. The overall impact of this secondary effect was to shorten the diabetes-free lifetime in these individuals by a median of over 30 years.

During the final 4 years preceding diabetes, the glucose values (unsmoothed) for a vast majority of the individuals jump >50% of their total rise since baseline, with a median rise of >200%. As noticed previously (4,5a,13), this jump is considerable and invites additional study to determine the process by which this rise in glucose occurs. The rates of glucose increase determined in this final growth period are possibly unreliable estimates of the true rapid rise, as the intervals between exams were on average 2.7 years. More accurate measurement of the rate and timeframe of this final increase as well as identification of the physiologic determinants of the rapid glucose rise could be obtained through more frequent measurements of glucose and plausible correlates on a subset of normal individuals before development of diabetes. Understanding the physiology of the onset of diabetes will depend on understanding the cause(s) of the transition from the linear to the exponential phase. Interventions to prevent type 2 diabetes might be most effective at or around the time of this transition.

## APPENDIX

Diabetes is one of several diseases diagnosed by the attainment of a cutoff value. The use of any cutoff, whether clinically sensible or not, can obscure statistical interpretations of growth data due to the occurrence of an artificial “jump” in values of this variable before and at the time of diagnosis. Hence, a period of rapid onset cannot be concluded solely based on a dramatic rise in levels when passing a cutoff value.

Figure 5*A* shows the mean data for pooled glucose values of individuals before and following a diabetic measurement (2-h glucose ≥200 mg/dl). While the apparent jump is striking, it is also at least partly artifactual. Figure 5*B* shows the mean data for glucose values of the same population for the scenario that the diagnostic level of diabetes has been changed to various different glucose levels. A “jump” in the data is noticed even for very low threshold values. This highlights the potential artifactuality of the abrupt rise in mean values due to the use of any threshold value for diagnosis.

The explanation for this abrupt rise is that data before reaching the cutoff value are constrained to be in a range entirely below that value. At the time of “diagnosis,” the range of measurements is required to be above the threshold. As there is no overlap between the data ranges, a “jump” will occur. Such a jump may be the result of natural variation alone, while no change in the underlying growth rate has occurred. The study presented in this article attempted to determine the true existence of a rapid effect by modeling glucose profiles of individuals themselves, avoiding some of the pitfalls of this threshold dilemma.

While threshold bias will blur comparisons of the glucose effects model with other models in describing diabetes development in individuals, the model's theoretical framework allows for estimating this bias. Simulations showed the existence of a rapid effect to be significant after adjusting for this threshold bias. Without a model, the amount of bias may be difficult to assess.

## Acknowledgments

This research was supported in part by the Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) and the American Diabetes Association. C.C.M. was also supported by the Achievement Rewards for College Scientists (ARCS) Foundation Inc. and the present work submitted in partial fulfillment of the requirements for the degree of PhD.

The authors thank the members of the Gila River Indian Community whose selfless involvement in this study made these findings possible, the staff of the Diabetes Epidemiology and Clinical Research Section, NIDDK for conducting the examinations, and Dr. Robert G. Nelson for advice.

## Footnotes

Published ahead of print at http://diabetes.diabetesjournals.org on 1 May 2007. DOI: 10.2337/db07-0053.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

- Received January 18, 2007.
- Accepted April 25, 2007.

- DIABETES