Defining Pancreatic Endocrine Precursors and Their Descendants

  1. Peter White1,
  2. Catherine Lee May1,2,
  3. Rodrigo N. Lamounier1,
  4. John E. Brestelli1 and
  5. Klaus H. Kaestner1
  1. 1Department of Genetics and Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania
  2. 2Department of Pathology, The Children's Hospital of Philadelphia Abramson Research Center, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania
  1. Address correspondence and reprint requests to Klaus H. Kaestner, PhD, University of Pennsylvania, Medical School, Department of Genetics, 415 Curie Blvd., Philadelphia, PA 19104. E-mail: kaestner{at}


OBJECTIVE—The global incidence of diabetes continues to increase. Cell replacement therapy and islet transplantation offer hope, especially for severely affected patients. Efforts to differentiate insulin-producing β-cells from progenitor or stem cells require knowledge of the transcriptional programs that regulate the development of the endocrine pancreas.

RESEARCH DESIGN AND METHODS—Differentiation toward the endocrine lineage is dependent on the transcription factor Neurogenin 3 (Neurog3, Ngn3). We utilize a Neurog3–enhanced green fluorescent protein knock-in mouse model to isolate endocrine progenitor cells from embryonic pancreata (embryonic day [E]13.5 through E17.5). Using advanced genomic approaches, we generate a comprehensive gene expression profile of these progenitors and their immediate descendants.

RESULTS—A total of 1,029 genes were identified as being temporally regulated in the endocrine lineage during fetal development, 237 of which are transcriptional regulators. Through pathway analysis, we have modeled regulatory networks involving these proteins that highlight the complex transcriptional hierarchy governing endocrine differentiation.

CONCLUSIONS—We have been able to accurately capture the gene expression profile of the pancreatic endocrine progenitors and their descendants. The list of temporally regulated genes identified in fetal endocrine precursors and their immediate descendants provides a novel and important resource for developmental biologists and diabetes researchers alike.

Significant efforts to treat and potentially cure diabetes have been focused on generating renewable sources of insulin-producing β-cells from their progenitors to be used in transplantation (13). This effort is motivated by the increased incidence of both type 1 and type 2 diabetes and the limited effectiveness of pharmaceutical treatments for the disease. The Edmonton cadaveric islets transplantation protocol (4) opened the door to improved treatment of the disease but faces two significant challenges: the need for improved immuno-regulatory measures and the necessity to increase the supply of islets or β-cells by a factor of at least 1,000 to be able to treat even the most severely affected patients. While we have gained significant insights into the transcriptional programs and signaling mechanisms that control the differentiation of endocrine precursors to mature hormone-expressing endocrine cells of the islet (rev. in 59), efforts to direct differentiation of various cells into β-cells have met with only limited success (2).

Differentiation of the pancreatic endocrine lineage is dependent on Neurogenin 3 (Neurog3, Ngn3), a member of the family of basic helix-loop-helix (bHLH) transcription factors. Neurog3-null pancreata lack all five mature endocrine cell types (10,11), and Neurog3 is also required for enteroendocrine cell development in the stomach and intestine (12,13). Notably, regulatory genes marking endocrine precursors, including Pax4, Pax6, Isl1, and Neurod1 (14,15), are not expressed in the absence of Neurog3. Ectopic expression of Neurog3 can initiate differentiation of endocrine cells in mouse (16), humans (17), and pig (18). Furthermore, lineage tracing during mouse embryogenesis has revealed that Neurog3-positive cells differentiate exclusively into islet cells, suggesting that expression of this gene could be used as a marker for isolating these progenitors for further study (19).

During pancreatic development in the mouse, expression of Neurog3 is initiated during pancreatic budding, as early as embryonic day (E)9.5 in the dorsal primordium (20,21). Relatively low levels of Neurog3 expression are maintained until E13.5, when a dramatic upregulation is observed, marking the beginning of the second transition (16). Expression is believed to peak at E15.5 and then rapidly decrease to undetectable levels in juvenile and adult islets (10). Different transcriptional regulators participate in the activation of Neurog3, including Onecut1 (Hnf6) as a direct activator of its transcription and members of the FoxA family (7,22,23). Expression of Neurog3 is restricted to a relatively small population of cells destined for the endocrine lineages (19). After differentiation into hormone-expressing cells has occurred, expression of Neurog3 ceases. This is most likely due to negative feedback, as Neurog3 has been shown to repress its own expression (24). Homeodomain factors such as Pax4 (25) and Nkx2–2 (26) as well as the bHLH factor NeuroD1 (27) are targets for Neurog3 activation. However, the majority of downstream genes dependent on Neurog3 and the mechanisms through which this factor regulates differentiation of endocrine precursors into islet cells are unknown.

In the present study we set out to identify potential novel targets of Neurog3 and to determine the transcriptome of the endocrine lineage during fetal development. Taking advantage of the fact that enhanced green fluorescent protein (EGFP) persists in cells for several days after the promoter driving its expression has been turned off, we were able to sort Neurog3-EGFP endocrine precursors and their immediate descendants from fetal pancreas. By determining the gene expression profile of these developing cell populations, we derived gene signatures that can be used to guide future efforts to enhance differentiation of endocrine precursors.


Heterozygous male mice containing an EGFP-marked null allele of Neurog3 (Neurog3+/EGFP) (12) were mated with CD1 females and pregnant females killed at either 12.5, 13.5, 14.5, 15.5, 16.5, or 17.5 days of gestation. Embryos were rapidly dissected and the pancreata removed and placed into PBS. Neurog3+/EGFP embryos (at the expected 50% Mendelian ratio) were easily identified by their green fluorescence and transferred into a separate buffer ready for dissociation into single cells. A total of 70 pregnant females, producing 785 embryos, were required to produce three biological replicates for each time point. Adult islets were prepared from eight CD1 female mice as described previously (28). Islet preparations from two animals were pooled for RNA isolation using the RNeasy kit (Qiagen), giving a total of four biological replicate samples for expression analysis.

Preparation of single-cell suspensions and flow cytometry.

EGFP+ pancreata were separated from EGFP pancreata and pooled in 500–1,000 μl prewarmed Trypsin at 37°C in a standard scintillation vial. The number of pancreata needed to form a biological replicate depended upon the developmental stage and the number of Neurog3+/EGFP embryos harvested. On average 25 embryos from six females were required for each E13.5 and E14.5 replicate, 19 embryos from five females were required for each E15.5 and E16.5 replicate, and 12 embryos from three females were required for each E17.5 replicate. The pool of pancreata was minced in the trypsin solution by repeatedly chopping the samples using very fine surgical scissors (Fine Science Tools) for 1–2 min. A small stir-bar was added to the vial, and the sample was incubated at 37°C on a stir plate at low speed for 7 to 15 min, until no clumps of cells were visible. The disassociation was terminated with the addition of an equal volume of RPMI-1640 containing 10% FBS, transferred to a 5-ml polystyrene round-bottom Falcon tube through a 35-μm nylon mesh cell strainer cap (BD Biosciences), and placed immediately on ice ready for sorting.

Neurog3+/EGFP-positive and -negative cells were sorted by the University of Pennsylvania Flow Cytometry and Cell Sorting Resource Laboratory using the FACSVantage SE (BD Biosciences). Cells were gated for viability and nonaggregates to achieve a high-purity sort. Both positive and negative cells were separated and collected directly into sterile 1.5-ml microcentrifuge tubes containing 500 μl of the denaturing buffer RLT from the RNeasy mini kit (Qiagen). No more than 70,000 Neurog3+/EGFP-positive cells were collected to prevent over-dilution of the denaturing buffer used for the RNA extraction. Once the sort was complete, samples were snap frozen in liquid nitrogen and stored at −80°C for further processing.

Probe preparation and microarray hybridization.

Total RNA was prepared from the sorted cells or adult islets using the RNeasy Mini Kit (Qiagen) and eluted in water. The quality of each RNA sample was assessed using either the RNA 6000 Pico or Nano LabChip Kit with a 2100 Bioanalyzer (Agilent Technologies). Only samples of high RNA quality with a 28S-to-18S RNA ratio >2 and with RNA integrity number >7 were used.

Approximately 20 ng total RNA from each sample was used for labeling and hybridization. Samples were labeled using the Ovation Aminoallyl RNA Amplification and Labeling System (NuGEN Technologies), which uses the rapid and sensitive Ribo-SPIA RNA amplification process. After amplification, the reference sample was created by pooling amplified cDNA in equal aliquots of each of the samples used in the hybridization. To analyze changes in gene expression occurring over time, a “reference” experimental design was used for the microarray analysis. Cyanine dyes (GE Healthcare Bio-Sciences) were chemically attached (coupled) to 2 μg amplified cDNA in 50 mmol/l sodium bicarbonate, pH 8.5, in the dark at room temperature for 1 h. All test samples were coupled to Cy5 (red), whereas the reference sample was coupled to Cy5 (green). Each test sample was combined with an equal amount of the reference sample and purified using the MinElute Reaction Cleanup Kit (Qiagen). The eluted sample of fluorescently labeled cDNA probe was mixed with hybridization buffer (2.5 μg Cot1 DNA, 2.5 μg oligo-dT, 25% formamide, 5X SSC, and 0.1% SDS), denatured at 95°C for 5 min and applied to the Mouse PancChip 6 spotted cDNA array (29,30), covered with glass coverslips, and incubated in a Corning Hybridization chamber (Corning Incorporated Life Sciences) overnight at 42°C. The Mouse PancChip 6 contains ∼13,000 mouse cDNAs chosen for their expression in various stages of pancreatic development, many of which are not found on commercially available arrays. Detailed information on this array and full protocols are available at After hybridization, the coverslips were removed in 2× SSC, 0.1% SDS, and the arrays were washed for 5 min in 0.2× SSC, 0.1% SDS at 42°C and 5 min in 0.2× SSC at room temperature. The arrays were immediately scanned with the Agilent DNA Microarray Scanner, Model G2565B (Agilent Technologies) at a resolution of 5 μmol/l.

Data processing.

The median Cy5 (red) and Cy3 (green) intensities of each element on the array were determined by processing the array images with GenePix Pro version 5.1 (Molecular Devices). All subsequent steps were performed using scripts we developed for this analysis in the R open-source language environment for statistical computing ( Positive control elements were removed from the dataset, and the expression ratio for each element on the array was calculated in terms of M [log2(red/green)] and A {[log2(red) + log2(Green)]/2}, without local background signal subtraction. The data were normalized by the print tip loess method using the BioConductor package “marray” (31). Quality control diagnostic plots were prepared for each array, and those failing to exhibit high-quality hybridizations were excluded from further analysis. This resulted in the final dataset containing three biological replicates for each embryonic time point and four biological replicates for the adult islet time point, giving a total of 19 samples in the time course.

Quantitative real-time RT-PCR.

Gene expression profiles were confirmed using quantitative real-time RT-PCR (qRT-PCR). cDNA was prepared from each replicate at the five developmental time points, from the adult islet samples, and from E14.5 and E16.5 EGFP cells that were collected along with the EGFP+ cells. cDNA was synthesized from ∼20 ng total RNA using the WT-Ovation RNA Amplification System (NuGEN Technologies). PCRmixes were assembled using the Brilliant SYBR Green qRT-PCR Master Mix (Stratagene) according to the manufacturer's instructions. Reactions were performed using the SYBR Green (with dissociation curve) program on the Mx3000 Multiplex Quantitative PCR System (Stratagene). Cycling parameters were 95°C for 10 min and then 40 cycles of 95°C (30 s), 60°C (1 min), and 72°C (30 s) followed by a melting curve analysis. All reactions were performed with three biological replicates and three technical replicates with reference dye normalization. Actb, Hprt, Gapdh, Tbp, and Ubc were tested for their suitability as a housekeeping gene using the geNorm analysis package (32). Expression of Actb was shown to be unstable across the samples being analyzed and as such was not included in the qRT-PCR normalization set. The median cycle threshold (CT) value was used for analysis, and all CT values were normalized to expression of four housekeeping genes: Tbp, Hprt, Ubc, and Gapdh. Primer sequences are available from the authors upon request.

Statistical analysis.

All statistical analysis of the array data were performed using the normalized M values for all noncontrol elements on the array. This M value represents the ratio of the test sample over the reference sample. Identification of temporally regulated genes was performed using the R-based application Extraction of Differential Gene Expression (EDGE version 1.1.291) (33), which uses the newly developed Optimal Discovery Procedure (ODP) statistical theory (34). For direct comparisons between any two conditions, EDGE was implemented with static settings to identify differential expression. Clustering and principal component analysis was performed using GeneSpring GX (Agilent Technologies).

Time-course analyses of microarray data are considerably more difficult than identification of differentially expressed genes in a two-state comparison, and there are limited tools available to achieve this. Of the available statistical tools, four were compared for their ability to identify temporally regulated genes: ANOVA (GeneSpring GX), SAM (35), PaGE (36), and EDGE (33). It was very clear to us that EDGE was the superior method for identification of temporally regulated genes in this time course (data not shown). EDGE was executed with 1,000 iterations, using a natural cubic spline, with a dimensional basis for the spline of 3. Analysis of the P value versus q value plot and the q value versus the number of significant tests plot clearly indicated that a P value cutoff of 0.05, corresponding to a q value of 0.23, provided an ideal point to return the maximum number of significant genes with fewest false-positives (Supplementary Data Fig. 1 [available in an online appendix at]). A total of 1,170 distinct genes were returned with a P value ≤ 0.05. This list was filtered to remove any elements whose fold change was <1.2 between any two time points, giving a final list of 1,029 genes.

The complete list of targets organized into these broad functional groups can be found in Supplementary Data Table 1. To aid the reader in exploring this results table, Web links to the appropriate Entrez Gene description page are provided by clicking on the identifier. In addition, expression profiles were generated for the significant genes and are available in Supplementary Data Expression Profiles. The natural cubic spline used by EDGE to fit the data are shown in blue.

Gene annotation enrichment analysis.

The resulting gene list was annotated with functional categories using the National Institute of Allergy and Infectious Diseases Database for Annotation, Visualization, and Integrated Discovery (DAVID) bioinformatics resource, using all of those elements on the array annotated with an Entrez Gene ID (37). Enrichment of these functional categories was determined using the Fisher's exact test through a comparison of the categories found in our list when compared with the background list of all genes found on the PancChip. The complete results of this analysis can be found in Supplementary Data Table 4.

Ingenuity pathway analysis.

Biologically relevant networks were drawn from the list of 1,029 temporally regulated genes identified by EDGE analysis. These data were generated through the use of Ingenuity Pathways Analysis, a web-delivered application ( that enables the visualization and analysis of biologically relevant networks to discover, visualize, and explore therapeutically relevant networks. The application and a detailed explanation of significance scoring were described previously (38). Of this list, 950 genes could be mapped to elements in the Ingenuity Pathways Analysis database. To focus the analysis on those genes with most significant changes in expressions, pathway analysis was performed using a subset of 334 genes (focus genes) in this list that showed a fold change >1.5 between any two time points.


Isolation of endocrine precursors.

To profile the gene expression changes in fetal endocrine pancreas precursors and their immediate descendants, we used Neurog3-EGFP knock-in mice (12). Mice homozygous for this allele develop diabetes and die shortly after birth, whereas heterozygous mice (Neurog3+/EGFP) show no apparent differences from wild-type littermates as assessed by overall development, growth characteristics, and glucose homeostasis. Use of a “knock-in” allele is advantageous compared with a Neurog3 promoter-driven transgene, which might be missing essential cis-elements or which might be influenced by integration site effects. Heterozygous male mice were mated with CD1 female mice, and embryos were harvested daily from E13.5 to E17.5, with a total of 377 Neurog3+/EGFP embryos required to complete the study. Figure 1 highlights the development and differentiation of the pancreas throughout the secondary transition, with EGFP levels peaking at E14.5. This figure also illustrates the fact that EGFP is persistent in the fetal pancreas due to the long half-life of the protein. Thus, we were able to isolate the direct descendants of Neurog3+ cells at later stages of pancreas development, i.e., cells in which Neurog3 transcription and new protein synthesis had been extinguished.

FIG. 1.

Development of the endocrine pancreas. Heterozygous male mice containing an EGFP-marked null allele of Neurog3 (Neurog3+/EGFP) were mated with CD1 female mice. Pregnant females were killed at either 13.5, 14.5, 15.5, 16.5, or 17.5 days of gestation. Photographs represent bright field images of wild-type (left) and Neurog3+/EGFP (right) pancreata. Panels on the right show pancreata from Neurog3+/EGFP embryos identified by their green fluorescence. The Neurog3+/EGFP cells are located near the center of the organ.

For optimal preparation of high-quality RNA required for expression analysis, 30–70,000 cells were sorted from pools of 12–25 pancreata to produce each biological replicate. At least three biological replicates were prepared for each time point. The procedure was attempted at E12.5 using 25 Neurog3+/EGFP embryos, but only 1,000 Neurog3+ cells were obtained, representing <0.03% of the total cell number. This low level of expression is consistent with expression studies using in situ hybridization (10,21).

The greatest percentage of cells expressing Neurog3-EGFP occurred on E14.5, with 5.2% of the sorted population positive for EGFP (Fig. 2A). Subsequently, there is a steady decrease in the percentage of positive cells, which likely reflects the differentiation and expansion of the population of non-Neurog3–expressing cells at a higher rate than those expressing the gene (Fig. 2B). The persistence of 3,000 to 4,000 EGFP+ cells up to E18.5 is the result of the long half-life of EGFP; we observed Neurog3 mRNA levels to be considerably reduced in the latter half of the time-course, and protein levels are not detectable at these late stages (10).

FIG. 2.

The proportion of endocrine precursors in the pancreas is highest at E14.5, and descendants of Neurog3-expressing cells can be marked by the persistence of EGFP. Neurog3+/EGFP pancreata were disassociated into a single-cell suspension after trypsinization (research design and methods). The number of biological replicates (individual sorts) used is shown on the x-axis under the developmental stage, and the number below this represents the total number of embryos that were used at that stage. Neurog3+/EGFP-positive and -negative cells were FACS sorted and used for RNA extraction. A: The percentage of EGFP-positive cells was determined in reference to the total number of cells sorted. EGFP expression began at E13.5 and peaked the following day. EGFP expression declined rapidly thereafter, but positive cells could still easily be sorted until E17.5. B: The total number of EGFP-positive cells per pancreas was determined. This figure clearly demonstrates the ability of EGFP to mark descendants of Neurog3+ cells, with the long half-life of EGFP resulting in an accumulation of positive cells. At E18.5, 4 days following the peak in Neurog3 expression, a decline in the number of marked cells was apparent.

Identification of temporally regulated genes.

Total RNA samples isolated from the sorted Neurog3-EGFP cells and passing stringent QC measures were used for expression profiling. Furthermore, unlike previous expression array–based studies in which only one or two arrays were used per time point, we used three or four biological replicates for each time point, giving this study increased statistical power. Time-course analysis of microarray data are considerably more difficult than identification of differentially expressed genes in a two-state comparison, but the EDGE program proved to be powerful in identifying temporally controlled gene expression (research design and methods). Using EDGE, 1,029 temporally regulated genes were identified (complete list in Supplementary Data Expression Profiles). For the purposes of highlighting this data, genes were grouped based upon peak expression at each of the six developmental stages investigated and ranked according to their EDGE P value. The top five most significantly temporally regulated genes with expression levels peaking at each developmental time point are shown in Fig. 3.

FIG. 3.

Expression profiles of the five most significantly temporally regulated genes at each of the six developmental time points studied. A total of 1,029 genes was found to be temporally regulated during development of endocrine precursors. Genes were observed to peak in expression at each of the six time points investigated. The top five genes, ranked by their EGDE P value, at each time point are depicted. Plots are headed with official gene symbol: EDGE rank (P value). The individual data points for each sample are represented by the circles and, left to right, represent E13.5, E14.5, E15.5, E16.5, E17.5 (black), and adult (gray) islets. The natural cubic spline used by EDGE to fit the data is shown by a dashed line. Expression in adult islets is shown for reference. The intent of this figure is to highlight some of the most significant expression profiles; the entire set of temporally expressed genes can be found in Supplementary Data Expression Profiles.

Expression profile data were confirmed via qRT-PCR (Fig. 4). For the 18 genes tested, the results of the RT-PCR analysis closely mirrored those of the array analysis. Little or no transcript was detected in EGFP cells for the majority of the genes tested. The presence of Hmga2, Id1, and Id2 mRNA in these nonendocrine precursor cells may indicate an additional role for these genes in development of the exocrine pancreas. As expected, Neurog3 was not expressed in either the EGFP cells or adult islets, indicating the high level of purity of the sorted cells. Ghrl and Pou3f4 also had no detectable mRNA in the mature islet, with expression profiles closely matching that which had been described previously (16,39).

FIG. 4.

Patterns of temporal regulation are consistently confirmed using qRT-PCR. cDNA was prepared from each replicate used for the analysis (shaded bars) and from E14.5 and E16.5 EGFP-negative cells (open bars) that were collected along with the EGFP+ cells. For the 18 genes tested, the results of this qRT-PCR analysis closely mirrored the results of the array analysis. For the majority of genes tested, little or no transcript was present in EGFP cells.

Clustering analysis.

Hierarchical clustering was performed using the 1,029 genes identified to be temporally regulated (Fig. 5A). Biological replicates for each time point were observed to cluster together, and a clear progression of similarity between time points is apparent (E13.5 most similar to E14.5, which together were similar to E15.5 and so forth). Clustering of the genes revealed that many genes were expressed in similar patterns during the time course. The cluster containing Neurog3 identified 21 genes whose expression profile was strikingly similar to this master regulator (Fig. 5B). Finally, hierarchical clustering was performed on the subset of 64 genes with transcription factor activity (Fig. 5C). Three patterns were evident among these genes: those with increasing expression over time (Sox15, Pax4, Myc), those with peak expression in the middle of the time course (Onecut1, Mafb, Isl1), and then those with decreasing expression over time (Gata4, Neurog3, Foxa2). Neurog3 was observed to cluster most closely with Nkx22, which is known to function downstream of Neurog3 in the specification of the β-cell, and Mycl1, a gene with links to lung and colorectal cancer, but there is no literature with regard to its role in the pancreas.

FIG. 5.

Hierarchical clustering analysis identifies three principal components within the time course and highlights several novel genes that were observed to cluster closely with Neurog3. Hierarchical clustering was performed using the subset of 1,029 genes identified to be temporally regulated using EDGE. Clustering was performed using the average linkage clustering algorithm and the Pearson correlation as a measure of similarity (A). Biological replicates for each time point were observed to cluster together. Clustering of the genes revealed that many genes were expressed in similar patterns during the time course. The cluster containing Neurog3 identified 21 genes whose expression profile was strikingly similar to that of Neurog3 (B). Hierarchical clustering using the subset of 64 genes with transcription factor activity identified several transcription factors whose expression profile closely resembled that of Neurog3 (C). Three principal components were identified that accounted for 95% of the significant variability in the data. As such, K means clustering was performed using the Pearson correlation to cluster the genes into three groups with a high degree of similarity within each cluster and a low degree of similarity between clusters (D). Over half of the genes identified were observed to cluster with Neurog3, with a pattern of decreasing expression over time.

Principal component analysis was performed to identify predominant gene expression patterns. Three principal components were identified that accounted for 95% of the significant variability in the data (Fig. 5D). Over half of the temporally regulated genes were observed to cluster with Neurog3, with a pattern of decreasing expression over time. Of the genes whose expression profile correlated most significantly with Neurog3, 50% were found to be expressed in the nucleus (P < 0.007), and gene ontology (GO) functions of development (P < 0.011) and transcription (P < 0.048) were significantly overrepresented. This is consistent with a model in which subdivision and differentiation of the endocrine lineage is dependent on a cascade of multiple transcription factors.

Differential gene expression in endocrine precursors and their descendants.

In addition to the temporal analysis performed with EDGE, a direct comparison analysis was performed to highlight differences between the endocrine precursors and their immediate descendants. Based on the clustering analysis described above (Fig. 5), the endocrine precursors sorted from E13.5 and E14.5 pancreata had very similar expression profiles. Likewise, the descendants of these precursor cells, which were marked by EGFP and sorted from later time points (E16.5 and E17.5), were also observed to cluster closely together and clearly had an expression profile that was distinct from the precursors. Therefore, to gain statistical power, we performed a direct comparison between the six E13.5 and E14.5 biological replicates, representing endocrine precursors, and the six E16.5 and E17.5 biological replicates, representing their descendants. It should be noted that although we observed Neurog3 mRNA levels to be substantially reduced in this population, levels were not completely extinguished (Fig. 4A). As such, the cells sorted at these latter time points may represent a mixed population of descendants and cells continuing to expressing Neurog3. A total of 648 genes were found to be differentially expressed with an absolute fold change ≥1.3 and a P value <0.05 (Supplementary Data Table 2). Of these genes, 182 (28%) have higher expression in endocrine precursors (E13.5/E14.5) and 466 (72%) higher expression in the descendants (E16.5/E17.5). Over 100 transcriptional regulators were identified in this analysis, 33 of which were expressed at highest levels in endocrine precursors.

Using this approach, an additional 338 differentially expressed genes were discovered that had not been identified in the time course analysis described earlier. These include Insm1, Nars2, Foxp1, Hhex, and Cd14. Among the genes that were most highly expressed in the E16.5/E17.5 Neurog3 descendants were markers of the β-cell (Ins1, Ins2, Glut2) and ε-cell (Ghrl), indicative of differentiation toward the mature endocrine cell types. Overrepresented functions and canonical pathways were determined via pathway analysis (Supplementary Data Fig. 2). Pathways involved in cellular movement, integrin signaling in relation to cytoskeletal rearrangements, protein ubiquitination, and insulin receptor signaling were all observed to be significantly enriched in the progenitor cells when compared with the descendants. Strikingly, carbohydrate metabolism was absent from the progenitors but was observed to be highly enriched in the descendants, along with lipid metabolism and the Pparα/Rxrα activation pathway. These observations reflect the differentiation of endocrine precursors into more mature endocrine cells.

Finally, we compared the expression profile of endocrine precursors (E13.5/E14.5) directly to that of adult islets. Over 2,000 genes were found to be significantly differentially expressed. Of these genes, 103 were found to be highly expressed in the precursor cells, with a fold change more than two when compared with the adult islets (Supplementary Data Table 3). A total of 25 of these genes were classified as transcriptional regulators, many of which have an unknown role in the developing endocrine pancreas, e.g., Nars2, Mycl1, Sox11, Foxb1, and Zfp306.

Gene annotation enrichment analysis.

Temporally regulated genes were mapped to protein families using the Ingenuity Pathways Knowledge Base and to enriched GO biological process and molecular function categories (Supplementary Data Fig. 3). Protein metabolism and biosynthesis, cellular organization and differentiation, and development were highly enriched biological processes in endocrine precursors and their descendants. Gene annotation enrichment analysis using the list of temporally regulated genes revealed that the functions of nucleic acid binding and transcriptional regulation were significantly enriched. Detailed analysis and annotation with GO functions determined that this list contained 69 transcription factors, 71 other transcriptional regulators (such as cofactors and elements of the basal transcriptional machinery), and 97 other genes potentially involved in transcriptional regulation. Together, these genes represented 27% of the entire list of temporally regulated genes. Table 1 lists these 246 transcription factors, transcriptional regulators, and potential transcriptional regulators found to be temporally regulated during development of the endocrine precursors.


Temporally regulated transcriptional genes in the development of the endocrine pancreas

Several of these transcription factors have been shown by genetic means to function in the development of the endocrine pancreas, including Neurod1, Mafa, Mafb, Nkx2–2, Foxa1, and Foxa2 (Fig. 6). Moreover, markers of mature islets cells, such as Ins1, Ppy, Ghrl, and Iapp were observed to have increasing expression levels over time. Interestingly, Gcg, a marker of the mature α-cell, was not significantly differentially expressed in these cells, but as expected high levels were observed in adult islets (Fig. 4E). It was recently demonstrated that the role of Neurog3 in the development of the α-cell lineage occurs much earlier than the time frame studied in the present work, with the induction of Neurog3 in Pdx1+ progenitors at E8.7 resulting in an almost exclusive induction of glucagon-positive cells by E14.5 (40).

FIG. 6.

Temporal expression profiles of transcription factors during development of the endocrine pancreas. The individual data points for each sample are represented by circles and, left to right, represent E13.5, E14.5, E15.5, E16.5, E17.5 (black), and adult (gray) islets. Plots are headed with official gene symbol: EDGE rank (P value). The natural cubic spline used by EDGE to fit the data and determine significance is shown by a dashed line. Expression in adult islets is shown for reference. A: Transcription factors known and speculated to play an important role during development of the endocrine pancreas. B: Expression profiles of immature and mature hormones of the endocrine islet, demonstrating temporal regulation of Ins1, Ghrl, and Ppy, but not Gcg. C: Of the six diabetes-associated loci recently identified in a series of whole-genome association studies (5961), four of the genes were observed to be differentially expressed in our time course: Slc30a8, Tcf7l2, Cdk5rap1, and Igf2bp2.

Identification of regulatory networks.

Gene annotation enrichment analysis provided information regarding categorical changes with regard to the biological function and metabolic activity of the temporally regulated genes in this time course. However, our specific interest in this study was to understand how individual genes were integrated into specific regulatory and signaling networks. This type of analysis has not been reported in microarray studies of the developing endocrine pancreas and revealed several new findings.

Several major networks were identified, but by far the most significant was a complex regulatory network controlling endocrine system development and function centered around Neurog3 (Fig. 7). Many of the transcription factors previously reported to play a role in endocrine cell development, such as Pdx1, Foxa2, Neurod1, Isl1, Nkx22, Mafa, and Pax4, are part of this network derived from our expression data. However, several genes that had limited prior evidence regarding their role in this developmental program were observed, such as Sim1, Insm1, Id2, and Nr3c1 (glucocorticoid receptor). Furthermore, genes with no previously reported role in this development process were identified, such as Id1, H19, Yy1, and Mycl1.

FIG. 7.

Pathway analysis identifies a complex regulatory network controlling endocrine system development and function in the pancreas. The network is displayed graphically as nodes (genes/gene products) and edges (the biological relationships between the nodes). Nodes were colored to highlight those genes whose differential expression was significant in either the temporal regulation analysis or static analysis. Color intensity indicates significance, from the most significant (dark red) to least significant (light pink) genes. Nodes are displayed using various shapes that represent the functional class of the gene product. Edges are displayed with various labels that describe the nature of the relationship between the nodes. The above network was produced by expanding the most significant network (34 focus genes and a score of 60) with additional information in PubMed. To simplify the image, many of those interactions not demonstrated in the pancreatic tissues were removed.

Pathway analysis was also used to examine the relationship between our gene set and existing canonical signaling pathways. Four pathways in particular showed dramatically significant enrichment (Supplementary Data Figs. 4–7). A total of 22 genes were associated with Igf1 signaling (P value = 1.24E-09), 7 with the endoplasmic reticulum stress pathway (P value = 2.65 E-05), 21 with Wnt/β-catenin signaling (P value = 1.54 E-04), and 17 genes with insulin receptor signaling (P value = 4.51 E-04).


Expression profiling of endocrine precursors and their descendants.

We identified over 1,000 genes that are temporally regulated during development of the endocrine pancreas. Strikingly, a large number of the genes identified in our analysis have no prior reports of a role in the developing pancreas and many represent novel genes with no prior art. The most significantly temporally regulated gene was Glypican 3 (Gpc3), with a marked decrease in expression over time. Glypicans are cell-surface heparan sulfate proteoglycans that are bound to the exoplasmic surface and are thought to play a role in the control of cell division and growth regulation, possibly via modulation of Wnt signaling (41). Gpc3 is known to be highly expressed during embryogenesis and was recently demonstrated to be a marker of hepatic progenitor/oval cells with an expression profile remarkably similar to that observed in our analysis of pancreas progenitors (42).

A total of 246 transcription factors, transcriptional regulators, and potential transcriptional regulators were found to be temporally controlled during development of the endocrine precursors, several of which are well established and important regulators of the endocrine pancreas, for instance Neurod1, Mafa, Mafb, Nkx22, Foxa1, and Foxa2. As expected, markers of mature islets cells, such as Ins1, Ppy, Ghrl, and Iapp, were observed to have increasing expression levels over time, highlighting the comprehensiveness of our analysis.

Previous attempts to describe transcriptional regulation of the developing endocrine pancreas have failed to produce the in-depth transcriptional profile of endocrine precursors and their descendants presented in the current study. Gu et al. (43) produced transcriptional profiles from four stages of endocrine pancreas development: E7.5 unspecified endoderm, E10.5 Pdx1-positive cells, E13.5 Neurog3-positive cells, and adult islets. Expression profiling of Neurog3-precursor cells at E13.5 identified 71 genes that were temporally enriched in this population, of which only 7 were transcription factors: Pou3f2 (Brn2), Isl1, Mycl1, Mafb, Myt1, Neurod1, and Pax4. With the exception of Myt1, all of these transcription factors were among the 246 transcription factors, transcriptional regulators, and potential transcriptional regulators identified by our analysis. We observed expression of Myt1 to remain constant throughout the fetal time course, with a twofold reduction of expression in the adult islet (Supplementary Data Table 3). Several factors may explain this marked difference between this present study and the previous report. Gu et al. (43) marked E13.5 endocrine precursors using a Neurog3-EGFP transgene that may be missing essential elements of the Neurog3 promoter and is less likely to represent the true expression pattern of Neurog3 that we obtained through the use of a knock-in approach. Furthermore, only one or two biological replicates were analyzed at each time point, resulting in significant statistical limitations. Finally, their comparisons were performed using different GeneChips for each time point, the most comprehensive of which lacked ∼4,000 of the genes found on the PancChip array (30).

An alternative approach to the use of transgenic mice to mark Neurog3+ cells has been to transduce immortalized cell lines with recombinant viral vectors expressing Neurog3 (44,45). Microarray analysis demonstrated increased expression of 51 genes in an in vitro study where pancreatic duct cell lines were infected with a Neurog3-expressing adenovirus (44). Less than 20% of the genes claimed as Neurog3 targets from adenoviral overexpression were also identified in the present study and were restricted to those genes whose interaction with Neurog3 has been reported elsewhere (Supplemental Data Fig. 8). This striking difference between the two studies most likely reflects a significant disadvantage in studies where Neurog3 gene expression is artificially induced, in a cell line far removed from the fetal pancreas. Furthermore, there is limited evidence to show that pancreatic duct cells can function in vivo as endocrine progenitors or be induced to differentiate into mature islets cells (46).

More recently, microarray analysis was performed using a Neurog3-deficient mouse model in an attempt to identify Neurog3-dependent genes expressed in whole pancreas (47). The use of whole pancreas tissue, as opposed to sorted Neurog3+ cells, severely limited the power of this approach, as at most 5% of the cells in the total pancreas express Neurog3 (Fig. 2). Therefore, the approach of using total pancreas to investigate changes resulting from the deletion of Neurog3 will result in at least a 20-fold reduction in the power to detect alterations in gene expression. Consequently, this study only identified 52 differentially expressed genes, many of which were genes expressed at high levels in mature endocrine cells, like glucagon and insulin, which were already known to be lacking in Neurog3−/− pancreas (10).

Regulatory networks in the developing endocrine pancreas.

Our pathway analysis identified several networks of genes regulating endocrine precursor specification, development, and expansion. The centrality of Neurog3 in this process is highlighted in Fig. 7. The signaling factors involved in specifying the developmental decision between endocrine and exocrine tissue during organogenesis of the pancreas are of considerable interest. Whereas insulin- and glucagon-producing cells are not related to a hormone coexpressing precursor cell, a common origin of endocrine cells does exist at the non–hormone-expressing precursor level, and both α- and β-cells develop from Pdx1- and Neurog3-positive cells (19,48). Exocrine pancreatic cells arise from Ptf1a-expressing precursors, although it was recently demonstrated that mature pancreatic cells develop through a very early common progenitor expressing Pdx1, Ptf1a, cMyc, and Cpa1 (49). In those cells not fated to become part of the islets, the transcriptional repressor Hes1, a main effector of Notch signaling, strongly inhibits Neurog3 gene promoter activity through a mechanism known as lateral inhibition, which occurs via Notch receptor signaling (20,50). Indeed, in mice containing a homozygous mutation of the Hes1 gene, misexpression of Ptf1a throughout the gut epithelium results in ectopic pancreas formation (62).

Temporal regulation of Id1 and Id2 was highly significant and followed a pattern of downregulation similar to that of Neurog3 as development progressed. The Id1 and Id2 proteins contain a helix-loop-helix (HLH) domain but not a basic domain and lack DNA binding activity and therefore can inhibit the DNA binding and transcriptional activation ability of bHLH transcription factors, such as Neurog3 and Neurod1. The spatial and temporal control of the proneural bHLH factors (Neurog1, 2, and 3) and inhibitory HLH factors (Id1 and Hes1) was recently shown to coordinate the timing of differentiation of neurons and glia (51). However, the role of inhibitory HLH factors during development of the pancreas has received little attention. Unlike expression of Id2, expression of Id1 was observed to be highest in the adult islet, and there is evidence to suggest that this gene may play a role in promoting β-cell function (52). Recent in vitro studies demonstrated that Bmp4 promotes the heterodimerization of Id2 to Neurod1, with a resultant decrease in Pax6 expression (53). We observed significant expression of Bmp4 during development of endocrine precursors with a peak in expression at E16.5. Our data support a model in which high expression of Bmp4 at E16.5 triggers the switch that inhibits continued transcriptional activation by bHLH factors (Neurog3, Neurod1) via regulation of inhibitory HLH factors (Id1, Id2). The result of this switch will be to block further differentiation of endocrine precursors and instead promote their expansion into mature islets cells.

In addition to the several factors with well-documented roles in the development of endocrine precursors, our analysis confirmed several of the more recently discovered and less well-documented potential targets of Neurog3. Insulinoma-associated 1 (Insm1 or Ia1) was found to be significantly downregulated in the descendants of endocrine precursors (E16.5/E17.5). This transcriptional repressor was recently reported to have an essential role in β- and α-cell differentiation, acting downstream of Neurog3 and parallel with Neurod1 (45). Moreover, as is shown in Fig. 7, Neurog3 is believed to form a heterodimer with Tcf3 (E47) on the E-box3 of the Insm1 promoter and to recruit the Creb-binding protein (Crebbp) (54). Expression of Sim1, which is involved in the differentiation of neuroendocrine cells of the hypothalamic-pituitary axis (55), was recently shown to be under the control of Neurog3 in a murine embryonic stem cell line in which Neurog3 was overexpressed (56). Our data provide evidence to suggest that this in vitro observation may be biologically relevant during development of the endocrine pancreas in vivo. Although profiling of Sim1 expression by qRT-PCR (Fig. 4M) supports the observations that this gene is expressed in the developing endocrine pancreas, and at much greater levels than in Neurog3 cells, we observed static expression profile across the time course. This may suggest, as does Fig. 7, that Sim1 is not a direct target of Neurog3.

Use of pathway analysis to examine the relationship between our gene set and existing canonical signaling pathways strongly suggested that Igf1 and insulin receptor signaling pathways play a significant role in this process. Strikingly, Grb10 plays a critical role in both of these pathways, and along with Igfbp2 was observed to be one of the genes clustering most closely with Neurog3 (Fig. 5B). This gene encodes a growth factor receptor–binding protein that interacts with insulin receptors and insulin-like growth-factor receptors, and is imprinted in a highly isoform- and tissue-specific manner. Grb10 was shown to form a complex with Nedd4, which was also highly significantly downregulated over the time course (57,58). The role of this protein-protein complex in the Igf1 signaling pathway is in regulating ubiquitination and stability of the Igf1 receptor (Igf1r). In response to Igf1, Grb10 associates with Igf1r and brings Nedd4 into the vicinity of Igf1r, leading to its ubiquitination, which in turn would result in the internalization and degradation of the receptor. Therefore, the high levels of Grb10 and Nedd4 expression we observed at E14.5 could represent a cellular mechanisms used to prevent continuous and enhanced activation in response to Igf1, expression of which was seen to peak at E15.5 (58). Again, this represents a novel observation for the endocrine pancreas, suggesting a mechanism to temporally limit the rate of proliferation of endocrine precursors.


Through the use of the Neurog3-EGFP knock-in model, combined with careful cell sorting, daily sampling throughout the secondary transition and beyond, and the use of powerful statistical and analytical tools, we have been able to accurately capture the gene expression profile of the pancreatic endocrine progenitors and their descendants. Furthermore, through the use of the mouse PancChip array, we have been able to capture information on numerous genes known to be expressed in the pancreas but not found on commercially available arrays, many of which are novel and will lead to further investigation and discovery. The list of temporally regulated genes identified in fetal endocrine precursors and their immediate descendants provides a novel and important resource for developmental biologists and diabetes researchers alike. Furthermore, from our attempt to model the regulatory networks that control development of the endocrine pancreas, it is apparent that the complex interactions between these genes, and their requirements for carefully controlled spatial and temporal expression, goes far beyond what has traditionally been presented in simplified models of transcriptional hierarchy.

The identification of so many transcription factors, which are typically expressed at relatively low levels and as such often not detected in array analysis, lends significant credence to the quality of this dataset and the statistical tools used to analyze it. A series of whole-genome association studies, in which over 380,000 single nucleotide polymorphisms were analyzed in over 1,400 patients with type 2 diabetes, recently identified six novel loci strongly associated with the disease (5961). The authors hypothesize that these six genes have a primary role in the β-cell. Our data provides strong evidence in support of this notion for four of the genes associated with these loci, as we found them to be differentially expressed in our time course: Slc30a8, Tcf7l2, Cdk5rap1, and Igf2bp2 (Fig. 6). Moreover, Igf2bp2 was one of the genes whose expression profile clustered most closely to that of Neurog3 (Fig. 5B). Very little is known about the function of these genes, but our observations of their expression profile in murine endocrine precursors, combined with the genetic findings in humans, provides strong evidence to their playing a key role not only in development of the endocrine pancreas but also in long-term islet function and the development of diabetes.


This study was supported by the U.S. National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health Functional Genomics of the Beta Cell Grant UO1-DK56947 and the University of Pennsylvania Institute of Diabetes, Obesity and Metabolism Diabetes and Endocrinology Research Center Grant P30DK19525.

Microarray data for this study have been deposited at Array Express with the accession number E-CBIL-40. The data and MIAME (Minimum Information About a Microarray Experiment) compliant annotation can also be queried through the user-friendly interface RAD (RNA Abundance Database) ( The final annotated gene lists and numerous additional analysis tools are available through the Betacell Biology Consortium Web site ( via the Endocrine Pancreas Consortium Database (EPConDB:

Thanks to Dr. Joshua R. Friedman for critical reading of the manuscript; Dr. Marco Z. Vatamaniuk for preparation of adult islets; Ryan D. Wychowanec and Dr. Jonni S. Moore, University of Pennsylvania Flow Cytometry & Cell Sorting Facility, and Drs. Elisabetta Manduchi, Joan Mazarelli, Jonathan Schugg, and Chris Stoeckert for their advice on computational analysis.


  • Published ahead of print at on 10 December 2007. DOI: 10.2337/db07-1362.

  • Additional information for this article can be found in an online appendix at

  • The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

    • Received September 26, 2007.
    • Accepted November 30, 2007.


| Table of Contents

This Article

  1. Diabetes vol. 57 no. 3 654-668
  1. Online-Only Appendix
  2. All Versions of this Article:
    1. db07-1362v1
    2. 57/3/654 most recent