DOI: 10.2337/db06-0788 © 2006 by the American Diabetes Association
Comparative Analysis of Insulin Gene PromotersImplications for Diabetes ResearchFrom the School of Medical Sciences, University of Aberdeen, Institute of Medical Sciences, Foresterhill, Aberdeen, U.K Address correspondence and reprint requests to Dr. K. Docherty, School of Medical Sciences, University of Aberdeen, Institute of Medical Sciences, Aberdeen, AB25 2ZD, U.K. E-mail: k.docherty{at}abdn.ac.uk
Abbreviations:
ChIP, chromatin immunoprecipitation; COUP-TFII, chicken ovalbumin upstream promoter–transcription factor II; CRE, cyclic AMP response element; ECR, evolutionary conserved region; HNF, hepatocyte nuclear factor; ILPR, insulin-linked polymorphic region; PDX-1, pancreatic duodenum homeobox-1
DNA sequences that regulate expression of the insulin gene are located within a region spanning 400 bp that flank the transcription start site. This region, the insulin promoter, contains a number of cis-acting elements that bind transcription factors, some of which are expressed only in the ß-cell and a few other endocrine or neural cell types, while others have a widespread tissue distribution. The sequencing of the genome of a number of species has allowed us to examine the manner in which the insulin promoter has evolved over a 450 million–year period. The major findings are that the A-box sites that bind PDX-1 are among the most highly conserved regulatory sequences, and that the conservation of the C1, E1, and CRE sequences emphasize the importance of MafA, E47/ß2, and cAMP-associated regulation. The review also reveals that of all the insulin gene promoters studied, the rodent insulin promoters are considerably dissimilar to the human, leading to the conclusion that extreme care should be taken when extrapolating rodent-based data on the insulin gene to humans. The cloning and sequencing of the insulin gene in 1980 (1) was a landmark breakthrough that opened up a new field of research on the mechanisms controlling expression of the gene. This in turn led to the discovery of transcription factors that, in addition to regulating the insulin gene in a tissue-specific and temporal manner, participated in the development of the endocrine pancreas and in the maintenance of islet cell function (2). Some of these transcription factors have been identified as maturity-onset diabetes of the young (MODY) genes (3), and at least one has been associated with type 2 diabetes. Their use in the development of novel therapies for diabetes based on the differentiation of embryonic or adult stem cells toward a ß-cell–like phenotype (4) and the forced expression of endogenous insulin genes in nonislet cells (5–7) has also been exploited. The early work on characterizing the DNA sequences involved in regulating insulin gene expression focused on the rat insulin 1 gene (8). The reason for this was that at the time there were no available human ß-cell lines and it was felt important to correlate data from transfected promoter constructs with effects on the endogenous insulin gene. As it turned out, most of the studies involved transfecting the rat insulin gene constructs in the Syrian hamster HITm2.2.5 cell line, which transfected much more efficiently using available techniques than the rat RINm5F cell line. There was also a perception that human insulin promoter constructs would not function in transfected rodent cells. However, these worries proved to be unfounded after it was later shown that there is a very high degree of sequence and functional conservation within the transcription factors that regulate the gene (e.g., 89% identity between rat and human PDX-1) and the human insulin promoter exhibited the expected pattern of activity in transgenic mice (9,10). As a result of the decision to concentrate on a detailed analysis of the rat insulin promoter, most of the literature on the insulin promoter pertains to this promoter. The structure and evolution of the insulin gene has been previously reviewed (11). In this article, we focus on the sequences that lie upstream of, or flank, the transcription start site and are known to affect transcription of the gene. One major conclusion is that the rodent promoters are markedly different from the human promoter, and we urge caution in extrapolating data from rodent promoter studies to the etiology and therapy of diabetes.
Humans, in keeping with the overwhelming majority of species, have a single copy of the insulin gene, which is located on chromosome 11 (p15.5) (12). Of the small number of species with two nonallelic insulin genes, the best known are Xenopus laevis (13) and the popular laboratory research rodents of rat (14) and mouse (15), with insulin two corresponding to the single copy in most animals. In the adult insulin is expressed almost exclusively in the ß-cells of the pancreatic islets of Langerhans (16), hence its name from Latin insula or "island." Low levels of extrapancreatic insulin have been detected in a number of other tissues (17,18) including brain (19), thymus (20–22), lachrymal glands (23), and salivary glands (24). The role of insulin expression in non-ß-cells is unclear. In some tissues it may play a role in the complex hormonal communication required for the maintenance of overall energy balance (25,26) or in the establishment of immune tolerance (27). Very little is known about the regulatory sequences that control insulin gene expression in nonpancreatic tissue, although the sequence containing variable numbers of tandem repeats (see later) has been implicated in thymus expression of insulin. In the ß-cell, sophisticated mechanisms have evolved to control insulin expression at the correct time and place during embryonic development. In the adult related mechanisms and a variety of signaling pathways are involved in restricting insulin expression to ß-cells (notwithstanding the low level extrapancreatic expression about which little is known) and in coordinating insulin expression in response to diverse afferent signals (16). Positive and negative crosstalk between the various signaling pathways, formation of homo- or heterodimers permitting individual transcription factors to act as activators, nonactivators or repressors, reversible phosphorylation of transcription factors, multiple isoforms of several transcription factors, and synergistic interactions between certain combinations of transcription factors extend the gamut of signals influencing the regulation of insulin gene expression. Insulin transcriptional control is conferred by cis-acting regulatory sequences believed to be located within 300–400 bp from the transcription start site (28), which bind ß-cell restricted and ubiquitous transcription factors (16). The principal regulatory elements within the human insulin promoter are outlined in Fig. 1. The compact nature of the insulin promoter results in the close proximity of regulatory elements that can bind an extensive range of factors thereby permitting a multiplicity of outcomes through additive and synergistic interactions between the bound proteins (29–31). In addition, regulatory elements can overlap in certain species e.g., the A3 and a cAMP response element (CRE) site in humans, introducing another layer of complexity through binding competition between alternative transcription factors.
There is no general approach to interpreting and predicting transcriptional evolution; however, the insulin promoter is one of the most extensively studied, and knowledge of the signals that bear upon insulin transcriptional regulation facilitates our understanding of possible functional consequences of insulin promoter evolutionary differences. By classic convention, the sequences that regulate basal promoters were divided into two classes. These are upstream regulatory elements (UREs) that are often located within 100–200 bp upstream of the site of initiation and display directional qualities, and enhancers that can function over distances of many kilobase pairs, regardless of orientation or whether they lie upstream or downstream of the start site. However, as more promoters and enhancers have been identified and studied, it has become apparent that there is a continuum between these two classes of regulatory elements with promoter and enhancer motifs sharing many physical and functional traits. Therefore, in keeping with current opinion, we have reviewed the cis-regulatory elements within the compact insulin promoter without further categorization. This review has drawn upon publicly available DNA sequences to compare the human insulin promoter sequence (–1,500 to +100) to the insulin promoters in an evolutionary and taxonomically divergent range of species. Definitive identification of insulin genes and their promoters lags well behind the isolation of the corresponding cDNA sequences; hence, care has been taken to ensure that only unambiguous insulin promoters have been included. These belong to human (Homo sapiens), great apes (chimpanzee [Pan troglodytes], orangutan [Pongo pygmaeus], and gorilla [Gorilla gorilla]), Old World monkeys (African green monkey [Cercopithecus aethiops] and rhesus macaque [Macaca mulatta]), New World monkey (owl monkey [Aotus trivirgatus]), rodents (rat [Rattus norvegicus] and mouse [Mus musculus]), mammals with diverse diets (carnivorous dog [Canis familiaris], herbivorous cow [Bos taurus], and omnivorous pig [Sus scrofa]), bird (chicken [Gallus gallus]), and fish (zebrafish [Danio rerio]). The promoter sequences of gorilla, orangutan, African green monkey, and owl monkey are currently incomplete extending upstream to positions –295, –290, –426, and –510, respectively. The phylogenic relationships based on molecular analyses (32,33) between these species are outlined in Fig. 2.
A preliminary evaluation of the relatedness of homologues can be generated from the number and relative position of introns, and these are shown in Fig. 3 (34). There are minor variations in the sizes of the introns among mammals while large dissimilarities are witnessed in the introns of chicken and zebrafish. The insulin 1 genes of rat and mouse have lost the second intron and also contain the remnant of a polydeoxyadenylate acid tract preceding the downstream direct repeat. Together, these structural features have led to the suggestion that the insulin 1 gene is a functional transposon (14) that was generated by an RNA-mediated duplication-transposition event involving a transcript of insulin 2 gene that was initiated upstream from the normal capping site. This duplication-transposition event clearly preceded separation of rat and mouse 15 million years ago. Along this evolutionary road, additional divergence has taken place resulting in rat having the two insulin genes residing about 55 Mbp apart on chromosome 1, whereas in the mouse they lie on different chromosomes, namely 6 and 7.
Synteny (i.e., the preserved order of genes between organisms) provides an expedient higher-level assessment of the association between homologues. The identification and annotation of genes in most genomes remains fragmentary; however, it is clear from currently available data that all of the studied insulin genes display remarkable synteny extending all the way back to zebrafish, which diverged from humans 450 million years ago. Not only are the immediate upstream and downstream flanking genes of tyrosine hydroxylase (TH) and insulin-like growth factor 2 (IGF-2) retained, but inspection of 500 Mbp confirms extensive maintenance of synteny of many important genes including syt8, lsp1, tnnt3, mrpL23, cd81, and tssc4. While gene order and direction of transcription are preserved, the spacing between specific genes can vary. This is most dramatically illustrated with the insulin and TH genes, which are separated by 2–22 kbp in all species except mouse and rat, where the insulin 2 gene lies 210 and 230 kbp distant from the TH gene, respectively. Despite evidence of different rates of insertion and deletion mutation within the insulin gene region, maintenance of synteny across vast evolutionary timescales points to a common and vital function for the insulin gene product, which is wholly consistent with the high degree of insulin protein conservation.
It has been estimated from large-scale studies that the number of conserved intergenic sequences is similar to that of coding sequences (35–37), and evolutionary changes in promoters together with their attendant alteration in transcriptional response to physiological and environmental demands have been documented (38,39). This is facilitated by the fact that promoters of protein-encoding genes are laid out into functional modules (40), allowing independent evolutionary selection of distinct characteristics of the overall transcription profile. Promoters are also considered to be more prone to genetic change than coding sequences (41,42) as the constraints typical of coding sequences are absent. In light of different regions of vertebrate genomes diverging at dissimilar rates (43) and this heterotachy being witnessed across different classes of mutation and lineages (44), this study utilized a variety of comparative alignment and transcription factor binding site search techniques with parameters that were appropriate for the evolutionary distances between species in order to detect meaningful evolutionary conserved regions (ECRs). The computational tools included CLUSTAL W (45), T-Coffee (46), GraphAlign (47), ECR Browser (48), Mulan (49), zPicture (50), TRES (51), and TRANSFAC (52). Calculations of homology between the different insulin promoters and the human version were carried out across the region spanning –600 to +1. The downstream 100 bp, which contains two cyclic AMP response element (CRE) sites in human (see the section on CREs below), is comprised mostly of the extremely poorly conserved first intron that unduly influences the overall results. Percentage identity plots (PIPs) comparing the human insulin promoter to those of the other species reveal that, not surprisingly, the most closely related chimpanzee and other great apes share the greatest homology to human, making discernment of conserved regions impossible. Mammals that are more distantly diverged from human display several regions of conservation within the first 350 bp upstream, which correspond to the major regulatory elements. There is a clear fall off in homology beyond –350 or –400 bp upstream from the start of transcription, which is especially apparent in rhesus macaque. While PIPs are useful for identifying ECRs, a detailed breakdown of identity values for specific regions can expose the overall relatedness of different insulin promoters (Table 1). Interestingly, the degree of homology does not follow a simple direct correlation with time from divergence. For example, African green monkey and owl monkey diverged from humans 25 and 35 million years ago, but the main regulatory region of their promoters (–300 to +1) display 90 and 98% identity, respectively. Similarly, most nonprimate mammals have 65–69% identity in this region and 49–55% in the adjacent upstream 50 bp. Dog stands out in having much higher homology with 69 and 75% identity for these two regions, respectively.
Together, these results are in agreement with the opinion that vertebrate genes and immediate upstream flanks are highly constrained and, more important, confirm the accepted demarcation of the insulin promoter. There is no discernable significant homology between human and either chicken or zebrafish insulin promoters, which is in keeping with the view that most human DNA is not alignable to species separated by more than 200 million years. Likewise, there is no homology between chicken and zebrafish insulin promoters. Computational analysis of the insulin promoters for novel evolutionary conserved sequences uncovered a single short region immediately upstream of the A3 box (see A BOXES); however, this region does not appear to contain any currently known transcription factor consensus sequences.
Within a promoter, the fundamental component is the 100-bp basal promoter that provides an assembly platform for the RNA polymerase II initiation complex. These modules vary among genes and can contain a TATA box 25–30 bp upstream of the transcription start site, an initiator element lacking the TATA sequence or a null basal promoter containing neither. All of the studied insulin promoters contain a TATA box. However, the chicken promoter is distinct from the others in that at least two isoforms can be transcribed from alternative initiation sites (53). In E1.5 chicken embryo pancreas, the single insulin gene is also transcribed from an upstream secondary promoter to yield an mRNA with an additional 32-bp leader sequence. Inspection of available chicken genome sequence reveals that this alternative start site must be the product of a secondary basal transcription complex, as the transcript includes the genomic sequence from immediately upstream of the TATA box (25 bp upstream from the start of transcription) to the beginning of exon 1. The lack of another TATA box within the promoter and the presence of a C at –1 and an A at +1 of the longer transcript suggest that transcription is most likely established by an initiator element.
Regulatory elements within promoters can originate at different times, and species comparisons indicate that promoters evolve through transcription factor binding site turnover and accretion (54,55). The relative numbers of the principal insulin promoter regulatory elements in the surveyed species are listed in Table 2.
A boxes. A-box sequences containing the TAAT motif bind homeodomain proteins (56), the most important of which is pancreatic duodenum homeobox-1 (PDX-1) (57–61), which has been shown to be a potent stimulator of transcription of rat, mouse, and human insulin genes (62). There are three principal A boxes in the human promoter: A1 (–82), A3 (–216), and A5 (–319) (Fig. 1). PDX-1 stimulates expression at A3 (58,63–65) and mutation of A3 has the most significant effect on transcription (61,65,66). Contrary to the opinion that A3 is not the most conserved (16), this survey has shown that A3 is the only A box present in all the mammals and, therefore, must be considered to be the most conserved and central to PDX-1 stimulation. PDX-1 bound to A1 has been shown to interact synergistically with E47/ß2 in rat insulin 1 (30). As the 4-bp TAAT motif can occur every 256 bp, the ability of PDX-1 to differentiate between potential regulatory elements must be influenced by adjacent sequences. The 3-bp flanking sequences have been shown to make an important contribution to the binding affinity of PDX-1 to TAAT core elements with a concomitant effect on activation. However, variations in these sequences are insufficient to completely explain differences in PDX-1 binding affinities (67). Therefore, the 8-bp flanking regions of all A boxes were assayed for homology (Table 3). The A3 box and 5' flanking region lie within a novel ECR, and this is reflected in the high degree of conservation. The lack of any other regulatory elements within this ECR based on computational analysis raises the intriguing possibility that, while the TAAT motif is symmetrical, binding of PDX-1 to the promoter may be directional. Clear, though less well defined, asymmetrical homology of the other A box flanking regions to the human sequences is also apparent. Regulatory elements present in multiple copies often exist in both orientations (42), thereby increasing potential phenoplasty.
The A3 5' flanking region in rat insulin 1 has two additional TAAT sequences as a consequence of two single base pair changes. This creates the A4 site (29), which is juxtaposed to A3 to generate an additional regulatory element that has been reported to bind other homeodomain transcription factors, some of which have been shown to affect transcription. One of the best studied is hepatocyte nuclear factor (HNF)-1 , which has been reported to activate the rat insulin 1 gene in the HIT cell line (68). Similarly, Isl-1 has been found to bind to this site (69) and to interact with islet cell–specific transcription factor ß2 to stimulate rat 1 insulin expression (70). Other transcription factors reported to bind to the A3/A4 box include cdx-3 (29) and HMGI(Y) (71). Inspection of all other insulin promoters shows that this homeodomain-binding sequence is unique to rat insulin 1. It would, therefore, seem logical to conclude that these transcription factors play no role in other species. However, HNF-1 provides an example of how the promiscuity of transcription factors creates obstacles in predicting insulin promoter effecters. Although the consensus binding sequence is not present in the human insulin promoter, the A3 region is sufficiently similar for the protein to bind, at least in vitro, and stimulate reporter assays (72). On the other hand, in vivo chromatin immunoprecipitation (ChIP) assays have shown that HNF-1 is not necessary for either insulin 1 or 2 expression in mice, which lack A4 (73). Surprisingly, both the 5' and 3' flanking regions of each of the A4 TAAT sequences have higher homology to the human A3 region than rat insulin 1 A3, differing by only 1 bp. This evokes the interesting likelihood that, although the rat insulin 1 A3 box seems to be the main binding site (67), A4 could also bind and be regulated by PDX-1. Regardless of the regulatory capacity of the alternative A boxes, the binding kinetics of PDX-1 to the primary A3 regulatory element could be appreciably different in rat insulin 1 compared with humans and other mammals. The greatly diverged chicken and zebrafish insulin promoters lack mammalian A boxes; however, several TAAT motifs are present. The chicken has two at –359 and –386, and zebrafish has three at –142, –347, and –359 plus two more further upstream at –473 and –510. The clustering of TAAT motifs is greater than would be expected from random nucleotide arrangements. While TAAT motifs are targets for a large number of homeodomain transcription factors, it is worthy to note that the 5' and 3' flanks of the zebrafish A boxes at –359 and –142 have 3-bp sequences associated with strong PDX-1 binding (67), suggesting a possible role for PDX-1 in regulating these insulin genes. The flanking regions share no homology with human. This is unlikely to reflect divergence of the PDX-1 proteins (rodent, chicken, and zebrafish PDX-1 proteins share 89, 26, and 49% amino acid sequence identity with the human protein, respectively) as the homeodomains are well conserved and there is no evidence of species specificity in DNA binding.
GG boxes.
Cyclic AMP response element. Comparison of CRE sites between species (Table 4) reveals that only primates have multiple copies of CREs with other mammals containing a single CRE corresponding to CRE2. Of these, only the dog CRE is identical to the conserved human CRE2 site. The multiple CRE sites in primates could be due to several factors; the most likely being dietary. It should be noted that while gorillas are often considered to be predominantly folivorous, it has become apparent that they also consume a significant amount of fruit (83). This is even truer of the Western gorilla (Gorilla gorilla), whose genome is being sequenced for assembly, than of Eastern gorillas (Gorilla beringei). Also, all the primates, especially the great apes, are partly omnivorous since they supplement their diets with birds, eggs, small reptiles, and insects. In comparison to the other mammals studied, only primates consume large quantities of fruit in their diet. However, the number of CRE sites is not in a simple direct correlation with the amount of fruit consumed, as all the studied primates eat large quantities. Another possible reason is that while primates are omnivorous to varying degrees, they often gorge themselves on a single food (e.g., ripe fruit when a tree is in season or meat when a whole carcass is consumed quickly), which would give rise to major alterations in metabolic demands. This would be particularly pertinent to early humans and necessitate an insulin promoter that could respond accordingly. The phenomenon of increased numbers of CREs in primates may be expedited by the fact that that primate promoters have an increased rate of evolution (44).
As with other regulatory elements, the chicken and zebrafish insulin promoters do not contain obvious CRE sites. The chicken insulin promoter contains four possible (three overlapping) nonconsensus sequences in the vicinity of the conserved mammalian CRE site, while the zebrafish has two potential nonconsensus octamers at –46 and –226. It is impossible to draw conclusions on the effects of the numerous minor nucleotide changes on CRE site activity, as most regulatory elements can tolerate one or more substitutions without total loss of function (84,85). Therefore, it may be very significant that, even with the variability of the octamer in the conserved CRE site, sequences that include the CRE core along with at least 8 bp of both 5' and 3' flanking regions represent one of the most prominent ECRs in all mammalian insulin promoters. This strongly points to the importance of CRE sites in insulin gene regulation.
C elements. The human insulin promoter has a bipartite C2 element (5'CAGGGACAGG) at –252 (94), and rat insulin 1 promoter has been reported to contain a dissimilar, though active, sequence between –329 and –307. The C2 site can bind PAX4 and PAX6, which repress (95) and stimulate (96), respectively. A search of insulin promoters showed that the human C2 site is present in all primates, although African green monkey has a single base pair substitution between the two CAGG motifs. Among nonprimates, dog has two substitutions between the direct repeat and cow has three repeats with the intervening regions containing 1- and 2-bp deletions. It is not immediately apparent from DNA sequence alone whether these latter sites are functional.
E boxes. An unnamed sequence at –232 (5'GGGCCC), which we have tentatively termed G2 in Fig. 1, overlaps the 5' end of the E2 box and binds a factor with limited tissue distribution (101). This sequence, which is known to induce DNA curvature, may serve to bring together proteins that bind at sites flanking this motif. Examination of the other insulin promoters reveals that within the primates, chimpanzee and gorilla contain the G2 sequence at the same location while orangutan, rhesus macaque, and African green monkey share a transition at the first nucleotide. The G2 site is absent from owl monkey; however, this primate has an alternative G2 motif at –453. Among the other mammalian insulin promoters, mouse insulin 2 and cow have a G2 site in the same region while dog, mouse insulin 1, and pig have alternative G2 sequences at –329, –400, and –16, respectively. Since a 6-bp motif would be expected to occur only once every 4,096 bp by random, the existence of alternative G2 motifs may indicate that G2-facilitated DNA bending abets interactions between proteins binding to the promoter. The G2 motif is absent from the rat insulin paralogues, chicken and zebrafish.
Negative regulatory element.
Insulin-linked polymorphic region.
G1 box.
Enhancer core.
SP1 site.
Ink box.
CCAAT box. Several of the descriptions of the effects of transcription factors on insulin expression are based on results from single species. For example, there is a transcriptionally active CCAAT regulatory element that overlaps the single CRE site in the insulin promoters of both rat and mouse. Expression studies using rat insulin 1 promoter have shown that the combined CRE/CCAAT site shows preferential binding for the nuclear transcription factor-Y (NF-Y), which leads to reduced influence of CRE-associated signaling (116). A search of the other insulin promoters revealed not only that no nonrodent species have a CCAAT site that overlaps with CRE, but that CCAAT sites are totally absent from all of the insulin promoters except zebrafish, which has three at –164, –130, and –85. Therefore, NF-Y signaling, which has an absolute requirement for all five bases in the CCAAT consensus sequence (117), is unique to rodents within mammals and does not typically play a role in insulin regulation.
HNF-4
STAT regulatory element.
COUP-TFII binding element.
The spacing between the individual regulatory elements within the particularly well-conserved cassette of C1, E1, and A1 boxes has been shown to alter the relative stimulatory effects of the transcription factors that bind along with their synergistic interactions (91). Comparison of mammalian insulin promoters in this region showed that the relative spacing of the regulatory elements has been maintained for at least 35 million years, as there is no deviation in the primates. On the other hand, all the rodent insulin promoters contained insertions and deletions between all three sites. In mammals lacking A1, the C1-E1 spacing was maintained in both pig and dog while cow had a one base pair insertion between C1 and E1.
Efficient transcription is the outcome of coordinated dynamic arrangements upon the promoter. ChIP assays using MIN6 ß-cells have shown that PDX-1, MafA, E47, and ß2 bind to the mouse insulin 2 promoter in a cyclical manner with a periodicity of 10–15 min (125). Insulin gene regulation is also influenced by epigenic factors that include DNA methylation and alterations in histone modifications, which affect the packaging of DNA within chromatin. There are a number of studies on the role of histone acetylation and methylation in the control of insulin gene expression. A key role for histone acetyl transferase (HAT) p300 in insulin promoter regulation has been demonstrated by the observations that PDX-1 and ß2 mediate their effects on the rat insulin 2 gene through an interaction with p300 (31,126,127), while activation of a rat insulin 1 promoter construct in HeLa cells by PDX-1 requires interactions with p300 (128). It has also been shown that the effects of glucose on a rat insulin 1 promoter construct in the mouse MIN6 ß-cell line involved the recruitment by PDX-1 of HAT and histone deacetylase activities (HDAC) activities. Thus, under low-glucose conditions, PDX-1 associated with HDACs to repress transcription (129), whereas under high glucose conditions PDX-1 recruited the HAT p300 to activate transcription (130). PDX-1 has also been linked to the presence of methylated histone H3, i.e., H3K4me (nomenclature as per (131)), at the proximal promoter and coding regions of the insulin gene in rodent cells (132). More recently, the histone methyl transferase set9 has been localized to ß-cells in association with the insulin gene (133). Investigations into the role of chromatin accessibility in insulin expression have revealed that PDX-1 shows preferential binding to open chromatin (euchromatin) over condensed chromatin (heterochromatin). In particular, PDX-1 occupies the endogenous insulin promoter in mouse ßTC3 ß-cells but not in mPAC ductal cells, which do not express insulin. Furthermore, the binding affinity of PDX-1 is strongly influenced by the position of nucleosomes relative to its regulatory element (134). Even within euchromatin, the degree of openness varies as the A3/A4 region (–126 to –296) to which PDX-1 can bind contained the most open chromatin structure based on micrococcal nuclease digestion, whereas the adjacent region (–297 to –460), which is not as crucial for ß-cell–specific insulin transcription, was more condensed. Although it is likely that the insulin gene is embedded in euchromatin in ß-cells and in more condensed heterochromatin in non-ß-cells, it may be of relevance that the synteny studies (see INSULIN GENES) show that the human insulin gene lies only 2 kbp from the transcriptionally active TH gene, whereas this distance is >100-fold greater in rodents. Thus, the diverse efforts to induce insulin expression in non-ß-cells may be less problematic in humans than in rodents.
The extraordinary synteny of insulin genes from zebrafish to human substantiates the key importance of the insulin hormone product. Comparison of insulin promoters spanning 450 million years of evolution has permitted identification of the central regulatory elements as well as several valuable observations.
Investigations based on rodents and their insulin genes have provided invaluable insights into diabetes and the workings of insulin promoters. However, the findings reported here illustrate that notable dissimilarities exist between the human and rodent promoters, which may reflect both divergence and the degree to which these promoters have been studied. The atypical characteristics of rodent insulin promoters are exemplified most manifestly with the rat insulin 1 promoter, whose unusual attributes include an active dominant CCAAT site overlapping the single CRE site, HNF-1 , and HNF-4 regulatory elements; a functional Isl-1 binding site at A3/A4; a STAT-3 binding site; a potential COUP-TFII binding site; a consensus-containing E2 site; loss of GG boxes; lower conservation of A3 flanking regions; and changed spacing between regulatory elements in the C1-E1-A1 module leading to alternative synergistic interactions. The most plausible basis for the complexity of rodent insulin promoters is the duplication of their associated genes. Gene duplication can lead to functional divergence of the cis-regulatory elements (135,136) that can be swift even in recently duplicated genes (137). In addition, the signaling pathways regulating an essential gene like insulin will undoubtedly incorporate redundancy to extend responses and to act as a buffer against the consequences of mutation of key components. The fundamental differences in regulatory elements should serve as a salutary warning to be cautious when extrapolating rodent-based data to humans. A major obstacle in diabetes research has been the lack of a human pancreatic ß-cell line that is functionally equivalent to primary ß-cells. It is essential that new human ß-cell lines be developed and widely distributed in order that physiologically and medically relevant studies on the human insulin promoter can be carried out. This is especially true of in vivo epigenetic and ChIP-based experiments that will accurately map the position and define the role of nucleosomes and undoubtedly help to unravel the precise mechanisms responsible for insulin gene regulation. These are exciting times as genome sequencing progresses rapidly. The availability of insulin genes from a wider range of species will provide tools that will permit the relatively straightforward answering of points raised in this report and allow us to advance our comprehension and appreciation of the subtle and sophisticated insulin promoter.
Received for publication June 8, 2006 and accepted in revised form August 24, 2006
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||