Diabetes 57:783-790, 2008 DOI: 10.2337/db07-0970 © 2008 by the American Diabetes Association
Identification and Replication of a Novel Obesity Locus on Chromosome 1q24 in Isolated Populations of Cilento![]()
1 Institute of Genetics and Biophysics "A. Buzzati-Traverso", CNR Naples, Italy Address correspondence and reprint requests to Marina Ciullo, PhD, Institute of Genetics and Biophysics "A. Buzzati-Traverso", CNR, Via Pietro Castellino, 111, 80131 Naples, Italy. E-mail: ciullo{at}igb.cnr.it
Abbreviations:
FWER, family-wise error rate; lFDR, local false discovery rate; LOD, logarithm of odds; POMC, proopiomelanocortin; SNP, single nucleotide polymorphism
OBJECTIVE—Obesity is a complex trait with a variety of genetic susceptibility variants. Several loci linked to obesity and/or obesity-related traits have been identified, and relatively few regions have been replicated. Studying isolated populations can be a useful approach to identify rare variants that will not be detected with whole-genome association studies in large populations. RESEARCH DESIGN AND METHODS—Random individuals were sampled from Campora, an isolated village of the Cilento area in South Italy, phenotyped for BMI, and genotyped using a dense microsatellite marker map. An efficient pedigree-breaking strategy was applied to perform genome-wide linkage analyses of both BMI and obesity. Significance was assessed with ad hoc simulations for the two traits and with an original local false discovery rate approach to quantitative trait linkage analysis for BMI. A genealogy-corrected association test was performed for a single nucleotide polymorphism located in one of the linkage regions. A replication study was conducted in the neighboring village of Gioi. RESULTS—A new locus on chr1q24 significantly linked to BMI was identified in Campora. Linkage at the same locus is suggested with obesity. Three additional loci linked to BMI were also detected, including the locus including the INSIG2 gene region. No evidence of association between the rs7566605 variant and BMI or obesity was found. In Gioi, the linkage on chr1q24 was replicated with both BMI and obesity. CONCLUSIONS—Overall, our results confirm that successful linkage studies can be accomplished in these populations both to replicate known linkages and to identify novel quantitative trait linkages. Overweight and obesity are by now major epidemic conditions strongly challenging Westernized societies. Several years of studies have resulted in the definition of different forms of obesity such as the rare and monogenic, the syndromic, and the polygenic forms. Nevertheless, recent data have demonstrated that even forms assumed to be controlled by a single gene are now thought to be influenced by multiple factors (1). It has also been clearly proved that gene-environment interactions are critical for the regulation of adipose mass function (2). Several loci linked to obesity and/or obesity-related traits have been identified in different populations, and relatively few regions have been replicated in more than one population. Hundreds of genes reported by the Human Obesity Gene map have been found through animal model studies to be able to modulate body weight and adiposity (3). Additional candidate genes have been identified by association studies in chromosome regions derived from genome-wide screens. From such a broad amount of data, a picture emerges in which a large repertoire of predisposing alleles have variable effects. Considering this genetic complexity, studying isolated populations is a potentially powerful complementary strategy to identify new genetic variants. In particular, these populations can be useful to identify rare variants with modest effects that will not be detected with whole-genome association studies in large populations because they are rare or with linkage analysis in large populations because they have a modest effect (4,5). These variants could have become more frequent in the isolates and thus could be more easily discovered. A central question raised by the isolated population approach is that of sample size. By definition, these populations are relatively small and only samples of moderate size can be ascertained. Consequently, the design of strategies maximizing the power of genetic detection is a key issue. Population-based sampling, provided that an important effort focuses on genealogy reconstruction, is a particularly interesting sampling scheme. As genealogy may virtually connect all individuals in the sample, it is eligible for both linkage and association studies. The Cilento and Vallo di Diano National Park is a remote hilly region of South Italy where we have initiated the genetic study of small villages. A recent analysis of both genealogical and genetic data in Campora, the first village to be studied, confirmed that it displays the classical features of a young genetic isolate (6). To identify loci involved in obesity, we conducted a genome-wide linkage study on both BMI, the quantitative trait underlying the definition of obesity, and obesity, defined as a binary trait. By doing so, we maximized the number of individuals included in the linkage analysis when considering BMI and, when considering obesity, we focused on extreme phenotypes only. Linkage analysis on these samples, however, was not straightforward. The large pedigrees connecting all the subjects of interest were too complex for linkage analysis software and had to be broken into subpedigrees. Here, we used a procedure that includes a maximization of interesting linkage signals over a predefined group of subpedigree sets representing different possible splitting of a large pedigree, following a strategy successfully applied for essential hypertension in Campora (7). An original approach to estimate local false discovery rate (lFDR) (8) was applied to identify significantly linked loci. We found that the location of one of the linkage signals exactly corresponds to the INSIG2 gene region (9). Therefore, we evaluated whether this association could account for the linkage detected in Campora. Finally, a replication study was conducted in Gioi, another village of Cilento, for both BMI and obesity.
Sampling and BMI measurement. BMI measurements were obtained through a population-based sampling strategy. All adult individuals of Campora and Gioi were invited to participate in the study ,and a majority of them could be included. As height <150 or >200 cm may result in unreliable BMI values, we excluded from the analysis the 36 individuals from Campora and the 54 from Gioi with a height <150 cm (none had a height >200 cm). The qualitative phenotype of obesity was defined for all individuals, considering as obese the individuals with BMI >30 kg/m2. Altogether, BMI was available for 394 individuals in Campora, 73 of whom were obese. In Gioi, 531 individuals were phenotyped for BMI, 98 of whom were obese. Among the 394 individuals of Campora, 385 belong to a unique 2,705-member pedigree. In Gioi, 514 individuals are connected through a 3,750-member pedigree.
Genotyping and error checking.
Pedigree breaking.
Heritability estimation and linkage analysis. The multipoint quantitative linkage analysis of BMI was performed using the regression-based approach (14) implemented in MERLIN-REGRESS (logarithm of odds [LOD]). Population trait means and variances were computed from all phenotyped individuals in each population separately. The multipoint qualitative trait linkage analysis of obesity was performed with the Zlr statistic (15) based on Spairs with the exponential model using ALLEGRO (version 2.0) (16). To generate genome-wide empirical P values, null distributions of linkage statistics were assessed through simulations taking into account both the particular characteristics of the marker map and the breaking of a large pedigree into subpedigrees. The details of the procedure have previously been published (7). Briefly, the genome-wide P value was computed in two steps. First, the significance of the maximum LOD was estimated with simulations considering all the markers of the chromosome where it is located (referred to as the chromosome-wide P value). Second, the genome-wide significance was extrapolated by a Bonferroni correction of the chromosome-wide P value that takes into account the 21 remaining chromosomes tested (e.g., chromosome 1 represents one-twelfth of the genome genetic length). For each simulation replicate of step 1, alleles at each marker for all the founders of the complete genealogy were randomly drawn using the allele frequencies estimated from the data, assuming independence among markers and Hardy-Weinberg equilibrium. Genotypes of all nonfounder individuals were then randomly drawn, conditional on their parent genotypes and independent from their phenotype. This gene-dropping step was performed with the Genedrop program from the MORGAN 2.6 package (17). The statistic of interest (LOD) was performed on the subpedigree set used in the analysis for all simulated markers, considering the real observed phenotypes. The maximum LOD on the chromosome was identified. Finally, the P value was assessed as the number of replicates with a maximm LOD superior or equal to the maximum LOD observed in the data. Ten thousand replicates were performed. To assess replication P values, we defined a replication region ± 25 cM around the locus displaying the maximum LOD. Similar null simulations were performed; however, maximum LODs were defined over the replication region (and not over the whole chromosome anymore). The distribution of these "local" maximum LODs is used to compute the empirical replication P value. We performed 1,000 replicates to assess the replication P values.
Computing genome-wide empirical P values allowed us to control the classical family-wise error rate (FWER), i.e., the probability of falsely rejecting at least one null hypothesis. However, FWER-based procedures are often too conservative, particularly when numerous hypotheses are being tested (18). An interesting alternative consists in controlling the lFDR (19), defined as the posterior probability for the null hypothesis to be true. The lFDR is a variant of the false delivery rate (20), giving to each tested null hypothesis its own measure of significance. If the false discovery rate has been repeatedly used in linkage analyses (21), the use of lFDR is a relatively new approach in this context. We recently proposed a two-component mixture model to estimate lFDR in variance component linkage analysis (8). Here, we used the same model but in the context of the regression-based approach proposed by Sham et al. (14). Indeed, the only characteristic of the variance component linkage statistics that we used is its theoretical asymptotic distribution, which is a 50:50 mixture of a
Association testing.
For BMI, we used the general two-allele model (GTAM), the generalized linear regression–based test proposed by Abney et al. (25), to test for association while correcting for relatedness with the genealogical information. To analyze obesity, we used the corrected
Use of genomic control to correct for cryptic relatedness was first proposed by Devlin and Roeder (26) for case-control studies and then extended to quantitative trait association analysis by Bacanu et al. (27). Here, we considered all 1,122 microsatellites converted into biallelic markers (with the most frequent allele renumbered 1 and all others pooled) to compute the genomic control–based correction. We used the generalized linear model–based frequentist approach implemented in the GC/GCFR package (28) for both quantitative and qualitative association testing. All analyses but the corrected
Linkage analysis on the Campora sample. The characteristics of the subpedigree set maximizing the number of individuals phenotyped for BMI and included in families are presented in Table 1. It contains 366 phenotyped individuals clustered in 92 families. A maximum LOD of 4.47 is observed on chr1q24 at position 176.38 cM, corresponding to marker D1S452. Interestingly, this subpedigree set maximizes the linkage with 1q24 over a group of seven different subpedigree sets (Table 2). The genome-wide corrected P value of the linkage peak on chr1q24 is 0.01.
Whole-genome LOD values are presented on Fig. 1A. Figure 1B displays the corresponding estimated values of –log10(lFDR) along the 22 chromosomes. Four regions display an lFDR <5%: chr1q24 (lFDR 0.00041 at 176.38 cM), chr2q14.3 (lFDR 0.0148 at 138.30 cM), chr4q35.1 (lFDR 0.0186 at 197.04 cM), and chr6q23.3 (lFDR 0.02 at 142.71). The regions on 1q24, 2q14.3, and 4q35.1 all have several markers with an lFDR <5%. Chromosome 2q14.3 has already been linked to BMI in pedigrees of European descent (29). Interestingly, it corresponds to the position of the INSIG2 region in which the rs7566605 variant was found to be associated with BMI (9). The 6q23.3 region has also been linked to BMI in the Framingham Heart Study (30).
The subpedigree set maximizing the linkage on chr1q24 with obesity included 41 affected individuals clustered in 15 families (Table 1). The Zlr (2.46) is located at position 172.57 cM, two markers away from the linkage peak identified for BMI (Table 3). Though this subpedigree set did not maximize the number of affected individuals included, the linkage detection was robust over a group of five different subpedigree sets (Table 2). Apart from the detection of 1q24, this subpedigree set identified a Zlr of 2.46 on chr2q14 at position 132.18 cM, only one marker away from the linkage peak identified for BMI. For comparison, the linkage signals on chromosomes 4q35.1 and 6q23.3 were much lower for obesity than they were for BMI, with Zlr values of 0.72 and 0.32, respectively (Table 3). Two regions undetected with BMI display Zlr score >2.5: 8p22 (Zlr 2.83 at position 24.4 cM) and 22q11 (Zlr 2.52 at position 17.97 cM).
Association analysis on chromosome 2. To assess whether the linkage signals detected on chr2q14.3 for both BMI and obesity could be explained by rs7566605, we tested the association between this variant and both BMI and obesity in the sample. The SNP was in Hardy-Weinberg equilibrium, and the frequency of allele C (0.24) was very close to its frequency in the CEU HapMap sample (0.27). As can be seen from Table 4, we did not detect association with either BMI or obesity with or without adjustment for sex and age. Interestingly, the results obtained with the genealogy-based (GTAM and corrected 2) and the genomic-control based (quantitative and qualitative genomic control) tests were very similar.
Replication study in Gioi population. The subpedigree set maximizing the linkage signal between chromosome 1 and BMI in Gioi included 488 individuals clustered in 110 families (Table 1). The maximum LOD (2.29) on chromosome 1 is located at position 168.68 cM, and it corresponds to the maximum LOD over the whole genome (Table 5). Though this subpedigree set did not maximize the number of individuals included, the linkage signal was robust over a group of six subpedigree sets (Table 6). This signal had a significant replication P value of 0.006 considering a null replication region between 150 cM and 200 cM. None of the linkages on chromosomes 2, 4, and 6 were replicated. The subpedigree set clustering the obese cases and maximizing the linkage on chromosome 1 included 63 obese case subjects clustered in 23 families (Table 1). Maximum Zlr on chromosome 1 (2.87) was located at position 136.30 cM, 32 cM away from the maximum LOD identified for BMI in Campora (168.68 cM). However, Zlr was 2.25 at position 168.68 cM (Table 5), and this signal had a significant replication P value of 0.037 (considering again the 150- to 200-cM region to assess replication significance). Again, the linkage is robust over a group of six subpedigree sets (Table 6). No linkage was detected with chromosomes 2, 4, or 6. The maximum Zlr on the genome (3.08 [Table 5]) on chr20q13 at position 100.94 cM was not significant genome wide. However, it overlapped with a region where a quantitative trait loci for BMI and other obesity-related phenotypes has been described by two independent studies (29,31).
Figure 2 presents the combined analysis of the two samples, maximizing the linkage results in Campora and in Gioi, for BMI (Fig. 2A) and obesity (Fig. 2B) for the 1q24 region. The evidence for linkage was increased for obesity (Zlr 3.34 at position 168.68 cM), which was not the case for BMI (LOD 2.29 at position 168.68 cM). This reflects the fact that the maximum LOD for BMI in Campora and Gioi are located two markers away, with very little linkage evidence at the other marker in each population.
In this study, conducted on a random sample of individuals from the isolated population of Campora, we detected a genome-wide significant linkage between BMI and a new locus on chr1q24. Interestingly, this linkage is also detected when focusing on obesity, and it is replicated for both BMI and obesity in the neighboring village of Gioi. However, in these three latter analyses, the linkage is located 7.7 cM away from the initial signal, at a position where no linkage is observed on the first BMI analysis. Whether this suggests the implication of two different loci remains an open question. Following Göring et al. (32), who demonstrated that "the chromosomal position and genotype-phenotype relationship of a locus cannot both be estimated reliably by use of a single data set of current realistic size" in linkage analysis, our results may well be generated by a single locus. We believe that having significant replication P values and detecting a linkage with obesity in Campora at the same marker as the linkage with BMI in Gioi are good elements in favor of the one locus hypothesis. Still, if this is true, then the more accurate estimates of both location and effect size for this locus are those obtained from Gioi. This is the first linkage report of BMI/obesity and chr1q24, a region that harbors interesting candidate genes for overweight and obesity, particularly in the case of the TBX19 gene, located in the 3-Mb region between the two markers (D1S196 and D1S452) displaying the highest linkage signals for BMI and obesity. This gene encodes a T-box factor required for the expression of the proopiomelanocortin (POMC) in the neurons of arcuate nucleus of hypothalamus (33). POMC is a complex propeptide that encodes for a range of peptides involved in the leptin/melanocortin pathway regulating satiety and energy homeostasis. Mutations in POMC are associated with rare monogenic forms of severe obesity occurring soon after birth and with adrenocorticotropic hormone deficiency (1). In addition, an implication of variants in or near the POMC locus in obesity or obesity-related quantitative traits has been suggested in general populations by a number of independent studies (34,35). Mutations in the TBX19 gene have been identified in patients affected by adrenocorticotropic hormone deficiency (36) showing the role of TBX19 in the POMC-related pathway. Another good candidate in the 1q24 region is the ATP1B1 gene, which encodes for the β-1 subunit of the Na+/K+ ATPase protein, as the ATPase activity has been correlated with obesity and energy consumption in several studies (37,38). Additional plausible candidate genes include PRDX6, PLA2G4A, and SOAT1 genes, which are all involved in the phospholipids or cholesterol metabolism. Chang et al. (39) have recently detected both linkage and association between the 1q23–32 region and essential hypertention analyzed as a quantitative trait. BMI and hypertension were highly correlated in Campora (P value <0.002), as described in many populations. Still, in our study on essential hypertension (analyzed as a binary trait) performed in this village (7), we did not detect any linkage between essential hypertension and 1q24 (Zlr –0.45 at position 176.38). The highest Zlr on chromosome 1 was located at position 262.68 cM (Zlr 3.05). Many scenarios can explain these results, one of which is a pleiotropic effect of the region on both BMI and hypertension due to multiple variants with various characteristics. Indeed, our essential hypertension study was relatively underpowered to detect the common variants identified by Chang et al. (sample size of 389 adults in Campora vs. 1,862 adults in the study by Chang et al.). But we cannot exclude that rare variants with a main effect on BMI could not be detected in the large population of Chang et al. but may have become more frequent and detectable in our villages. When using the Bonferroni procedure to control the FWER (at a 5% threshold) over the whole genome, chr1q24 is the only locus with a significant effect. However, when using the model-based approach proposed by Dalmasso et al. (8) to control the lFDR, which is a less conservative criterion defined as the posterior probability for the null hypothesis to be true, three more loci display an lFDR <5%: chr2q14.3, chr4q35.1, and chr6q23. It is worth noting that two of them have already been linked or associated with obesity or BMI in previous studies (9,30,40). Apart from being a less conservative approach than FWER, this model-based lFDR approach has another advantage for linkage studies in large pedigrees: it is much less computationally demanding. We have shown how important a careful handling of large pedigree breaking is (7) while assessing significance of linkage. However, genome-wide corrected FWERs rely on null simulations of the whole pedigree and are consequently computationally intensive. Instead, our model-based lFDR is much faster, as it is based on an empirical null distribution estimated from the marker data. Interestingly, the locus on chr2q14.3 that displays linkage with both BMI and obesity phenotypes contains the SNP rs7566605 reported to be associated (9) with obesity in the Framingham Heart Study and four separate samples. We found no evidence of association between rs7566605 and either BMI or obesity in our sample. Recent studies also failed to replicate this association in populations of different ethnicity (41,42). Still, a recent survey (43) in nine large cohorts concludes that there is an effect of rs7566605 on BMI that is small and possibly masked in relatively small samples. They also suggest that its effect may be heterogeneous across population samples. We cannot rule out that our sample is too small to detect the modest effect of rs7566605 in our population. However, a frequent variant (minor allele frequency 0.24) with a modest effect such as rs7566605 would not be able to explain the strong linkage signal identified in our sample. Consequently, in our study population, it is valid to hypothesize the presence of an effect of other susceptibility polymorphisms in the region. This work is the first analysis of a quantitative trait in the isolated populations of the Cilento National Park. Our results confirm that successful linkage studies can be performed in these populations both to replicate known linkages and to identify novel quantitative trait linkages.
This work was supported by grants from Ente Parco Nazionale del Cilento e Vallo di Diano, the Associazione Italiana per la Ricerca sul Cancro, the Assessorato Ricerca Regione Campania, and the Fondazione Banco di Napoli (to M.G.P.). We thank the populations of Campora and Gioi villages for their kind cooperation. We thank Dr. Andrea Salati, Don Guglielmo Manna, Dr. Leopoldo Errico, and Dr. Giuseppe Vitale for helping in the interaction with the populations; Antonietta Calabria, Raffaella Romano, and Teresa Rizzo for the organization of the study in the villages; Stefano Bracco and Mario Aversano for bioinformatics support; and Laura Dionisi for technical assistance.
Published ahead of print at http://diabetes.diabetesjournals.org on 27 December 2007. DOI: 10.2337/db07-0970. M.G.P. and C.B. contributed equally to this work. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Received for publication July 16, 2007 and accepted in revised form December 18, 2007
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||