OUP user menu

Intragenomic heterogeneity between multiple 16S ribosomal RNA operons in sequenced bacterial genomes

Tom Coenye, Peter Vandamme
DOI: http://dx.doi.org/10.1016/S0378-1097(03)00717-1 45-49 First published online: 1 November 2003

Abstract

The availability of a large number of completely sequenced bacterial genomes allows the rapid and reliable determination of intragenomic sequence heterogeneity of 16S rRNA genes. In the present study we assessed the intragenomic sequence heterogeneity of 16S rRNA genes in 55 bacterial genomes, representing various phylogenetic groups. The total number of rRNA operons in genomes included ranged from 2 to 13. The maximum number of nucleotides that were different between any pair of 16S rRNA genes within a genome ranged from 0 to 19. The corresponding minimal similarity ranged from 100 to 98.74%. This indicates that the intragenomic heterogeneity between multiple 16S rRNA operons in these genomes is rather limited and is unlikely to have a profound effect on the classification of taxa. Among the multiple copies of the 16S rRNA genes present in the genomes included, 199 mutations were counted with transitions being the dominant type of mutations over the total length of the 16S rRNA gene. Most heterogeneity occurred in variable regions V1, V2, and V6.

Keywords
  • 16S rRNA microheterogeneity
  • Whole-genome sequence
  • Taxonomy

1 Introduction

The comparison of 16S ribosomal RNA gene sequences to infer phylogenetic relationships among bacteria has been widely used for several decades (see for example references [1,2]). 16S rRNA is generally accepted as the ultimate molecular chronometer because it is functionally constant, shows a mosaic structure of conserved and more variable regions, occurs in all organisms and because its length allows easy sequencing [1,3,4]. Nevertheless, it has been shown that the resolution of the 16S rRNA gene is often too low to allow the differentiation of closely related species [3,5]. It was also shown that there may be considerable intraspecific variation in 16S rRNA sequence [6]. Part of this intraspecific diversity is caused by the fact that rRNA genes are often organised as part of a multigene family, with the copy number ranging from 1 to 15 [7]. The rRNA operon copy number reflects the ecological strategy of the organism as there seems to be a correlation between the response rate of the organism to changing conditions and the rRNA operon copy number [8]. In general, members of multigene families tend to coevolve [9,10], but the ultimate degree of sequence polymorphism within the family will depend on the frequency of molecular interaction mechanisms such as gene conversion [9,11]. So far, little attention has been paid to the systematic study of variability in multiple copies of the 16S rRNA gene, although this now has been reported for organisms belonging to various major bacterial lineages, including the Proteobacteria, the Firmicutes and the Actinobacteria (for an overview see reference [12]). Overall, intragenomic sequence heterogeneity seems to be relatively low, although values of up to 6.5% have been reported for some actinomycetes [13,14]. Most studies regarding intragenomic sequence heterogeneity of 16S rRNA genes have relied on cloning and sequencing of the individual genes, although separation of the multiple copies by DGGE (denaturing gradient gel electrophoresis) [15] or TGGE (temperature gradient gel electrophoresis) [16] followed by sequencing has been used as well. The availability of an increasing number of completely sequenced bacterial genomes allows for the rapid determination of intragenomic sequence heterogeneity of 16S rRNA genes without the need for further experimental work. In this study we have assessed the intragenomic sequence heterogeneity of 16S rRNA genes in 55 bacterial genomes, representing various phylogenetic groups.

2 Materials and methods

2.1 Whole-genome sequence data

The complete genome sequences used in this study are shown in Table 1. They were downloaded from the GenBank database (http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/micr.html) or were obtained from the Wellcome Trust Sanger Institute website (http://www.sanger.ac.uk/).

View this table:
Table 1

Whole-genome sequences used in this study

Species and strain designationGenBank accession no.rRNA copy no.Max. difference (nucleotides)Min. similarity (%)Max. difference in free energy (kcal/mol)
α-Proteobacteria
Agrobacterium tumefaciens C58AE007869, AE007870401000
Brucella melitensis 16MAE008917, AE008918301000
Caulobacter crescentus CB15AE005673201000
Mesorhizobium loti MAFF303099BA000012201000
Sinorhizobium meliloti 1021AL591688301000
β-Proteobacteria
Bordetella pertussis Tohama INC_002929301000
Burkholderia cenocepacia J23156399.801.2
Burkholderia pseudomallei K962434199.931.0
Neisseria meningitidis MC58AE002098401000
Ralstonia solanacearum GMI1000AL646052, AL646053401000
γ-Proteobacteria
Escherichia coli O157:H7 SakaiBA00000771199.291.1
Haemophilus influenzae RdL42023601000
Pseudomonas aeruginosa PAO1AE0040914199.943.7
Pseudomonas putida KT2440AE0154517399.813.3
Pseudomonas syringae DC300NC_004578501000
Shewanella oneidensis MR-1AE0142999499.742.5
Salmonella enterica CT18AL5133827299.879.3
Vibrio parahaemolyticus RIMD 2210633NC_004603, NC_00460511599.672.8
Xanthomonas axonopodis 306AE008923201000
Xanthomonas campestris ATCC 33913AE008922201000
Xylella fastidiosa 9a5cAE003849201000
α-Proteobacteria
Campylobacter jejuni NCTC 11168AL111168301000
Helicobacter pylori J99AE0014392199.930.3
Wolinella succinogenes DSM 1740NC_005090201000
Spirochaetacea
Leptospira interrogans 56601NC_004342201000
Treponema pallidum NicholsAE000520201000
Firmicutes
Bacillus anthracis AmesNC_00399711599.671.6
Bacillus cereus ATCC 14579NC_00472213399.818.0
Bacillus halodurans C-125BA0000048599.661.9
Bacillus subtilis 168AL009126101299.2313.1
Clostridium acetobutylicum ATCC 824AE001437101299.215.7
Clostridum perfringens 13BA000016101998.746.5
Enterococcus faecalis V583NC_0046684199.942.3
Lactococcus lactis IL1403AE0051766199.943.4
Lactobacillus plantarum WCFS1AL9352635299.870.4
Listeria innocua Clip11262AL5920226499.744.5
Listeria monocytogenes EGD-eNC_0032106499.742.8
Oceanobacillus iheyensis HTE831BA00002871798.927.4
Staphylococcus aureus Mu50BA0000175499.752.6
Staphylococcus epidermidis ATCC 12228AE01592951199.294.8
Streptococcus agalactiae 2603V/RAE009948701000
Streptococcus mutans UA159AE0141335399.810.2
Streptococcus pneumoniae TIGR4AE005672401000
Streptococcus pyogenes MGAS8232AE009949601000
Actinobacteria
Bifidobacterium longum NCC2705AE014295401000
Streptomyces avermitilis MA-4680BA000030601000
Streptomyces coelicolor A3(2)AL6458826399.809.5
Cytophaga–Flavobacterium–Bacteroides group
Bacteroides thetaioatomicron VPI-5482NC_00466351898.9236.0
Porphyromonas gingivalis W83NC_002950401000
Cyanobacteria
Nostoc sp. PCC7120NC_0032724199.930
Synechocystis sp. PCC6803NC_000911201000
Other taxa
Aquifex aeolicus VF5NC_000918201000
Chlorobium tepidum TLSNC_0023932221000
Deinococcus radiodurans R1AE0005133299.871.9
Fusobacterium nucleatum ATCC 25586NC_0034545299.863.3
  • These sequence data were produced by the Wellcome Trust Sanger Institute and can be obtained from their website (http://www.sanger.ac.uk/).

2.2 Extraction of 16S rRNA gene sequences and phylogenetic analysis

Whole-genome sequences were downloaded, imported in the Kodon 2.0 (Applied Maths) software package and 16S rDNA sequences were extracted. If 16S rRNA genes were not annotated they were located in the genome sequence using BLAST [17]. Multiple 16S rRNA genes extracted from the same genome sequence were aligned and a similarity matrix was constructed using Kodon 2.0.

2.3 Determination of secondary structure and minimal free energy

Secondary structures of all 16S rRNA genes were obtained through the mfold webserver [18] using the free energy data from Mathews et al. [19]. The conditions for folding were the standard conditions (37°C, 1 M NaCl, no divalent ions), which are equivalent to physiological conditions [18]. mfold was also used to calculate the mimimal free energy, ΔG0, which is a measurement of the stability of the secondary structure.

2.4 Statistical analysis

Statistical analyses were performed using the SPSS 11.0.1 software package.

3 Results

The rRNA copy number, the maximum pairwise difference in nucleotides between any pair of 16S rRNA operons, the corresponding minimal pairwise similarity and the maximal difference in free energy between any pair of 16S rRNA operons for each genome are given in Table 1. The total number of rRNA operons in genomes included in the present study ranged from 2 to 13 (mean±standard deviation 5.07±2.73). The maximum number of nucleotides that were different between any pair of 16S rRNA genes within a genome ranged from 0 to 19 (mean±standard deviation 2.91±4.78). The corresponding minimal similarity ranged from 100 to 98.74% (mean±standard deviation 99.81±0.31). The maximal difference in free energy between two 16S rRNA operons from the same genome was between 0 and 36.0 kcal/mol (mean±standard deviation 2.57±5.47). The distribution of different types of mutations (transitions [A↔G and C↔T], transversions [purine↔pyrimidine] and insertions/deletions) in multiple copies of the 16S rRNA gene was mapped. Among the multiple copies of the 16S rRNA genes present in the 55 bacterial genomes included in this study, 199 mutations were counted (insertion/deletions at the 5′ or 3′ end of the 16S rRNA gene were not included as they may result from errors in predicting the correct borders of the 16S rRNA gene, especially in not-annotated genomes for which BLAST was used to identify the gene). These mutations included 125 transitions (62.8%), 52 transversions (26.1%) and 22 insertions/deletions (11.1%). Transitions are the dominant type of mutations over the total length of the 16S rRNA gene. Insertions and deletions are rare and are even totally absent from regions 401–600, 701–800, 1101–1200 and 1301–1400. The majority of all mutations were located in the first 300 nucleotides of the 16S rRNA gene (n=95, 47.7%) and between positions 1001 and 1100 (n=23, 11.6%) (Fig. 1).

Figure 1

Distribution of different types of mutations in multiple copies of the 16S rRNA genes of sequenced bacterial genomes. The location of the nine variable regions (V1–V9) is indicated above the bars.

4 Discussion

The results of the present study confirm that the number of rRNA operons is indeed not strictly correlated with phylogeny [7,8], although the mean number is slightly higher in Gram-positive organisms than in Gram-negative organisms (6.64 vs. 3.90, P=0.001). However, it should be noted that the number of taxa investigated is relatively low and that this may influence the statistical analysis. Our data clearly indicate that the intragenomic heterogeneity between multiple 16S rRNA operons in a genome is rather limited. This is in agreement with a preliminary study in which it was shown that the maximum intragenomic 16S rRNA operon diversity within 14 bacterial and archeal genomes was between 0 and 1.23%[12] and with several other studies in which individual 16S rRNA operons were cloned and sequenced (see references [2022] for recent examples). There are however examples in which extensive sequence diversity has been reported between multiple 16S rRNA operons within a genome, especially in the actinomycetes; this heterogeneity is most likely caused by recombination and/or horizontal transfer [13,14]. There is no significant difference in intragenomic diversity between the different phylogenetic groups, although the homogeneity of the 16S rRNA operons in the α-Proteobacteria is remarkable. We included several organisms of which the genomes consist of multiple rRNA-containing replicons (Agrobacterium tumefaciens, Brucella melitensis, Burkholderia cenocepacia, Burkholderia pseudomallei, Ralstonia solanacearum and Vibrio parahaemolyticus). We found that the diversity between 16S rRNA operons located on different replicons was not higher than the diversity between 16S rRNA operons located on the same replicon; the16S rRNA operons of A. tumefaciens and B. melitensis located on different replicons are even identical. To evaluate the impact of the sequence heterogeneity on the stability of the 16S rRNA we calculated the difference in free energy between all pairs of 16S rRNA operons within a genome. As could be expected these differences were also rather limited, indicating that, at least for the genomes included in the present study, there are no significant differences in secondary structure.

When we mapped the distribution of all mutations, it was obvious that this distribution was unequal (P<0.01). Most heterogeneity occurred in variable regions V1, V2, V6, and, to a lesser extent, V3 and V4 (Fig. 1). When we compared the distribution with known distribution of substitution rates and with secondary and tertiary structure models of rRNA and ribosomes [23,24], it became obvious that most variability occurred in (i) regions which are known to have high intrataxon diversity, (ii) regions that are located further away from the centre in an assembled ribosome, and (iii) regions that are not (or only to a lesser extent) involved in tertiary structure interactions. The location of the differences in the most variable part of the 16S rRNA corroborates that the differences are true differences, and not mere sequencing errors.

Our data indicate that, although there is heterogeneity among multiple 16S rRNA operons in bacterial genomes, in general this heterogeneity is rather limited and is unlikely to have a profound effect on the classification of taxa. Although there are several mechanisms that can cause heterogeneity within a multigene family, it seems that there is a strong pressure to maintain a high level of sequence conservation among multiple copies of 16S rRNA genes, probably because of functional and structural constraints.

Acknowledgments

T.C. and P.V. are indebted to the Fund for Scientific Research — Flanders (Belgium) for a position as postdoctoral fellow and research grants, respectively. T.C. also acknowledges the support from the Belgian Federal Government (Federal Office for Scientific, Technical and Cultural Affairs).

References

  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
  11. [11].
  12. [12].
  13. [13].
  14. [14].
  15. [15].
  16. [16].
  17. [17].
  18. [18].
  19. [19].
  20. [20].
  21. [21].
  22. [22].
  23. [23].
  24. [24].
View Abstract