OUP user menu

Organisation of the S10, spc and alpha ribosomal protein gene clusters in prokaryotic genomes

Tom Coenye, Peter Vandamme
DOI: http://dx.doi.org/10.1016/j.femsle.2004.10.050 117-126 First published online: 1 January 2005

Abstract

Although it is well known that there is no long range colinearity in gene order in bacterial genomes, it is thought that there are several regions that are under strong structural constraints during evolution, in which gene order is extremely conserved. One such region is the str locus, containing the S10spcalpha operons. These operons contain genes coding for ribosomal proteins and for a number of housekeeping genes. We compared the organisation of these gene clusters in 111 sequenced prokaryotic genomes (99 bacterial and 12 archaeal genomes). We also compared the organisation to the phylogeny based on 16S ribosomal RNA gene sequences and the sequences of the ribosomal proteins L22, L16 and S14. Our data indicate that there is much variation in gene order and content in these gene clusters, both in bacterial as well as in archaeal genomes. Our data indicate that differential gene loss has occurred on multiple occasions during evolution. We also noted several discrepancies between phylogenetic trees based on 16S rRNA gene sequences and sequences of ribosomal proteins L16, L22 and S14, suggesting that horizontal gene transfer did play a significant role in the evolution of the S10spcalpha gene clusters.

Keywords
  • Ribosomal protein
  • S10
  • Phylogeny

1 Introduction

Shortly following the completion of the first two prokaryotic genomes (that of Haemophilus influenzae and that of Escherichia coli) it was noted that there is no long range colinearity in gene order in bacterial genomes [1,2]. Subsequent studies on more taxa confirmed this initial finding: apparently dynamic rearrangements have occurred frequently enough to break up operon structures [35] and although gene order is extremely conserved in closely related taxa, it rapidly becomes less conserved with evolutionary distance [6,7]. However, even in distantly related genomes, several highly conserved regions can be found, probably regions that are under strong structural constraints during evolution [5,7]. Systematic genome comparisons have revealed that functionally related genes tend to be neighbours more often than unrelated genes [8] and this provides strong support for the concept that conserved gene order could be correlated with physical interactions between the encoded proteins [9]. One region in which gene order generally appears to be conserved is the str locus that contains the S10spcalpha operons, encoding ribosomal proteins and a number of housekeeping genes [5,10].

In E. coli, 53 ribosomal proteins have been identified [11,12]. Approximately half of these are encoded by genes that are located at the str locus, while the rest are scattered around the genome in clusters of 1–4 genes. The genetic organisation of the ribosomal protein clusters is complex, with many operons containing genes for non-ribosomal proteins. In addition, the organisation of many ribosomal protein operons does not follow the promotor-structural gene-terminator paradigm [12]. The physiological relevance of this complex organisation is at present not entirely clear. In E. coli, the S10 operon contains the genes coding for ribosomal proteins S10, L3, L4, L23, L2, S19, L22, S3, L16, L29 and S17. The spc operon contains the genes coding for ribosomal proteins L14, L24, L5, S14, S8, L6, L18, S5, L15 and L36. In addition, between the genes coding for L15 and L36, the secY gene is found, coding for a preprotein translocase. The alpha operon contains ribosomal proteins S13, S11, S4 and L17, with rpoA (coding for the α-subunit of RNA polymerase) inserted between S4 and L17. Subsequently, the organisation of the S10, spc and alpha gene clusters was determined for a number of other bacterial taxa (including Mycoplasma capricolum [13], Chlamydia trachomatis [14], Bacillus subtilis [15], Synechococcus sp. [16] and Sinorhizobium meliloti [15]), as well as for a number of archaeal species (including Sulfolobus solfataricus [17] and Halobacterium halobium [18]). While for several organisms the organisation of S10, spc and alpha gene clusters was very similar to the organisation seen in E. coli, deletions or insertions of additional genes and/or translocations of genes were often noticed. For example, in contrast to E. coli and H. influenzae, the spc gene clusters of B. subtilis and Mycoplasma genitalium contain three additional genes coding for non-ribosomal proteins: adk (coding for adenylate kinase), map (coding for methionine aminopetidase) and infA (coding for translation initiation factor I) [5]. When comparing the gene order conservation in 35 sequenced prokaryotic genomes, Tamames [7] found varying levels of conservation of gene order (15–88%, expressed as the ratio between the number of times the gene is conserved in the run and the total number of times the gene is present) for members of these gene clusters.

Although it has been hypothesised that genes coding for proteins involved in multiple interactions, including ribosomal proteins, are less likely to be horizontally transferred (the complexity hypothesis [19]), horizontal gene transfer has been described for some ribosomal protein genes, including S14 and L27[2023]. Especially the case of the S14 gene is intriguing, as there seems to have been recurrent transfers of this gene between various bacterial groups [20]. Other studies have demonstrated the importance of ribosomal protein gene duplications and lineage-specific gene loss [21,24]. This suggests that many evolutionary forces are involved in shaping the organisation of ribosomal protein gene clusters.

Now that bacterial genome sequences are published almost weekly, it is possible to compare the organisation of S10, spc and alpha gene clusters in a wide range of taxa. In the present study we compared the organisation of the S10, spc and alpha gene clusters in 99 sequenced bacterial genomes. Twelve archaeal genomes were included for comparison. We also compared the organisation of S10, spc and alpha gene clusters with groupings obtained by comparing 16S ribosomal RNA gene sequences and the sequences of multiple ribosomal proteins.

2 Materials and methods

2.1 Genome sequences

We downloaded 99 bacterial and 12 archaeal genome sequences from the GenBank database. If several strains from a single species were sequenced we only included one. An overview of all taxa included (including strain number and GenBank accession number) is given in Tables S1 and S2 that can be found as supplementary data online at http://allserv.ugent.be/~tcoenye/cepacia/page40.html.

2.2 Sequence alignment and numerical analysis

16S ribosomal RNA gene and amino acid sequences from ribosomal proteins L16, L22 and S14 were extracted from the whole-genome sequence. Sequences were aligned using the emma interface (EMBOSS). Tree construction and bootstrap analyses (100 replicates) were performed using the Bionumerics 3.5 (Applied Maths) and Treecon [25] software packages. Phylogenetic trees were constructed using the neighbour-joining method [26] (no specific substitution model was applied). In the case of the genetic organisation of the S10, spc and alpha gene clusters, all individual genes were considered as multistate characters. Genes were considered to belong to one of the following categories: (i) present in the genome in the same place and order as in the E. coli and/or B. subtilis genomes; (ii) present in the genome but in a different place and/or order than in the E. coli and/or B. subtilis genomes or (iii) absent from the genome. Trees were constructed using the Bionumerics 3.5 software package, using the categorical coefficient. Absence of a gene from a genome was confirmed by performing a BLASTP analysis [27], using the ribosomal protein sequence of the closest relative as the query sequence.

3 Results and discussion

3.1 Organisation of S10, spc and alpha gene clusters in bacterial genomes

When we compared the organisation of S10, spc and alpha gene clusters in 99 sequenced bacterial genomes, 42 different organisations were observed (see Table S1). Based on this organisation we constructed a dendrogram, using the categorical coefficient (Fig. 1). A schematic overview of the organisations observed is given in Fig. 2. Most variation occurs in the 3 prime half of the spc gene cluster and in the alpha gene cluster, while the S10 gene cluster appears to be more conserved. It is worth noting that only 14 organisms showed the same organisation as seen in E. coli.

Figure 1

Dendrogram derived from the unweighted pair group average linkage of categorical coefficients between the organisation of S10,spc and alpha gene clusters in sequenced bacterial genomes.

Figure 2

Schematic overview of the organisation of some S10, spc and alpha gene clusters in 99 bacterial genomes. The three lateral arrows below the gene names represent the operon organisation in Escherichia coli. +, presence; −, absence; x, found in another position in genome. Numbers in first column refer to organisation of the cluster as indicated in Table S1 (which taxa belong to which group can also be found in Table S1).

Many bacterial genomes do not encode all ribosomal proteins found in the S10spcalpha operons in E. coli (Table S1 and Fig. 2). While some of these gene losses appear to be specific for one or more lineages (for example L30 is absent from the genomes of members of the Chlamydiae, the ε-Proteobacteria, the Cyanobacteria and the mycoplasmas), other losses are restricted to one or a few members of a lineage (for example L2 is present in all bacterial genomes, except in that of Streptococcus mutans). Most gene loss is seen in Clostridium tetani: the genome of C. tetani appears to lack the genes that code for 10 ribosomal proteins proteins found in the S10spcalpha operons in E. coli.

In a number of genomes, additional genes coding for non-ribosomal proteins can be found in the S10, spc and alpha gene clusters. In several cases the inserted genes encode unknown and/or hypothetical proteins (Table 1). Some of these are very short and it remains to be determined if these are true protein coding genes or open reading frames that occur by chance [28]. However, in several genomes, there is evidence for the insertion of true protein-coding genes in the S10, spc and alpha gene clusters (Table 1).

View this table:
Table 1

Overview of non-ribosomal proteins inserted in the S10, spc and alpha gene clusters in sequenced bacterial genomes

OrganismInserted non-ribosomal genesLocation
Bacillus halodurans C125Hypothetical proteinBetween map and infA
Bifidobacterium longum NCC2705Hypothetical proteinBetween S13 and rpoA
Corynebacterium diphtheriae NCTC13129Putative secreted protein, putative ABC transport systemBetween S17 and L14
ATP-binding protein and putative ABC transport system integral membrane protein
Serine transporter, l-serine dehydratase and putative secretedBetween L5 and S8
amino acid hydrolase
Putative transport protein, putative sugar binding secreted protein,Between L15 and secY
putative sugar ABC transport system membrane protein and putative ABC transport system membrane protein
Putative sialidase precursor and putative secreted proteinBetween map and infA
Corynebacterium efficiens YS-3142 Hypothetical proteins and putativeBetween L5 and S8
glucose-6-phosphate dehydrogenase
Hypothetical proteinBetween map and infA
Corynebacterium glutamicum ATCC130322 Hypothetical proteinsBetween L5 and S8
Uncharacterised proteinBetween map and infA
Coxiella burnetii RSA493Hypothetical proteinBetween secY and S13
Haemophilus ducreyi 35000HPInsA and InsBBetween S17 and L14
Helicobacter hepaticus ATCC51449Conserved hypothetical proteinBetween L5 and S8
Lactococcus lactis IL1403Unknown proteinBetween S14 and S8
Hypothetical protein yvfC, IS 1077F transposase,Between adk and infA
hypothetical protein yvfD and IS 904H transposase
Mesorhizobium loti2 Unknown proteinsBetween L18 and L15
Mycobacterium bovis AF2122/97Possible arylsulfatases ATSa and ATSb, conserved hypotheticalBetween S17 and L14
protein and conserved transmembrane protein
Possible protease IV sppA, possible d-xylulose kinase B andBetween L15 and secY
conserved hypothetical protein
Mycobacterium leprae TNArylsulfatase pseudogene and 2 hypothetical proteinsBetween S17 and L14
Possible protease IV sppA, possible d-xylulose kinase B andBetween L15 and secY
conserved hypothetical protein
Mycobacterium tuberculosis H37RvArylsulfatase and 2 hypothetical proteinsBetween S17 and L14
Possible protease IV sppA, possible d-xylulose kinase B andBetween L15 and secY
conserved hypothetical protein
Neisseria meningitidis Z24913 Hypothetical proteinsBetween S10 and L3
Pirellula sp. strain 1Hypothetical proteinBetween S10 and L3
Thermoanaerobacter tengcongensis MB4THypothetical proteinBetween map and infA

Multiple genes found in the S10spcalpha operons in E. coli, are found outside these gene clusters in many bacterial genomes investigated (Table S1 and Fig. 2). Two different classes can be distinguished. In several genomes, genes coding for ribosomal proteins found in the S10spcalpha operons in E. coli are now found outside these clusters. These genes are not grouped together but are found on different positions, scattered throughout the genome. This is the case for Agrobacterium tumefaciens, Rhodopseudomonas palustris, Rickettsia conorii, Rickettsia prowazekii, S. meliloti, Neisseria meningitidis, Ralstonia solanacearum, Staphylococcus aureus, Pasteurella multocida, Photorhabdus luminescens, Pseudomonas aeruginosa, Pseudomonas putida, Pseudomonas syringae, Salmonella enterica, Shewanella oneidensis, Vibrio cholerae, Vibrio parahaemolyticus, Vibrio vulnificus, Xanthomonas axonopodis, Xanthomonas campestris, Xylella fastidiosa, Prochlorococcus marinus, Synechococcus sp., Synechocystis sp., Thermosynechococcus elongatus, Pirellula sp. and Treponema pallidum. However, it appears that in other genomes, multiple genes found in the S10spcalpha operons in E. coli have formed novel, separate ribosomal gene clusters (data not shown). For example, in the Campylobacter jejuni genome the genes infA, L36, S13, S11, S4, rpoA and L17 form a separate gene cluster located in a different position of the genome. Whether these genes or gene clusters found outside the S10, spc and alpha gene operons represent horizontal gene transfer followed by deletion of the original gene or gene clusters in the S10, spc and alpha operons, or are the result of a single or multiple genomic rearrangement(s) within the genome is at present not clear. In several bacterial genomes, genes located in the S10spcalpha operons in E. coli are found in other ribosomal gene clusters (data not shown). For example, the S10 gene is located in the S12 ribosomal gene cluster in the genome of all species of the Chlamydiae and the Cyanobacteria. Similarly, the S4 gene of Mycoplasma gallisepticum is also located in the S12 cluster. There are also several examples of changes in gene order within the S10, spc and alpha gene clusters; for example, in Thermotoga maritima, infA is localised at the 3’ end of the L17, while in Mycoplasma penetrans, S3 is localised between S17 and L29.

The genome of several organisms included in this study consists of multiple replicons (A. tumefaciens, Brucella melitensis, Brucella suis, R. solanacearum, V. cholerae, V. vulnificus, V. parahaemolyticus, Leptospira interrogans and Deinococcus radiodurans). In all these organisms the S10, spc and alpha gene clusters were located on the largest replicon.

3.2 Organisation of S10, spc and alpha gene clusters in archaeal genomes

When we compared the organisation of S10, spc and alpha gene clusters in 12 sequenced archaeal genomes, 10 different organisations were observed (see Table S2). The organisation of the S10, spc and alpha gene clusters of Methanosarcina acetivorans, Pyrococcus furiosus, Archaeoglobus fulgidus, S. solfataricus, Halobacterium sp., Thermoplasma acidophilum and Methanothermobacter thermoautotrophicum is somewhat similar to the organisation in bacterial genomes, with most variation being localised in the 3′ half of the spc gene cluster and in the alpha gene cluster. However, the organisation of the S10, spc and alpha gene clusters of Aeropyrum pernix, Methanocaldococcus janaschii, Methanopyrus kandleri, Nanoarchaeum equitans and Pyrobaculum aerophilum is totally different (Table S2) and these gene clusters actually appear to be nonexisting in P. aerophilum and N. equitans.

We also noted the insertion of several genes coding for other ribosomal proteins between genes localised in the S10, spc and alpha gene clusters (data not shown). For example, the genes coding for ribosomal proteins L32 and L19 were inserted between the genes coding for L6 and L18 in all archaeal genomes (except in A. pernix, N. equitans and P. aerophilum), while L7 was inserted between S5 and L15 in T. acidophilum. The S10 gene is colocalised with the S12 gene cluster in all archaeal genomes (except those of P. furiosus, N. equitans and P. aerophilum) while the S4 gene is located between L24 and L5 in all archaeal genomes (except those of A. pernix, M. thermoautotrophicum, N. equitans and P. aerophilum). There are also several examples of genes normally found in the S10, spc and alpha gene clusters that now form a separate gene cluster on another location in the genome. This is for example the case for the S13, S4 and S11 genes in all archaeal genomes investigated, and the L3, L4 and L23 genes in A. pernix.

3.3 Phylogenies based on amino acid sequences of ribosomal proteins L22, L16 and S14 and comparison with phylogenies based on 16S rRNA gene sequences and on organisation of S10, spc and alpha gene clusters

The 16S rRNA gene has been widely used to infer phylogenetic relationships among prokaryotes. There is however considerable concern that single-gene trees may not adequately reflect phylogenetic relationships, because of the possibility of horizontal gene transfer. For this reason, the sequences of protein coding genes have been used to deduce phylogenetic relationships between organisms, including genes coding for ribosomal proteins [29,30]. Data from the present study indicate that, from the ribosomal proteins encoded by genes localised in the S10, spc and alpha gene clusters, L22 and L16 are the most “stable” genes (i.e. they are present in all bacterial genomes in the same location within the S10, spc and alpha gene clusters). As there is some evidence that the S14 gene might be horizontally transferred [20], we also included the S14 protein in our phylogenetic analysis.

A phylogenetic tree based on 16S rRNA gene sequences is shown in Fig. 3. Overall, the phylogenies derived from L16 and L22 sequences were similar to each other and to the phylogeny derived from the 16S rRNA gene sequences (Table 2, Fig. 4). The main discrepancies between the 16S rRNA gene sequence based tree and the tree based on L16 sequences were: (i) the close relationships between the Actinobacteria and the Firmicutes; (ii) the separate postions of the mycoplasmas and members of the genus Clostridia; (iii) the fact that the β-Proteobacteria appear as a subgroup of the γ-Proteobacteria; (iv) the fact that the δ-proteobacterium Geobacter sulfurreducens does not group with the other Proteobacteria; (v) the separate position of the spirochaete L. interrogans. The main discrepancies between the 16S rRNA gene sequence based tree and the tree based on L22 sequences were: (i) the fact that the β-Proteobacteria appear as a subgroup of the γ-Proteobacteria; (ii) the fact that the δ- and ε-Proteobacteria seem unrelated to each other and the other Proteobacteria; (iii) the separate position of L. interrogans; (iv) the close relationship between the Cyanobacteria and the Actinobacteria. The correlation between the grouping obtained based on 16S rRNA gene sequence similarity and S14 protein sequence similarities was lower, and several differences between both trees can be observed (Figs. 3 and 5). The main discrepancies between the 16S rRNA gene sequence based tree and the tree based on S14 sequences were: (i) the subdivision of the Actinobacteria; (ii) the separate position of Clostridium perfringens and Clostridium acetobutylicum; (iii) the separate position of Streptococcus pneumoniae; (iv) the separate positions of the δ- and ε-Proteobacteria. When comparing the sequence-based trees to the tree based on the organisation of the S10, spc and alpha gene clusters, several differences were noted. Most remarkable were the diversity of the ε-Proteobacteria, and the positions of the clostridia, the spirochaetes, S. mutans, Gloeobacter violaceus, M. gallisepticum and M. penetrans in the tree based on the organisation of the S10, spc and alpha gene clusters (Fig. 1). The overall Pearson product moment correlation coefficients between organisational similarity and 16S rRNA gene, L16, L22 and S14 sequence similarity were high (86.7%, 77.4%, 76.9% and 88.0%, respectively) (Table 2, Fig. 4).

Figure 3

Phylogenetic tree based on 16S rRNA gene sequences. The scale bar represents 10% sequence disimilarity.

View this table:
Table 2

Pearson product moment correlation coefficients between the similarity matrices of the different data sets

Organisational similarity100
Sequence similarity of 16S rRNA gene86.7100
L1677.482.3100
L2276.980.779.5100
S1488.072.773.974.5100
Figure 4

Concordance between the phylogeny derived from the organisation of the S10, spc and alpha gene clusters, and the phylogenies derived from the 16S rRNA gene sequences and the sequences of ribosomal proteins L16, L22 and S14.

Figure 5

Phylogenetic tree based on S14 sequences. The scale bar represents 10% sequence disimilarity.

4 Conclusions

Although it was previously reported that the S10spcalpha operon, encoding ribosomal proteins and a number of housekeeping genes, was similar in all bacterial genomes [5,9], data from the present study clearly indicate that there is much variation in gene order and content in these gene clusters. Whether or not the differences in organisation are partially or entirely due to: (i) genomic rearrangements in the genome; (ii) lineage-specific gene loss (preceded by gene duplications or not) and/or (iii) horizontal gene transfer, is at present not clear. However, evidence for the role of horizontal gene transfer in the evolution of ribosomal proteins was presented before [2023] and the observed discrepancies between phylogenetic trees based on 16S rRNA gene sequences and sequences of ribosomal proteins L16, L22 and S14 also suggest that horizontal gene transfer may have played a significant role in the evolution of the S10spcalpha operon. More detailed studies will be required to confirm this. Our data also indicate that differential gene loss has occurred on multiple occasions during evolution. In addition, the determination of the organisation of the S10, spc and alpha gene clusters can provide additional, sequence-independent, information that can be used to deduce phylogenetic relationships between prokaryotes.

Acknowledgements

T.C. and P.V. are indebted to the Fund for Scientific Research – Flanders (Belgium) for a position as postdoctoral fellow and research grants, respectively. T.C. also acknowledges the support from the Belgian Federal Government (Federal Office for Scientific, Technical and Cultural Affairs).

References

  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
  11. [11].
  12. [12].
  13. [13].
  14. [14].
  15. [15].
  16. [16].
  17. [17].
  18. [18].
  19. [19].
  20. [20].
  21. [21].
  22. [22].
  23. [23].
  24. [24].
  25. [25].
  26. [26].
  27. [27].
  28. [28].
  29. [29].
  30. [30].
View Abstract