OUP user menu

Phylogenetic reconstruction of Gram-positive organisms based on comparative sequence analysis of molecular chaperones from the ruminal microorganism Ruminococcus flavefaciens FD-1

Dionysios A. Antonopoulos, W. Michael Russell, Bryan A. White
DOI: http://dx.doi.org/10.1016/S0378-1097(03)00597-4 1-7 First published online: 1 October 2003


Primers designed on the basis of nucleotide sequences conserved in DnaK and GroEL from Gram-positive organisms were used to PCR amplify internal regions of the cognate genes from the anaerobic ruminal cellulolytic bacterium Ruminococcus flavefaciens FD-1. Genome walking was then utilized to elucidate the remainder of the sequences in addition to upstream and downstream regions. The full sequence of the gene encoding the GroES protein (groES) was found directly upstream from groEL. The deduced amino acid sequence of the groEL gene showed the highest homology with the amino acid sequence of the Clostridium thermocellum GroEL protein (72% amino acid identity). Similarly, translation of the groES nucleotide sequence showed highest homology to the C. thermocellum GroES protein (61% amino acid identity). Analysis of the upstream region of this chaperonin operon revealed a CIRCE regulatory element 45 bp upstream from the putative start of the groES ORF. The deduced amino acid sequence of the putative dnaK gene showed the highest homology with the amino acid sequence of the Clostridium acetobutylicum DnaK protein (68% amino acid identity). Phylogenetic analyses based on the translated sequences reiterate this relationship between R. flavefaciens and the Clostridia. However, when the nucleotide sequences of Gram-positive organisms are analyzed, a different topology occurs of the relationship between high- and low-G+C Gram-positive organisms to the 16S rRNA interpretation.

  • Heat shock
  • Chaperone
  • Ruminococcus
  • DnaK
  • GroES
  • GroEL

1 Introduction

Molecular chaperones undergo increased synthesis in response to various physical and chemical stimuli. Modeling of their action and regulation was first described in the Escherichia coli/λ bacteriophage system, with work on the isolation of temperature-sensitive mutations, and much of the nomenclature has persisted from these initial studies [1]. Initially described as ‘heat shock proteins’, owing to their increased expression levels upon an upshift in temperature, they are now recognized as playing more rudimentary roles in cellular physiology; they assist in protein folding during and after translation regardless of the thermal environment [2,3]. With the response to thermal shifts overlapping other modes of stress, the heat shock system is actually a more globally oriented control system than previously thought. In bacterial systems, they play a central role in the availability of sigma factors, which induce conformational changes in RNA polymerase so that its binding efficacy to different sets of promoters is increased, and entire sets of genes can be coordinately expressed [4]. Additionally, an interesting hypothesis that has been posited, in a eukaryotic context, is that they may play a role in basic signaling systems by ‘poising’ intercellular receptors in a conformation that allows maximum sensitivity to their respective signals [5].

The long-term goal of our research has been to advance our understanding of the molecular biology of plant cell wall degradation by Ruminococcus flavefaciens FD-1 which is highly active against higher-order forms of cellulose [6]. This bacterium is one of the predominant cellulolytic bacteria isolated from the rumen of animals [79]. Individual cellulase genes, celA, celB, celD, and celE, have been isolated and characterized from the strain but only preliminary research has been conducted on understanding their regulation [1014]. Regulation of cellulases at the post-translational level requires exploring more general regulatory circuits, such as molecular chaperones. Since chaperones have been demonstrated to assist in general protein folding and stability, they may be involved, in the context of cellulose degradation, in both the assembly of the cellulase complex as well as in altering the conformation of cellulase proteins to regulate their activities. More recently, thermostable homologs of the chaperonins GroES and GroEL have been cloned and expressed from Clostridium thermocellum that were initially described in relation to enhancing yields of cellulosome-related proteins [15]. As the first step towards examining the expression of heat shock proteins with respect to those gene products involved in cellulolysis by R. flavefaciens FD-1, we determined the DNA sequences for the chaperone proteins DnaK, GroES, and GroEL. Based on phylogenetic analyses of the hypothetical amino acid sequences, the common ancestry of R. flavefaciens and the Clostridia is preserved, and agrees with the 16S rRNA-based phylogeny. However, the GroEL and DnaK sequences indicate an alternate ordering of the larger low-GC and high-GC Gram-positive clades with respect to the Proteobacteria. An analysis of the nucleotide sequences of chaperones from Gram-positive organisms reveals large-scale differences that divide them into two distinct clusters that do not appear in the 16S rRNA-based phylogeny.

2 Materials and methods

2.1 Organisms and culture conditions

R. flavefaciens FD-1 from the Department of Animal Sciences culture collection was used as the source of genomic DNA while E. coli TOP10 cells (Invitrogen, Carlsbad, CA, USA) were used in all cloning applications.

R. flavefaciens FD-1 was cultivated in a defined medium consisting of 5% v/v mineral solutions (Mineral Solutions 1 and 2, per liter [16]), 0.001% w/v hemin, 0.4% Na2CO3, 1% v/v VFA solution [17], 0.5% v/v vitamin solution [18], 0.01% w/v dithiothreitol, 0.038% w/v Na2S, 0.01% w/v resazurin, 20 mM NH4Cl, and 10 mM cellobiose. Cells were grown at 37°C in crimped butyl rubber-stoppered bottles (Bellco) saturated with 95% CO2/5% H2 atmosphere.

Transformed E. coli cells were grown in LB medium supplemented with 100 µg ml−1 of ampicillin (Sigma) for selection and maintenance of plasmids.

2.2 Polymerase chain reaction (PCR) of internal fragments based on conserved sequences

Primers were designed based on alignments of Gram-positive nucleotide sequences (Bacillus subtilis MB11, B. subtilis subsp. marburg 168, B. subtilis, B. stearothermophilus NUB36, Clostridium acetobutylicum DSM 1731, C. perfringens, and Lactococcus lactis) of GroEL and DnaK and used to PCR amplify portions of the genes (Table 1; primers GroELFW, GroELRV, DnaKFW, and DnaKRV). PCR amplifications were conducted using 10-ng aliquots of genomic DNA extracted (using an adaptation of a general laboratory protocol [19]) from R. flavefaciens FD-1, and the following program: 95°C for 5 min; 30 cycles of 95° C for 30 s, 65°C for 1 min; 72°C for 2 min; 72°C for 7 min; 4°C end. All PCR reactions were done using a Perkin Elmer GeneAmp 2400 using TaKaRa brand Taq polymerase (0.05 U µl−1 final concentration), PCR buffer (1×final concentration), and dNTPs (1 mM final concentration). Primers were used at a final concentration of 1 pmol µl−1 in the PCR. Following sequence elucidation, primers were designed to amplify the upstream and downstream regions of the genes using genome walking libraries (Table 1; primers GROEL5′RV, GROEL3′FW, DNAK5′RV, and DNAK3′FW,).

View this table:
Table 1

Chaperone-derived primers utilized in PCR and genome walking

Primer namePrimer sequencePrimer location relative to deposited sequence

2.3 Construction of genome walking libraries

Extraction of genomic DNA from R. flavefaciens FD-1 was adapted from a general laboratory protocol [19]. Genome walking libraries were constructed using the Universal GenomeWalker Kit (Clontech, CA, USA). Genomic DNA of R. flavefaciens FD-1 was digested according to the manufacturer's protocol using nine separate restriction enzymes: AatII, BsmI, DraI, EcoRV, HpaI, PvuII, ScaI, SspI, and StuI (it should be noted that EcoRV was the only restriction enzyme capable of complete digestion of 50 ng µl−1 DNA of R. flavefaciens FD-1).

These restriction digests were then ligated to adaptor sequences (GenomeWalker Adaptors) thus producing pools of adaptor-ligated genomic DNA fragments (‘libraries’). Amplification of flanking regions of the PCR-amplified internal fragments gene was conducted using a primer based on the internal fragment sequence information and primers from the GenomeWalker Kit (Table 1). PCR amplifications were conducted using 10-ng aliquots from each library using a basic three-step program: 95°C for 5 min; 30 cycles of 95° C for 30 s; 65°C for 30 s, 72°C for 1 min; 72°C for 7 min; 4°C end. All PCR reactions were done using a Perkin Elmer GeneAmp 2400 using TaKaRa brand Taq polymerase (0.05 U µl−1 final concentration), PCR buffer (1×final concentration), and dNTPs (1 mM final concentration). Primers were used at a final concentration of 1 pmol µl−1 in the PCR. Following sequence elucidation, a primer was designed based on this information for further downstream amplification from the other libraries of the C-terminal portion of the dnaK gene (Table 1; primer DNAK1771CL).

2.4 Cloning and sequence analysis methods

Gene fragments initially obtained via PCR were cloned using the Original TA Cloning Kit and the TOPO TA Cloning Kit (Invitrogen, Carlsbad, CA, USA). Clones were then sent for sequencing to the Core Sequencing Facility of the W M. Keck Center for Comparative and Functional Genomics located on the UIUC campus. Sequencing reactions were done using a Perkin-Elmer cycler and dye terminator chemistry. Sequences were read using an automated sequencer (Model 377, Applied Biosystems). Analysis of resulting sequences, including editing, alignment, and designing of primers for further amplification and sequencing work, was conducted utilizing GeneWorks 2.5.1 for Macintosh (IntelliGenetics, Inc., Mountain View, CA, USA) and Sequencher (Gene Codes Corporation, Ann Arbor, MI, USA). ClustalX (1.8) was used to align sequences, and to construct neighbor-joining trees [20]. Other phylogenetic analyses, using maximum parsimony and maximum likelihood, were performed with PHYLIP 3.6a3 (Consense and Seqboot in conjunction with Dnapars and Protpars), and fastDNAML 1.2.2 (using Seqboot for input and Consense for output from PHYLIP) [21,22,23]. Component 2.0 was used to determine correlations between trees constructed by different methods [24]. TreeView and MacDraw were used to edit trees for publication quality [25]. The ARB package was used for analysis of 16S rRNA sequences [26]. Chaperone sequences from R. flavefaciens FD-1 were deposited in GenBank under the accession numbers AY189822 (dnaK) and AY189823 (groES and groEL).

3 Results and discussion

Degenerate oligonucleotide primers based on conserved amino acid sequences of molecular chaperones were used to PCR amplify internal regions of dnaK and groEL from R. flavefaciens FD-1 (Table 1; primers GroELFW, GroELRV, DnaKFW, and DnaKRV). Genome walking was then used to clone and sequence the entire groEL and dnaK genes. The deduced amino acid sequence (622 amino acids, 1866 bp) of the dnaK gene showed the highest homology with the amino acid sequence of the C. acetobutylicum DnaK protein (68% amino acid identity according to BlastX). The only regulatory feature readily identified in the 2553-bp cloned sequence is a putative terminator element, characterized by an inverted repeat (GGGCGAAAtccgTTTCGCCC) 28 bp downstream from the putative termination codon. This inverted repeat is then followed by five TAAT motifs of unknown function that do not translate into repeating amino acid motifs.

A 2178-bp fragment of DNA containing the groEL and groES genes of R. flavefaciens FD-1 was also cloned and the DNA sequence determined. The deduced amino acid sequence of the groES gene (90 amino acids, 270 bp) showed highest similarity with the C. thermocellum GroES protein (61% identity), and the deduced amino acid sequence (542 amino acids, 1626 bp) of the groEL gene showed highest similarity with the amino acid sequence of the C. thermocellum GroEL protein (72% identity). We also identified the CIRCE (conserved inverted repeat control element) regulatory element 45 bp upstream of this chaperonin operon. This conserved regulatory element appears as an inverted repeat (AGCACTCaaagaaagaGAGTGCT) and is one of the few regulatory features in R. flavefaciens FD-1 that overlaps with other model systems. In C. acetobutylicum the inverted repeat appears as TTAGCACTCaagattaacGAGTGCTAA and in B. subtilis appears as TTAGCACTCtttagtgctGAGTGCTAA. Regulation of groESL, co-expression of groES and groEL, in B. subtilis has been observed to involve a σA-like promoter and this signature sequence [27]. The CIRCE sequence serves as a transcriptional inhibitor, due to its inverted repeat structure, that provides a tight transcriptional regulation of the downstream genes when bound by its repressor [28]. Under heat shock conditions, the repressor for this operon is inactivated and the genes downstream of the CIRCE are expressed at high levels.

Heat shock proteins have been shown to be expressed constitutively at low levels to assist in the general proper folding of proteins as they are translated. As a consequence of their intimate involvement with the translational machinery of the cell, their highly conserved sequences have lent themselves to phylogenetic scrutiny and have proven useful in resolving 16S rDNA sequence-based phylogenetic disputes at the species level. Based on amino acid homologies, the molecular chaperones GroES, GroEL, and DnaK from R. flavefaciens FD-1 cluster with the Clostridia (Fig. 1). At this level of large families (versus specific genera), this relation follows the prediction based on 16S rRNA-based phylogeny. The Proteobacteria, and the low- and high-G+C Gram-positive bacteria form cohesive clusters. However, the bootstrap values in the GroES-based phylogeny provide low resolving power to definitively establish the branching order of individual genera, let alone species (data not shown), compared to those provided by the GroEL- and DnaK-derived trees. This effect owes more to the shorter overall length of that sequence than its conserved informational content.

Figure 1

Comparison of phylogenies based on predicted GroEL and DnaK amino acid sequences from R. flavefaciens FD-1. Alignments used to construct these phylogenies with ClustalX were based on amino acid lengths of 511 and 574, respectively. Phylogenies are based on bootstrapped (1000) neighbor-joining trees. Maximum parsimony analysis with Protpars strongly supports the neighbor-joining GroEL phylogeny (based on quartet analysis of the neighbor-joining and maximum parsimony datasets by Component 2.0).

By focusing phylogenetic comparisons on eubacterial organisms, as shown in Fig. 1, the relationship between the Proteobacteria, and the low- and high-G+C Gram-positive organisms is not well resolved. The displayed neighbor-joining trees are not well supported, demonstrated by their low bootstrap values at the major branchings, or by other methods of phylogenetic inference, such as maximum likelihood. When the nucleotide complement of these sequences is used for phylogenetic reconstruction instead, the trees devolve into star-like arrangements of clades (data not shown). As previously demonstrated in other heat-shock-based phylogenies, using alternate methods, the orders of major groups are disrupted when compared with the 16S rRNA-based phylogeny [29]. Gupta et al. [29] have used the presence or absence of indels (insertions–deletions) to effectively provide another character of relatedness, in that the presence of an insertion in an ancestor of a group of organisms will be maintained in its progeny. Using this alternate approach they have demonstrated an alternate tree topology to the 16S rRNA-based interpretation [30].

When phylogenetic analysis is performed at the nucleotide level on Gram-positive organisms only, two clusters emerge that do not appear in a 16SrRNA-based analysis of the same organisms. This owes to the higher homology between the 16SrRNA molecules, a slower ‘molecular clock’, when compared with the groEL and dnaK sequence sets. Many of the coherent 16S rRNA-based groupings (Fig. 2A) are disrupted in the chaperone nucleotide phylogenies, including that of the Clostridia. C. perfringens appears in the same main cluster with the Bacilli, as do the sequences from R. flavefaciens FD-1, whereas C. tetani does not. In the groEL phylogeny, C. acetobutylicum groups with C. tetani, while in the dnaK phylogeny C. acetobutylicum groups with C. perfringens. In neither case do the R. flavefaciens FD-1 chaperone genes cluster closely with the Clostridial representatives.

Figure 2

Comparison of phylogenies of primarily Gram-positive organisms based on (A) the canonical 16S rRNA tree, (B) groEL, and (C) dnaK nucleotide sequences from R. flavefaciens FD-1. The 16S rRNA phylogeny is drawn according to the broader universal phylogeny available through the ARB phylogeny program. When the phylogeny is based on the alignment of these organisms only, the tree collapses into a star phylogeny, with no apparent natural ordering. Chaperone phylogenies are based on bootstrapped (1000) neighbor-joining trees. Maximum parsimony analysis with Dnapars strongly supports the neighbor-joining groEL phylogeny whereas maximum likelihood analysis with fastDNAml strongly supports the neighbor-joining dnaK phylogeny (>90% confidence based on quartet analysis of the datasets by Component 2.0).

We are interested in determining whether the heat shock system in R. flavefaciens FD-1 is involved in the regulation of cellulase expression. Our reasoning stems from observations of other organisms that the heat shock system and its constituent proteins are upregulated not only under thermal duress, but from other environmental conditions classified as stressful, such as acidic conditions [31]. Their role in maintaining protein conformations becomes further generalized by their involvement in translation, which becomes especially important in the assembly of multi-protein complexes, such as cellulosomes (multi-enzymatic complexes involved in fiber degradation). By identifying features that have precedence in other described systems (e.g. CIRCE regulatory element) we hope via an indirect route to uncover those involved with cellulases. The precedence of recent work on heat shock proteins in C. thermocellum putatively associated with cellulosomes provides further encouragement for the intersection of these circuits [15]. Identification of cellulolytic enzymes and their interactions, as larger multi-protein structures, are numerous in the literature but nothing is known about the mechanism of transport or excretion to the cell surface. Maintenance of their structural integrity, or in a form that is conducive to a transport process, is a fundamental question, and now with the acquired sequence information, expression studies of heat shock transcripts can be conducted to investigate their influence in cellulose degradation in R. flavefaciens FD-1.


The authors would like to acknowledge Cynthia Murphy for helpful discussions in the early stage of the sequencing work. This project was supported by U.S. Department of Agriculture grants no. ILL35-0389, ILL35-0524 and ILL35-0538 and by the Agricultural Experimental Station of the University of Illinois.


  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
  11. [11].
  12. [12].
  13. [13].
  14. [14].
  15. [15].
  16. [16].
  17. [17].
  18. [18].
  19. [19].
  20. [20].
  21. [21].
  22. [22].
  23. [23].
  24. [24].
  25. [25].
  26. [26].
  27. [27].
  28. [28].
  29. [29].
  30. [30].
  31. [31].
View Abstract