OUP user menu

Novel modular enzymes encoded by a cellulase gene cluster in Cellvibrio mixtus

Maria S.J. Centeno, Arun Goyal, José A.M. Prates, Luís M.A. Ferreira, Harry J. Gilbert, Carlos M.G.A. Fontes
DOI: http://dx.doi.org/10.1111/j.1574-6968.2006.00464.x 26-34 First published online: 1 December 2006


Hydrolysis of plant cell wall polysaccharides, a process which is of intrinsic biological and biotechnological importance, requires the concerted action of an extensive repertoire of microbial cellulases and hemicellulases. Here, we report the identification of the gene cluster unk16A, regA and cel5B in the aerobic soil bacterium Cellvibrio mixtus, encoding a family 16 (CmUnk16A) glycoside hydrolase (GH), an AraC/XylS transcription activator (CmRegA) and a family 5 (CmCel5B) endo-glucanase, respectively. CmUnk16A is a modular enzyme comprising, in addition to the catalytic domain, two family 32 carbohydrate-binding modules (CBMs), termed CBM32-1 and CBM32-2, a CBM4 and a domain of unknown function. We show that CBM32-2 binds weakly to laminarin and pustulan. CmRegA is also a modular protein containing a highly hydrophobic N-terminal domain and a C-terminal DNA-binding domain of the AraC/XylS family. The role of the identified enzymes in the hydrolysis of cell wall polysaccharides by aerobic bacteria is discussed.

  • Cellvibrio
  • glycoside hydrolase (GH)
  • carbohydrate-binding module (CBM)
  • AraC/XylS transcription activator


Plant cell wall polysaccharides represent the most abundant reservoir of organic carbon in the biosphere (Coughlan, 1985). In nature, bacterial and fungal enzyme systems play a major role in recycling photosynthetically fixed carbon and these biocatalysts are, therefore, of considerable biological and biotechnological importance (Gilbert et al., 2002). Cellulases and hemicellulases are industrially important enzymes, widely used in the paper/pulp, animal feed, fruit juice, detergent and textile sectors. Understanding the mechanism by which these enzyme systems degrade their complex insoluble and highly recalcitrant substrates could improve the efficiency with which plant biomass is converted to bioethanol, a renewable and environmentally friendly fuel (Schell et al., 2004).

The diversity and complexity of polysaccharides comprising the plant cell wall limits the accessibility of enzymes that degrade this macromolecular composite structure (Brett & Waldren, 1996). To improve the efficiency of the degradative process, plant cell wall-specific glycoside hydrolases (GH) have evolved complex modular architectures comprising both catalytic modules and noncatalytic carbohydrate-binding modules (CBMs). By mediating an intimate and prolonged association of the enzyme with its substrate, CBMs enhance the activity of the catalytic module against the recalcitrant polysaccharides (Boraston et al., 2003). Thus, CBMs play a major role in potentiating the capacity of cellulases and hemicellulases to degrade the plant cell wall. Similar to GHs, CBMs have been grouped into 45 sequenced-based families (Coutinho & Henrissat, 1999; see http://afmb.cnrs-mrs.fr/CAZY). In addition, based on the topology of the binding sites, which reflects the structure of the target ligand, these noncatalytic modules have been grouped into three types. CBMs that interact with the flat surfaces of crystalline polysaccharides contain a planar hydrophobic carbohydrate-binding site and are referred to as Type A CBMs (Kraulis et al., 1989; Tormo et al., 1996; Raghothama et al., 2000). In contrast, type B CBMs interact with several sugars, and type C CBMs with one or two saccharides of individual glycan-chains (Charnock et al., 2000; Carvalho et al., 2004; Pires et al., 2004; Najmudin et al., 2006). Characteristically, type C CBMs lack the extended binding-site groves of their type B counterparts.

In a previous study, we have identified a modular cellulase in Cellvibrio mixtus, termed CmCel5B, comprising an N-terminal GH5 catalytic module, followed by two family 6 CBMs (Fontes et al., 1998). Here, we report the cloning and the preliminary biochemical characterization of two novel genes, named unk16A and regA, encoding modular proteins involved in the degradation of plant cell wall polysaccharides. The two genes are located downstream of cel5B, which encodes CmCel5B. CmUnk16A, encoded by unk16A, is a modular GH16 enzyme containing, in addition to a GH16 catalytic domain, two family 32 CBMs, designated CBM32-1 and CBM32-2, a family 4 CBM and a domain of unknown function. CBM32-2, binds weakly to laminarin and pustulan. CmRegA is a putative regulatory protein encoded by regA, which is located between cel5B and unk16A. The significance of these findings in relation to the mechanism by which C. mixtus mobilizes organic carbon is discussed.

Materials and methods

Bacterial strains, plasmids and culture conditions

The Escherichia coli strains used in this study were BL21(DE3), Tuner (Novagen) and XL1-Blue (Stratagene). The bacteriophages and plasmids used in this work were λZAPII (Stratagene), pBluescript SK (Stratagene), pGEM-T (Promega), pET21a (Novagen) and pET22b (Novagen). Escherichia coli XL1-Blue (Stratagene) were cultured at 37°C in Luria broth (LB) or on LB agar plates. Escherichia coli cells used to propagate bacteriophage were grown on LB supplemented with 10 mM MgSO4 and 0.2% maltose, and were plated out on NZYM top agar (0.7%). The recombinant proteins, encoded by pET21a and pET22b derivatives, were expressed in E. coli BL21 and Tuner strains, as described below.

Sources of carbohydrates

Polysaccharides were purchased from Megazyme International (Bray County Wicklow, Ireland), with the exception of oat spelt xylan and hydroxyethylcellulose, which were obtained from Sigma.

DNA procedures

General recombinant DNA procedures were carried out as described by Sambrook (1989). The C. mixtus genomic library was constructed in λZAPII using the approach described by Centeno (2006). The library was plated on NZYM top agar at a density of three plaques per cm2 (Millward-Sadler et al., 1995) and screened by DNA hybridization utilizing the fluorescein system from Amersham using the DNA insert of plasmid pHP2C34 as the probe. Two inverse PCR amplifications were performed to clone the full-length sequence of the genes located downstream from cel5B (encodes CmCel5B). Briefly, for the first inverse PCR experiment, C. mixtus genomic DNA was digested with the enzyme SalI, and the resulting DNA fragments ranging from 2.0 to 2.5 kb were eluted from agarose gels and religated. In the second experiment, the enzyme used was NdeI, and the fragments ranging from 4.3 to 5.5 kb were eluted and ligated following the same procedures. The primer pairs used in the first and second amplifications were as follows: first amplification (SalI), 5′-GAGGATGTGCCCAATCTATAC-3′and 5′-CATTTGTTGCGGTTGATGCTC-3′; second amplification (NdeI), 5′-GTCATGTGCTGAAAATGC-3′ and 5′-GTCAATTGGTGGAAGTGG-3′. PCRs were performed using 1 U of the thermostable DNA polymerase pFU turbo (Stratagene) following the manufacturer's instructions. The two PCR products were cloned into pGEM-T (Promega), and the resulting plasmids were named pZC5 and pZC6, respectively. The nucleotide sequence of DNA was determined with an ABI Prism Ready Reaction DyeDeoxy terminator cycle sequencing kit and an Applied Biosystems 377A sequencing system. The complete sequence of the DNA (both strands) was determined using a series of custom-made primers. Sequences were compiled and ordered using the computer software DNAsis (Hitachi).

Expression and purification of recombinant proteins

To express the five modules of CmUnk16A in E. coli, DNA encoding these regions were amplified by PCR from C. mixtus genomic DNA using the thermostable DNA polymerase pFU Turbo (Stratagene). The forward and reverse primers used for the amplifications incorporated, respectively, NheI and XhoI restriction sites, which were used for subsequent cloning into pET21a (Table 1). PCR products were initially inserted into pGEM-T and sequenced to ensure that no mutations had occurred during PCR. The recombinant derivatives of pGEM-T were digested with NheI and XhoI and the excised Cellvibrio DNA fragments were cloned into the similarly digested pET21a, such that the recombinant proteins contained C-terminal His6-tags. To insert the signal peptide encoded by the expression vector pET22b into the various pET21a derivatives generated in this study, the DNA containing the T7 promoter and encoding the pET22b signal peptide was amplified by PCR from 20 ng of pET22b using the primers indicated in Table 1. After cloning into pGEM-T and sequencing, the pET22b fragment was excised through digestion with BglII and NheI and cloned in the similarly restricted pET21a derivatives.

View this table:
Table 1

Primers used to hyperexpress CBM32-1 and CBM32-2 in Escherichia coli

  • These primers were used to amplify the pET22b DNA fragment containing the T7 promoter and the DNA sequence encoding the vector signal peptide. The amplified fragment was subcloned into the BglII–NheI sites of pET21a containing unk16A truncated derivatives, to produce recombinant proteins fused to the pET22b encoded signal peptide.

  • The NheI, BglII and XhoI restriction sites are given in bold.

Escherichia coli strains BL21 and Tuner harbouring the pET21a derivatives were cultured in LB containing 100 µg mL−1 ampicillin at 37°C to the mid-exponential phase (A550 nm 0.6), at which point isopropyl-β-d-thiogalactopyranoside (IPTG) was added to a final concentration of 1 mM (BL21) or 0.2 mM (Tuner) and the cultures were incubated for a further 16 h at 16°C. Soluble recombinant proteins were purified by immobilized metal ion affinity chromatography as described previously (Dias et al., 2004) and buffer exchanged into 20 mM Tris-HCl buffer, pH 7.5, containing 100 mM NaCl and 5 mM CaCl2 (Buffer A).

Binding studies

Qualitative assessment of the capacity of CBM32-1 and CBM32-2 to bind soluble polysaccharides was determined by affinity gel electrophoresis (AGE) as described previously (Tomme et al., 2000), using bovine serum albumin (BSA) as the nonbinding protein. Qualitative assessment of binding to insoluble polysaccharides was carried out as described by Carvalho (2004).

Results and discussion

Cloning and sequencing of novel Cellvibrio genes

Previously, we reported the cloning and sequencing of a C. mixtus gene (here termed cel5B) encoding a modular cellulase, CmCel5B, comprising an N-terminal GH5 catalytic domain, followed by two CBM6 modules organized in tandem. Sequencing of Cellvibrio DNA downstream of cel5B revealed a gene, designated regA, encoding a protein, termed CmRegA, which displays homology to the DNA-binding domains of the AraC/XylS family of prokaryotic transcription factors (Fig. 1). The data also revealed an incomplete ORF, designated unk16A, downstream of regA that encodes a protein displaying similarity to GH16 enzymes. To obtain the full sequence of unk16A, a combination of library screening (which generated the plasmid pZC1; see Fig. 1) and inverse PCR was used. This strategy led to the cloning of 8360 nt of Cellvibrio DNA, which was sequenced in both strands. This genomic region contains, in addition to cel5B, two novel genes of 1176 (regA) and 3150 (unk16A) nucleotides (Fig. 1). The proteins encoded by these genes were designated CmRegA and CmUnk16A, respectively, and displayed predicted Mr values of 45 138 and 114 754. The nucleotide sequence of the 8.4 kb sequence characterized in this report appears in GenBank with the accession number AF003697. The codon usage of the two genes is similar to that of other Cellvibrio proteins, and the proposed ATG translational start codons of the genes are preceded (7–12 bp) by typical prokaryotic ribosome-binding sequences. A region of dyad symmetry was identified 16 nucleotides downstream of the putative translational stop codon of unk16A and has the potential to form a substantial stem–loop structure, characteristic of a ρ-independent transcription terminator sequences. Thus, the cloning strategy used in these experiments identified two novel C. mixtus genes downstream cel5B. The two GH encoding genes are in the same orientation and flank regA, which encodes a putative regulatory protein (Fig. 1).

Figure 1

Restriction map and organization of the Cellvibrio gene cluster encoding CmCel5B, CmRegA and CmUnk16A. Plasmids containing the DNA fragments that allowed the assembly of the 8.4 kb sequence containing the three genes identified in this work are displayed. The position and orientation of the genes cel5b, regA and unk16A encoding, respectively, the proteins CmCel5B, CmRegA and CmUnk16A are displayed. Abbreviations: NdI, NdeI; SpI, SphI; S, SalI; H, HindIII, C, ClaI, E1, EcoRI, E5, EcoRV; P, PstI.

Comparison of the primary sequences of CmRegA and CmUnk16A with protein databanks, searched by blast (http://www.ncbi.nlm.nih.gov/BLAST), revealed that both proteins display a modular architecture. CmRegA comprises a 106-residue C-terminal DNA-binding domain of the AraC/XylS family of transcription activators. Sequence alignments (Fig. 2a) show that the DNA-binding domain displays the highest identity (>45%) with the AraC/XylS domain homologues present in nine putative regulatory proteins from Saccharophagus degradans (accession numbers ABD83063, ABD81534, ABD82185, ABD81755, ABD80653, ABD82256, ABD81751, ABD82282 and ABD80064). Lower identity scores of the C-terminal domain were displayed for proteins from the genus Flavobacterium, Reinekea, Flavobacteriales and Marinobacter. Interestingly, the N-terminal domain of CmRegA is extremely rich in hydrophobic residues. The Kyte–Doolittle hydropathy plot of the N-terminal domain of CmRegA reveals the presence of seven regions with hydropathy scores above 1.8, which, therefore, are likely to constitute transmembrane regions (Fig. 2b; Kyte & Doolittle, 1982). In addition, this hydrophobic region displays considerable identity (>30%) with the N-terminal domain of the above-mentioned protein regulators, ABD83063 and ABD81755, from S. degradans (Fig. 2a). However, the N-terminal regions of the Saccharophagus proteins are considerably less hydrophobic. Although the hydrophobic character of the N-terminal domain of CmRegA suggests that this region constitutes a transmembrane anchor, it is also possible that this region functions as a protein homodimerization domain. Indeed, most transcription activators from the AraC/XylS family are modular proteins in which the N-terminal domain is responsible for the effector/signal recognition in addition to mediating protein dimerization (Gallegos et al., 1997; Egan, 2002).

Figure 2

Alignment of CmRegA with homologous proteins from Saccharophagus degradans (a) and the Kyte–Doolitle hydropathy plot of the N-terminal module of CmRegA (b). In (a), the S. degradans proteins are identified by the accession numbers (GenBank) and the DNA-binding module is highlighted. In (b), positive values represent increased hydrophobicity and peaks with scores greater than 1.8 (horizontal line) indicate possible transmembrane regions. The alignment was prepared using clustal w (Thompson et al., 1994).

While CmRegA is a membrane-bound or intracellular protein, the deduced N-terminal 26 residues of CmUnk16A exhibited features typical of a signal peptide. This observation suggests that CmUnk16A constitutes an extracellular enzyme in common with most bacterial cellulases and hemicellulases (Fontes et al., 1995). The N-terminal domain, following the signal peptide, displays considerable sequence identity (>30%) with GH16 enzymes from the genus Saccharophagus (accession numbers ABD80706 and ABD82135), Reinekea (accession number EAR10101) and Marinobacter (accession number EAQ65099). Interestingly, amino acid identity spans throughout the majority of the other modules of these five proteins and clearly, as can be seen in Fig. 3a, the molecular architecture of these enzymes is rather similar. Thus, the GH16 catalytic domain of CmUnk16A is followed by a family 32 CBM (named CBM32-1), a CBM4 and a 224-residue domain of unknown function. All these modules present the highest identity scores with the equivalent domains from the four enzymes of the genus Saccharophagus, Reinekea and Marinobacter referred to previously. However, the C-terminal region of CmUnk16A varies when compared with its closest homologues. Thus, downstream the domain of unknown function in CmUnk16A, we have identified a second CBM32 (designated CBM32-2), while the proteins with the accession numbers ABD82135 and EAR10101 contain two C-terminal copies of this domain (Fig. 3a). By contrast, EAQ65099 from Marinobacter sp. lacks the C-terminal CBM32 domain, while ABD80706 from S. degradans contains a nonrelated C-terminal domain of unknown function. Conservation on these proteins’ molecular architectures suggests a common but still unknown role for these enzymes in polysaccharide hydrolysis.

Figure 3

Molecular architecture of CmUnk16A and its protein homologues (a) and alignment of the CBM32 domains located in proteins from panel a (b). In (a), modules belonging to the same protein family have similar visual representations. In (b), residues that are completely conserved in the modules are highlighted in grey. CBM32 homologues from Saccharophagus degradens are identified by the accession numbers. Other CBMs are from Marinomonas sp. MED121 (EAQ65099), Reinekea sp. MED297 (EAR10101) and Hahella chejuensis KCTC 2396 (ABC29869). The amino acids in Micromonospora viridifaciens CBM32 (BAA00852) that interact with galactose are highlighted with grey arrowheads. Arrows above the sequence represent β-strands in M. viridifaciens CBM32. The protein alignment was prepared with clustal w (Thompson et al., 1994).

A close inspection of the CmUnk16A GH16 catalytic domain reveals, in addition to the Saccharophagus, Reinekea and Marinobacter enzymes, considerable identity with the catalytic domains of bacterial β-1,3-glucanases from the marine bacterium Pseudomonas sp. PE2 (BAC16332), Nocardiopsis sp. F96 (BAE54302) and Streptomyces sp. AP77 (BAD20955). Alignments with GH16 enzymes suggest that CmUnk16A Glu179 and Glu184 constitute, respectively, the enzyme nucleophile and general acid/base catalytic residues (not shown). In addition, the presence of three extra amino acids before Glu179 and an inserted Ile between the nucleophile and Glu184 suggest that CmUnk16A belongs to the β-1,3-glucanase subfamily of GH16 enzymes (Planas, 2000). Interestingly, the β-1,3-glucanase from Pseudomonas sp. PE2 was recently shown to constitute an essential enzyme for the degradation of fungal cell walls (Kitamura & Kamei, 2006). Moreover, CmUnk16A CBM4, here termed CmCBM4, displays strong identity with the β-1,3-glucan-specific family 4 CBMs of Lam16A of Thermotoga maritima and Thermotoga neapolitana (Boraston et al., 2002). An alignment of CmCBM4 with the β-1,3-glucan and β-1,3-1,4-glucan CBM4 binders of T. maritima and Cellulomonas fimi, respectively, of known structure, demonstrates that the CmCBM4-equivalent residue to Tyr32 in T. maritima is conserved, suggesting a putative β-1,3-glucan-binding specificity for CmCBM4. Surprisingly, two out of the other three conserved residues involved in carbohydrate recognition are absent in CmCBM4. Members of CBM4 demonstrate a large array of ligand specificities and, therefore, it is possible that this lack of conservation in key residues relates to a differential mode of interaction in CmCBM4, in case of the CmUnk16A being functional.

An alignment of CBM32 modules from CmUnk16A highly conserved homologues is displayed in Fig. 3b. The structure of Micromonospora viridifaciens sialidase A CBM32, which was the founder of the CBM32 family, displays a β-sandwich topology and its primary sequence was also included in the alignment of (Fig. 3b;Gaskell et al., 1995). The M. viridifaciens CBM32 has been partially characterized and was shown to bind galactose, lactose and sialic acid in a lectin-like pocket, which is characteristic of type C CBMs (Gaskell et al., 1995; Boraston et al., 2003). Three residues, corresponding to Asp944, Trp951 and Tyr987 of CBM32-2, are conserved in the 13 CBM32 analysed (Fig. 3b). Inspection of M. viridifaciens 3D structure demonstrates that all three residues fulfil a structural role in the protein, suggesting a general role for these amino acids in the structural organization of CBM32 members. The residues that bind galactose in M. viridifaciens CBM32, Arg572, Glu578, His539, Ser575 and Trp542 (depicted with a grey arrowhead in Fig. 3b) are not conserved in the other members of the CBM32 family. This observation is not completely surprising as residues involved in carbohydrate recognition were shown to vary among members of the same CBM family displaying differences in ligand specificity (Boraston et al., 2004). Finally, in contrast to CBM32-2, the alignment suggests that CBM32-1 and its closely related homologues (ABD82135, ABD80706-1, EAQ65099 and EAR10101) may constitute distant members of the CBM32 family.

Binding studies

Initially, we attempted to express all five modules of CmUnk16A as discrete entities in E. coli. All proteins, except CBM32-1 and CBM32-2, formed inclusion bodies when expressed in E. coli BL21, Tuner and Origami using several different expression regimes. CBM32-2 was exported to the periplasm via the vector-encoded signal peptide. Attempts to refold the proteins that formed inclusion bodies were unsuccessful. In contrast, CBM32-1 and CBM32-2 were purified to electrophoretic homogeneity (Fig. 4) and, therefore, their capacities to bind to a range of cell wall carbohydrates were explored. AGE revealed that CBM32-2 displays a modest retardation in the electrophoretic migration in the presence of laminarin and pustulan, suggesting that the protein binds to these polysaccharides. The degree of retardation was too small to enable detailed affinity constants to be determined (Fig. 5). No electrophoretic retardation of CBM32-2 was detected in the presence of carob galactomannan, potato galactan, pullulan, xyloglucan, barley β-glucan, lichenan, hydroxyethylcellulose (HEC), konjac glucomannan or oat spelt xylan. Interestingly, CBM32-1 displayed no affinity for the range of polysaccharides tested. As expected, the two proteins were unable to interact with the insoluble cellulose preparations Avicel and acid-swollen cellulose (data not shown). The apparent limited binding of CBM32-2 to β-1,3 and β-1,6 glucose polysaccharides suggests that the target ligand for CBM32-2 is present at lower concentrations in these β-glucans. As the carbohydrate concentration in the gel is relatively high (0.4%), the data suggest that binding might be occurring at the polysaccharide chain ends, which are relatively infrequent in polymers with a significant degree of polymerization. Recently, CBMs displaying exquisite selectivity for the nonreducing ends of polysaccharide chains have been described, confirming that this mechanism of binding occurs in nature (van Bueren et al., 2005; Henshaw et al., 2006).

Figure 4

Hyperexpression and purification of recombinant CBM32-1 (a) and CBM32-2 (b). BL21 cells were transformed with the respective recombinant plasmids and induced with IPTG as described in Materials and methods. Soluble extracts from uninduced (lane 1) and induced (lane 2) cells were prepared and the recombinant proteins were purified through IMAC [lane 4 in (a) and lane 5 in (b)]. Lane 3 represents unbound protein and lane 4 in (b) represents proteins washed off from the purification column. S represents the protein molecular mass standards, of which proteins with 14 and 20 kDa are depicted.

Figure 5

Interaction of CBM32-2 with xylan, laminarin and pustulan analysed by AGE. The electrophoretic mobility of CBM32-2 is not affected by the presence of xylan, which is shown as an example of a noninteracting polysaccharide. BSA and CBM32-2 were subjected to nondenaturing electrophoresis in gels containing 0.4% (w/v) of the specified polysaccharide.


This report describes the cloning of two GH encoding genes that are located in the same orientation and flank regA (encodes a modular regulatory protein termed CmRegA) in the genome of C. mixtus. The significance of clustering these three genes is intriguing and suggests a role for CmRegA in the regulation of the GH encoding genes. Owing to protein insolubility in E. coli, only the biochemical role of the C-terminal domain of CmUnk16A was determined. Data presented in this report suggest that this family 32 CBM, termed CBM32-2, binds weakly to laminarin (β-1,3 glucose polymer) and pustulan (β-1,6 glucose polymer). Together with chitin, β-1,3 and β-1,6 glucose polysaccharides are important components of the fungal cell wall (Cid et al., 1995) and therefore it is possible that CmUnk16A is involved in the hydrolysis of fungal cell wall polysaccharides. This is consistent with the substrate specificities displayed by GH16 enzymes, which mainly hydrolyse β-1,3 glucans, and is supported by a recent report demonstrating that a homologue close to CmUnk16A GH16 is highly active in fungal cell wall hydrolysis (Kitamura & Kamei, 2006). However, lack of soluble protein precludes the biochemical characterization of the CmUnk16A GH16 catalytic domain. The role of CmUnk16A in a bacterium that is specialized in the hydrolysis of plant cell wall polysaccharides is, therefore, intriguing. Considering that fungi are the first colonizers of plant biomass, it is possible that saprophytic bacteria could have developed fungal lytic mechanisms to compete with, or utilize as a nutrient, fungal species that also attack the plant cell wall. These possibilities remain, however, to be evaluated in the Cellvibrio genus.


This work was supported by grant POCI/CVT/61162/2004 from the Fundação para a Ciência e a Tecnologia, Portugal.


  • Editor: Marco Moracci


View Abstract