OUP user menu

Proteomic analysis of the insoluble subproteome of Clostridium difficile strain 630

Shailesh Jain , Robert L.J. Graham , Geoff McMullan , Nigel G. Ternan
DOI: http://dx.doi.org/10.1111/j.1574-6968.2010.02111.x 151-159 First published online: 1 November 2010


Clostridium difficile, a Gram-positive spore-forming anaerobe, causes infections in humans ranging from mild diarrhoeal to potentially life-threatening pseudomembranous colitis. The availability of genomic information for a range of C. difficile strains affords researchers the opportunity to better understand not only the evolution of these organisms but also their basic physiology and biochemistry. We used proteomics to characterize the insoluble subproteome of C. difficile strain 630. Gel-based LC-MS analysis led to the identification of 2298 peptides; provalt analysis with a false discovery rate set at 1% concatenated this list to 560 unique peptides, resulting in 107 proteins being positively identified. These were functionally classified and physiochemically characterized and pathway reconstruction identified a variety of central anaerobic metabolic pathways, including glycolysis, mixed acid fermentation and short-chain fatty acid metabolism. Additionally, the metabolism of a variety of amino acids was apparent, including the reductive branch of the leucine fermentation pathway, from which we identified seven of the eight enzymes. Increasing proteomics data sets should – in conjunction with other ‘omic’ technologies – allow the construction of models for ‘normal’ metabolism in C. difficile 630. This would be a significant initial step towards a full systems understanding of this clinically important microorganism.

  • multidimensional
  • proteomics
  • GeLC/MS
  • membrane associated
  • leucine


The Gram-positive spore-forming anaerobe Clostridium difficile, first described by Hall & O'Toole (1935), has become recognized as the leading cause of infectious diarrhoeal in hospital patients worldwide over the last three decades (Riley, 1998; Sebaihia et al., 2007). Two factors are significant in the increased prevalence of C. difficile infection (CDI): the increase in the use of broad-spectrum antibiotics, including cephalosporins and aminopenicillins (Poutanen & Simor, 2004), and the widely reported contamination of the hospital environment by C. difficile spores (Durai, 2007).

Antibiotic-associated diarrhoeal and colitis were well established soon after antibiotics became available, with C. difficile being identified as the major cause of antibiotic-associated diarrhoeal and as the nearly exclusive cause of potentially life-threatening pseudomembranous colitis in 1978 (Bartlett, 2006). Clostridium difficile 's well-documented antibiotic resistance results in its persistence when the normal gut microbial communities are disturbed or eradicated by antibiotic therapy, following which C. difficile spores germinate, producing vegetative cells, which, upon proliferation, secrete the organism's two major virulence factors – toxin A and toxin B. As the major virulence factors, the toxins have been studied extensively in order to dissect C. difficile virulence mechanisms and they are the primary markers for the diagnosis of CDI (reviewed extensively elsewhere – e.g. Voth & Ballard, 2005; Jank et al., 2007; Lyras et al., 2009). The toxins lead to the development of symptoms associated with CDI, ranging from mild, self-limiting watery diarrhoeal, to mucosal inflammation, high fever and pseudomembranous colitis (Bartlett & Gerding, 2008).

Recently, a new epidemic of C. difficile, associated with the emergence of a single hypervirulent strain of C. difficile characterized as toxinotype III, North American pulsed-field gel electrophoresis type 1 (NAP1), restriction-endonuclease analysis group type BI and PCR-ribotype 027 (Pepin et al., 2005; Green et al., 2007; Marcos & DuPont, 2007), has come to light. The strain carries the binary toxin gene CdtB, and has an 18-base-pair deletion in the toxin repressor gene, tcdC, which means that it generates approximately 16–23 times more toxin than other strains (Warny et al., 2005). Infection is associated with a high risk of acute clinical deterioration and a poor response to metronidazole therapy (Spigaglia & Mastrantonio, 2002; Pepin et al., 2005), making it a major concern for healthcare worldwide. Clostridium difficile ribotype 027 was initially rare in the United Kingdom; however, when outbreaks at Stoke Mandeville and the Royal Devon and Exeter Hospitals were investigated in 2004–2005, type 027 was found to predominate in their cases (Anon, 2006), and this ribotype has now been detected in the majority of countries around the world (Kuijper et al., 2007).

It is clear, then, that C. difficile is a significant burden on the healthcare profession and patients. With the ever-increasing availability of genomic information, however, greater insight into the evolution and variation of C. difficile genomes is now possible (Stabler et al., 2006, 2009; He et al., 2010). The Clostridb database (http://xbase.bham.ac.uk/clostridb/) (Chaudhuri & Pallen, 2006), an excellent publicly accessible resource for those interested in comparative genomics of the genus Clostridium, currently contains genome sequences of 18 strains of clostridia, including two genomes of C. difficile, namely C. difficile 630 and C. difficile qcd32_g58, a representative of the predominant NAP1/BI/027 strain in Quebec (Loo et al., 2005). The 4.29 Mb genome of C. difficile strain 630 and its 7.8 kb plasmid encode a remarkable number of genes associated with resistance to antimicrobial agents, as well as virulence factors, host adherents and surface structures (Sebaihia et al., 2007). Genome sequences have been generated recently for a further six strains, including CD196, an early, nonepidemic, ribotype 027 strain (Stabler et al., 2009), the R20291 isolate responsible for the UK Stoke Mandeville outbreak, and 21 other hypervirulent ribotype 027 strains isolated over the past two decades (He et al., 2010). A further six hypervirulent isolates associated with the Quebec outbreak and a reference ATCC43255 strain are at the draft genome sequence stage (McGill University and Génome Québec Innovation Centre), while the human microbiome project at Baylor College of Medicine has draft genome sequences for two strains (NAP07, NAP08) at the time of writing.

These genomic data, along with recently developed tools for Clostridial functional genomics (Heap et al., 2009), make it possible for researchers to adopt a systems approach to the dissection of the physiology and biochemistry of this pathogen. To meet the challenges of systems biology, there must be a comprehensive analysis of individual organisms, linking data from various genome-wide approaches with that generated from proteomic investigations (Romijn et al., 2003). Our laboratory, among others (Graham et al., 2006a, b, 2007; Beck et al., 2009), has adopted an approach in which fractionation of whole bacterial cell proteomes into subproteomes reduces sample complexity and increases the robustness of protein identifications as the proteome of even a subcellular fraction remains too complex for complete analysis by one dimension of LC-MS (Fang et al., 2010). We have previously characterized the insoluble proteomes of the Gram-positive bacteria Geobacillus thermoleovorans T80 and Oceanobacillus iheyensis HTE831 (Graham et al., 2006b, 2007). These studies have affirmed, postgenomically, the expression within these organisms of the protein machinery that allows cells to interact with their environment, with functions including cell–cell signalling, adhesion and stress response, and have shown that bacteria can express stress-related proteins even under ‘optimal’ laboratory conditions (O'Toole et al., 2010). A number of stress-related proteins, including molecular chaperones, play a role in virulence and adhesion in certain pathogens, including, for example, Helicobacter pylori and Salmonella enterica (Henderson et al., 2006).

The proteomic characterization of bacterial-insoluble subproteomes has been previously proven to be an effective strategy in the generation of important physiological and biochemical information. Therefore, we wished to identify and characterize this fraction of the C. difficile strain 630 proteome. This approach will provide an insight into the metabolic processes of actively growing C. difficile cells and furthermore will complement existing proteomic data sets from spore and cell-wall subfractions from this organism.

Materials and methods

Microorganism and culture conditions

Clostridium difficile strain 630 was a kind gift from Dr Peter Mullany of the Eastman Dental Institute (London, UK) and was routinely maintained on brain–heart infusion (BHI) agar (Oxoid) in a MACS MG500 Anaerobic workstation (Don Whitley Scientific, UK) in an 80 : 10 : 10 atmosphere of N2 : H2 : CO2, at 37 °C. Liquid culture (1 L in glass bottles) was performed in BHI broth (Oxoid) with resazurin (1 mg L−1) added as an anaerobic indicator. Overnight cultures in BHI broth were inoculated with a single colony and used as inocula at 5% (v/v). Culture growth was followed as attenuance (D) at 650 nm vs. uninoculated BHI broth.

Cell harvest and protein extraction

Mid log-phase cells (D650 nm=0.5) were harvested from duplicate 1 L cultures by transferring to two 500 mL centrifuge bottles in the anaerobic cabinet. Bottle lids were screwed down tightly and cells were harvested (9000 g, 10 min, 3–5 °C, Beckman J2-HS centrifuge/JA10 rotor). The supernatant was removed inside the anaerobic cabinet and ice-cold 10 mM phosphate-buffered saline (PBS) (pH 7.8) was added to resuspend the cells; a second centrifugation washed the cells. Cells were resuspended in PBS at a ratio of 1 g cells to 2 mL buffer inside the anaerobic cabinet.

Cell suspension (1 mL) was added to lysing matrixE tubes (MP Biomedicals, UK) inside the cabinet and the cells were lysed mechanically by treatment in a Fastprep150 instrument for 2 min (4 × 30 s treatment, 2-min cooling on ice). Homogenates were first centrifuged at 25 000 g to remove unbroken cells and debris and the resultant supernatants were subsequently ultracentrifuged (150 000 g, 2 h, 3–5 °C, Beckman L8-M centrifuge/70.1 Ti rotor) to pellet the insoluble proteins, following which the supernatant was removed.

The insoluble pellet was resolubilized by gentle sonication in resolubilization buffer (1 mL) as described previously (Graham et al., 2006b), and the protein concentration was measured using the Bradford (1976) assay. Samples were reduced and alkylated before electrophoresis and protein (42 μg) from each duplicate was electrophoresed and stained (Graham et al., 2006b). Lanes were excised from the gel and cut into seven fractions based on molecular mass and an in-gel tryptic digest was carried out as described previously (Graham et al., 2006b).

LC-MS and database searching

LC-MS of peptide samples was performed as described by Graham (2006a, b) using a 60-min nano-LC gradient. Protein identification was carried out using an internal mascot server (version 1.9; Matrix Science, London, UK) searching against a combined C. difficile genomic DNA and plasmid database (Reference sequence NC_0090989 and NC_008226, respectively) downloaded from NCBI (20 June 2007) and containing 3573 sequences in total. Peptide tolerance was set at 1.2 Da with an MS/MS tolerance of 0.6 Da and the search set to allow for one missed tryptic cleavage. To expedite the curation of the identified protein list from mascot, the resultant mascot output files were reanalysed against the extracted C. difficile database using provalt (Weatherly et al., 2005), which takes multiple mascot results and identifies matching peptides. Redundant peptides are removed and related peptides are grouped together, associated with their predicted matching protein. provalt also uses peptide matches from a random database (in this case, the C. difficile database was randomized) to calculate the false-discovery rates (FDR) for protein identifications as described previously by Weatherly (2005). In the current work, FDR was set at 1%; thus, 99% of the proteins identified should be correct.

Results and discussion

Characteristics of the C. difficile-insoluble subproteome

The workflow used in our gel-based analysis firstly isolated the insoluble fraction of the proteome from duplicate C. difficile cultures by ultracentrifugation, yielding a protein concentration of 22.4 mg mL−1. Because of the complex nature of the peptide mixtures being analysed and the chance nature of automated selection of peptides for MS analysis (Graham et al., 2006a, b), the separation capabilities of the LC-MS system can often be exceeded. However, using 1 D SDS PAGE as a prefractionation step yields high-quality and reproducible separation of hydrophobic protein mixtures (Supporting Information, Table S1) and concomitantly reduces sample complexity before the MS analysis (Cottingham, 2010), further aiding proteome coverage. For the C. difficile peptide fractions analysed in this investigation (Fig. 1), the number of unique proteins identified in a sample did not increase significantly after three replicate injections (Fig. S1) and therefore all peptide samples were injected and analysed three separate times to maximize the overall protein identification. Stringent automated curation of the data set using provalt set with a FDR of 1% yielded a total of 560 uniquely identified peptides, corresponding to 107 uniquely identified proteins. The average MOWSE score was 240; the average number of peptides per protein was five and the average protein coverage was 24% (Tables S1 and S2).


Fractionation of the Clostridium difficile strain 630 insoluble subproteome by 1 D SDS-PAGE: correlation of molecular mass of identified proteins with a gel slice.

The proteins identified had widely varying physiochemical characteristics, with the most acidic protein being a conserved hypothetical protein (CD2522; pI 4.57) and the most basic being 50S ribosomal protein L20 (pI 11.48). The lowest molecular mass protein identified was 50S ribosomal protein L36 (Mr 4277 Da) and the highest was a hypothetical protein (CD0590; Mr 197 241 Da). We could functionally categorize all except for three of the proteins identified in this study according to the SubtiList functional category list (Graham et al., 2006a, b, 2007) (Table 1). The largest category of identified proteins was that involved in protein synthesis (45.8%), followed by that involved in the metabolism of amino acids and related molecules (10.3%). Of the three ‘uncategorizable’ proteins identified, those encoded by CD2552 (iojap-like protein) and CD1711 may be part of the bacterial core genome, a concept proposed by Mulkidjanian (2006) and further developed in the recent work of Callister (2008). Homologues of these proteins are also found in other species of saccharolytic and fermentative clostridia, in addition to other known gut bacteria including Roseburia intestinalis and Faecalibacterium prausnitzii (Aminov et al., 2006). The third, CD0590, encodes a conserved hypothetical protein that has an N-terminal Mg2+/GTP-binding motif as identified by blastp analysis. Interestingly, and in contrast to the other two hypothetical proteins identified in this study, CD0590 appears to be absent from all other Clostridia species and indeed yields no significant homology matches with any other organism in the NCBI database. The exception to this appears to be a protein encoded by the adjacent gene, CD0589, which shares significant homology and appears to represent a duplication of the N-terminal Mg2+/GTP-binding region of CD0590. All publicly available C. difficile genomes also appear to contain homologues of both CD0590 and CD0589. As regards a possible function for protein CD0590, O'Connor (2006) reported that CD0589 and CD0590 belonged to an operon consisting of four ORFs, CD0587–CD0590, that were positively regulated by rgaR, a C. difficile protein similar to the VirR toxin gene regulator of C. perfringens. Comparative phylogenomic analysis of C. difficile strains, by Stabler (2009), showed that the deletion of five specific genes, including CD0590, was characteristic of a toxin A−/B+ subclade of C. difficile strains; therefore, it may be hypothesized that the protein encoded by CD0590 is in some way important for toxin A production by C. difficile. However, under the conditions of our study, neither toxin A nor toxin B was detected.

View this table:

Functional categorization of proteins identified within the insoluble subproteome of Clostridium difficile strain 630.

In a previous study of cell-surface proteins (as distinct from the insoluble proteins reported here) from C. difficile, Wright (2005) identified a total of 11 proteins from a glycine extract of whole cells and a further 42 proteins from a lysozyme digest of their peptidoglycan layer, resulting in a total of 47 uniquely identified proteins. It is to be expected that different experimental approaches, including sample types and extraction methods, will lead to the identification of different proteomic data for the same organism. For example, the hypothetical proteins identified by us were distinct from those detected by Lawley (2009) in the C. difficile spore proteome. When we compared data from our current investigation with the previous work of Wright (2005), 20 proteins were common to both studies, 27 were unique to Wright and colleagues and 87 were unique to our work. The larger number of proteins identified by our bottom-up geLC-MS approach confirms that this experimental strategy can yield significant and important biological information to further our understanding of a microorganism.

An important step towards understanding the function of a protein is the determination of its subcellular localization, and in recent years, a number of bioinformatic tools have been developed to assist with this (Emanuelsson et al., 2007). Knowledge of Gram-positive bacterial protein targeting/secretion is essentially restricted to the model organism Bacillus subtlis (Tjalsma et al., 2000, 2004), and indeed, Desvaux (2005) state that protein secretion by clostridia in general is ‘poorly understood’. As the insoluble proteome might be expected to contain proteins associated with, or targeted to, either the cell membrane or the extracellular milieu, and that could thus play a role in virulence, we therefore used psortb (Gardy et al., 2005), signalp (Bendtsen et al., 2004) and secretomep (Bendtsen et al., 2005) to guide our efforts to assign a subcellular location for each protein.

All 107 proteins identified in this study were analysed and assigned a putative or a predicted cellular localization as shown in the workflow depicted in Fig. 2. Within the subset of proteins predicted to be secreted, 23 were identified as possessing an N-terminal signal peptide (Table 2). Three cell wall-associated proteins were identified including two SlpA variants and a recently characterized cysteine protease, Cwp84, which Kirby (2009) have shown is required for maturation of the S-layer, but that is not essential for virulence. Of the two proteins classified as ABC transporters, neither conformed to the expected architecture for such a protein, namely, a leader peptide containing an N- and C-domain completely lacking an intervening hydrophobic domain, in addition to a double-glycine motif N-terminal of the signal peptide cleavage site. All the other ‘transport’ proteins identified contained a significant hydrophobic domain between the N- and the C-domain of the predicted signal peptide, in addition to a number of other motifs usually associated with the twin arginine translocation or Sec secretion pathways. None of the 23 proteins contained any C-terminus cell wall anchor motifs commonly found in Gram-positive bacteria, such as LPxTG, NPQTN or TLxTC (Dramsi et al., 2005; Desvaux et al., 2006).


Bioinformatics workflow for the prediction of Clostridium difficile strain 630 protein subcellular location.

View this table:

Proteins identified within the insoluble subproteome of Clostridium difficile strain 630 with predicted export signals

As in our previous work, we used the pathway reconstruction tool biocyc (Karp et al., 2005) to analyse pathways inferred from our proteomics dataset. The snapshot of C. difficile metabolism presented here reflects the nutritional complexity of BHI broth, which contains glucose, proteose peptone and bovine BHI solids. We could, therefore, reconstruct a number of key central metabolic pathways (Djordjevic et al., 2003) that would be expected to be active in clostridial cells including glycolysis, mixed acid fermentation and fermentation of amino acids (Gottschalk, 1979) (see Figs. S1-S3). The metabolic processes we have identified in C. difficile are, therefore, broadly similar to those described in a recent proteomic investigation of the Gram-negative gut anaerobe, Fusobacterium varium. Potrykus (2008) report that F. varium may play both beneficial and pathogenic roles in the human gut. While the antics of C. difficile left unchecked have given it a deservedly bad reputation (Heap et al., 2009), its ability to produce butyrate (Fig. S3), as is known to occur in F. varium, could mean that in asymptomatic carriers of C. difficile, the organism has the potential to contribute to colonocyte health. Such a counterintuitive hypothesis highlights the need, not only from a basic science perspective but also from a position of concern for public health, to know the frequency of asymptomatic C. difficile carriers within the general population: therefore, we see an urgent requirement to develop a better understanding of C. difficile biology within the human microbiome.

The pathogenicity of C. difficile is dependent on a combination of toxin synthesis, p-cresol production and a diverse range of amino acid fermentations (Kim et al., 2008). Leucine is reported to be indispensible for the growth of this organism and may be metabolized by a reductive pathway, to isocaproate, or by means of an alternative oxidative pathway in which isovalerate and ammonia are produced. Thus, unlike the typical Stickland reaction, here, leucine may serve both as an oxidant and as a reductant (Kim et al., 2006). In the present study, we identified seven of the eight proteins necessary for the reductive branch of the leucine fermentation pathway (Fig. 3), with the sole exception of the ATP-dependent activator protein, HadI (Kim et al., 2005). While leucine fermentation is of fundamental importance to C. difficile growth and pathogenesis, the pathway is also of significant scientific interest as it involves a novel mechanism to generate the necessary radicals for the dehydration of 2-hydroxyisocaproyl-CoA to 2-isocaprenoyl-CoA, which does not depend on the typical radical generators such as oxygen, coenzyme B12 or S-adenosyl methionine (Kim et al., 2008). Clostridia are hypothesized to have emerged some 2.34 billion years ago and C. difficile between 1.1 and 85 million years ago (He et al., 2010), thus supporting the hypothesis put forward by Kim (2008) that these reactions, which proceed via a novel allylic ketyl radical intermediate, represent an evolutionarily ancient means for radical formation in bacteria. Given the organismal and scientific importance of this pathway and our success in the identification of the majority of its proteins, it should be possible, in conjunction with other ‘omic technologies, to develop a model for leucine metabolism within C. difficile. This would represent one step towards the development of a systems understanding of this microorganism.


Genomic context of genes in Clostridium difficile strain 630 encoding the protein machinery of the reductive branch of the leucine fermentation pathway.

Concluding remarks

In this study, our GeLC-MS proteomics approach identified C. difficile 630 proteins expressed during mid-log phase growth in BHI broth. Therefore, this extends the proteomics information for C. difficile, allowing the reconstruction of several central metabolic pathways, including the reductive branch of the leucine fermentation pathway. The Clostridial research community is in a position now wherein the increasing availability of genomic, transcriptomic and proteomic information for C. difficile should enable the generation of datasets that are sufficiently robust to enable systems biologists to develop metabolic models for this clinically important microorganism. This should allow predictions to be made regarding the roles and expression of key virulence determinants and lead to the rapid identification of cellular targets for therapeutic purposes.

Supporting Information

Additional Supporting Information may be found in the online version of this article:

Appendix S1. Overview of, and commentary on metabolic pathways active in Clostridium difficile strain 630.

Fig. S1. Number of unique Clostridium difficile strain 630 proteins identified in a mixed protein sample with repeated injection to LC-MS.

Fig. S2. Glycolysis and pentose phosphate pathway: showing proteins (boxed) identified in this investigation.

Fig. S3. Mixed acid fermentation: showing proteins (boxed) identified in this investigation.

Fig. S4. GABA metabolism: showing proteins (boxed) identified in this investigation.

Table S1. Excel Spreadsheet with details of all proteins identified in this investigation, including molecular mass, pI, mowse score, signal peptide analysis etc.

Table S2. provalt html output file with details of all peptides identified for each protein in this investigation, including number of spectra, sequences, mowse scores, % coverage, etc.

Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.


  • Editor: André Klier

  • Present address: Robert L.J. Graham, The Proteome Exploration Laboratory, California Institute of Technology, Beckman Institute, Pasadena, CA 91125, USA.


View Abstract