OUP user menu

Lateral gene transfer of O1 serogroup encoding genes of Vibrio cholerae

Sol González Fraga, Mariana Pichel, Norma Binsztein, Judith A. Johnson, John Glenn Morris Jr, Oscar Colin Stine
DOI: http://dx.doi.org/10.1111/j.1574-6968.2008.01251.x 32-38 First published online: 1 September 2008


In Gram-negative bacteria, the O-antigen-encoding genes may be transferred between lineages, although mechanisms are not fully understood. To assess possible lateral gene transfer (LGT), 21 Argentinean Vibrio cholerae O-group 1 (O1) isolates were examined using multilocus sequence typing (MLST) to determine the genetic relatedness of housekeeping genes and genes from the O1 gene cluster. MSLT analysis revealed that 4.4% of the nucleotides in the seven housekeeping loci were variable, with six distinct genetic lineages identified among O1 isolates. In contrast, MLST analysis of the eight loci from the O1 serogroup region revealed that 0.24% of the 4943 nucleotides were variable. A putative breakpoint was identified in the JUMPstart sequence. Nine conserved nucleotides differed by a single nucleotide from a DNA uptake signal sequence (USS) also found in Pastuerellaceae. Our data indicate that genes in the O1 biogenesis region are closely related even in distinct genetic lineages, indicative of LGT, with a putative DNA USS identified at the defined boundary for the DNA exchange.

  • Vibrio cholerae
  • lateral gene transfer
  • multilocus sequence typing


The extreme polymorphism of the O-antigen among Gram-negative bacteria is evidence for strong diversifying selection. Such strong selection may select for mechanisms of lateral gene transfer (LGT) that could increase the diversity. The majority of bacterial genes are transmitted vertically from mother to daughter cells. Occasionally genes are found in genome sequences that are genetically more similar to a gene from a distant genus or species than those sequences from closer species and genera. These latter genes are assumed to have resulted from LGT. While we assume that LGT is the result of conjugation, transformation or phage transduction, the rarity of LGT events makes them difficult to study systematically. One foray into the systematic uptake of DNA is to identify DNA uptake signal sequences (USSs), such as the one identified in Pasteurellaceae that is overrepresented and favors transformation in competent cells (Bakkali et al., 2004; Redfield et al., 2006). Another method to systematically examine LGT is to study the phylogenetic relatedness of housekeeping genes representing vertical transmission as compared with other loci that may undergo LGT. O-antigen genes have been previously associated to horizontal transfer events (both intra and interspecies) in other Gram-negative bacteria (Reeves, 1993; Pacinelli et al., 2002; Samuel et al., 2004) and therefore are a good choice for this approach.

A change in serogroup may have a selective advantage, permitting the pathogen to escape immune detection. In Vibrio cholerae, several distinct O-antigens (O1, O37, O139, O27, O53 and O65) have been found among the genetically related strains that have been associated with epidemic disease and that carry the gene for cholera toxin (Bik et al., 1995; Comstock et al., 1996; Stroeher et al., 1997; Beltran et al., 1999; Li et al., 2002). The genes at the ends of the transferred segment have been suggested to be regions of enhanced recombination, although the breakpoints were not precisely defined (Bik et al., 1995; Li et al., 2002; Blokesch & Schoolnik, 2007). Others have suggested an increased level of recombination in conjunction with a possible Chi site in Escherichia coli and Klebsiella (Sugiyama et al., 1997) and with the JUMPstart sequence, so named because it was Just Upstream of Many Polysaccharide regions and postulated to be involved in regulation and movement (Hobbs & Reeves, 1994).

A third method to examine systematic LGT is to examine multiple isolates with the same serogroup from distinct genetic lineages. This method may be able to define the breakpoint, if the mobile group is sufficiently conserved. In Argentina, different V. cholerae O1 strains have been recovered from patients and the environment. First, O1 El Tor isolates were recovered from many patients during the cholera outbreaks in the 1990s. These isolates were identified as representatives of the typical epidemic South American strain by the presence of the virulence genes ctxA (subunit A of the cholera toxin) and tcpA El Tor (TCP colonization factor), and a pulse field gel electrophoresis (PFGE) pattern similar to other Latin American isolates (Pichel et al., 2003). Second, the O1 Tucumán variant isolates were recovered from patients with moderate to severe cases of diarrhoea. These isolates did not have ctxA and its NotI-PFGE pattern differed from the epidemic clone (Pichel et al., 2003). Third, another clinical strain, recovered in 2005, was ctxA+ but had a unique PFGE pattern. In addition, several human and environmental O1 isolates with diverse PFGE patterns have been found (unpublished).

In this paper, the genetic relatedness of housekeeping genes and genes within the O1 biogenesis region for isolates of V. cholerae were compared. As evidence of LGT events was found, the junction regions of the O biogenesis genes were examined for evidence of a mechanism of LGT.

Materials and methods

Bacterial strains

Twenty-eight V. cholerae isolates from Argentina were selected for this study (Table 1), including epidemic El Tor isolates, the Tucumán Variant isolates, strain CH2283, and other isolates including O1, O139 and non-O1 strains. In addition, strain NRT-36S (Chen et al., 2007) (Table 1), isolated from a Japanese patient with travelers' diarrhoea, was included. The isolates were grown on thiosulfate–citrate–bile-salt–sucrose plates and colonies were transferred to Luria–Bertani broth. DNA was prepared from overnight cultures using PrepMan Ultra (Applied Biosystems Inc., Foster City, CA).

View this table:
Table 1

Selection of Vibrio cholerae isolates and their housekeeping alleles; sequence type (ST) and clonal complex (CC)

IsolateIsolation yearProvince of isolationSourceSerogroupctxAtcpA El TordnaElaprecApgmgyrBcatchiSTCC
SF19021998Santa FeHumanO1++1111111641
SF3661999Santa FeHumanO1++1111111641
SJ179W1993San JuanEnvironmentalO1132551381197
BA3121994Buenos AiresHumanND1123617106144
  • * ND: grouped as non-O1 non-O139.

Sequence typing

First, seven housekeeping genes were selected for multilocus sequence typing (MLST): gyrB, pgm, dnaE, cat, lap, chi and recA. The primers were described previously (Garg et al., 2003). Second, six pairs of primers amplified overlapping fragments from the wbe (O-antigen) region of 21 V. cholerae O1 isolates extending into parts of the junction genes gmhD and rjg (Comstock et al., 1996) (Table 2). These primers amplified c. 4-kb fragments (Fig. 2a) and were used to sequence the ends producing partial sequence data for manB, wzm, wzt, wbeN, the IS element, wbeP, wbeT and wbeV. The products were amplified by PCR using 2 mM MgCl2 at 50 °C as the annealing temperature for the housekeeping genes and as in Table 2 for the O1 region fragments, with 30-s cycles. The presence of amplified products was confirmed on 1% agarose gels in Tris-borate-EDTA buffer. Purification of the products was performed using Millipore filters. The purified PCR products were sequenced in both directions with the same primers used for amplification. Amplified fragments were sequenced using Big Dye cycle sequencing kit (ABI) in accordance with the manufacturer's instructions. The fluorescently labeled products were separated and detected using an ABI 3730xl Automatic Sequencer (ABI). The trace files were read by phred (Ewing & Green, 1998; Ewing et al., 1998) and assembled by phrap (available at http://www.washington.edu). Low-quality sequence at the ends was trimmed, and the contigs from each individual isolate were aligned using clustalx (Jeanmougin et al., 1998). Variable nucleotides were identified and confirmed manually using analyseclustal5%, a program that calculates distance matrices (available on request from Dr Stine). When assigning allele type numbers, if the sequence of an allele in our study was identical to a previously reported allele sequence (Garg et al., 2003; Lee et al., 2006), the same number was designated. The sequences of new housekeeping alleles were deposited in GenBank (EU101394EU101455); new alleles from the O1 region (different from the published N16961 genome) were deposited as gapped sequences of the wbe operon (EU169449–EU169455).

View this table:
Table 2

Primers and PCR conditions used to amplify and sequence the O1 antigen synthesis gene cluster from its junction region

PrimerSequence[Mg] (mM)Ta (°C), text (min)Size (bp)Reference
manB-FGAATTTGAAGGATGGCGT2.555, 44167This study
wzm-FTCGGAACGTGCTGATTCTTC1.7552, 44224This study
wbeN-FGCTCGCCCTTTTGAATTATC250, 44210This study
IS-FTGGACATGCTTCACGACTTC250, 44058–2800This study
  • [Mg], Mg concentration; Ta, annealing temperature; text, extension time.

Figure 2

Analyses of O1 antigen synthesis gene cluster. (a) Pictogram of the entire region. Genes are represented by thick arrows on the line. The parallel lines with double head arrows pointed inwards represent the 3–4-kb amplified fragments. The ends of each fragment were sequenced and labeled s1–s10. (b) Graph of the number of variable nucleotides per 100 bases for s1, the junction region at the gmhD end. The arrow identifies the putative breakpoint. The percentages refer to the proportion of variable nucleotides on either side of the putative break point. (c) Clustal alignment of the JUMPstart region from O1, O5, O8, O31, O37 and O108. The asterisks indicate nucleotides that are invariant in all 11 species. The arrow indicates the putative breakpoint. The putative DNA uptake signal sequence USS is identified.

Statistical analysis

The Mann–Whitney two-tail test (Sokal & Rohlf, 1981) was used to evaluate the significance of the difference of variable nucleotides in housekeeping vs. O-antigen synthesis genes. Pearson's χ2 test (Sokal & Rohlf, 1981) was used to evaluate the significance of the difference of allelic changes in housekeeping vs. O-antigen synthesis genes.

Results and discussion

Housekeeping genes

MLST showed that all of the loci were variable and 18 sequence types were identified in the 28 studied isolates (Table 1). The percentage of variable nucleotides among the alleles of housekeeping genes ranged from 2% (lap) to 10% (cat) (Table 3). We identified four genetically related groups, referred to as a clonal complex, each of which was composed of isolates that shared at least five alleles (Table 1) and eight singletons, i.e. isolates unrelated to any of the others. One group consisted of nine isolates: six clinical and an environmental one identified as O1 El Tor, the ctxA+ CH2283 with a distinct PFGE type, and a non-O1, non-O139 isolate, Me160. When these Latin American El Tor isolates were compared with O139 and O1 isolates from India and Mozambique (Garg et al., 2003; Lee et al., 2006), they all shared at least five alleles and thus were members of the same group. The second group in our study contained seven clinical and one environmental O1 Tucumán Variant. The third group contained two ctxA- clinical isolates, T1437 and T717. The fourth contained BA312 from Argentina and NRT-36S from Japan. Three of the other eight isolates with unique sequence types that were unrelated to any others were serogroup O1: T2001, RB1 and SJ179W (Table 1). Thus, there were six genetically distinct lineages among the O1 isolates.

View this table:
Table 3

Number and % of variable nucleotides (nt) in housekeeping genes among O1 isolates

Variable nt1022179123251
Total nt333524498458500644509
% Variable34.23.422.4510
Range nt differences between alleles1–92–156–91–54–111–242–39

The genetic relatedness of the O1 isolates is diagrammed in Fig. 1 (top portion). Each sequence type is represented by a circle, with its number from Table 1 on it. The isolate name or the clonal complex designation appears between the circles and the rectangles. The distance between each sequence type is measured by the number of allelic changes, following an earlier example (Salim et al., 2005). Each node indicates a change in the allele. The alleles differ by at least one and up to 39 nucleotides (Table 3). This representation of the genetic relatedness does not distinguish between point mutations and recombination events that are known to alter alleles in V. cholerae (Farfán et al., 2002; Garg et al., 2003; Salim et al., 2005).

Figure 1

The genetic relatedness of housekeeping genes and genes from the O1 region. Housekeeping genes are diagrammed in the upper portion, while O1 genes are diagrammed in the bottom half. Each housekeeping-gene-based sequence type is represented by a circle numbered according to the sequence type in Table 1. The names between the circles and the rectangles identify the clonal complex or the isolate name. Each distinct type of O1 region is represented by a rectangle. The distance between each sequence type is measured by the number of allelic changes. Each node in the connecting line segments indicates a change in an allele.

O1 antigen synthesis gene cluster

DNA amplified from six pairs of primers produced fragments of the expected size (Fig. 2a), implying that the organization of the wbe loci was the same as published (Stroeher et al., 1992; Heidelberg et al., 2000). The region amplified by primers IS-F and wbeV-R containing a transposon produced fragments of two expected sizes, 4 and 2.8 bp, with and without the two subunits of the transposase OrfAB. When the ends of the amplified fragments were sequenced (sequences s2–s9), little divergence was observed in the 21 V. cholerae O1 isolates. Sequences s3 and s4 were identical for all isolates and the other six sequences differed in four or less nucleotides in the c. 600 bases analyzed, with only one mutation that produced an amino acid substitution. The proportion of variable nucleotides ranged from 0% (wzm and wzt) to 0.44% (IS element-wbeP) (Table 4).

View this table:
Table 4

Number and % of variable nucleotides (nt) in the O1 biogenesis region

SequencemanB (s2)wzm (s3)wzt (s4)wbeN (s5)ISR-wbeP (s6)ISF-wbeT (s7)wbeVR (s8)wbeVF (s9)
Variable nt10013124
Total nt575563655595680615600660
% Variable0.19000.170.440.160.330.61
Range nt differences between alleles10011–211–21–4

The O1 antigen regions are very similar. Two loci, wzm and wzt, showed no variation among the O1 isolates. All isolates had at least two identical alleles among the eight loci in the region. There were a maximum of four alleles at any locus and most alleles differed from the others by a single mutation. The bottom portion of Fig. 1 shows the genetic relatedness of the O1 regions. Each distinct type of O1 region is represented by a rectangle. The allelic differences between the genetic types are represented by a node in the connecting line segments.

Comparison of housekeeping genes to O1 antigen synthesis gene cluster

Among O1 serogroup V. cholerae isolates, the housekeeping genes were significantly more variable than the O1-antigen-encoding region genes. There were five to six alleles at each housekeeping locus, while there were only one to four alleles in the O1 region loci. In the housekeeping genes, the total number of nucleotides was 3466 of which 153 were variable on an average across all seven loci of 4.4%. In contrast, the O1 region genes were very similar. The total number of nucleotides was 4943 of which only 12 (0.24%) were variable (P=0.0003, Mann–Whitney). An alternative method of determining relatedness is to count the number of allelic changes. Among the housekeeping genes, there were 35 allelic changes (Fig. 1), while among the O1 region genes, there were 18 changes, which is significantly fewer (χ2=5.44, P=0.019). Because the O1 biogenesis genes were more closely related than the housekeeping genes, they are inferred to have moved from one genetic lineage to the next via LGT.

Junction regions

The sequences from the junction regions starting in gmhD and rjg had more variable nucleotides than loci in the O1 region. Despite sequencing these regions in all 21 V. cholerae O1 isolates representing six genetic distinct lineages, there were only three distinct alleles at each end. There were 15 variable bases (2% of 740) in the rjg end (s10) with the nucleotides evenly distributed across the region. Thus no breakpoint could be identified. Although the gmhD end (s1) had a similar percentage 3% (24/790) of variable bases, the distribution of variable nucleotides was not uniform (Fig. 2b). Starting in gmhD, the first 400 bases had 5.75% (23/400) variable bases, a value similar to the housekeeping genes (4.4%), while the next 390 bases had 0.25% (one) variable bases, consistent with the average 0.24% in the O1 region. A putative breakpoint was identified as the last variable base in the first 400 bases of the sequence. Next to this base was the conserved sequence AAGGGCGGTAGC in all 21 V. cholerae O1 isolates. These bases are part of the JUMPstart region. The last variable base is just 3′ of the stem loop region shown to be involved in transcriptional regulation (Leeds & Welch, 1997). These bases are conserved in other V. cholerae JUMPstart regions, e.g. O37, O31, O5, O8 and O108 (Fig. 2c).

At the 5′ end of the 12 conserved bases at the putative junction point for the O1 sequences were the nine bases AAGGGCGGT. These were similar to the experimentally identified DNA USS from Haemophilus influenzae (Bakkali et al., 2004). The H. influenzae sequence is AAGcGCGGT, differing at one base identified by the small letter c in the center. We propose these sequences have related, but not identical, functions. In V. cholerae of the Vibrionaceae, there is specificity for O-antigen DNA uptake driven by the selection for novel serogroups, while in H. influenzae of the neighboring Pastuerellaceae family, the USS is a generalized DNA uptake signal. Recent work has suggested that competence existed in the ancestor of all Pastuerellaceae (Redfield et al., 2006) and many of the genes are also found in V. cholerae (Meibom et al., 2005), thus leading to the conclusion that competence is the ancestral state. The USS in the H. influenzae Rd genome has 1471 copies, far more than the number expected by chance (Smith et al., 1999). In contrast, V. cholerae N16961 has only 23 copies of AAGGGCGGT, about the expected number (25) of a nonamer with its base composition. This difference is consistent with its constitutive expression of competence in H. influenzae, while in V. cholerae competence is tightly regulated, being dependent on the presence of chitin, high density and stress (Meibom et al., 2005). These latter two conditions are also the conditions in which frequency-dependent selection (the type that favors novel serotypes) is strongest. Of note, the putative V. cholerae DNA USS was in the region identified as the JUMPstart sequence conjectured to be involved in LGT (Hobbs & Reeves, 1994), and when O-antigen genes switching was demonstrated to occur between strains of V. cholerae in the laboratory, the left junction occurred within or upstream of gmhD based on microarray data (Blokesch & Schoolnik, 2007). Our sequence data, more precise than microarray data, and the similarity to the USS provide circumstantial evidence that part of the JUMPstart sequence is involved in LGT. The membrane receptors and the mechanism by which the DNA is integrated into the chromosome are unknown. In summary, our data and analyses are consistent with a possible mechanism that significantly increases the rate of LGT of O-antigen genes, with the conclusion that the genes encoding the O1 serogroup have undergone numerous LGT events.


This work was supported in part by an ASM Fellowship for Latin America granted to S.G.-F. and by the grant PICTR2000-00010 from the Agencia Nacional de Promoción Científica y Tecnológica, Argentina.

We gratefully acknowledge the Argentinean Laboratory Network for Cholera and Diarrhoeal Diseases for sending the V. cholerae isolates.


  • Editor: Mark Schembri


View Abstract