OUP user menu

Epidemiological study of Vibrio cholerae using variable number of tandem repeats

Raikamal Ghosh, G. Balakrish Nair, Li Tang, J. Glenn Morris, Naresh C. Sharma, Mamatha Ballal, Pallavi Garg, Thandavarayan Ramamurthy, O. Colin Stine
DOI: http://dx.doi.org/10.1111/j.1574-6968.2008.01352.x 196-201 First published online: 1 November 2008


By conventional genetic methods, including pulse-field gel electrophoresis and multilocus sequence typing, most pathogenic, cholera toxin-positive O1 and O139 isolates of Vibrio cholerae cannot be distinguished. We evaluated relationships among 173 V. cholerae isolates collected between 1992 and 2007 from different geographic areas in India by analyzing five variable number of tandem repeat (VNTR) loci. Each VNTR locus was highly variable, with between 5 and 19 alleles. eburst analysis revealed four large groups of genetically related isolates. Two groups contained genotypes of isolates with the O139 serogroup (which emerged for the first time in epidemic form in 1992), with the other two groups containing O1 strains. In subsequent analysis, it was possible to track the spread of specific genotypes across time and space. Our data highlight the utility of the methodology as an epidemiologic tool for assessing spread of isolates in both epidemic and endemic settings.

  • VNTR
  • evolution
  • Vibrio cholerae
  • epidemiology
  • MLVA


The ability to distinguish rigorously within and among various species of pathogenic bacteria is critical for evaluating hypotheses concerning the epidemiology of the diseases they cause. Therefore, when a new method for rigorously differentiating pathogenic bacteria, for example, characterizing their variable number of tandem repeat (VNTR) loci, becomes available, it is worthwhile to determine whether it may be useful during epidemiological studies. Toxigenic strains of Vibrio cholerae cause cholera, a major cause of illness in the ‘third world.’ The WHO recently reported (http://www.who.int/mediacentre/factsheets/fs107/en/index.html) that 236 896 cases of cholera occurred in 52 countries during 2006, a 79% increase over the number of cases that occurred in 2005. Actually, the number of cases during 2006 is probably at least one to two orders of magnitude higher because several countries where cholera is known to be endemic did not report cases of that disease to the WHO. In addition to endemic cases, V. cholerae has caused seven recognized pandemics since 1818, with the sixth and seventh pandemics caused by cholera toxin-positive (ctx+) isolates with the O1 serogroup. In 1992, a major epidemic of cholera caused by a new serogroup, O139, was recognized in the Indian subcontinent (Nair et al., 1994; Kaper et al., 1995).

Most ctx+ O1 and O139 isolates are genetically the same by (1) restriction fragment length polymorphism analysis of rRNA genes (Faruque et al., 2000) and the CTX element (Basu et al., 2000; Mukhopadhyay et al., 2001), (2) pulsed-field gel electrophoresis (PFGE) of genomic restriction fragments (Kurazono et al., 1996; Basu et al., 2000), (3) determination of the nucleotide sequence of recA (Stine et al., 2000) and (4) multilocus sequence typing (MLST) (Farfan et al., 2002; Garg et al., 2003; Salim et al., 2005). Most recently, 17 VNTR loci that could be used to differentiate strains undistinguishable by PFGE were identified, and two of the loci were demonstrated to be stable during serial passaging under culture conditions (Danin-Poleg et al., 2007). Also, studies of a second group of VNTR loci revealed that environmental and clinical isolates from distinct villages in Bangladesh could be differentiated by analyzing five VNTR loci (Stine et al., 2008).

In the current study, we applied the VNTR typing methodology to a large group of V. cholerae isolates from India, collected from different geographic areas and at different times, and including both ‘new’ O139 and ‘old’ O1 isolates. We sought to assess the variability of VNTR loci in this diverse isolate population, and to evaluate the potential utility of the technique in defining relationships among isolates from different locations, times and serogroups.

Materials and methods

Bacterial isolates

The study included three groups of isolates. The first group consists of 56 serogroup O1 isolates from 15 cities throughout India. These were collected between 2004 and 2007 and sent to the National Institute of Cholera and Enteric Diseases (NICED) in Kolkata. The second group contains 25 serogroup O139 isolates collected between 2001 and 2006 in Delhi. The third group consists of 92 serogroup O139 isolates from Kolkata, collected between 1992 and 2000 at the Infectious Disease Hospital, Kolkata, as part of a systematic, 2% sampling program. Vibrio cholerae isolates were picked from thiosulfate–citrate–bile salts–sucrose agar (Eiken, Tokyo, Japan) inoculated with stool samples. All isolates were examined for the oxidase reaction, and the identity of V. cholerae O1 and O139 was confirmed by serotyping with polyvalent O1 and monospecific Inaba, Ogawa and O139 antisera raised at the NICED (Mukhopadyay et al., 1995). Vibrio cholerae strains were examined for sensitivity/resistance to ampicillin (10 mg), chloramphenicol (30 mg), cotrimoxazole (25 mg), ciprofloxacin (5 mg), furazolidone (100 mg), gentamicin (10 mg), neomycin (30 mg), nalidixic acid (30 mg), norfloxacin (10 mg), streptomycin (10 mg) and tetracycline (30 mg), with commercial disks (Hi Media, Bombay, India). Characterization of strains as sensitive, intermediately resistant, or resistant was based on the size of the growth inhibition zones around each disk, according to the manufacturer's instructions, which matched the interpretive criteria recommended by the WHO (World Health Organization, 1993). PCR-based tests for tcpA, ctxA, ctxB and rstR were performed as described previously (Basu et al., 2000).


DNA was prepared, from overnight cultures, using PrepMan Ultra (ABI). Each locus was amplified by PCR using VNTR-specific primers (Table 1) selected from the sequence of strain N16961 (GenBank NC002505 and NC002506). VNTR loci were identified by the gene in which they occur: VC0147, VC0436-7 (intergenic), VC1650, VCA0171 and VCA0283. The first and last loci were previously demonstrated to be stable during serial passaging (Danin-Poleg et al., 2007). Presence of amplified products was confirmed by agarose gel electrophoresis, and then the purified products were sequenced using the same primers used for amplification by the Big Dye cycle sequencing kit (ABI), in accordance with the manufacturer's instructions. Fluorescently labeled products were separated and detected with a model 3730xl Automatic Sequencer (ABI). Trace files were read using sequencher (AGCT Gene Codes, Ann Arbor, MI). The number of repeats was determined using the program tandem repeat finder (Benson, 1999) (http://tandem.bu.edu/trf/trf.html), which permits multiple fasta formatted sequences to be submitted and accurately determines the number of repeats in each of the sequences. Alleles were distinguished from each other by the number of tandem repeats. Each isolate was assigned an allele (designated by a number) for each locus. The alleles at the five loci we examined were placed in order, to generate a genotype. For example, genotype 2,1,7,20,20 indicates that the isolate has alleles 2, 1, 7, 20, and 20 at loci 1, 2, 3, 4 and 5, respectively. The allele assignments were consistent with our previous work in Bangladesh (Stine et al., 2008). The isolates' genetic relatedness was assessed with the eburst (http://eburst.mlst.net/) (Feil et al., 2004) and splitstree (http://www.splitstree.org/) (Huson & Bryant, 2006) programs.

View this table:
Table 1

Primers and characteristics of VNTR loci

LocusSequenceMotif size (bp)MotifRange

Results and discussion

Genetic variation

We observed extensive genetic variation among the five loci we examined, i.e. 6, 5, 12, 19 and 19 alleles were detected in loci VC0147, VC436-7 (intergenic), VC1650, VCA0171 and VCA0283, respectively. The number of repeat units ranged from 2 to 29 and the number for each observed allele is listed (Supporting Information Table S1). When the assigned alleles at each locus were collected in order for each isolate, there were 105 genotypes among the 173 strains. Isolates with the same genotype often did not have the same antibiogram (Table S2), which is consistent with previous observations that antibiotic resistance genes are carried on mobile elements (Garg et al., 2000).

Genetic relatedness of isolates

Genetic relatedness of all 105 genotypes of the 173 isolates was determined using eburst. Genotypes were defined as members of a genetic group or clonal complex, when the genotypes were related to each other by an allelic change at a single locus. This definition yielded four large groups containing 44, 22, 13 and 7 genotypes (Fig. 1); three small groups containing 2, 2 and 3 genotypes; and 12 singletons or genotypes that were unrelated to other genotypes, i.e. they differed at two or more loci from all other genotypes. Three of the 12 singleton genotypes occurred in multiple isolates from the same city. Two of the large groups were composed predominantly of O139 isolates, while the other two contained only O1 isolates.

Figure 1

Diagrams of genetic relatedness of O139 and 01 Vibrio cholerae from India. Genetic relatedness was determined using eburst. Each genotype is represented by a node in the diagram. For Groups 1 and 2, the year(s) of isolation of each genotype is given; isolates from Delhi are identified with a ‘D’, those from Kolkata do not have letter. For groups 3 and 4, the location in India is given.

Group 1 in Fig. 1 contained 44 genotypes representing 82 isolates, 80 with serogroup O139 and two with O1. The founder determined by eburst was a genotype identified in isolates collected during 1992, 1993 and 1994 in Kolkata. The 10 genotypes radiating from the founder were collected in Kolkata from 1992 through 1995. Two of these genotypes further differentiated; the genotypes connected to these two were found in isolates collected through 1997 in one case and through 1999 in the other. Further differentiation occurs in a double-locus variant from the founder; among these isolates are the ones collected in Kolkata from 1993 through 1998 and the ones collected in Delhi from 2001 through 2006. Our data may be interpreted as indicating the differentiation of a founder genotype to a more differentiated present population consistent with the existing epidemiological data (Nair et al., 1994), PFGE (Kurazono et al., 1996; Basu et al., 2000) and MLST data (Garg et al., 2003) that indicate a single origin for all O139 isolates. The two O1 isolates were considered to be arising by the convergence of their genotypes to one related to the O139 genotypes. The genotypes of the O139 isolates had different alleles at two or more loci to 55 of the 57 O1 isolates.

Group 2 in Fig. 1 was composed of 25 serogroup O139 isolates. The founding genotype, determined by eburst, was one recovered in 1999 in Kolkata, although the oldest isolate was recovered in 1997. Group 2 may be related to Group 1, because a single intermediary genotype could connect the two groups by single-locus variants. Of note, at the VC0436-7 locus, all of the O139 genotypes have allele 1, as do all of the O139 genotypes found in Bangladesh (Stine et al., 2008). In contrast, at VC0147, Group 2 differs from Group 1 in that 12 of 13 Group 2 genotypes have allele 4 as compared with Group 1 in which none of the 44 genotypes have allele 4, and at VCA0282, Group 2 genotypes have 5, 23 or 6 alleles in 11 of 13 genotypes, while Group 1 genotypes have a single instance of the 6 allele.

The genotypes in the other two large genetic groups were found only in O1 isolates. Group 3 (Fig. 1) consisted of 22 genotypes accounting for 30 isolates that were collected from patients in 15 cities spread over the entire subcontinent (Fig. 2, black dots). Group 4 (Fig. 1) contained seven genotypes among 11 isolates that were collected from five western, eastern and southern cities (Fig. 2, squares).

Figure 2

Map of India indicating the source of genetically related O1 isolates. The source of genetically related isolates from the same clone complex is indicated by the same symbol: either a circle or a square.

Local variation among serogroup O139 isolates in Delhi and Kolkata

Because VNTR loci differ by the gain or loss of repeat, it is likely that the specific number of repeats in an allele may be generated more than once at a locus. The mathematical algorithm underlying eburst does not allow for this possibility; however, network analysis can accommodate multiple possible pathways for the generation of a genotype. Based on our observation in the eburst analysis that the genotypes appear to have evolved, we elected to include the date of isolation in our network analysis. We also separated the O139 isolates from Delhi and Kolkata because they separate based on both geography and date of isolation. The O139 Delhi isolates were related to each other, 18 isolates had genotypes that were either identical or differed by only a single locus from another O139 isolate from Delhi. In Fig. 3a, isolates collected from different years that presented the same genotype are connected by double lines, while isolates with genotypes that differed at a single locus are connected by a single line, all possible connections were made. The network analysis revealed that the isolates from each year were related to each other by a change at a single locus, and those from subsequent years were related by changes at one or more loci. The network analysis preserves the ambiguity inherent with the possibility of generating the same allele multiple times. For example, in Fig. 3a, the top four isolates 2,1,7,25,22; 2,1,7,3,22; 3,1,7,25,22 and 3,1,7,3,22 have all combinations of alleles 2 and 3 at locus 1, and of alleles 25 and 3 at locus 4. A tree analysis, whether eburst or a traditional bifurcating tree algorithm, must select one of the four connections to remove (it will not change the length of the tree) without an a priori reason for the choice. The ambiguity cannot be resolved with the current data.

Figure 3

(a) Pictogram of a network of genetically related of O139 Vibrio cholerae genotypes from Delhi (2001–2006). Each of the 11 genotypes is identified by the alleles at each of the five loci. For those genotypes observed more than once during a single year, the number of isolates is identified by ‘n’ beneath the genotype. The genetic relatedness of the genotypes is indicated by (1) double lines, if the identical genotype was collected in different years and (2) an arrow, if the genotypes differ by a mutation at a single locus. (b) Pictogram of a network of O139 V. cholerae genotypes from Kolkata (1992–2000).

Seven other unrelated O139 genotypes or singletons differing at two or more loci were identified in Delhi. These seven genotypes were not randomly distributed during the years they were obtained. The collection contained eight isolates from 2001, 2004 and 2006, but six singletons were identified in 2001, one in 2004 and none in 2006.

Figure 3b diagrams the genetic relatedness of 80 O139 isolates from Kolkata. Isolates with the same genotype occurred in a temporally consistent manner. Genotypes that were seen more than once were clustered by the year or years in which they were observed (Table 2). Sixteen genotypes were found more than once, 14 of 16 were observed multiple times in a single year and nine were found in strains collected during two or more years. In 10 of 12 instances, the same genotype was observed in sequential years. One exception was 5,1,7,22,12, which was detected in strains isolated during 1993, 1994, 1995 and 1997, and the other was 2,1,7,2,12 detected in 1996 and 1998. These two were the only genotypes whose occurrence was not repeated within a year or during sequential years. Their absence in the intervening may be a result of our small sample size (n<14 per year). Singletons were relatively unrelated and found in every year except 1992.

View this table:
Table 2

Occurrence of genotypes among serogroup O139 isolates in Kolkata


Widespread geographical variation of VNTR loci

As was expected based on the known epidemiology for the O139 isolates, the isolates in the four large genetic groups appear to be genetically related isolates rather than isolates that possess similar genotypes simply by chance. The probability that two isolates have identical genotypes solely by chance may be computed based on the number of alleles at each locus. If the alleles occur at random, the chance that two isolates will have the same allele at a single locus is 1/(number of alleles at the locus). Thus, the probability that two isolates are identical at all five loci is the product of 1/(number of alleles) for all five loci; i.e. 2 × 10−5. The probability that four of five loci or three of five loci are identical may be estimated in a similar manner. If the least variable loci are used, the minimum probability obtained is 3 × 10−4 for four of five loci and 4 × 10−3 for three of five loci. If the calculation is extended to additional (>2) independent isolates, the probability is the chance of two isolates being related raised to the n-1 power. Thus, for 11 and 30 isolates found in the two O1 clonal complexes, the probability that they are related by chance is extremely small. The implication is that these isolates were related by descent and that there was a single source for the initial strain. Therefore, in order to find related members in cities throughout India, there must be widespread dissemination.

Five West Bengal (from Malda, Garulia and Midnapur) isolates in our largest O1 clonal complex were isolated during cholera outbreaks. All of them had the same ST (3,5,3,6,6), even though they exhibited four antibiograms, i.e. all five isolates were resistant to furazolidone and nalidixic acid, four were also resistant to cotrimoxazole and streptomycin, two were also resistant to ampicillin and one was also resistant to chloramphenicol. Three nonoutbreak isolates from West Bengal had two different, unrelated genotypes. Our observations made with a limited number of West Bengal strains are consistent with the hypothesis that rapid expansion may be associated with a geographically widespread clonal complex. If these outbreaks were to occur across the country, then the widespread strain may be considered an epidemic strain within a region where V. cholerae is endemic.


Analysis of VNTR loci is a new method for epidemiological studies of V. cholerae. The loci were highly variable among clinical isolates, and there was variation within and between the years the isolates were collected from a single location and between locations. The results of our genetic relatedness studies indicated that the variation was ordered, i.e. many genotypes were clustered by year and place. Some genotypes were unrelated to any others, consistent with the presence of local forms in some cities with little exchange. In contrast, some isolates were related to other isolates in places spanning the subcontinent. The latter strains most likely had a common origin, which indicates widespread dissemination of that genotype. VNTR analysis, if carried out on a wide scale, could be useful to follow changes in strain populations across time and space, as well as to track the spread of newly introduced strains into endemic strain populations.

Supporting Information

Additional Supporting Information may be found in the online version of this article:

Table S1. Identification of the number of repeats for each allele.

Table S2. Isolates, genotypes and antibiograms.


The authors thank Arnold Kreger for reading and editing drafts of the manuscript.


  • Editor: Ross Fitzgerald


View Abstract