OUP user menu

Identification of novel virulence-associated loci in uropathogenic Escherichia coli by suppression subtractive hybridization

Liisa Johanna Sorsa , Severin Dufke , Sören Schubert
DOI: http://dx.doi.org/10.1016/S0378-1097(03)00893-0 203-208 First published online: 1 January 2004


To identify novel virulence-associated genes in uropathogenic Escherichia coli (UPEC) strains, a suppression subtractive hybridization strategy was applied to genomic DNA of four clinical UPEC isolates from patients suffering from cystitis or pyelonephritis. The genomic DNA of four isolates (tester strains) was subtracted from the DNA of two different driver strains, the well characterized UPEC strain CFT073 and the non-pathogenic E. coli K-12 strain MG1655. We determined the sequence of 172 tester strain-specific DNA fragments, 86 of which revealed only low or no homology to nucleotide sequences of public databases. We further determined the virulence association of the 86 novel DNA fragments using each DNA fragment as a probe in Southern hybridizations of a reference strain collection consisting of 60 extraintestinal pathogenic E. coli isolates, and 40 non-virulent E. coli strains from stool samples. From this, 19 novel DNA fragments were demonstrated to be significantly associated with virulent strains and thus may represent new virulence traits. Our results support the idea of a considerable genetic variability among UPEC strains and suggest that novel genomic determinants might contribute to virulence of UPEC.

  • Uropathogenic Escherichia coli
  • Suppression subtractive hybridization
  • Genome diversity
  • Virulence factor

1 Introduction

Urinary tract infections (UTI) are among the most common infectious diseases, resulting in over 7 million clinic visits annually in the USA alone and causing significant morbidity and mortality. The majority of UTI are due to ascending infections of the urinary tract caused by uropathogenic Escherichia coli (UPEC) leading to asymptomatic bacteriuria, cystitis, acute pyelonephritis or urosepsis. The knowledge of virulence mechanisms and virulence traits has accumulated in the last decades, and UPEC strains have been shown to express a variety of virulence factors, e.g. siderophores, adhesins, toxins, invasins and capsules [1], which enable the adaptation to the hostile environment of the urinary tract, and the circumvention of the host immune defenses [14]. Currently, however, none of the known virulence genes or set of genes can clearly define a prototypic uropathogenic E. coli, and UPEC strains differ considerably with regard to the number of known virulence factors [5] indicating a significant genetic diversity among UPEC strains. Consistently, the size of the E. coli genomes varies from 4.6 to 5.3 Mb and reveals differences in the genetic composition of up to 40% with a common genome backbone of E. coli shared by virulent and non-virulent E. coli strains of an estimated 3.9 Mb [69]. Because of this, it is likely that different UPEC strains carry yet unknown virulence factors, which are not represented in the entirely determined genome sequences of well characterized UPEC strains [6].

In order to determine the genotypic variation between UPEC strains and to detect new potential virulence traits, we applied a suppression subtractive hybridization (SSH) to four UPEC isolates from patients with different clinical symptoms. For this, the archetypical UPEC strain CFT073 [6], and the non-pathogenic E. coli K-12 strain MG1655 [8] were collectively used as driver strains. Previous studies have used SSH cloning to determine genomic differences between uropathogenic and non-virulent E. coli strains [1012]. In contrast to these reports, we here determined the genomic diversity among different UPEC strains by subtracting the respective genomic DNA of clinical UPEC isolates (tester strains) from genomic DNA of both the well defined UPEC strain CFT073 and the non-pathogenic E. coli K12 strain MG1655 (combined driver). The entire genome sequences of these strains are available in public nucleotide databases. In this study we sequenced and characterized a total number of 384 SSH fragments, derived from the four distinct UPEC strains HE300, JS304, JS299 and JS322. Of these fragments, 86 revealed no or only a low homology to sequences of nucleotide databases. By determining the distribution of the subtracted fragments among uropathogenic and non-pathogenic E. coli strains, 19 of the novel DNA fragments turned out to be significantly associated with a virulent phenotype.

2 Materials and methods

2.1 Bacterial strains, plasmids and growth conditions

Bacterial strains used in this study are listed in Table 1. The three E. coli strains HE300, JS322 and JS304 were isolated from patients with acute pyelonephritis, and E. coli strain JS299 was collected from the urine of a patient with cystitis. Furthermore, 90 additional E. coli strains used in the dot-blot hybridizations were isolated either from urine samples and blood cultures of patients, or from stool samples of healthy volunteers at the diagnostic laboratory of the Max von Pettenkofer-Institut, Munich, Germany. The UPEC strain CFT073 is a previously described isolate of a patient with acute pyelonephritis [6,13,14]. Apathogenic E. coli K-12 strain MG1655 was used as the second driver strain for subtractive hybridization and as a control for polymerase chain reaction (PCR) reactions [8]. The complete sequences of these two E. coli strains have previously been determined [6,8]. Both tester and driver strains were classified into phylogenetic groups by PCR amplification [15], and the tester strains were screened for the presence of virulence factors associated with UTI by multiplex PCR [5] (data not shown). Bacteria were routinely grown at 37°C in Luria–Bertani broth supplemented with kanamycin (50 μg ml−1) or ampicillin (100 μg ml−1) when required.

View this table:

E. coli strains used in the study

Tester strain/Subtraction libraryPathotypePhylogenetic groupTotal tester strain-specific sequences in subtraction libraryHigh homology sequences (>85%)aLow homology sequences (<85%)aSequences with no homologyVirulence-associated sequencesb
  • aPercentage of homology regarding the total sequence length.

  • bSubset of sequences with low or no homology (‘novel DNA fragments’) which were shown to be virulence-associated DNA fragments by means of dot-blot hybridization assays.

2.2 Recombinant DNA techniques and genomic subtraction

Isolation of plasmids and genomic DNA as well as cloning of DNA fragments were carried out using standard techniques [16]. Genomic SSHs between tester strain and driver strains were performed using the PCR-Select Bacterial Genome Subtraction Kit (BD Biosciences Clontech, Heidelberg, Germany) according to the manufacturer's protocol. The subtracted PCR fragments were cloned into pCR4-TOPO vector (Invitrogen, Karlsruhe, Germany) for subsequent sequencing of the subcloned SSH fragments.

2.3 Sequence analysis of subtracted DNA fragments

The subtracted PCR fragments were sequenced using the ABI Prism BigDye Terminator Sequencing Kit according to the manufacturer's instructions and a model 377 DNA sequencing system (Applied Biosystems, Weiterstadt, Germany). Management and analysis of nucleotide sequence data were performed using a Lasergene sequence analysis software system (DNASTAR, Madison, WI, USA). Homology searches were performed by comparing the sequences with the public DNA databases using the BlastN algorithm of the National Center for Biotechnology Information (NCBI) database at the website http://ncbi.nlm.nih.gov[17], and using the Fasta3 algorithm of the European Bioinformatics Institute (EBI) at the website http://www.ebi.ac.uk.

2.4 Virulence association studies

Using dot-blot hybridization technique all subtracted DNA fragments were tested for association with a virulent phenotype. Thus, an E. coli reference strain collection comprising 60 isolates from urine, renal puncture or blood culture isolates and 30 isolates from stool samples of healthy volunteers was subjected to dot-blot hybridization assays using each of the 86 novel SSH fragments as a DNA probe. Chromosomal DNA (200 ng) of each E. coli reference strain was dotted onto Hybond N+ membrane (Amersham), and incubated with DNA probes generated from subcloned SSH fragments using universal oligonucleotide M13 primers. Labeling, hybridization and detection were carried out by means of the ECL random prime labeling and detection system (Amersham). The association of the SSH fragments with extraintestinal pathogenic E. coli (ExPEC) strains was determined and quantified by the χ2 test and calculations of odds ratios using SigmaStat software (Version 2.03, SPSS, Richmond, CA, USA). The threshold for statistical significance was a P value of <0.05.

2.5 Nucleotide sequence accession number

The partial nucleotide sequences of 19 novel SSH fragments that revealed significant association with ExPEC strains are deposited in the GenBank database under the accession numbers AY423556 and AY428145AY428161.

3 Results

3.1 Suppression subtraction cloning of DNA fragments from different UPEC strains

In this study we performed a subtractive suppression cloning strategy of genomic DNA from different UPEC strains in order to detect new virulence-associated determinants and to gain further insight into the virulence mechanisms of UPEC. To accomplish this, we used four UPEC strains, HE300, JS304, JS299 and JS322, as tester strains in separate experiments together with E. coli strains CFT073 and MG1655 as combined driver strains. The tester strains were isolated from urine samples of patients suffering from pyelonephritis or cystitis, and were identified to be causative agents based on the clinical symptoms and the laboratory findings that the sample contained more than 105 CFU of E. coli per ml. A total number of 384 SSH fragments with sizes of 0.5–2.0 kb were isolated in four separate subtraction experiments and were cloned into the vector pCR4-TOPO for sequencing. A total of 172 (45%) SSH fragments revealed sequences specific for the four tester strains. Of the 172 tester-specific SSH fragments, 86 were shown to be highly homologous (>90% identity) to sequences deposited in the nucleotide databases (Table 1). These 86 sequences represented parts of (i) known pathogenicity islands (PAIs I536, II536 and III536) of E. coli strain 536 and the Shigella resistance locus (SRL PAI) of Shigella flexneri (n=25), (ii) bacteriophages and transposable elements (n=11), (iii) plasmids found in other E. coli and Salmonella strains (n=28), (iv) the O-islands of enterohemorrhagic E. coli strain EDL933 (n=14), and (v) other genomic loci (n=8) [9,18].

3.2 Subtracted fragments sharing high homology with known nucleotide sequences

Interestingly, each of the four different UPEC strains used as tester strains gave a distinct pattern of SSH fragments with regard to the origin of the respective homologue. Pyelonephritogenic E. coli tester strain HE300, belonging to the phylogenetic group A, revealed almost exclusively SSH fragments highly homologous (>95% identity) to the genes of diverse E. coli plasmids and Salmonella typhimurium plasmid R64 encoding proteins involved in plasmid replication, segregation and mating. In contrast, subtraction of the E. coli tester strain JS299, which belongs to the phylogenetic group B2 and caused cystitis, essentially generated PAI-associated sequences with 22 sequences being highly homologous (>95% identity) to the three PAIs (I–III) of the UPEC strain 536 [18]. Three of these fragments, homologous to open reading frames (ORFs) 16, 20 and 21 of PAI I536, encoded two hypothetical proteins and a putative F17-like fimbrial usher protein. PAI II536 was represented by 15 subtraction fragments. Four subtraction fragments were homologous to genes of the known adhesins Hek (96%) and PrfG, the adhesin of P-related fimbriae (97%). Two subtracted DNA fragments were 99% identical to ORFs 50 and 63 of PAI III536. Further subtraction fragments of tester strain JS299 revealed high homology to the genes for cytotoxic necrotizing factor (cnf1), type I fimbriae and gspH and gspI, encoding hypothetical type II secretion proteins.

The subtraction library of pyelonephritogenic E. coli tester strain JS304, belonging to phylogenetic group B2, consisted of DNA fragments with homology to genomic DNA of Shigella sp. and enterohemorrhagic E. coli. Three SSH fragments shared homology to ORFs 4 (82%), 42 (93%) and 43 (98%) of the S. flexneri 2a SRL PAI (Shigella resistance locus) [19], and eight subtraction fragments were homologous to the putative phage or IS sequences originating either from the O-islands of strain EDL933 or genomic DNA of Salmonella enterica. Finally, the subtraction library of the pyelonephritogenic E. coli strain JS322, which belongs to phylogenetic group D, consisted almost exclusively of sequences homologous to the genome of enterohemorrhagic E. coli. Seven sequences shared high homology (>91% identity) with genes present in the O-islandsEDL933, three of the SSH fragments exhibited more than 92% homology to phage- or IS-associated sequences of E. coli EDL933, and two further fragments resembled genes of a type IV fimbrial gene cluster of Shiga toxin-producing E. coli.

3.3 Subtracted fragments sharing no or only low homology with known nucleotide sequences

Eighty-six tester-specific SSH clones displayed either low (between 55% and 82% identity; n=77) or no homology (n=9) to the sequences in the public nucleotide databases (Table 1). These fragments representing the low homology fraction exhibited short sequence stretches related to the chromosomal DNA of E. coli strain CFT073 (n=10), and to putative genes from bacteriophages, prophages and O-islands of E. coli strain EDL933 (n=7). However, the majority of tester-specific SSH fragments with low homology (n=60) shared homology with DNA sequences of 39 different bacterial species, both Gram-negative and Gram-positive. Among these fragments of low homology, some shared homology with genes involved in regulation and initiation of gene transcription, e.g. clone SPL176 revealed homology with a transcriptional regulator of the GntR family found in Agrobacterium tumefaciens, clone SPL582 with the two-component sensor histidine kinase CPE0207 of Clostridium perfringens, and clone SAK450 (accession number AY428160) shared homology with the blr0961 gene encoding a putative translation initiation factor of Bradyrhizobium japonicum. Other tester strain-specific SSH fragments shared homology with genes encoding different enzymes, e.g. clone SPL373 shared homology with the scrB gene encoding a sucrose hydrolase in Erwinia amylovora, clone SPL265 (AY428155) showed homology to the enolase gene eno-2 of Pseudomonas syringae, and clone SAK494 carried sequences related to the hemN gene encoding the oxygen-independent coproporphyrinogen III oxidase HemN of Campylobacter jejuni. A third group of SSH fragments with low homology to known genes comprised clones with homology to different membrane transporters, e.g. clone SPL251 carried DNA related to the ammonium transporter gene amt of Methanococcus jannaschii, clone SPL272 revealed homology to the putative anaerobic C4-dicarboxylate transporter gene dcuA of Campylobacter jejuni, and clone SPL346 shared nucleotide homology with the preprotein translocase subunit gene secY of Treponema pallidum.

3.4 Virulence association of subtracted fragments sharing no or only low homology with known nucleotide sequences

Sequence analysis of four subtraction libraries revealed 86 SSH fragments with low homology to sequences in public nucleotide databases. We next determined the prevalence of these novel SSH fragments among ExPEC by means of dot-blot hybridization analysis. The E. coli reference strain collection consisted of 60 strains isolated from blood cultures and urine samples of patients suffering from septicemia and UTI, respectively, and of 30 strains from stool samples of healthy volunteers as negative control strains. The degree of association with virulence was measured by statistical analyses using the χ2 test. As shown in Table 2, a total number of 19 SSH fragments were significantly associated with virulence, revealing a high level of association with virulent phenotype (χ2=3.59–27.30; P<0.05) (Table 2). Interestingly, the highest association with a virulent phenotype could be attributed to the DNA fragments with homology to S. typhimurium plasmid R64 and bacteriophages, as well as to different DNA fragments without significant homology to the sequences in the databases.

View this table:

Fasta3 homology search of SSH fragments specific for E. coli tester strains and associated with a virulent phenotype

SSH fragmentDerived from tester strainSize (bp)Nucleotide sequence homology or designationSourceExpect valueaAccession numberbPcχ2c
SAK432JS322232nikA/nikB, oriT regionS. typhimurium plasmid R644.9e−25AP0051470.00127.30
SPL234JS304286ECs0225, hypothetical proteinE. coli O157:H71.3e−26AP0025500.00111.20
SPL241JS304612no library sequences0.00110.94
SPL592JS299593YPO2274, putative phage proteinY. pestis1e−86AJ4141510.0038.99
SPL107JS300156lin0414, hypothetical proteinL. innocua0.37AL5961640.0116.52
SPL379JS299248intergenic regionC. acetobutylicum0.46AE0075870.0126.39
SPL301JS299546dnaJ, heat shock proteinM. penetrans0.047AP0041740.0126.31
SPL317JS299580hsdR, endonucleaseN. europaea4.6e−11BX3218640.0215.26
SPL345299580intergenic regionV. cholerae0.29AE0043690.0274.92
SPL318JS299498atpB, ATP synthase subunit aB. aphidicola0.86AF0082100.0274.91
SPL282JS304456no library sequences0.0274.91
SPL265JS304543eno-2, enolaseP. syringae2.9e−15AE0168720.0284.83
SPL243JS304268intergenic regionE. coli plasmid p3001.4e−66AY2055650.0284.81
SPL343JS299606YPO2274, putative phage proteinY. pestis2.8e−88AJ4141510.0394.25
SPL360JS299496intergenic regionC. ruminantium0.032AF3086640.0414.17
SAK452JS322520ECs2763, hypothetical proteinE. coli O157:H77.2e−08AP0025590.0434.08
SPL313JS299481ECs1093, putative phage proteinE. coli O157:H74.5e−20AP0025540.0463.66
SPL371JS299455R02_orf140, hypothetical proteinM. pneumoniae0.15AE0000080.0483.59
SAK450JS322347blr0961, hypothetical proteinB. japonicum6.5e−34>AP0059380.0483.59
  • aThe E value represents the significance of sequence matches calculated with the Fasta3 algorithm.

  • bAccession number of the homologous gene.

  • cThe significance of association of SSH fragments with ExPEC strains was determined by the χ2 test. P<0.05 was considered statistically significant. A higher χ2 value indicates a closer association of the respective SSH fragment with virulent E. coli (ExPEC) strains.

4 Discussion

ExPEC comprise a group of pathogens causing urinary tract infections, sepsis and newborn meningitis. Recent whole-genome analyses of E. coli strains revealed an excessive variation in the genome sizes and gene contents, with pathogenic E. coli strains possessing genomes up to 1 Mb larger than non-pathogenic laboratory strains [6]. Accordingly, pathogenic E. coli have obtained a significant proportion of their genetic diversity through the acquisition of DNA from distantly related organisms [20]. The amount of DNA transferred to ExPEC is largely undefined and it is likely that yet unknown virulence or fitness factors are encoded by horizontally transferred DNA fragments. As whole-genome sequencing requires great effort and cost, SSH is an alternative approach that allows comprehensive genome surveys of closely related strains. The present study, to our knowledge, is the first report on an extensive genome survey of different UPEC strains to identify new potential urovirulence DNA loci by using combined driver strains, comprising an archetypical UPEC strain and a non-pathogenic laboratory strain.

About 45% of the 384 subtracted fragments sequenced were tester-specific. Half of the tester-specific fragments were highly homologous (>85% identity) to the known sequences in the public nucleotide databases. These sequences represented parts of the known PAIs, other urovirulence genes, and parts of transferable elements, such as bacteriophages and plasmids. The large fraction of sequences homologous to mobile genetic elements may reflect their contribution to dispersing putative virulence traits and to the ongoing rearrangements of genetic islands. In particular, the subtraction of pyelonephritogenic strains HE300 and JS304 resulted in the detection of diverse plasmid sequences which were most significantly associated with UPEC strains. These observations are in contrast to previous reports describing UPEC to generally lack plasmids, and corroborate our recent findings that plasmid-encoded determinants contribute to the virulence of UPEC strains [21]. Interestingly, several subtracted fragments represented parts of O-islands, DNA regions present in enterohemorrhagic E. coli strain EDL933, but absent from the genome of non-pathogenic E. coli strain MG1655. The spectrum of known virulence genes and novel virulence-associated DNA fragments was rather strain-specific, with subtractions of some strains largely resulting in PAI-associated SSH fragments, whereas other strains gave DNA fragments homologous to the genomic DNA of enterohemorrhagic E. coli strain EDL933. Of note, half of the tester-specific sequences (86 sequences) displayed either only low homology (<85%) or no homology to the known sequences. As mere sequence data do not provide any information with regard to the virulence association, subtracted tester-specific DNA fragments with no or only low sequence homology to known genes were screened for their prevalence among the ExPEC strains. Unexpectedly, 19 out of 86 DNA fragments with low homology to known genes were shown to be significantly more prevalent among ExPEC strains than among non-pathogenic E. coli strains. These novel virulence-associated DNA fragments may be exploited as further genetic markers of urovirulence and some may even represent novel virulence traits.

In summary, this study provides further evidence for the considerable diversity of the genomes within the UPEC group. We detected 86 novel DNA fragments of UPEC genomes, 19 of which are highly virulence-associated and may represent new targets for diagnostic purposes or prevention measures. Further experiments are necessary to characterize the entire structure of the novel virulence-associated DNA loci and to define the potential role of these loci for virulence of ExPEC.


We would like to thank Kirsten Weinert for her excellent technical assistance. This study was supported by a grant from the Bundesministerium für Bildung und Forschung (Kompetenznetzwerk PathoGenoMik) to S.S.


  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
  11. [11].
  12. [12].
  13. [13].
  14. [14].
  15. [15].
  16. [16].
  17. [17].
  18. [18].
  19. [19].
  20. [20].
  21. [21].
View Abstract