OUP user menu

Atypical enteropathogenic Escherichia coli genomic background allows the acquisition of non-EPEC virulence factors

Silvia Y. Bando, Fernanda B. Andrade, Beatriz E.C. Guth, Waldir P. Elias, Carlos A. Moreira-Filho, Antônio F. Pestana de Castro
DOI: http://dx.doi.org/10.1111/j.1574-6968.2009.01735.x 22-30 First published online: 1 October 2009


Atypical enteropathogenic Escherichia coli (aEPEC) has been associated with infantile diarrhea in many countries. The clonal structure of aEPEC is the object of active investigation but few works have dealt with its genetic relationship with other diarrheagenic E. coli (DEC). This study aimed to evaluate the genetic relationship of aEPEC with other DEC pathotypes. The phylogenetic relationships of DEC strains were evaluated by multilocus sequence typing. Genetic diversity was assessed by pulsed-field gel electrophoresis (PFGE). The phylogram showed that aEPEC strains were distributed in four major phylogenetic groups (A, B1, B2 and D). Cluster I (group B1) contains the majority of the strains and other pathotypes [enteroaggregative, enterotoxigenic and enterohemorrhagic E. coli (EHEC)]; cluster II (group A) also contains enteroaggregative and diffusely adherent E. coli; cluster III (group B2) has atypical and typical EPEC possessing H6 or H34 antigen; and cluster IV (group D) contains aEPEC O55:H7 strains and EHEC O157:H7 strains. PFGE analysis confirmed that these strains encompass a great genetic diversity. These results indicate that aEPEC clonal groups have a particular genomic background – especially the strains of phylogenetic group B1 – that probably made possible the acquisition and expression of virulence factors derived from non-EPEC pathotypes.

  • atypical enteropathogenic Escherichia coli
  • diarrheagenic Escherichia coli
  • multilocus sequence typing
  • pulsed-field gel electrophoresis


Endemic diarrhea is a major public health concern around the world, especially in developing countries, and Escherichia coli has been described as an important bacterial etiologic agent of this pathology (Nataro & Kaper, 1998; Kaper et al., 2004). The strains associated with intestinal infections are known as diarrheagenic E. coli (DEC) and are classified into six pathotypes according to their pathogenicity mechanisms, virulence markers, adhesion patterns to cultured epithelial cells and clinical symptoms that they provoke (Kaper et al., 2004): enteropathogenic E. coli (EPEC), enterotoxigenic E. coli (ETEC), enterohemorrhagic E. coli (EHEC), enteroinvasive E. coli (EIEC), enteroaggregative E. coli (EAEC) and diffusely adherent E. coli (DAEC). Since 1996, EPEC has been divided into two categories: typical EPEC (tEPEC) and atypical EPEC (aEPEC), where aEPEC differs from tEPEC by the absence of EAF (EPEC adherence factor) plasmid and from EHEC by not producing Shiga toxin (Kaper, 1996).

aEPEC strains have been found in association with endemic diarrhea in children and diarrhea outbreaks in adults in developing and industrialized countries (Viljanen et al., 1990; Hedberg et al., 1997; Scaletsky et al., 1999; Afset et al., 2004; Robins-Browne et al., 2004; Cohen et al., 2005; Bueris et al., 2007; Moreno et al., 2008; Estrada-Garcia et al., 2009). Apparently, this pathotype emerged recently and it is among the leading causes of childhood diarrhea in many countries (Afset et al., 2004; Robins-Browne et al., 2004; Cohen et al., 2005; Franzolin et al., 2005; Bueris et al., 2007; Estrada-Garcia et al., 2009). These strains are very heterogeneous, some belonging to the classical EPEC O serogroups (O26, O55, O86, O111, O114, O119, O125, O126, O127, O128 and O158), and frequently the O and H antigens cannot be determined (World Health Organization, 1987; Vieira et al., 2001; Trabulsi et al., 2002; Dulger et al., 2003; Gomes et al., 2004; Franzolin et al., 2005; Afset et al., 2008; Abe et al., 2009). Moreover, aEPEC strains adhere to cultured epithelial cells in a distinct expression by tEPEC, i.e. the localized pattern. On the other hand, aEPEC strains adhere to HEp-2 cells mainly in the localized-like (LAL) pattern, although aggregative or diffuse adherence may be found (Pelayo et al., 1999; Vieira et al., 2001; Dulger et al., 2003; Gomes et al., 2004). In terms of virulence factors, aEPEC strains often carry factors associated with other DEC pathotypes such as the enterohemolysin- and EAEC heat-stable enterotoxin-encoding genes (Pelayo et al., 1999; Vieira et al., 2001; Trabulsi et al., 2002; Dulger et al., 2003).

Besides intestinal disorders, some E. coli strains may cause extraintestinal infections (known as ExPEC) such as urinary tract infection, meningitis and bacteremia (Nataro & Kaper, 1998). The commensal and pathogenic E. coli (DEC and ExPEC) strains fall into four main phylogenetic groups, named A, B1, B2 and D (Selander et al., 1987; Wirth et al., 2006). A recent work showed six major phylogenetic groups of E. coli (A, B1, B2, C, D and E) where commensal and diarrheagenic strains were present in all groups and ExPEC strains only in groups B2 and D (Escobar-Páramo et al., 2004).

Some studies have shown aEPEC clonality with tEPEC and EHEC (Reid et al., 2000), but investigations on how aEPEC strains are genetically related to other DEC pathotypes are lacking. Here, we analyzed the genetic structure of aEPEC strains presenting large phenotypic heterogeneity and determined their relationship with other pathotypes (tEPEC, EHEC, ETEC, EAEC and DAEC). Genetic relationships were evaluated on the basis of nucleotide sequence of seven housekeeping genes. The genetic diversity was analyzed by pulsed-field gel electrophoresis (PFGE) and phylogenetic grouping was determined by triplex PCR (Clermont et al., 2000).

Materials and methods

Bacterial strains

The 43 strains studied here are listed in Table 1, according to their previously determined pathotypes, serotypes, country and year of isolation. The 26 aEPEC strains have been previously characterized by presence of eae and absence of the EAF probe sequence and of the Shiga toxin encoding genes (stx1 and stx2), as well as by the ability to cause the attaching-effacing lesion in vitro (Trabulsi et al., 2002; Abe et al., 2009). The four tEPEC, two EAEC and one EHEC strains were also described previously (Campos et al., 1994; Elias et al., 2002; Trabulsi et al., 2002). The five E. coli strains devoid of DEC virulence markers were studied by Monteiro et al. (2009). These strains belong to the E. coli collection kept at the Laboratório de Bacteriologia (Instituto Butantan, Brazil). The other E. coli strains used here include the following DEC prototype strains: E2348/69 (tEPEC), 042 (EAEC), H10407 (ETEC), C1845 (DAEC) and EDL933 (EHEC) (Evan et al., 1975; Levine et al., 1978; Wells et al., 1983; Nataro et al., 1985; Bilge et al., 1989).

View this table:

Comparison of different clustering obtained based on MLST and PFGE data and phylogenetic group determination

Strain numberSerotypeCountry of isolation (year)PathotypeMLST clusterPFGE clusterPhylogenetic group
USP-029O111:H2Brazil (1958–1963)tEPECI5B1
3157O119:H2Brazil (2000–2002)aEPECI4B1
4013O88:HNMBrazil (2000–2002)aEPECI4B1
C505-60O119:H2Denmark (1960)aEPECI4B1
852O88:H25Brazil (2000–2002)aEPECI4B1
EC29/84O128:H2Brazil (1984)aEPECI4B1
C194-65O111:H8Denmark (1965)EHECI4B1
EPM3121O111:H9Brazil (1986)aEPECI7B1
EC11/93O128:H35Brazil (1993)EAECI2B1
1887O111:H38Brazil (2000–2002)aEPECI6B1
2459O26:H11Brazil (2000–2002)aEPECI7B1
2103O26:H11Brazil (2000–2002)aEPECI4B1
148B2O26:H11Brazil (1984)aEPECI7B1
462O51:H40Brazil (2000–2002)aEPECIIXA
92O2:H16Brazil (2000–2002)aEPECII7A
4192O111:H25Brazil (2000–2002)aEPECII4A
558O111:H40Brazil (2000–2002)aEPECII3A
209/85O126:H27Brazil (unknown)EAECII6A
4009O114:H25Brazil (2000–2002)aEPECII3A
3414ONT:H6Brazil (2000–2002)aEPECIII1B2
C54-58O55:H6Guiana (1958)tEPECIII7B2
4182O125:H6Brazil (2000–2002)aEPECIII1B2
9100 -83O55:H7Peru (1983)aEPECIII3B2
3970O55:HNMBrazil (2000–2002)aEPECIII7B2
C292/84O125:H6Brazil (unknown)aEPECIII1B2
2791O119:H19Brazil (2000–2002)aEPECIII5B2
289O119:H15Brazil (2000–2002)aEPECIII5B2
30/88-72O119:H6Brazil (1972)tEPECIII2B2
47/1O86:H34Brazil (1991)tEPECIII6B2
21O55:HNMBrazil (1981)aEPECIV5D
4147O55:H7Brazil (2000–2002)aEPECIV6D
320O55:H7Brazil (2000–2002)aEPECIV6D
1381PSHCO55:H7Brazil (1981)aEPECIV2D
84NTBrazil (2000–2002)E. coliV7D
251NTBrazil (2000–2002)E. coliV5D
895NTBrazil (2000–2002)E. coliV2D
268NTBrazil (2000–2002)E. coliV5D
936NTBrazil (2000–2002)E. coliVI4B2
  • * HNM, nonmotile.

  • † aEPEC strains identified by the absence of the EAF probe sequence, but which presented BFP expression (Abe et al., 2009). NT, not tested. X, profile not obtained.

Phylogenetic group determination

Phylogenetic grouping was determined using the triplex PCR for chuA, yjaA genes and the TSPE4.C2 anonymous fragment, as described by Clermont et al. (2000).

Multilocus sequence typing (MLST)

The primers used for amplification and sequencing of seven housekeeping genes used for phylogenetic analysis of pathogenic E. coli strains were described previously (http://www.shigatox.net/stec/mlst). Each strain was grown on MacConkey agar overnight at 37 °C. A single colony was transferred to 3 mL of Luria–Bertani broth and grown overnight under shaking (250 r.p.m.) at 37 °C. Chromosomal DNA was obtained using the Wizard Genomic DNA Purification Kit (Promega, Madison, WI). Each reaction was performed in a final volume of 30 μL containing 20 ng of chromosomal DNA, 20 mM Tris-HCl (pH 8.4), 50 mM KCl, 2 mM MgSO4, 150 μM each of dATP, dCTP, dGTP and dTTP, 0.4 μM of each primer and 1.0 U of Platinum Taq DNA Polymerase High Fidelity (Invitrogen, Carlsbad, CA). The amplification conditions for 35 cycles were: 95 °C for 45 s, 55 or 57 °C for 45 s and 72 °C for 2 min, with an initial denaturating step of 95 °C for 5 min and final extension step of 72 °C for 7 min. The amplification products were purified using GFX PCR DNA and Gel Band Purification Kit (GE Healthcare, Amersham, UK). The concentration was determined by agarose gel electrophoresis.

Amplicons and primers were sent to the Centro de Estudos do Genoma Humano, Universidade de São Paulo (São Paulo, Brazil) for nucleotide sequencing.

Sequence analysis

Multiple-sequence alignment of the nucleotide sequences was performed with clustalx and genedoc. Phylogeny was based on a supergene constructed by concatenating the seven genes. The phylogenetic tree was constructed based on neighbor-joining method using maximum-likelihood model of nucleotide substitution. The homologous nucleotide sequences of EAEC strain 55989 (GenBank accession number: NC011748), ETEC strain E24377A (GenBank accession number: NC009801) and EHEC strain Sakai (GenBank accession number: NC002695) were obtained from GenBank and included in the phylogenetic analysis.


All 43 strains were processed according to the protocol described by Gautom (1997). Briefly, the chromosomal DNA was digested using XbaI and the fragments were separated by electrophoresis performed on the CHEF-DR III System (Bio-Rad Laboratories, Hercules, CA). The electrophoretic conditions used were as follows: initial and final switch time, 2.1 and 35.0 s, respectively; run time, 14 h; angle, 120°; gradient, 6.0 V cm−1; temperature, 14 °C. Data analysis was done using the bionumerics software. The similarities of PFGE profiles of each strain were compared using a Dice coefficient at 1.0% of tolerance and 0.8% of optimization. The similarity matrix obtained was used to construct a dendrogram based on unweighted pair group method of averages.


Phylogenetic group determination

Triplex PCR showed that DEC strains are distributed in all phylogenetic groups (Table 2). The aEPEC strains were distributed among group B1 (10 strains), A (five strains), B2 (four strains) and D (four strains).

View this table:

Distribution of phylogenetic groups among the 43 DEC strains studied

E. coli50140
Total4314 (32.6%)12 (27.9%)10 (23.3%)7 (16.3%)
  • * Escherichia coli strains isolated from children but devoid of genetic virulence markers of DEC pathotypes.


The 43 strains of DEC presented the same PCR fragment sizes for seven genes analyzed. However, nucleotide-sequencing alignment revealed a codon insertion for icdA in aEPEC strain 462 (serotype O51:H40). This insertion was considered as one gap for phylogenetic inference. The nucleotide sequence alignment analysis showed that the seven genes studied were highly conserved, with the nucleotide diversity per site between 0.008 and 0.034 (Table 3). The fadD gene showed a great genetic variability (13.5%) in nucleotide sequence; however, only 0.6% of such variation generates amino acid substitution. All nucleotide sequences are available in the GenBank database (http://www.ncbi.nlm.nih.gov/Genbank/index.html, accession numbers: FJ693717FJ694010).

View this table:

Sequence diversity of seven housekeeping genes among 43 diarrheagenic Escherichia coli

Polymorphic sites18486454293919
Diversity (Pi)0.007680.021970.033710.022670.009300.027290.00979
Fragment length465483474471/474498309333
  • * Strain no. 462 possesses a codon insertion.

The phylogenetic inference was based on a total of 3033 nucleotide and 271 variable sites of seven genes after concatenation. The tree presented in Fig. 1 shows the DEC strains fitting into six main clusters. Cluster I and II are very heterogeneous: cluster I contains 10 aEPEC, two EAEC, two ETEC, one EHEC and one tEPEC strain; cluster II contains five aEPEC strains, one EAEC and one DAEC strain. Cluster III presents only EPEC strains (seven aEPEC and four tEPEC strains) with the majority of strains presenting the H6 flagellar antigen. Cluster IV encompasses only aEPEC belonging to serotypes O55:H7 and O55:HNM as well as EHEC O157:H7. However, this O55:HNM strain presents the H7 flagellar antigen, according to the fliC restriction analysis (Botelho et al., 2003). Cluster V is composed of EAEC (prototype strain 042) and four E. coli strains isolated from cases of diarrhea but devoid of any virulence markers that define the DEC pathotypes. Finally, cluster VI is represented by one E. coli strain without DEC virulence markers.


Phylogenetic tree of 46 diarrheagenic Escherichia coli strains. Phylogeny was constructed by the neighbor-joining algorithm based on maximum-likelihood nucleotide substitution model. The numbers at the nodes represent the bootstrap values based on 1000 replications.


PFGE profiles were obtained for 42 of the 43 strains studied here. Only one strain (462) did not present any restriction profile for XbaI. The 42 distinct electrophoresis profiles were used for dendrogram construction (Fig. 2). The dendrogram showed seven clusters, assuming a cutoff of 60% of similarity. When a cutoff of over 80% of similarity was adopted, 39 different clusters were found, indicating the high genetic variability among the strains, and the three clusters with high genetic similarity contained only strains that share the same O antigen (O88 and O119) or the same serotype (O55:H7).


Dendrogram of 42 diarrheagenic Escherichia coli strains constructed by PFGE data.


The classification of aEPEC strains that has been used in this study was defined by Kaper (1996). This definition divides EPEC in typical and atypical based on the presence or absence of EAF plasmid, respectively. This plasmid encodes the bundle-forming pilus (BFP) and perABC loci that express regulators of bfp operon and the LEE pathogenicity island of EPEC (Nataro & Kaper, 1998). According to this definition, we had initially 26 strains of aEPEC and five strains of tEPEC. However, during the course of this study, BFP expression was observed in three of the aEPEC strains (strain 3970/O55:HNM, strain 289/O119:H15 and strain 2791/O119:H19) (Abe et al., 2009). Therefore, here we considered these strains tEPEC.

This is the first study showing the genetic relationships of aEPEC with other DEC pathotypes such as tEPEC, EHEC, EAEC, ETEC and DAEC. Strains of some pathotypes were represented by their prototypes, as aEPEC strains frequently present virulence markers of other DEC (Whittam & McGraw, 1996; Reid et al., 2000; Bando et al., 2007). Moreover, we also included five strains isolated from cases of diarrhea but devoid of virulence markers that define a DEC pathotype.

The aEPEC strains were distributed in four phylogenetic groups: most of them fell into group B1 (cluster I), followed by group A (cluster II), group B2 (cluster III) and group D (cluster IV). Similar results were also found by Afset et al. (2008), studying aEPEC strains isolated from children with and without diarrhea in Norway. We observed two clusters in group D (clusters IV and V). This is in accordance with the work of Escobar-Páramo et al. (2004), who named cluster IV as group E.

Recent investigations of phylogenetic relationships of aEPEC found close relations with tEPEC (Lacher et al., 2007; Afset et al., 2008). Lacher et al. (2007) showed that EPEC are distributed in four main clusters: EPEC 1 includes only tEPEC strains possessing H6 flagellar antigen, EPEC 2 includes tEPEC and aEPEC possessing H2 antigen, EPEC 3 contains tEPEC and aEPEC possessing H34 antigen and EPEC 4 encompasses tEPEC and aEPEC possessing H6 antigen. The present work showed that tEPEC and aEPEC belonging to classical EPEC O serogroups were distributed on clusters that closely correlated with the clonal groups described by Lacher et al. (2007). The aEPEC strains analyzed here that do not belong to EPEC O serogroups (Abe et al., 2009) were arranged in clusters separate from the classical EPEC clusters obtained by Lacher et al. (2007).

The aEPEC and tEPEC strains grouped in clusters III and I (Fig. 1) corroborate the clonality found within them in other studies (Whittam & McGraw, 1996; Reid et al., 2000; Lacher et al., 2007), and a better understanding of their relationship with other pathotypes can be seen in the phylogram. The cluster III has three subclusters: subcluster IIIa contains aEPEC and tEPEC strains that are arranged in EPEC 1, subcluster IIIc corresponds to EPEC 3 and subcluster IIIb includes strains of tEPEC and aEPEC that correspond to EPEC 4. An interesting finding observed in this latter subcluster was that two strains (serotypes O119:H15 and O119:H19) were EAF negative but expressed BFP, suggesting that this group contains exclusively tEPEC. Finally, the cluster I has subcluster Ia, which encompasses aEPEC corresponding to EPEC 2.

These results showed that strains belonging to subclusters IIIa, IIIb and IIIc are clonally close, whereas strains grouped in subcluster Ia are genetically distant of the EPEC strains belonging to cluster III. On the other hand, aEPEC strains of subcluster Ia have genetic similarity with EHEC, ETEC and EAEC.

It is interesting to observe that the majority of aEPEC strains (65%) belonging to clusters I and II share genetic traits with other pathotypes. The Ib subcluster contains aEPEC strains that harbor the hemolysin gene detected by PCR, or express this toxin (three strains) (R.M.F. Piazza, unpublished data), and other genes encoding virulence factors such as sat (serotype O111:H38), efa/lifA (serotype O26:H11) and toxB (serotype O26:H11) (M.P. Sircili, unpublished data). Furthermore, aEPEC strains of subcluster Ib showed genetic similarity with ETEC (H10407 and E24377A). Some of the aEPEC strains of cluster II carry virulence factors, as the coding genes astA, efa and ldaH (M.P. Sircili, unpublished data). These findings suggest that aEPEC strains might have a genetic background that allows the acquisition of virulence factor coding genes of other pathotypes.

The ordinary clonality of O55:H7 aEPEC strains and O157:H7 EHEC strain (cluster IV) have already been reported (Whittam & McGraw, 1996; Reid et al., 2000; Bando et al., 2007) and this information has been confirmed here. In addition, we found that O55:H7 and O157:H7 share the same ancestrality. One exception was the strain 9100-83, previously typed as O55:H7 (Rodrigues et al., 1996), which grouped in cluster IIIa and phylogenetically pertains to group B2. Based on these results, we decided to confirm the H antigen typing by restriction analysis of the flagellin gene. The electrophoresis profile was identical to serotype O55:H6 (data not shown), confirming that this strain does not belong to cluster IV. The serotype O55:H6 has been classified as tEPEC (Trabulsi et al., 2002); however, this strain does not harbor the EAF plasmid or BFP expression, and displayed the diffuse adherence pattern to HEp-2 cells (Rodrigues et al., 1996). Probably, this strain lost or did not acquire the EAF plasmid. Future investigations based on the genetic structure of this strain might explain how the EAF plasmid horizontal transfer within E. coli happened.

Interestingly, the four E. coli strains lacking DEC virulence markers fell into the EAEC prototype strain group. All of these strains have at least one EAEC virulence gene (Monteiro et al., 2009). This suggests that these E. coli strains may belong to a hitherto uncharacterized pathotype, or pathotypes. Alternatively, these strains could be commensals carrying EAEC virulence factors. A similar situation was described by Rasko (2008) in a comparative genomic study conducted in commensal and pathogenic E. coli strains.

The PFGE analysis showed great genetic diversity within the 42 strains studied here, demonstrating more discrimination than MLST and phylogenetic grouping. However, the PFGE clustering does not correspond to the MLST phylogram clusters. Afset et al. (2008) reported a similar finding when they analyzed aEPEC strains from a case/control study. This result confirmed that PFGE is a useful tool for evaluating high similarity strains and comparing strains belonging to the same or closely related serotypes.

The genomic analyses conducted here show that aEPEC strains are distributed in all E. coli phylogenetic groups (B1, A, B2 and D). Moreover, our data showed that aEPEC possess at least two distinct genomic backgrounds – here named clusters I and III – that possibly permitted the acquisition and expression of virulence factors. Cluster I (group B1) strains present virulence factors associated to severe diarrhea, such as caused by EHEC and ETEC. Cluster III (group B2) strains present virulence factor associated to mild diarrhea, such as caused by EPEC. This interpretation is in accordance with work of Escobar-Páramo et al. (2004), who have already suggested the existence of two different genomic backgrounds in commensal and pathogenic E. coli strains.

This study showed that phylogenetic analysis combining MLST and virulence factor markers constitute a useful tool for identifying E. coli clonal groups. This approach could be associated to other comparative genomic analysis methods to elucidate the relationship between genomic background and pathogenicity in E. coli, as well as its evolutionary process.


This work is dedicated to the memory of Prof. Luiz R. Trabulsi. We thank Maria Cecília Cergole-Novella and Fernanda Marques for excellent technical assistance. This work was supported by Fundação de Amparo à Pesquisa do Estado de São Paulo (grant 04/12136-5 to W.P.E. and a postdoctoral fellowship to S.Y.B.). C.A.M.-F. is the recipient of a career development award from CNPq (grant 300966/2006-7).


  • Editor: David Clarke


View Abstract