OUP user menu

The phylogeny of Sodalis-like symbionts as reconstructed using surface-encoding loci

Anna K. Snyder, Cynthia M. McMillen, Peter Wallenhorst, Rita V.M. Rio
DOI: http://dx.doi.org/10.1111/j.1574-6968.2011.02221.x 143-151 First published online: 1 April 2011


Phylogenetic analyses of 16S rRNA support close relationships between the Gammaproteobacteria Sodalis glossinidius, a tsetse (Diptera: Glossinidae) symbiont, and bacteria infecting diverse insect orders. To further examine the evolutionary relationships of these Sodalis-like symbionts, phylogenetic trees were constructed for a subset of putative surface-encoding genes (i.e. ompA, spr, slyB, rcsF, ycfM, and ompC). The ompA and ompC loci were used toward examining the intra- and interspecific diversity of Sodalis within tsetse, respectively. Intraspecific analyses of ompA support elevated nonsynonymous (dN) polymorphism with an excess of singletons, indicating diversifying selection, specifically within the tsetse Glossina morsitans. Additionally, interspecific ompC comparisons between Sodalis and Escherichia coli demonstrate deviation from neutrality, with higher fixed dN observed at sites associated with extracellular loops. Surface-encoding genes varied in their phylogenetic resolution of Sodalis and related bacteria, suggesting conserved vs. host-specific roles. Moreover, Sodalis and its close relatives exhibit genetic divergence at the rcsF, ompA, and ompC loci, indicative of initial molecular divergence. The application of outer membrane genes as markers for further delineating the systematics of recently diverged bacteria is discussed. These results increase our understanding of insect symbiont evolution, while also identifying early genome alterations occurring upon integration of microorganisms with eukaryotic hosts.

  • symbiosis
  • insect
  • phylogeny
  • Sodalis


Symbiosis enables the utilization of environments that would otherwise be rendered inhospitable and as such, is recognized as an important source of biological innovations particularly in regards to the radiation of the Class Insecta (Blochmann, 1887; Buchner, 1965). The evolutionary trajectory of symbiosis towards obligate mutualism may develop through a parasitism to mutualism continuum through processes such as the attenuation of host fitness penalties (Jeon, 1972) and the conversion of horizontal transmission to a purely vertical mode (Ewald, 1987). Such a route is exemplified by ancient endocellular symbionts of various insect hosts, such as Buchnera aphidicola in aphids (Homoptera: Aphididae), which are thought to have evolved from less specialized but more prevalent microbial relations such as those involving general insect pathogens (Daleet al,2001; Hosokawaet al,2010).

The gamma-proteobacterium, Sodalis glossinidius, is the secondary symbiont of the tsetse fly (Diptera: Glossinidae). Tsetse flies have medical significance as obligate vectors of the parasitic Trypanosoma brucei sspet al, the etiological agents of African trypanosomiasis. In contrast to the primary symbiont Wigglesworthia glossinidia, which has a strict localization to the tsetse bacteriome and an extensive coevolutionary history with its host (Chenet al,1999), Sodalis exhibits a wider tissue tropism including the host midgut, hemolymph, and muscle (Cheng & Aksoy, 1999) with the symbiosis being of relatively recent origin (Weisset al,2006). The functional role of Sodalis within tsetse remains relatively unknown, although influences on enhancing host life longevity (Dale & Welburn, 2001) and vector competency (Welburnet al,1993; Farikouet al,2010) have been demonstrated.

Recent studies have shown that symbionts harbored within several host insect orders including Diptera, Coleoptera, Phthiraptera, and Hemiptera are highly related to Sodalis based on 16S rRNA gene sequences (Weisset al,2006; Fukatsuet al,2007; Novakova & Hyspa, 2007; Grunwaldet al,2010; Kaiwaet al,2010; Tojuet al,2010). These analyses indicate that this group of bacteria shares a recent common ancestor, despite now infecting a broad taxonomic range of hosts.

Selection pressures unique to ecological niches drive evolutionary diversification, with genomic alterations facilitating the adaptation to new habitats by bacteria. Outer membrane proteins, with known immunogenic properties, represent initial points of interspecific contact. Moreover, symbiont cell surfaces have been shown to be pivotal toward the homeostasis of host–bacterial relations (Weisset al,2008; Nyholmet al,2009). Among related microorganisms, genes encoding surface-associated proteins are likely to represent preliminary examples of divergence due to host background differences and consequential symbiont adaptation. We believe that surface-encoding genes, often representing hypervariable genes (Wimley, 2003; Zhenget al,2003), may prove to be significant markers not only in deciphering the evolutionary distance between recently diverged microorganisms such as the Sodalis-allied bacteria, but also toward identifying preliminary molecular alterations associated with inhabiting diverse hosts.

For this study, we extend molecular phylogenetic analyses for this specific clade of Sodalis-like insect symbionts, particularly focusing on the symbionts of the tsetse fly species Glossina morsitans, Glossina brevipalpis, Glossina fuscipes, and Glossina pallidipes, the slender pigeon louse Columbicola columbae (Phthiraptera: Philopteridae), and the bloodsucking hippoboscid fly Craterina melbae (Diptera: Hipposboscidae). We aim to further our understanding of their relatedness and identify initial effects associated with the colonization of different host species. The goals of the current study are: to assess intra/interspecies diversity of Sodalis, to provide 16S rRNA gene phylogenetic analysis of all ‘Sodalis-allied’ microorganisms described to date, and to compare the ability of surface encoding genes to systematically resolve relationships within this symbiont lineage.

Materials and methods


Tsetse flies, G. morsitans and G. brevipalpis, were maintained at West Virginia University within the Department of Biology insectary as described previously (Snyderet al,2010).

Interspecific diversity analyses

DNA isolation (C. melbae, G. morsitans, G. fuscipes, G. pallidipes, and G. brevipalpis) was performed using the Holmes–Bonner protocol (Holmes & Bonner, 1973). Nucleic acid extraction for C. columbae was performed using the QIAamp tissue mini kit (Qiagen, Valencia, CA). All samples were resuspended in 1 × Tris-EDTA following DNA isolation. DNA samples were subjected to PCR amplification of genes encoding putative outer membrane components; specifically ompA, the outer membrane protein A, ompC, the osmoporin protein C, and rcsF, ycfM, slyB, and spr, producing various outer membrane lipoproteins. PCR annealing temperatures, primers, and respective amplicon sizes are included in Supporting Information, Table S1. Notably, amplification reactions of ycfM from C. columbae and C. melbae and rcsF and slyB from C. columbae were not successful. Negative controls were included in each set of amplification reactions. The amplification products were analyzed by agarose gel electrophoresis and visualized with Kodak 1d image analysis software. The amplicons were purified using QIAquick PCR purification kit (Qiagen) and subject to DNA sequencing at the West Virginia University's Department of Biology Genomics Center on an ABI 3130xl analyzer (Applied Biosystems, Foster City, CA) using a 3.1 BigDye protocol (Applied Biosystems). For each sample, three to five amplicons were sequenced in both directions and contigs were assembled using Ridom Trace Edit (Ridom GmbH, Wurzburg Germany).

Assessing Sodalis intraspecies diversity within tsetse

The Sodalis ompA gene was amplified from two G. morsitans, G. fuscipes, G. brevipalpis, and G. pallidipes individuals. Amplicons were ligated into pGEM-T vector (Promega) and Escherichia coli JM109 cells were transformed. Four colonies per individual tsetse were verified for an ompA insertion and sequenced as described above.

Molecular phylogenetic analyses

All analyses included sequence data collected in this study or publicly available at NCBI GenBank. DNA sequences were aligned using the clustal x algorithm with default settings, and refined manually when necessary. Maximum parsimony (MP) and neighbor joining (NJ) analyses were performed with 1000 replicates in paup 4.0 (Swofford, 2002). MP heuristic searches utilized the tree-bisection-reconnection (TBR) branch-swapping algorithm with 200 Max trees and starting trees were created using stepwise additions. All MP analyses were performed twice, where gaps were treated either as ‘missing data’ or as a ‘fifth character state,’ with no differences noted between the results. NJ analyses implemented Kimura's two-parameter model (Kimura, 1980). Lineage support was measured by calculating nonparametric bootstrap values (n=1000) (Felsenstein, 1985).

The evolutionary models used for Bayesian analyses were determined using the Akaike Information Criterion in mrmodeltest 2.3 (Nylander, 2004). Bayesian analyses were performed in mrbayes 3.1.2 (Ronquist & Huelsenbeck, 2003), and the number of categories used to approximate the gamma distribution was set at four. Additionally, six Markov chains (Larget & Simon, 1999) were run for 3000000 generations for 16S rRNA gene and for 1000000 generations for surface-encoding genes. Posterior probability (PP) values were subsequently calculated. Stabilization of model parameters (burn-in) occurred around 2400000 and 800000 generations for 16S rRNA and surface-encoding genes, respectively. Every 100th tree after stabilization (burn-in) was sampled to calculate a 50% majority-rule consensus tree. All trees were constructed using the program figtree v1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/).

Genetic divergence analyses

dnasp (Librado & Rozas, 2009) was used to calculate synonymous (dS) and nonsynonymous (dN) rates and two common measures of nucleotide variation, π and θW, for determining ompA intraspecies variation within Glossina. Neutrality tests were also performed in dnasp. The McDonald–Krietman test and neutrality index (NI) were calculated by comparing the ratio of dS to dN mutations within either individual Glossina species for ompA, or among Glossina isolates for ompC, and an E. coli outgroup. The outgroup was composed of ecologically diverse E. coli representatives NC_000913, NC_008253, and NC_002655. These adaptive evolution tests have been shown to be most powerful when taxa are closely related (Clarket al,2003). We chose E. coli as our representative outgroup because it is a close relative of Sodalis, and has a wide representation of publicly available genome strains.

Nucleotide sequence accession numbers

The nucleotide sequences determined in this study have been deposited in the NCBI GenBank database under accession numbers HM626140HM626149 and HQ914651-HQ914697.

Results and discussion

Phylogenetic placement of tsetse fly secondary symbionts (Sodalis) based on 16S rRNA gene analyses

To examine the evolutionary relationships of the newly identified Sodalis-like symbionts, we constructed phylogenetic trees based on 16S rRNA gene sequences. Bayesian analysis supports the monophyly of Gammaproteobacteria symbionts isolated from diverse insect orders (i.e. Diptera, Coleoptera, Hemiptera, and Phthiraptera) (Fig. 1). In general, there is a tight clustering of symbionts with respective insect host Order. Our Bayesian analysis also suggests the closer relationship of hippoboscid symbionts to weevil and pigeon louse symbionts, rather than to Sodalis, despite a common ancestry of their respective hosts within the Hippoboscoidea (Petersenet al,2007), thus further substantiating a previous hypothesis of independent symbiont acquisition events by these hosts (Novakova & Hyspa, 2007). However, there is only moderate Bayesian support for this relationship (PP=77, data not shown) that is further decreased (PP=51) when symbionts of the recently reported chestnut weevil Curculio sikkimensis (Tojuet al,2010) and the stinkbug Cantao ocellatus (Kaiwaet al,2010) are included in the analyses. Analyses were unable to resolve the relationships of the symbionts harbored within the hippoboscid, chestnut weevil, and stinkbug indicative of relatively recent establishments and inadequate time for 16S rRNA gene diversification, or alternatively the transfer of these symbionts within these insect orders. With Bayesian analysis, symbiont relationships within the Sitophilus clade are highly resolved in comparison with that of Sodalis, where the scattering of host species (i.e. not reflective of Sitophilus speciation; Conordet al,2008) suggests independent acquisition within species. It is possible that horizontal transmission, in addition to the previously described vertical route (Heddiet al,1999), may also contribute to this phylogenetic patterning of symbionts; this warrants further study. Interestingly, although bacterial endosymbiosis is believed to be old within weevils (dating back approximately 125Myr), symbiont replacement is believed to have occurred multiple times in Sitophilus weevils with causative factors remaining speculative (Conordet al,2008).

Figure 1

Molecular phylogenetic tree of 16S rRNA gene sequences from Sodalis and allied bacteria. A Bayesian analysis tree created from 1509 aligned nucleotides is shown; NJ analyses gave essentially identical results (data not shown). Branches in bold were constrained with MP analysis. PP (shown as %, i.e. 95% represents a PP value of 0.95) and bootstrap values >50% are indicated at the nodes (−, <50% bootstrap), respectively. The branch lengths are measured in expected substitutions per site. Sequence accession numbers are provided. Host species are indicated for symbiotic bacteria, with colors representing insect orders. PS, primary symbiont; SS, secondary symbiont.

Sodalis isolated from in vitro culture maintained through serial passage formed its own monophyletic clade, supporting diversification from current Glossina isolates. While culture isolates were grouped together based on the 16S rRNA gene, Sodalis obtained from the same host species did not follow this pattern (i.e. symbionts within G. fuscipes, G. austeni, and G. palpalis) suggesting either no diversity between tsetse fly isolates or the lack of resolution due to the conserved nature of this locus. Distance analyses of the 16S rRNA gene also support the higher similarity of bacteria within the Sodalis clade, relative to that housing the Sitophilus symbionts (data not shown), which may explain why analyses were unable to further resolve these relations (Fig. 1). Importantly, many branches could not be robustly resolved warranting the need for additional inquiries utilizing genes that are typically associated with higher evolutionary rates such as those encoding surface-exposed molecules.

Phylogenetic placement of Sodalis-like symbionts based on surface-encoding proteins

To further our understanding of the divergence of ‘Sodalis-allied’ bacteria, particularly those found within various Glossina sppet al., C. columbae, and C. melbae, and to also assess the application of these surface encoding genes in future analyses extending into other related symbionts, we reconstructed their phylogeny using six putative outer membrane-encoding genes: rcsF, slyB, ompA, spr, ompC, and ycfM. With only a few exceptions (all spr and Glossina vs. C. melbae slyB comparisons), the genetic distances of surface-encoding loci between symbionts localized within hosts of different orders were greater in comparison with 16S rRNA gene.

In regards to the spr, slyB, and ycfM loci, although sufficient sequence similarities resulted in the Sodalis-like isolates forming a monophyletic clade within the Gammaproteobacteria distinct from many free-living members of this group, deeper taxonomic resolution was lacking (data not shown). The low phylogenetic signal provided by these loci suggests that they may not be involved in adapting to particular host species and/or may be structurally constrained. For example, comparative analyses of the spr lipoprotein amino acid sequence demonstrated the conservation of residues that form a unique Cys–His–His catalytic triad that is believed to form a substrate-binding cleft within the active site of this protein (Araminiet al,2008) between examined Sodalis isolates, C. melbae, and C. columbae symbionts.

The ompA, ompC, and rcsF loci (Fig. 2) appear to be more informative toward the phylogenetic resolution of the Sodalis-like symbiont clade. With rcsF, sufficient phylogenetic signal was provided to enable clustering of the Glossina symbionts, with strong support, separate from the C. melbae symbiont (Fig. 2b). Interestingly, rcsF in E. coli has been shown to be involved in signaling transduction of perturbations and/or environmental cues from the cell surface (Majdalaniet al,2005). Diversification between Sodalis and C. melbae isolates may indicate functional adaptations, such as differences in the type of signaling encountered within the host species background. The Sodalis symbionts also formed a distinct clade with the ompC phylogeny, with most mutations noted outside of the seven putative extracellular loops (Basleet al,2006) of the different Glossina isolates. The one exception occurred in extracellular loop 4, where host interspecies diversity was observed with Sodalis isolates.

Figure 2

Molecular phylogenetic analyses of putative outer membrane encoding gene sequences from Sodalis-allied symbionts which support diversification. Bayesian trees inferred from (a) 1164 unambiguously aligned nucleotides of the ompA gene and (b) 426 nucleotides of the rcsF gene. Significance values are indicated in Bayesian PP/MP bootstrap /NJ bootstrap. Branch lengths are measured in expected substitutions per site and depicted under each tree. (c) MP tree inferred from 1227 nucleotides of the ompC genes are shown with support values in the order of MP bootstrap /Bayesian PP/NJ bootstrap. Branch lengths depict the number of substitutions. Bold lines indicate discrepancies in tree renditions between analyses. Accession numbers are provided in parentheses. Host species are indicated for symbiotic bacteria; SS, secondary symbiont.

Relative to the other surface encoding genes analyzed in this study, the ompA gene exhibited the greatest diversity among symbionts due to a combination of point mutations and indels. The best-studied ompA gene variant, that of E. coli K-12, encodes a 325 amino acid polypeptide (Chenet al,1980). The N-terminal domain forms an eight-stranded β-barrel in the outer membrane, creating four surface-exposed loops (Pautsch & Schulz, 1998), while the C-terminus is periplasmic (Kloseet al,1988). Amino acid variations within outer membrane proteins mainly occur in the domains located in the extracellular regions, while interspaced residues making up the β-strands tend to be conserved. In our analyses, relative to Glossina symbionts, a total of nine nonsynonymous mutations were observed among C. melbae, C. columbae, and Sitophilus (i.e. Sitophilus oryzae primary symbiont, SOPE) symbionts occurring in loops 1–4 of the OmpA protein. Differences noted in the ompA sequence between the Glossina symbionts were localized outside of the extracellular regions, similar to our observations with ompC. In relation to ompA, the C. columbae symbiont exhibited the greatest nucleotide divergence resulting in its sister taxon placement relative to the other symbionts of interest with strong MP bootstrap support. MP, Bayesian, and NJ analyses all grouped Glossina symbionts within their own clade indicative of diversification potentially arising from host adaptation processes.

Molecular evolution of Sodalis-like symbionts

The Sodalis ompA gene demonstrated a wide nucleotide variation (π) within tsetse species (Table 1), with the highest π exhibited within G. morsitans (π=0.11) and the lowest within G. brevipalpis (π=0.001). This observation is not unprecedented as evidence of endosymbiont genomes (e.g. Wolbachia) undergoing either purifying or diversifying selection when examined from different host species has also been described with cell envelope component genes (Brownlieet al,2007).

View this table:
Table 1

Sodalis ompA nucleotide diversity within tsetse species and tests for neutral models of evolution

Symbiont host speciesπTotalθwω (dN/dS)NITajimas's DFu and Li's D*Fu and Li's F*Fu and Li's D*Fu and Li's F
Glossina morsitans0.1120.1612.034.331−1.75−1.86−2.04−2.39−2.72
Glossina pallidipes0.0460.0680.8280.627−1.83−1.97−2.16−2.19−2.56
Glossina fuscipes0.0040.00601.000−1.13−1.3−1.37−1.49−1.65
Glossina brevipalpis0.0010.00100.311−1.05−1.13−1.20−1.26−1.41

The neutrality index (NI), the ratio of synonymous to nonsynonymous mutations (dN/dS), was calculated using the McDonald–Kreitman test. Neutrality was examined within tsetse isolates (Tajima's D, Fu and Li's D*, and Fu and Li's F*) and also compared with the outgroup Escherichia coli (accession number NC_000913) using Fu and Li's D and Fu and Li's F.
Statistical significance:

π, average pairwise nucleotide diversity; θw, segregating sites per haploid genome.

Tests of neutrality (Tajima's D, Fu and Li's D* and F*, and Fu and Li's D and F) indicate a significant excess of young, rare alleles for Sodalis ompA within G. morsitans and G. pallidipes. In summation, three indices (π, dN/dS, and NI) support diversifying selection due to an abundance of low frequency Sodalis ompA haplotypes within G. morsitans. These observations may reflect the well-supported phenomenon of enhanced sequence evolution in endosymbiotic bacteria (Clarket al,1999; Canbacket al,2004; Fry & Wernegreen, 2005). Similar to other endosymbionts, the small effective population size of Sodalis, a consequence of severe population bottlenecks during maternal transmission (Rioet al,2006), predicts a larger proportion of nonsynonymous mutations due to drift that will generate higher dN to dS ratios (Ohta, 1972; Woolfit & Bronham, 2003).

Deviation from neutrality was also observed with Sodalis ompC isolates, as supported by a significant MK test (G=13.42, P=0.00025) when compared with E. coli. A high abundance of fixed dN substitutions within all Sodalis isolates provides strong evidence for positive selection at particular sites of the ompC gene. Notably, upon comparison of Sodalis with E. coli isolates, greater ompC amino acid sequence variation was observed at putative surface-exposed loops suggesting their significance in adaptive evolution toward ecological niches.

Here, we describe early genetic modifications likely involved in host adaptation within Sodalis-allied bacteria, specifically divergence in symbiont surface-encoding genes. In general, this particular class of loci exhibited greater genetic distances among Sodalis-like bacteria than the 16S rRNA gene traditionally used in phylogenetic analyses. Nevertheless, not all the surface-encoding genes examined in this study proved equivalent in their ability to resolve phylogenetic relations. Differences in selective pressures arising from distinct host physiologies and feeding lifestyles (Rioet al,2003; Tohet al,2006), as well as the influence of other host microbiota members (Snyderet al,2010) have been shown to affect symbiont genome evolution. Future studies should extend the phylogenetics of these surface-encoding loci, specifically rcsF, ompC, and ompA, to other recently identified Sodalis-related symbionts to enhance phylogenetic resolution. Functional assays should be pursued also to examine the relevance of surface-encoding loci toward the process of endosymbiotic adaptation and to determine whether the described differences are sufficient to constrict host species colonization.

Supporting Information

Additional Supporting Information may be found in the online version of this article:

Table S1. Primers, annealing temperatures(Ta), and resulting amplicon sizes.

Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.


We thank Baneshwar Singh and Drs Mariam Lekveishvili, Beckie Symula and Olga Zhaxybayeva for technical assistance. We are grateful to Drs Pierre Bize, Vaclav Hyspa, and Takema Fukatsu for providing C. melbae and C. columbae samples, respectively. We thank the Slovak Academy of Science and IAEA for tsetse pupae. We acknowledge the funding support of NASA NNX07AL53A, NIH R03AI081701 and NSF-REU DBI-0849917.


  • Editor: Ross Fitzgerald


View Abstract