Certain classes of pathogenic bacteria secrete virulence proteins in a Sec-independent manner, by a mechanism known as type III secretion. The main body of the export apparatus specific for virulence proteins is identified as a needle complex, which has a similar structural organization to flagella. The two structures share several proteins with highly homologous amino acid sequences. Even where the sequence identity is low among flagellar proteins from various species, the physico-chemical properties of each protein remain homologous. Therefore, by comparing the physico-chemical properties of unidentified proteins, it is possible to find homologs among type III secretion systems.
Type III secretion system
Bacteria have several life styles: alone, symbiotic, commensal or parasitic. Lone cells swim by means of flagella, looking for better environments to survive. Gregarious cells interact with themselves or with host tissues by means of pili or fimbriae, forming colonies or biofilms to be prosperous. Aggressive cells attack higher organisms by means of virulence secretion, directly injecting toxic molecules (mostly proteins) into their hosts either to invade or to kill. Tactical cells enter plant roots by a similar mechanism as invasive pathogenic cells employ, living in concord to form nodules.
The manner of protein secretion in pathogenic bacteria has been extensively studied as the virulence factor secretion system, which is divided into several types [1,2]. Six types have been reported so far; the first three types are well defined, but the rest are still controversial. The third type of secretion system, which I am going to discuss in this article, is called a type III secretion system or TTSS for short. For other types, please see more extensive reviews [3,4].
Despite the superficially well-organized naming of TTSS (actually, this handy name should be regarded as a tentative nomenclature for the classification of innumerable patterns of pathogenicity), there still remains an ambiguity in the definition, which has led us to the intriguing question, ‘Do flagella belong to type III or not?’ Differences in function of these two systems are obvious: the flagellum is for motility, but type III is specific for host cell infection, or more properly, for interactive communications between bacteria and higher organisms as typified by symbiotic Rhizobium . However, beyond historical backgrounds and phenomenological differences, they are similar; especially when we pay attention to the core structure of these systems, common features between the two will be more evident. Here, I will simply regard these systems as a protein export system and discuss how closely they are related. First we define this unified system as follows: (1) Proteins to be exported do not have the Sec-dependent signal sequences. (2) The export apparatus, consisting of many protein components, spans both the inner and outer membranes (or the cell wall).
Here, proteins to be exported in (1) include not only virulence factors but also flagellar proteins and effector proteins for nodulation. Some of them could have signal sequences, in case they are components of the export apparatus in (2), which is a supramolecular structure such as the flagellum or needle complex (NC) and contains both inner and outer membrane proteins. Although NCs have so far been found only in Salmonella typhimurium1 [6–8] and Shigella flexneri [9–11], the genetic information strongly suggests that other pathogenic species would retain NCs as the type III export apparatus, as will be seen below.
The fruitful outcome of recent genome projects has provided us with abundant sequence data of TTSS genes from many pathogenic and symbiotic bacteria. The genes with high sequence identity with certain known genes have been easily recognized by homology searches of databanks. The results have been well summarized in reviews, such as the stimulating reviews by Galan [12,13], a masterpiece by Hueck , or a more recent review .
However, many genes still remain unidentified. Although biochemical analysis of supramolecular structures provides more reliable data, it is laborious and time-consuming. Genetic information from many more pathogenic bacteria floods into databanks year after year. Therefore, there is an urgent need to establish methods to identify genes that were overlooked by conventional homology searches. This article presents a novel approach to identifying genes based on their physico-chemical properties, a method largely learned from the analysis of flagellar proteins.
2 Physico-chemical properties of flagellar proteins
The flagellar system of S. typhimurium, a most extensively studied species, consists of 22 structural proteins, six cytoplasmic chaperones, four structural chaperones, and three regulatory proteins [15–17]. Many flagellar proteins from other species have been identified by homology with Salmonella proteins. Among flagellar homologs, it is evident that not only their amino acid sequences but also their physico-chemical properties are well maintained, as shown below. Physico-chemical parameters may be directly deduced from amino acid sequences using an EXPASY tool Prot Param (http://www.expasy.ch/tools/protparam.html). For this study the parameters are defined as MS (molecular size in the number of amino acids), pI (isoelectric point), GRAVY (grand average of hydropathicity), AI (aliphatic index), and II (instability index).
Both GRAVY and AI indicate whether a protein is membranous or not, hence these parameters are not independent of each other. II distinguishes stable proteins from unstable ones. This index value is calculated from the amino acid sequence by a certain method deduced from experimental data . It is loosely related to protein half-life and gives a theoretical indication of protein stability. A protein whose II is smaller than 40 is predicted to be stable, a value above 40 predicts that the protein may be unstable (http://www.expasy.ch/tools/protpar-ref.html). In Fig. 1, the II value of each component protein of a flagellum is demonstrated in color; the smaller the index value, the colder the color, as shown in the color bar. Extracellular structures such as flagellar hook and filament are bluish, whereas cytoplasmic proteins are reddish, implying that extracellular proteins are more stable than cytoplasmic proteins. This tendency is conserved among other flagella and NCs.
Schematic drawing of a flagellum. The II of each component is indicated in color. The color bar from blue to red is linearly divided into IIs from 0 to 60.
In order to characterize proteins, I chose four independent parameters: MS, pI, II, and AI, as described above. As an example, these parameters of FlgE, the hook protein, from various strains are listed in Table 1. The % identity was calculated with respect to Salmonella FlgE (SALTY_FLGE), and although most of them vary less than 30%, the parameter values stay quite similar. Averaged values are: 430 (for MS), 4.73 (pI), 21.95 (II), and 78.20 (AI), indicating FlgE is a very stable, acidic, and soluble protein. There are some variations in the molecular size: Bacillus spp. have smaller FlgEs than Salmonella FlgE, and Campylobacter, Caulobacter, and Helicobacter pylori have larger FlgEs. The reasons for the differences have not been elucidated yet. Other flagellar proteins have also been characterized in this way (data not shown), and those averaged parameters have been used as moulds for identifying homologs.
Physico-chemical properties of Salmonella FlgE and homologs
aProtein sequence data were extracted by Entrez of NCBI (National Center for Biotechnology Information). Names of bacterial species are according to the SWISS PROT nomenclature: SALTY, Salmonella enterica serovar Typhimurium; ECOLI, Escherichia coli; AQUAE, Aquifex aeolicus; BACHA, Bacillus halodurans; BACSU, Bacillus subtilis; BORBU, Borrelia burgdorferi; BRUAB, Brucella abortus; BUCAI, Buchnera aphidicola; CAMJE, Campylobacter jejuni; CAUCR, Caulobacter crescentus; HELPY, Helicobacter pylori; HELMU, Helicobacter mustelae; PSEAE, Pseudomonas aeruginosa; RHIME, Sinorhizobium meliloti; RHOSP, Rhodobacter sphaeroides; TREDE, Treponema denticola; TREPA, Treponema pallidum; TREPH, Treponema phagedenis; VIBCH, Vibrio cholerae; VIBPA, Vibrio parahaemolyticus; ZYMMO, Zymomonas mobilis.
bNumbers in bold deviated far from the average and are not included in the calculation.
cPercentage of identity is calculated with respect to S. typhimurium proteins, using the EXPASY SIM alignment program for protein with the following parameters: gap open penalty: 8; Ggp extension penalty: 0; comparison matrix: BLOSSOM 30. Number of alignments computed was 10 and the % of the first alignment was chosen.
dAv/S.D. indicates the average value and the standard deviation. When the S.D. is smaller than 10% of the average value, they are regarded as being well conserved.
Before going into details of protein characterization by this method, I have to mention the chaotic situation of terminology of virulence genes. Salmonella and Shigella share common names for some virulence factors (Spa), but many others are different. Other pathogenic bacteria have their own names for virulence factors (e.g. ysc/yop for Yersinia, esc/esp for enteropathogenic Escherichia coli) . Many of them try to follow the terminology of Yersinia sp. at least for alphabetical assignment of each gene name (e.g. yscR∼escR), but it is still quite confusing (as seen later in Table 2). Therefore for the time being, it will be more practical for us to employ the terminology of the flagellar system, because the flagellum (mostly of S. typhimurium and E. coli) is well understood and its unified terminology is employed for all other motile species .
Physico-chemical properties of Salmonella FliP homologs and components of the proto-channel
aBacterial species employed are S. enterica serovar Typhimurium (SALTY), Yersinia enterocolitica (YEREN), S. flexneri (SHIFL), enteropathogenic E. coli (ECOLI), Pseudomonas aeruginosa (PSEAE), and Ralstonia solanacearum (RALSO). For proteins in the lower rows, the same set of bacterial species as in the upper rows was used.
bThe pI values often split into two, either acidic or basic. The reason remains to be elucidated.
cA combination of these MS and AI values should be compared.
Among all flagellar proteins and virulence factors, FliP, FliQ, FliR, and their homologs are the most conserved and hence easily identified by homology searches (identity is nearly 40% or higher). They form the core of the central channel in the export apparatus, the C rod of flagella [21,22]. Physico-chemical parameters of FliP and its homologs are listed in Table 2A. It can be seen that their MS, II, and AI are highly conserved with a few exceptions. Likewise, averaged values of these parameters for FliQ and FliR are listed. This triplet FliP/Q/R consists of membrane proteins and is easily distinguished from other proteins due to their high AI values. Interestingly, combinations similar to the FliP/Q/R triplet can be found in other systems, for example the Sec system (Sec F/E/Y) and F0F1-ATPase (ATP 5/8/6) as seen in Table 2B, suggesting the triplet is a primitive and universal structure for membrane channeling. Hereafter, I will call this universal structure the proto-channel.
FlhA, FlhB, and FliI, accessory proteins in the flagellar export apparatus, are also conserved as well as the triplet. FlhA and FlhB are membrane proteins, probably forming the C rod together with FliP/Q/R in the flagellum . FliI is a cytoplasmic ATPase, prominently resembling the β-subunit of F0F1-ATPase . The flagellar FliI forms a complex with the corresponding inhibitor FliH and a chaperone FliJ, roaming around for secreted proteins and eventually pushing them out through the proto-channel by utilizing ATP .
4 Analysis of type III components from sequence data
In order to identify other NC components that were overlooked by homology searches, I employed the MS and II parameters to make an initial screen of potential flagellar protein homologs. For example, gene products of Salmonella flagellar proteins and Salmonella pathogenicity island 1 (SPI1) are plotted in maps of MS vs. II (Fig. 2). Dots are widely distributed between both axes, giving a pattern like a star chart in which dot patterns from the two species look similar. Dots at close positions are corresponding proteins; proto-channel proteins mentioned above locate at similar positions as shown in numbers in Fig. 2. By selecting dots around a position of interest (relative positions against surrounding dots are important), several homologs for known proteins have been selected. The number of candidates can then be narrowed down by comparing other parameters or topological information, provided by programs such as DAS transmembrane prediction server (http://www.sbc.su.se/~miklos/DAS/) or SOSUI (Classification and Secondary Structure Prediction of Membrane Proteins) (http://sosui.proteome.bio.tuat.ac.jp/~sosui/).
Dot patterns of flagellar proteins and type III proteins measured by MS and II. Six proteins whose physico-chemical properties are well conserved are indicated by number: (1) FliP/SpaP, (2) FliQ/SpaQ, (3) FliR/SpaR, (4) FlhB/SpaS, (5) FlhA/InvA, and (6) FliI/InvC (see text).
Table 3 summarizes virulence proteins thus identified from S. typhimurium (SALTY/SPI1 and SPI2; there are two sets of TTSS on the chromosome), Yersinia enterocolitica (YEREN/Ysc and Ysa; there are also two sets of TTSS, the former on a plasmid and the latter on the chromosome), S. flexneri (SHIFL), enteropathogenic E. coli (EPEC), Pseudomonas aeruginosa (PSEAE), and Ralstonia solanacearum (RALSO). Many of them (bold letters) have already been identified or indicated by other methods. Since many candidates can be picked by my method, only a few of the strongest candidates are indicated. From the similarity of these gene products, it is very likely that both animal and plant pathogens harbouring TTSS have the NC as a secretion apparatus.
Proteins in bold letters have already been identified or indicated by other methods, and the others are candidates picked by my method. Those with the question mark have a few more candidates, but only the strongest candidate(s) are indicated. When candidates are too many to mention, the column was left blank with a question mark.
aS. typhimurium has two sets of TTSS (SPI1 and SPI2) on the chromosome. Y. enterocolitica also has two sets of TTSS: the ysc genes on a plasmid and the ysa genes on the chromosome.
bChaperones and effector proteins are not complete, because the number differs from species to species.
cNeither SpaO nor InvE has been found in the preparation of NC.
dInvH is responsible for the penetration of the outer membrane.
ePrgJ is a minor protein of NC. It could be a rod of NC.
fInvJ is not a structural protein, but it is responsible for length control of the needle. For details, see .
5 Several components distinguishable from others
Membrane proteins are easily recognized by their high AIs (more than 100). The topological profiles of the membrane proteins obtained by DAS or SOSUI show several patterns of transmembrane (TM) segments, which are distinguishable from others (data not shown).
FliF, a component of the MS ring complex (MS in this context refers to the flagellar rings), is predicted to have two TM segments in the terminal regions; one each at the N- and C-terminus . This profile imposes strict restriction in the selection of candidates, and in most cases only one candidate was picked (Table 4A). In general, MS is usually well conserved among homologs. However, it is evident that FliF homologs were divided into two groups: flagellar protein (FliF) and virulence protein (represented by PrgK). The MS of FliF is double that of PrgK. This agrees with the fact that the diameter of the MS ring complex is larger than that of the NC basal ring despite the similarity between the whole structures. In addition, the pI values of the two groups differ: FliF is acidic (pI=5.8), while PrgK is basic (pI=8.1). The meaning remains to be elucidated.
aNumbers in bold deviated far from the average and are not included in the calculation.
bThe average of FlgE homologs is taken from Table 1.
The needle is a distinguished structure: straight, thin, and short. The length is well regulated (45±3 nm) [8,9], and therefore it corresponds to the hook rather than the flagellar filament despite the difference in shape. To be consistent with this, the needle length can be changed in the same manner as the hook. In fliK mutants, the hook is elongated to undefined lengths, thus called polyhooks. The virulence gene invJ, a fliK homolog, is also responsible for length control; invJ mutants give rise to uncontrolled long needles, polyneedles . Overproduction of the hook protein also gives rise to polyhooks under certain conditions [24,26]. Overproduction of Shigella MxiH, a component protein of the needle, gave rise to polyneedles, confirming the similarity of length control mechanisms in the two systems . In the mechanism of the length control, it was suggested that the physical capacity of the export apparatus was involved . The needle is thinner than the hook (8 nm vs. 20 nm), so naturally the MS of needle component proteins is smaller than that of the hook proteins (Table 4B). The II is as low as that of the hook (averaged II=28.34), indicating that the structure is highly stable.
In order to penetrate the outer membrane, bacteria often employ ring structures. The flagellar PL ring complex, also called outer rings, is composed of FlgH and FlgI ; secretins in the type II system are PulD and PulS ; and the outer ring of type III is InvG . All these component proteins have Sec-dependent signal peptides and one TM segment in the N-terminus. In spite of the differences of their MS and consequent different diameters, these rings appear quite similar in structure, showing a perfect roundness. Although the inner diameter of the PL ring is a little larger than that of the PulD/InvG rings, the MS of component proteins are opposite: for FlgI, MW=369, and for PulD/InvG, MW=572. Both have an II in the range 30–40 and an AI of 95–105.
Generally speaking, cytoplasmic proteins and secreted proteins have low AI values and high II values (that is, extremely soluble and unstable), like the majority of proteins, and therefore, it is not easy to uniquely identify soluble proteins. A combination of MS, pI, AI and II values still allows us to select homologs, but their identification awaits direct proof by other methods. Many of the soluble proteins in this system are effector proteins (also called exoproteins) and their corresponding chaperones. Since these proteins are specific for their own hosts, their properties deviate further from each other. It is noteworthy that some flagellar chaperones share homology with cytoplasmic virulence factors, which are probably chaperones. Consequently, their substrates resemble each other: flagellar filamentous proteins and effector proteins as indicated in Table 3. This homology has obviously nothing to do with their existing function but might indicate their structural origin.
6 Origin of flagella
‘Which is older, flagella or NC?’ is a big question being argued at the moment. A prevailing opinion  claims that “The bacterial virulence had to wait long till eucaryotes appeared on the earth. Besides, TTSS is found only in a limited number of Gram-negative pathogenic bacteria, whereas flagella prevail in a much wider range of bacterial species, including both Gram-negative and -positive ones. Flagella must have come first.”
Against this kind of a posteriori justification, one may come up with an a priori reasoning: ‘There must have been organisms (already extinct!) other than bacteria even before the birth of eukaryotes. The predator–prey relationship has been continuing since the creation of life. Even today, certain bacteria prey on other bacteria. Bacteria are not ascetic but avaricious. Flagella and virulence could have evolved in parallel.’ I favor the latter (of course, this is a matter of opinion at this moment). The flagellum is a beautifully designed architecture almost completed in evolution . Why should those sophisticated skills be abandoned to go back to boring soluble proteins?
I have discussed the structural characteristics of the type III protein export apparatus in relation to both flagella and the NC. Analysis has shown that physico-chemical properties (molecular size, isoelectric point, instability index, and aliphatic index) of the component proteins are well maintained, even when the sequence homology is low. Therefore, the combination of those parameters is useful for the identification of functional homologs.
Here, plant pathogens were regarded in a similar way to animal pathogens, because they retain a set of genes necessary for NC formation. We learned that flagella and the type III secretion system consist of homologous component proteins with common physico-chemical properties, suggesting these two systems could have evolved in parallel.
I thank Jorge Galan, Tomoko Kubori, Ariel Blocker, Christine Josenhans, and Sarah Daniell for their critical reading of the manuscript. I also thank Sho-taro Ishii for figures.
↵1 The new formal name is Salmonella enterica serovar Typhimurium.