OUP user menu

Bacterial virulence: can we draw the line?

Trudy M. Wassenaar, Wim Gaastra
DOI: http://dx.doi.org/10.1111/j.1574-6968.2001.tb10724.x 1-7 First published online: 1 July 2001


The molecular approach to microbial pathogenesis has resulted in an impressive amount of data on bacterial virulence genes. Bacterial genome sequences rapidly add candidate virulence genes to electronic databases. The interpretation of this overwhelming information is obscured because every gene involved in pathogenicity is called a virulence gene, regardless of its function in the complex process of virulence. This review summarizes the changing concept of bacterial virulence and the detection and identification strategies followed to recognize virulence genes. A refined definition of virulence genes is proposed in which the function of the gene in the virulence process is incorporated. We propose to include the life-style of bacteria in the assessment of their putative virulence genes. A universal nomenclature in analogy to the EC enzyme numbering system is proposed. These recommendations would lead to a better insight into bacterial virulence and a more precise annotation of (putative) virulence genes, which would enable more efficient use of electronic databases.

  • Bacterial virulence
  • Virulence gene nomenclature
  • Standardization
  • Comparative genomics

1 The concept of bacterial virulence

In 1890 Robert Koch postulated guidelines to establish a standard for evidence of causation in infectious disease. His postulates became the gold standard to define microbial virulence for over 100 years, despite limitations to their experimental applications for a number of microorganisms. Revisions of Koch's postulates were introduced to encompass those limitations in which immunological and/or epidemiological proof of causation was added (see [1] for a recent review). With the development of molecular biological techniques, it became possible to identify the genes encoding those factors responsible for virulence. This resulted in molecular microbiology, in which the role and function of specific genes (and the factors they encode) in (bacterial) virulence was the subject of investigation.

The quest for virulence genes evolved together with the technical development of molecular biology and genetic modification of microorganisms. In the beginning of molecular microbiology, genes were identified that encoded virulence factors of known reputation and these were used as probes to find analogs in other organisms. The function of individual genes and the factors they encode in virulence could be determined by random and targeted mutagenesis. Later, identified genes with unknown function were tested for their role in virulence. At present the challenge is to filter out virulence genes from complete bacterial genomes, which can now be sequenced faster than the time needed to establish the role of one single gene in virulence. To give such evidence, a molecular form of Koch's postulates was defined [2]: (i) the phenotype or property under investigation should be associated with pathogenic members of a genus or pathogenic strains of a species; (ii) specific inactivation of the gene(s) associated with the suspected virulence trait should lead to a measurable loss in pathogenicity or virulence; and (iii) reversion or allelic replacement of the mutated gene should lead to restoration of pathogenicity. An alternative postulate was added in case genetic manipulation was not possible: (iv) the induction of specific antibodies to a defined gene product should neutralize pathogenicity. This addition is sometimes taken alone in that when antibodies against a certain molecule protect an animal from disease, this is accepted as sufficient to call such a factor a virulence factor.

2 Detection and identification of virulence genes

The molecular approach to study bacterial virulence has resulted in a number of techniques that are based on different principles (Fig. 1). Firstly, genetic methods are used to obtain phenotypic evidence for a role in virulence. The approaches of gene inactivation and gene complementation are based on the second and third molecular postulates given above. Both principles are applicable to one-step mechanisms. For example, when the production of a bacterial toxin results in cellular damage, mutants with inactivated toxin genes no longer cause damage, and expression of these genes in another organism introduces toxinogenic properties. However, pathogenic bacteria can employ such complex processes that the experimental results of inactivation/complementation of these studies may be hard to interpret. Even under simplified in vitro conditions, a presumably straightforward process such as bacterial invasion is driven and regulated by multiple genes and gene loci, which work in concert or are complementary. Inactivation of one link of the chain may eliminate invasiveness, but complementation in a heterologous system may require several genetic loci. On the other hand, inactivation of a factor may be overcome by alternative factors so that loss of virulence is not observed, but complementation in a different genetic environment may have strong phenotypic effects.

Figure 1

Different approaches to identify virulence genes and virulence-associated genes.

Virulence factors are often immunogenic, thus when acquired immunity protects against disease, protective antibodies are frequently directed against virulence gene products (see Fig. 1). But the argument cannot be reversed: not all antigens are virulence factors. Some are structural components of the bacterial cell, and although these can have virulent properties (for instance lipopolysaccharide (LPS)), their function for the bacteria is primarily structural. Chaperonins can give rise to antibodies, but their function in protein folding and stress repair is of greater significance than their virulence properties. On the other hand, virulence factor candidates are sometimes discarded because they are not immunogenic during infection. The value of such reasoning remains disputable, for instance for intracellular pathogens.

As indicated in Fig. 1, a third approach to identify novel virulence genes comes from the observation that many virulence genes display antigenic polymorphisms, presumably to evade the selection pressure of the host immune system [4]. The correlation between polymorphism and virulence is so strong that the presence of mechanisms to produce polymorphic factors is indirect evidence for a role of that factor in virulence. With the high throughput of sequencing data, it becomes possible to identify putative virulence properties for genes based on the polymorphic nature of their predicted translation products [5]. Although most known examples of polymorphisms in bacteria are virulence genes, it is likely that this mechanism is employed for different functions also, for instance for adaptation to environmental conditions, as may be the case for contingency genes of certain bacterial species.

In addition to these approaches, several techniques have been developed to identify and characterize bacterial genes that are induced during in vivo infection and, potentially, may play a role in pathogenesis [6,7]. The (transcriptional) regulation of these genes may be a reflection of the new environment that bacteria have to adapt to after entering a host. Such gene products may not be involved in pathogenesis directly although they are a requirement for survival in the host.

A relatively new approach is to deduce information from comparative genetics [8]. The annotation of newly sequenced genes that are now rapidly generated by high-throughput genome sequencing is based on sequence similarity. To apply this to virulence genes is risky for two reasons. First, an acceptable level of sequence conservation is interpreted as conservation of function. However, genes may have a niche-adapted function in a particular organism, and this may be reflected in the role in virulence. Even a high degree of genetic conservation must be experimentally tested to demonstrate functional conservation. Second, sequence similarity searches have resulted in a new phenomenon for which the term ‘putativism’ would be appropriate. Sequence similarity to an experimentally defined virulence gene results in an entry of a ‘putative virulence gene’, and every next significant similarity to that gene will pass on this term to new genes, which may have little identity with the virulence gene that initiated the linkage. In a more severe case, the original entry of a ‘virulence factor’ in public databases may not be backed up with published experimental evidence. For example, the database entry of Salmonella typhimurium ‘virulence factor MviN’ has resulted in the identification of homologs of this gene in many species, such as Campylobacter jejuni, Escherichia coli K12, Neisseria meningitidis, Streptomyces coelicolor, Deinococcus radians, Thermotoga maritima, Helicobacter pylori, Rickettsia prowazekii, Treponema pallidum, and many others. It is hard to imagine that this gene encodes a virulence factor when it is present in organisms with such diverse life-styles, some of which are not even pathogens. In all cases the annotation of the homolog mentions ‘(putative) virulence gene/factor’ although no published record could be identified describing experimental evidence for a role of MviN in virulence in S. typhimurium. Mis-annotation based on ‘putativism’ is quite common, and contaminates electronic databases with misinformation.

Two other pitfalls of comparative genetics are facing opposite directions. On the one hand different genes that share no sequence homology can have identical functions, as is demonstrated for actA of Listeria monocytogenes and icsA in Shigella flexneri whose gene products recruit host cell actin (discussed in [3]). On the other hand sequence homology does not always predict function, or functional domains may not be conserved, as illustrated by comparison of calmodulin genes in Saccharomyces cerevisiae and vertebrates [9].

Ideally, for the identification of virulence genes, several approaches should lead to the same gene or set of genes, and a virulence gene should have more than one of the characteristics listed in Fig. 1. Even then, a controversy remains whether a gene is interpreted as being a housekeeping gene or a virulence gene. This situation prompts a more restricted definition of virulence genes, and more precise terms than those currently used.

3 How to define virulence genes

The definitions of bacterial virulence, and virulence factors, that have been in use over time have been summarized elsewhere [10]. The number of genes nominated ‘virulence genes’ depends on the definition used, as illustrated in Fig. 2. Most investigators draw a line somewhere between circles 2 and 3 of Fig. 2, since genes involved in basic cellular metabolism (‘housekeeping genes’) are not regarded as virulence genes. However, housekeeping genes can be screened as virulence genes, when their inactivation results in attenuation of virulence. This will be recognized for those genes for which a function in cellular metabolism is known, such as aroA. When the function of a gene product is not known, attenuation after inactivation results in the application of the term ‘(putative) virulence gene’. This problem is enlarged now that a total-genome mutagenesis approach is followed for pathogens whose complete genome sequence is available but whose virulence genes are still a mystery: every predicted (putative) open reading frame can now be mutated (providing the genetic tools are available) and tested for attenuation. In the definition of virulence genes one could include the requirement that the gene should be absent in non-pathogens (or non-pathogenic strains). If we accept this, LPS cannot be a virulence factor, since LPS genes are present in both pathogens and non-pathogens.

Figure 2

Depending on the definition of virulence, more or fewer genes are called ‘virulence genes’. The number of virulence genes and virulence-associated genes included in a given definition are represented by concentric circles. In collection 1, only those virulence factors are included that are directly involved in causing disease (‘true virulence genes’). The addition of ‘virulence-associated genes’ increases the number of identified virulence genes and thus the size of circle 2. The gene pool identified by inactivation and phenotypic characterization (see Fig. 1) includes all genes that lead to an attenuated phenotype as ‘virulence life-style genes’ (circle 3). The remaining genes are other housekeeping genes, structural genes, and essential genes. The border between collection 3 and the remainder cannot be exactly defined.

In order to exclude housekeeping genes from the set of virulence genes, the requisite is often added to Falkow's molecular postulates that virulence genes should not be expressed outside the host. This would again exclude many well-characterized and generally accepted virulence genes, for which the genes encoding LPS-producing enzymes are also an example: they are expressed under most if not all circumstances. Moreover, the lack of expression outside the host may be a reflection of the applied culture conditions. In conclusion, the border between virulence-associated genes and housekeeping genes remains poorly defined.

A solution to this problem is to distinguish genes directly involved in the pathogenesis of an organism from those genes that are required for a pathogenic life-style. The properties of pathogenic bacteria that are required to survive, multiply, and cause damage to a host are: the capacity to compete with other bacteria in the host; to gain a foothold within a specific host; to avoid normal host defence mechanisms; to multiply once established; and in the course of this process to produce damage to the host. This would be a definition of a pathogenic life-style. Virulence life-style factors would be all factors that are essential for this life-style; their genes could collectively be called virulence life-style genes. They make up a Pandora's box containing all genes related to virulence of a particular pathogen (Fig. 3). A subset of virulence life-style genes, in which virulence-associated genes are excluded, are the true virulence genes. They must be absent in non-pathogenic bacteria; their gene products must be involved in interactions with the host, and must be directly responsible for the pathological damage during infection.

Figure 3

The multi-compartment virulence Pandora's box. This box contains all virulence life-style genes. They consist of true virulent genes and virulence-associated genes. The true virulent genes are directly responsible for pathological damage and are absent in non-pathogens. Virulence-associated genes can be genes whose products process virulent factors (by post-translational modification, folding, secretion, etc.); genes encoding auxiliary virulence factors (that are needed for virulent factors to be active); genes that regulate expression of virulence genes; housekeeping genes that produce enzymes required for those metabolic processes required for the pathogenic life-style of the organism. More classes are listed in Table 1. Virulence-associated genes can be present in non-pathogens that live in association with a host (commensals, opportunists).

The proposed definition of true virulence genes excludes those genes that are involved in survival and multiplication in the host, and genes involved in expression, processing, or secretion of virulence factors; these genes could be defined in subclasses of virulence-associated genes as indicated in Table 1. All virulence life-style genes can either be structural genes (directly encoding the virulence life-style factors), or encode the enzymes to produce such factors.

View this table:
Table 1

Definitions for subclasses of virulence life-style genes

First digit: life-style of organismSecond digit: gene classDefinitionExamples of this classEvidence, commentsThird and further digits: subclasses
PA: virulence genes from bacteria that are exclusively pathogenic1. True virulence genesTheir gene products are directly involved in interactions with the host and are directly responsible for the pathological damage. These genes are exclusively expressed in pathogens.Cholera toxin, anthrax toxin, botulin toxin, shiga toxin, Bordetella adenylate cyclase toxin, etc.The pathological damage is induced by purified gene products and the gene is the structural gene for these products.Subclasses according to gene families, for instance, .1: RTX toxins, .2: enterotoxins, etc.
HS: virulence genes from bacteria displaying host-dependent pathogenicity2. Colonization genesTheir gene products enable colonization of a host and determine the localization of the infection.Adhesins, fimbriae, intimin, invasins.Inactivation will result in decrease in colonization potential. The factors make contact at the site of colonization..1: adhesins, .2: intimins, .3: invasins, .4: accessory genes of 2.1–2.3 (e.g. fimbrial subunits)
3. Defense system evasion genesTheir gene products are involved in evasion of the host immune system.Immunoglobulin-specific proteases, cytotoxins directed against immune cells, surface layers, slime polysaccharide.The role of these genes must be established for each pathogen.Subclasses according to the specific function
OP: virulence genes from opportunistic pathogens4. Processing virulence genesTheir gene products are involved in the biosynthesis of virulence life-style factors by enzymatic processing.Specific proteases, methylases, chaperonins, glycosyltransferases, with virulence life-style genes as a substrate.The enzymatic activity of the gene product must be proven. This activity must not be solely directed towards virulence life-style factors.Subclasses according to the type of processing, e.g. 1: chaperonins, 2: methylases, 3: glycosyltransferases, etc.
5. Secretory virulence genesTheir gene products are responsible for secretion of virulence life-style factors.Type III secretion machinery, type I secretion machinery.The role of the gene products in secretion of virulence life-style factors must be proven. Their activity may not be solely directed towards these factors.Subclasses: .1: type I secretion machinery genes, .3: type III secretion machinery genes
6. Virulence housekeeping genesTheir gene products provide nutrients during colonization, improve competition with other microbes, or provide the proper microenvironment.Urease, catalase, superoxide dismutase, siderophores, proteinase inhibitors. Flagella could also belong to this class although they are strictly speaking structural components of the organism.Inactivation will result in decrease in colonization potential although a direct role in colonization or immune evasion is absent. These genes are likely to be present in non-pathogens as well.Subclasses according to function
7. Regulatory genesTheir gene products are involved in regulation of virulence life-style gene expression.Alternative sigma factors, global regulators, specific transcription activators, regulators of phase variation by gene/promoter inversion.The role of these genes in virulence must be established for each pathogen..1: two-component regulators, .2: global regulators, .3: alternative sigma factors, etc.
  • Proposed classification and numbering system for bacterial genes encoding virulence factors. For simplicity only human pathogens are considered. The first digit of a given number would be PA, HS, or OP, depending on the life-style of the organism. The second digit is determined by the function of the gene in virulence. Genes belonging to classes 1–3 are true virulence genes; those belonging to classes 4–7 are virulence-associated genes. Third and further digits refine the system. For instance, Vibrio cholerae enterotoxin would be a PA:1.2 factor. Uropathogenic E. coli minor fimbrial subunit would be an OP:2.4 factor. The numbering system could be refined with more digits to provide a shorthand for each unique virulence gene in analogy to the EC enzyme nomenclature.

Every pathogen possesses a diverse and unique set of genes to allow it to cause disease. Focusing on the individual role of each of these genes is essential to understand the complex mechanisms behind disease. In addition the role of the host and his immune status in the outcome of disease need to be considered. Each pathogen has evolved a pathogenic strategy combining one or more mechanisms that operate in a concerted manner. Understanding the life-style of pathogenic and other bacteria can help to identify the genes relevant to pathogenicity.

4 Bacterial life-styles

Our focus on pathogenicity and virulence factors sometimes leads to the incorrect concept that pathogenic bacteria exist to cause disease in their host. Like every organism, pathogens have adapted to occupy an ecological niche. Their close association with a host causes damage to their host. Often this damage is ‘coincidental’, but it may even be beneficial to the survival or spreading of the pathogen (for example liberation of nutrients by cell damage, or enabling contagion of the next host by inducing coughing or diarrhea). The degree of damage is dependent on the equilibrium that results from the interplay of pathogen and host. The conditions that result in disease can vary between individuals, and between host species. Disease can be the result of the micro-organism being the ‘wrong’ host, while it lives as a commensal in other hosts. The distinction between ‘pathogen’ and ‘non-pathogen’ is not sharp, and the border between ‘virulence genes’ and all other genes is also fuzzy.

In the discussion of virulence gene definition, it is important to take the life-style of microorganisms into account, with emphasis on the probability for that organism to cause disease in a host. Bacterial life-styles can be ordered with an increasing probability to cause disease, varying from extremophiles (cryophile/thermophile, halophile, etc.), to non-colonizing bacteria (soil bacteria, marine bacteria, etc.), to commensal colonizers, to opportunistic pathogens and to exclusive pathogens. Evolutionary steps such as horizontal transfer of genetic information would more likely have an effect on virulence (and more likely be fixed in the population) when the transfer occurs between bacteria with a common life-style, or at the most with life-styles ordered near to each other on the scale of increasing probability of causing disease. True virulence genes would be found in the class of exclusive pathogens only, but other virulence life-style genes may be present in opportunistic pathogens and in other colonizers, since these genes are required for a life-style in close association with a host. Virulence life-style genes present in non-colonizing bacteria are a contradictio in terminis and imply that the gene has a different function in that particular organism.

Pathogens are constantly evolving, because the bacterial and the host population, as well as the ecological conditions that provide the interplay of both, undergo constant changes. Pathogens emerge and lose significance over time. Emerging infectious diseases are most likely caused by organisms that are already opportunistic or true pathogens and that have acquired additional DNA elements encoding a ‘true virulence determinant’, e.g. toxin-converting bacteriophage encoding cholera toxin or the Shiga toxins. Thus, a shift towards pathogenicity can be caused by changes in the bacteria, or, alternatively by a change in the susceptible hosts, or in the success of bacterial survival and contamination routes ex vivo. Bacterial factors of opportunists that are directly responsible for damage of susceptible hosts could be defined as ‘opportunistic virulence factors’ to differentiate them from true virulence factors, a term reserved for those organisms with an exclusively pathogenic life-style.

5 Virulence genes – what's in a name?

How important is it which genes we call virulence genes and how we further subdivide or classify this group? After all, the potency of the gene lies in its function, as dictated by its sequence, not in its name. For that reason genes are submitted to electronic databases with a description. Suppose one would like to get an overview of our current knowledge on bacterial virulence genes and virulence factors. By entering the key words ‘virulence factor AND bacteria’ in PubMed one would get over 1000 hits, and this number increases with time. Searching the Protein database of PubMed with these key words decreases the number of hits to 580, and searching the Nucleotide database gives ‘only’ 370 hits. Are these all genes encoding virulence factors? And can their function be learned from the annotation? Here are some examples to illustrate how genes are currently annotated.

  1. a ‘virulence factor’ homolog MviB is described in Aquifex aeolicus, a thermophile.

  2. gene XF2420 from Xylella fastidiosa is described as: product=‘virulence factor’ without further evidence why this is so.

  3. gene b1121 from E. coli K12 is described as: product=‘homolog of virulence factor’ and its function=‘putative factor; Not classified’.

  4. An ‘outer membrane virulence protein’ of V. cholerae is involved in the early steps of iron uptake. Its expression is iron-regulated, but does that suffice to name this a virulence gene?

  5. A gene encoding a cAMP binding protein from Pseudomonas aeruginosa is a probable DNA binding regulator that is required for production of exotoxin A and protease. Should we call this a virulence factor of an opportunistic pathogen?

PubMed databases are not generally used in the way described above. Suppose a more practical scenario. A cosmid library of a pathogen of which virulence genes are not yet characterized was screened in an in vitro model for virulence. Positive clones were sequenced and the identified open reading frames were compared with entries of the public databases. Would the scientific insight increase with the hits mentioned above?

These examples illustrate why we need a better annotation than the general term ‘virulence factor’.

6 Conclusions: where to draw the line

The handling of complex information requires simplification by classification. Unfortunately in nature there is no black and white, only shades of gray. This applies to the ‘pathogenicity’ of microorganisms as well as to the ‘virulent’ properties of their genes. If one can differentiate within the collection of virulence genes, at least for electronically stored annotation, the potential of database-generated research will increase. A refinement is required that recognizes shades of gray. In this contribution it is proposed to define subclasses of virulence genes that would give weight to their function in pathogenicity. Examples of such subclasses are given in Table 1. A nomenclature could be developed in analogy to the EC enzyme numbering nomenclature, in which a code defines the life-style of the organism, and the function of established virulence genes. Such a code would simplify database entry and retrieval of information. Our current concept may not be perfect, however a start must be made to reconsider the label ‘virulence’ that is so eagerly attached to genes.

Comparative genomics should include multiple alignment analysis of genes with significant similarity scores, to prove conservation of recognizable domains, before annotation is accepted. A reference to an entry in the database of a homolog for which experimental evidence is available should be included. The challenge for genome sequence projects is to predict a function for as many genes as possible. However, wrongly annotated genes are worse than ‘hypothetical protein’ entries because annotation spreads through databases. When virulence genes are more precisely annotated, this knowledge can be extrapolated and used for more exact entries of a newly sequenced genes.

With the shades of gray in virulence and virulence genes, the final conclusion would be that we cannot draw a clear line between virulence genes and all others. The best we can do is to subdivide genes according to their function. Until the role of a newly discovered ‘putative virulence gene’ is assessed for the organism (at the species or even subspecies level) predictions based on sequence similarity must be treated with a healthy amount of suspicion. If those genes whose function has been experimentally investigated are annotated more precisely, we can make optimal use of electronic databases and comparison of genetic data.


This study was commissioned by the Netherlands Ministry of Housing, Physical Planning and the Environment (VROM).


  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
View Abstract