OUP user menu

Novel domains of the prokaryotic two-component signal transduction systems

Michael Y. Galperin , Anastasia N. Nikolskaya , Eugene V. Koonin
DOI: http://dx.doi.org/10.1111/j.1574-6968.2001.tb10814.x 11-21 First published online: 1 September 2001

Abstract

The archetypal two-component signal transduction systems include a sensor histidine kinase and a response regulator, which consists of a receiver CheY-like domain and a DNA-binding domain. Sequence analysis of the sensor kinases and response regulators encoded in complete bacterial and archaeal genomes revealed complex domain architectures for many of them and allowed the identification of several novel conserved domains, such as PAS, GAF, HAMP, GGDEF, EAL, and HD-GYP. All of these domains are widely represented in bacteria, including 19 copies of the GGDEF domain and 17 copies of the EAL domain encoded in the Escherichia coli genome. In contrast, these novel signaling domains are much less abundant in bacterial parasites and in archaea, with none at all found in some archaeal species. This skewed phyletic distribution suggests that the newly discovered complexity of signal transduction systems emerged early in the evolution of bacteria, with subsequent massive loss in parasites and some horizontal dissemination among archaea. Only a few proteins containing these domains have been studied experimentally, and their exact biochemical functions remain obscure; they may include transformations of novel signal molecules, such as the recently identified cyclic diguanylate. Recent experimental data provide the first direct evidence of the participation of these domains in signal transduction pathways, including regulation of virulence genes and extracellular enzyme production in the human pathogens Bordetella pertussis and Borrelia burgdorferi and the plant pathogen Xanthomonas campestris. Gene-neighborhood analysis of these new domains suggests their participation in a variety of processes, from mercury and phage resistance to maintenance of virulence plasmids. It appears that the real picture of the complexity of phosphorelay signal transduction in prokaryotes is only beginning to unfold.

Keywords
  • Signal transduction
  • Genome sequencing
  • Conserved motif
  • Domain organization
  • Histidine kinase
  • Phosphorylation
  • Phosphodiesterase
  • Cyclic nucleotide

1 Introduction

The availability of the complete sequences of more than 40 microbial genomes representing eight of the 10 main bacterial phyla and both major branches of archaea (http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/new_micr.html) increasingly impacts our understanding of microbiology, providing ample data for the analysis of the metabolism, cell organization, and evolution of prokaryotes [1,2]. Cross-genome comparisons improve our understanding of each particular genome, allowing prediction of general (biochemical) functions of uncharacterized genes based on their conserved domain organization, gene neighborhood (operon organization), and phylogenetic patterns (presence in some species but not others) [35].

Domain architecture has proven particularly informative for analyzing multi-domain proteins involved in signal transduction. In sensor histidine kinases, several new domains have been described, such as the phosphotransfer Hpt domain [6], the heme- and flavin-binding PAS domain [7,8], the extracellular ligand-binding Cache domain [9], the cGMP-binding GAF domain [10,11], and the HAMP linker domain [12] (see [1317] for recent reviews). Analysis of the downstream signal transduction module revealed an even greater diversity. In addition to well-known response regulators, which consist of a CheY-like phosphoacceptor domain and a helix–turn–helix (HTH) DNA-binding domain, bacterial genomes were found to encode a variety of response regulators with unusual domain organization, featuring still poorly characterized domains, such as GGDEF [1820], EAL [19,21], and HD-GYP [22,23].

Because of their complex domain organization, signaling proteins are often poorly annotated in sequence databases such as GenBank, most often just as ‘sensor protein’ or ‘response regulator’. However, detailed sequence and structure analyses of these novel domains have been performed and sequence alignments are currently available in several protein domain databases, including SMART [24], COGs [4], and Pfam [25] (Table 1).

View this table:
1

Conserved domains of the bacterial signal transduction systems

Domain nameLength (aa)FunctionStructureDomain database entryaReference
SMARTCOGPfam
Sensor moduleb
FliY∼220amino acid binding2laoPBPb3438PF00497[26]
Cache∼120small ligand binding3290dPF02743[9]
MHYT∼200metal binding?transmembrane3300
PAS∼100FAD, heme, and cinnamic acid binding2phyPAS2202PF00989[13]
PACPF00785
GAF∼150cGMP binding, photopigment binding1f5mGAF2203PF01590[10,11]
HAMP∼50dimerization?1joyHAMP2770PF00672[12]
His kinase 1∼80phosphoacceptor, dimerization1joyHisKA0642dPF00512[14,15,17]
His kinase 2∼120Phosphorylation of His kinase 1 domain1bxdHATPase0642dPF02518[14,15,17]
Hpt∼100phosphoacceptor2a0bHPT2198PF01627[6,14]
Response moduleb
CheY∼120phosphoacceptor2cheREC0784PF00072[15,51]
HTH∼120DNA bindingccc[15,17]
AAA∼300σ54-binding ATPase1d2nAAA2204PF00158[52]
GGDEF∼170c-diGMP formation?similar to 1cjv?DUF12199PF01590[19,33]
EAL∼250c-diGMP hydrolysis?DUF22200PF00990[19]
HD-GYP∼170phosphodiesterase?2206[22]
  • aProtein domain databases that contain sequence alignments of these signaling domains include SMART (http://smart.embl-heidelberg.de[24]), COGs (http://www.ncbi.nlm.nih.go/cog[4]), and Pfam (http://www.sanger.ac.uk/Software/Pfam[25]).

  • bGGDEF, EAL and HD-GYP domains occur in fusions with sensor module domains as well as response module domains (CheY). Therefore, their classification as parts of response modules is based on their predicted functions and requires experimental verification.

  • cHTH domains of response regulators are not listed as separate domains in SMART, COGs, or Pfam.

  • dMembers of this COG contain more than one domain.

Here, we briefly review the diversity of newly discovered prokaryotic signaling domains and discuss the emerging complex picture of prokaryotic signal transduction.

2 Recently discovered prokaryotic signaling domains

The archetypal two-component signal transduction systems include a sensor module, which consists of an extracytoplasmic or membrane-associated sensor input domain and a cytoplasmic histidine kinase domain with an ATPase and phosphoacceptor subdomains (Table 1), and a response regulator, which consists of a receiver CheY-like domain and a DNA-binding domain. Functions of these domains and of the PAS domain, commonly found in the sensor module, have been reviewed recently [1317] and will not be considered here in detail.

2.1 Extracytoplasmic ligand-binding sensor domains

Periplasmic (in Gram-positive bacteria – extracytoplasmic) ligand-binding sensor domains are extremely diverse. The most common type of such domains (FliY-type, Table 1) is homologous to the periplasmic solute-binding protein components of the ATP-dependent transport systems [26]. Several sensor kinases, for example Escherichia coli EvgS, contain duplicated FliY-type domains followed by a transmembrane segment that anchors them to the membrane. Another periplasmic ligand-binding domain, Cache, is found in sensor kinases and in the extracytoplasmic parts of methyl-accepting chemotaxis proteins, such as Bacillus subtilis McpA and McpB [9]. There are many other types of ligand-binding sensor domains that are apparently specific for the recognition of narrow groups of substrates, such as metals, citrate, nitrate, etc.

Despite their diversity, many sensor modules have the same domain architecture with an N-terminal transmembrane segment (likely uncleavable signal peptide), a relatively large (100–300 aa) periplasmic domain, and a second transmembrane segment, followed by a HAMP domain and a cytoplasmic signal-transducing domain. In addition to extracytoplasmic ligand-binding domains, membrane-bound signaling domains also exist, as exemplified by the recently identified MHYT, a predicted metal-binding, redox-sensing domain (MYG, T.A. Gaidenko, A.Y. Mulkidjanian, and C.W. Price, submitted for publication). The diversity of sensor domains probably reflects the wide range of environmental stimuli that elicit regulatory responses in bacterial cells.

2.2 GAF domain

The GAF domain was originally described as a non-catalytic cGMP-binding domain conserved in cyclic nucleotide phosphodiesterases [27]. Subsequently, this domain was recognized in cyanobacterial adenylate cyclases and, finally, in histidine kinases and certain other proteins [10]. In spite of limited sequence similarity, the structure of the GAF domain turned out to be very similar to that of the PAS domain [11], indicating their common ancestry. In bacterial and plant phytochromes, the GAF domain contains a small insertion with a conserved Cys residue that serves for covalent attachment of photopigments [28,29]. GAF domains have been also found in association with a variety of other protein domains, such as PEP-dependent phosphotransferase (PTS Enzyme I [30]), PP2C-type protein phosphatase, NtrC-type ATPase, GGDEF, and EAL [10].

2.3 GGDEF domain

The GGDEF domain (Fig. 1A) was first discovered in the response regulator PleD that controls cell differentiation in the swarmer-to-stalked cell transition in Caulobacter crescentus[18]. PleD and its cognate histidine kinase PleC were first described as members of a typical two-component signal transduction system. However, instead of a typical CheY-HTH domain organization, PleD was found to consist of a CheY domain and a previously uncharacterized domain that was dubbed GGDEF based on its conserved sequence motif (Fig. 1A) [18]. This observation attracted little attention until it turned out that GGDEF is encoded in many bacterial genomes (Table 2), including 19 copies in E. coli and four copies in B. subtilis[22].

1

Consensus sequences of the recently discovered signaling domains. A: GGDEF domain. B: EAL domain. C: HD-GYP domain. The residue numbering is from T. maritima protein TM0107 (A), E. coli YdiV (B), and A. aeolicus aq_2027 (C). The GGDEF motif comprises residues 114–118 in A, the EAL motif comprises residues 29–31 in B, the HD-GYP motif corresponds to the residues 54–55 and 115–117 in C. Consensus sequences of the 218 GGDEF domains, 128 EAL domains, and 36 HD-GYP domains encoded in complete microbial genomes (Table 3) were drawn using the SeqLogo program [60] in the WWW-based implementation by Steven Brenner (http://www.bio.cam.ac.uk/seqlogo). The letters in each position represent amino acid residues found in that position; the height of each letter reflects the fraction of sequences with the corresponding amino acid residue in that position (the degree of conservation). The total height of each column indicates statistical importance of the given position. The residues are colored as follows: N, Q – green; K, R, H – blue; D, E – red; F, L, I, M, V – yellow; the rest – purple.

View this table:
2

Inventory of signaling domains in complete prokaryotic genomes

SpeciesaGenome size (kb)Total number of proteinsSensor modulebResponse moduleb
CachePASGAFHisKinCheYcHptCheYdGGDEFEALHD-GYP
Bacteria
Mesorhizobium loti 7036 6752 236 10 62 15 2 5732 181
Pseudomonas aeruginosa 6264 5565 6 42 9 63e 19 11 7533 213
Escherichia coli 4639 4289 3 14 9 28e 5 6 3219 170
Bacillus subtilis 4215 4100 10 14 3 33e 0 1 364 30
Bacillus halodurans 4202 4066 8 15536e 2 1 484 22
Mycobacterium tuberculosis44123918023150013120
Vibrio cholerae 4033 3827 20 30 541e 11 10 4941 229
Caulobacter crescentus 4017 3737 3 26 6 62 28 2 4511 100
Synechocystis sp. 3573 3169 2 26 28 42e 17 7 41e23 132
Mycobacterium leprae326827200315005320
Xylella fastidiosa26792766041145220331
Deinococcus radiodurans 2649 2580 0 7 3 21 5 0 2516 54
Lactococcus lactis 2365 2266 0 2 1 7 0 0 70 00
Neisseria meningitidis218421210115005000
Thermotoga maritima 1861 1846 6 4 2 9 0 1 129+2f 09
Haemophilus influenzae183017090104026000
Campylobacter jejuni16411654540711111+100
Helicobacter pylori166815661004119000
Aquifex aeolicus 1551 1522 1 7 4 4 0 0 511 81
Chlamydia pneumoniae123010520101002000
Treponema pallidum1138103110210141+103
Chlamydia trachomatis10428940201002000
Borrelia burgdorferi9118500104126111
Rickettsia prowazekii11118340004005110
Mycoplasma pneumoniae8166770000000000
Ureaplasma urealyticum7526110000000000
Buchnera sp. APS 641 564 0 0 0 0 0 0 00 00
Mycoplasma genitalium5804670000000000
Archaea
Archaeoglobus fulgidus 2178 2420 0 25 5 14 0 1 110 00
Halobacterium sp. NRC-1 2014 2058 0 15 7 14 3 1 60 00
M. thermoautotrophicum 1751 1869 0 15 4 16 3 0 80 00
Pyrococcus abyssi 1765 1765 1 1 0 1 0 0 10 00
Pyrococcus horikoshii 1739 ∼1750 1 0 0 1 0 1 10 00
Aeropyrum pernix 1670 ∼1720 0 0 0 0 0 0 00 00
Methanococcus jannaschii 1665 1715 0 0 0 0 0 0 00 00
Thermoplasma acidophilum 1565 1478 0 0 1 0 0 0 00 00
  • aThe names and data for non-obligate parasites are in bold. Complete genome sequences and corresponding references are available in the NCBI Entrez Genome division at http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/micr.html.

  • bEach number represents the number of proteins in a given genome that contain the corresponding domain; multiple occurrences of the same domain (e.g., PAS, CheY) on a single polypeptide chain are counted as one. The numbers are from the COG database (http://www.ncbi.nlm.nih.gov/COG, [4]) and/or from the results of iterative PSI-BLAST searches of domain-specific profiles against a database of proteins, encoded in completely sequenced microbial genomes [38,53,54]. These numbers were also compared against those in SMART [24]) and those reported in [41]. Complete lists of proteins that contain each particular domain are available at ftp://ncbi.nlm.nih.gov/pub/galperin/TwoCompCensus.html.

  • cCheY-like domains of the ‘hybrid’ sensor kinases (found on the same polypeptide chain with the His kinase domain, see [39]).

  • dCheY-like domains found in the response regulators (associated with HTH, ATPase, CheB, GGDEF, or HD-GYP domains), as well as stand-alone CheY-like domains.

  • eThese numbers do not include signaling proteins of the LytS family (COG3275, COG2972), predicted to be divergent His kinases [40]. The discrepancies with the numbers reported in [39] are due to the addition of Synechocystis sp. His kinase slr1212 and response regulators slr0687, sll1544, sll1879, and slr2041 and the exclusion of stand-alone Hpt domain slr0073 from the list of histidine kinases.

  • fThese domains are likely to be inactivated.

Recently, it was shown that PleD mutants with an intact CheY domain but lacking the GGDEF domain are defective in flagellar degradation and stalk formation during cell differentiation in C. crescentus[20]. These data directly demonstrate the involvement of the GGDEF domain in signal transduction, a role that was previously proposed on the basis of the association of this domain with CheY and PAS domains [18] in multi-domain proteins.

Although the functions of most of the GGDEF domain-containing proteins remain uncharacterized, some clues have emerged from a study of the regulation of the biosynthesis of extracellular cellulose in Acetobacter xylinum (recently renamed Glucoacetobacter xylinum) [19]. An extensive study of this process by Benziman and colleagues showed that it is regulated by cyclic diguanylate (c-diGMP, bis(3′,5′)-cyclic diguanylic acid), a novel effector molecule that consists of two cGMP moieties bound head-to-tail [31]. The exact mechanism remains unclear, but it apparently involves c-diGMP binding to some membrane proteins that activate expression and/or secretion of cellulose synthetase [32]. The search for the enzymes that synthesize and hydrolyze c-diGMP resulted in the identification of six open reading frames (ORFs) with an almost identical domain composition [19]. All of these proteins contain N-terminal PAS domains, followed by the GGDEF domain and another uncharacterized domain, which was dubbed EAL based on its conserved sequence motif (see below). Based on the properties of mutants in which these ORFs were inactivated by insertions, Tal et al. concluded that the GGDEF domain of each of these proteins was responsible for its diguanylate cyclase activity [19].

This conclusion has not been directly verified by demonstrating the enzymatic activity of the recombinant protein, so there remains a (remote) possibility that the ORFs described by Tal et al. [19] just regulate expression of diguanylate cyclases and phosphodiesterases. However, the notion that the GGDEF domain is a diguanylate cyclase has recently received support from a detailed analysis of its sequence. Using iterative PSI-BLAST searches and threading, Pei and Grishin aligned the GGDEF domain with eukaryotic adenylate cyclases [33]. Although the level of sequence similarity between the two domains was low, conservation of the proposed nucleotide-binding loop, which corresponds to the GGDEF motif, was compatible with the cyclase activity of the GGDEF domain [33].

2.4 EAL domain

The EAL domain (Fig. 1B) was originally described in a study of BvgR protein in Bordetella pertussis[21]. Under the conditions in which the genes encoding the major virulence factors are activated, virulence-repressed (vrg) genes are turned off in a BvgR-dependent fashion. Both these processes are under the control of a two-component regulatory system, BvgAS [21]. These observations established BvgR as a component of the signal transduction system in B. pertussis. A direct interaction of BvgR with DNA was suggested based on the similar expression patterns and molecular masses of BvgR and a putative transcriptional regulator, previously demonstrated to bind to a regulatory sequence within the coding region of vrg genes [34]. However, no direct experimental evidence for such an interaction has been provided. Sequence comparisons indicated the presence of a BvgR-like domain in a number of other poorly characterized proteins from diverse bacteria [21].

The same BvgR-like domain, dubbed EAL after its conserved residues, was independently discovered in tandem with the GGDEF domain in putative diguanylate cyclases and phosphodiesterases that regulate cellulose synthesis in A. xylinum[19]. Since diguanylate cyclase activity has been assigned to the GGDEF domain (see above), the EAL domain emerged as a good candidate for the role of a diguanylate phosphodiesterase. Indeed, the sequence of this domain contains several conserved acidic residues that could participate in metal binding and potentially might form a phosphodiesterase active site (Fig. 1B).

Other experimentally characterized proteins containing the EAL domain are listed in Table 3. It should be noted that YuxH (ComB) protein from B. subtilis, which was originally described as a transcriptional regulator of late competence genes, was later shown not to be required for this regulation [35]. The rtn gene of Proteus vulgaris, originally identified through its effect on the infection by phages λ and N4 [36], was later found in E. coli. The effect of rtn mutation was suppressed in cells grown on maltose, indicating that it might affect expression or membrane localization of LamB, which participates both in maltose transport and attachment of λ and N4.

View this table:
3

Partly characterized proteins containing GGDEF, EAL, and HD-GYP domains

Organism, protein nameGenBank accession numberDomain organizationaFunction, operon structureReference
Cell differentiation
C. crescentus PleDL42554CheY-xCheY-GGDEFrequired for swarmer-to-stalked cell transition[18,20]
Biosynthesis of extracellular polysaccharides
A. xylinum DGC1AF052517PAS-GGDEF-EALrequired for cellulose biosynthesis[19]
Rhizobium leguminosarum CelR2AF121341CheY-GGDEFrequired for cellulose biosynthesis[55]
X. campestris RpfGAJ251547CheY-HD-GYPrequired for biosynthesis of extracellular polysaccharide[23]
Virulence, biosynthesis of extracellular proteins, adhesion
B. pertussis BvgRAF071567EALregulates transcription of vrgs[21]
Klebsiella pneumoniae FimKAAA25064HTH-EALfollows the operon, encoding type 1 (mannose-sensitive) fimbriae, is required for their adhesiveness, but not for their formationS. Clegg
V. cholerae MshH (VC0398 or YhdA)AF079406PER-GGDEF-EALprecedes the operon encoding type IV (mannose-sensitive) fimbriae, but is not required for its expression[56]
V. cholerae VieAAAC38449CheY-EAL-xCheY-HTHforms an operon with a gene specifically induced in infection[57]
Salmonella typhimurium AdrAAJ271071TM-GGDEFrequired for intercellular adhesion (biofilm formation), biosynthesis of extracellular polysaccharide[47,48]
Resistance to phages, toxic metals
E. coli RtnU83404PER-EALoverexpression confers resistance to phages λ and N4[36]
Pseudomonas stutzeri Urf2 (TnpM)AAC38223EALencoded on mercury resistance transposons, but not required for resistance; appears to affect transposition rate[45,46]
Aeromonas jandaei RrpXU67070xCheY-CheY-GGDEFpermits overexpression in E. coli of A. jandaeiβ-lactamase AsbB1, but not of β-lactamases AsbA1 or AsbM1[58]
Response to oxygen and light
E. coli DosBAA15160PAS-GGDEF-EALreversibly binds O2, CO, and NO with high affinity[59]
Synechocystis sp. Cph2BAA10536GAF-GAF-GGDEF-EAL-GAF-GGDEFphytochrome, dark-induced and repressed by light[29]
  • aOnly readily discernible domains are listed. TM, multiple (four or more) transmembrane segments; xCheY, a CheY domain that is inactivated by mutations; PER, a periplasmic sensor module of typical topology followed by a single transmembrane segment.

2.5 HD and HD-GYP domains

Although the predicted phosphodiesterase activity of the EAL domain has not yet been demonstrated, some (predicted) signal transduction proteins do contain bona fide phosphodiesterase domains similar to the ones found in eukaryotic cyclic-nucleotide phosphodiesterases [37]. These domains belong to the recently identified superfamily of metal-dependent phosphohydrolases, designated the HD superfamily after the principal conserved residues implicated in metal binding and catalysis. This superfamily also includes such enzymes as bacterial dGTP triphosphohydrolase and the ppGpp(p) hydrolase SpoT [37]. The version of the HD-type domain that is fused with a CheY domain in response regulator-like proteins from several organisms (Table 4) has many additional highly conserved residues, including a conserved GYP motif (Fig. 1C); this domain was therefore dubbed HD-GYP [22]. Like the GGDEF and EAL domains, the HD-GYP domain was originally implicated in signal transduction on the basis of its association with CheY-like and other signaling domains [22]. Recently, its role in signaling has been demonstrated experimentally. In the plant pathogen Xanthomonas campestris, response regulator RpfG, which contains a CheY-like and an HD-GYP domain, has been shown to activate the synthesis of extracellular enzymes and the extracellular polysaccharide [23]. In-frame deletion of the rpfG gene abolished production of extracellular endoglucanase and significantly decreased the levels of polygalacturonate lyase and extracellular polysaccharide [23]. Increased expression of the cognate histidine kinase RpfC stimulated the production of these extracellular enzymes and even overcame the effect of the rpfG mutation. This latter result suggests that response regulators of the CheY-HD-GYP class, like RpfG, represent important, but not the only, output modules for their corresponding sensor kinases.

View this table:
4

Diversity of domain fusions in bacterial response regulators

Domain organizationExperimentally studied examplesExamples identified solely from genomic sequences
Simple response regulators
CheY-HTHE. coli ArcA, CitB, FimZ, UvrY, KdpE, NarL, NarP, OmpR, PhoBE. coli YgiX, YedH, YlcA
B. subtilis CitT, ComA, GerE, PhoP, ResDB. subtilis YdbG, YdfI, YfiK, YhcZ, YocG, YufM, YvfU, YvqC, YxjL
CheY-AAA-HTHE. coli AtoC, GlnG, HydGE. coli YfhA, Cj1024c, HP0703, RP562, CT468, CPn0586, TP0519, BB0763
CheY-GGDEFC. crescentus PleD, R. leguminosarum CelR2BB0419, VC1086, RP237, Cj0643
CheY-EALslr1588, PA3947, VC1652
CheY-HD-GYPX. campestris RpfGPA2572, PA4781, slr2100, sll1624, TM0186, TM1147, VC1087, VC1348, VCA0210, XF1113
CheY-GGDEF-EALThiocystis violacea ORF5 (S54369)a, XF0401
HD-GYP-GGDEFaq_2027, DRA0342
Response regulators with extracytoplasmic sensor domains
FliY-GGDEFVC1067, VCA0557
FliY-HD-GYPTM1170
PER-GGDEFVC2285, VC2454, VCA1082, PA0847
PER-EAL
PER-GGDEF-EALYhjK, PA2072, PA1433, slr2077
PER-HD-GYPTM1682, VCA0895
Response regulators with cytoplasmic sensor domains
PAS-GGDEF-EALA. xylinum DGC1, PHEaq_1442
CheY-PAS-GGDEF-EALslr1305, PA4959, XF2624
GAF-GGDEFE. coli YeaP, DRB0044, slr1143, sll0048, PA2771
CheY-GAF-GGDEFslr0687
GAF-GGDEF-EALRhizobium etli ORF1 (AF034831), Rv1354c, VCA0080, VCA0785, PA2567, ML1750
GAF-PAS-GGDEF-EALAzorhizobium caulinodans YntC (X63841), PA5017
Fusions of sensory transduction domains from various signaling systems
GAF-PtsIE. coli PtsP (AAB40476), Azotobacter vinelandii PtsP (Y14681)PA0337, VC0672, mll3436
CheY-RsbUPA3346, VCA1086, slr1983
PAS-RsbUB. subtilis RsbP (YvfP)Rv1364c
  • aFor experimentally studied proteins, not listed in Table 2, GenBank accession numbers are given in parentheses. The names of proteins encoded in completely sequenced genomes are listed exactly as in genome annotations; the corresponding sequences can be retrieved from the NCBI WWW site at http://www.ncbi.nlm.nih.gov/Entrez/ or http://www.ncbi.nlm.nih.gov/COG/. Protein names shown in italics indicate presence of other domains in addition to those listed in the first column.

If the HD-GYP domain is indeed a phosphatase or a phosphodiesterase, its highly conserved sequence suggests high substrate specificity. Notably, at least two proteins, Aquifex aeolicus aq_2027 and Deinococcus radiodurans DRA0342, contain a HD-GYP-GGDEF domain combination [22]. Thus, the HD-GYP domain may be involved in the metabolism of cyclic diguanylate or in dephosphorylation of a phosphotransfer domain.

A modified version of the HD-GYP domain is fused to the C-terminus of the EAL domain in the ComB (YuxH) protein from B. subtilis, two A. aeolicus proteins, and three Vibrio cholerae proteins. This version lacks the conserved distal portion of the HD-GYP domain (Fig. 1C) and has certain substitutions in the characteristic metal-binding residues of the HD superfamily phosphohydrolases [37], which likely render the domain catalytically inactive.

3 Census of signaling domains in completely sequenced prokaryotic genomes

With the complete sequences of over 30 bacterial and archaeal genomes currently available, we were interested in obtaining accurate counts of the number of signaling domains in each of them. Sequence profiles were constructed for each of these domains (see Fig. 1) and compared using iterative BLAST searches [38] against a database of protein sequences encoded in each of the completely sequenced genomes. The results obtained (Table 2) are very close to those reported earlier for E. coli, Synechocystis sp., and C. crescentus[3942], and reveal several interesting trends. First, some variations notwithstanding, these domains are abundant in the genomes of all free-living bacteria but are much less common in obligate parasites. This difference is particularly striking in the case of the GGDEF, EAL, and HD-GYP domains (compare the data in Table 2 for A. aeolicus and Helicobacter pylori, two bacteria with nearly the same number of genes, but with a free-living and parasitic life style, respectively). It appears that these newly discovered domains might be particularly important for sensing the more diverse environmental stimuli encountered by free-living or non-obligatory parasitic bacteria. The minimal genomes of mycoplasmas and Buchnera do not encode any signaling proteins at all (Table 2). Second, the signaling domains are generally less abundant and less evenly distributed in archaea than they are in bacteria. Although certain archaeal species, such as Archaeoglobus fulgidus and Methanobacterium thermoautotrophicum, encode a significant number of histidine kinases, CheY-like response domains, and PAS domains, the GGDEF, EAL, and HD-GYP domains have not been detected in any of the archaeal genomes sequenced thus far (Table 2). This result is particularly surprising because all of these domains are abundant in the hyperthermophilic bacteria A. aeolicus and Thermotoga maritima, which appear to have undergone horizontal gene exchange with the archaea on a massive scale [43,44]. Furthermore, two of the sequenced archaeal genomes, those of Aeropyrum pernix (the only sequenced representative of the Crenarchaeota, one of the two major archaeal branches) and Methanococcus jannaschii, do not appear to encode any of the currently recognized signaling domains (Table 2). This uneven phyletic distribution lends credence to a scenario whereby the two-component signaling system and the potential c-diGMP signaling system emerged early during bacterial evolution, with some of the components subsequently acquired by certain archaeal lineages via horizontal gene transfer.

4 Genomic context of the new signaling domains

The abundance of the uncharacterized signaling domains (Table 2) was one of the most unexpected features of bacteria revealed by genome sequencing. Indeed, none of the 19 copies of the GGDEF domain and 17 copies of the EAL domain encoded in the E. coli genome belongs to an experimentally characterized protein (see, e.g., COG2199 and COG2200 in the COG database, http://www.ncbi.nlm.nih.gov/COG[4]). The number of these domains in the recently sequenced genomes of P. aeruginosa and V. cholerae is even higher, and again the functions of the encoded proteins are as obscure as they probably are diverse (Table 3). Gram-positive bacteria appear to encode fewer of these signaling domains; there are only four proteins with the GGDEF domain and three proteins with the EAL domain in B. subtilis (Table 2), none of them has been characterized, either. Therefore, it is becoming increasingly clear that we have been missing major aspects of the regulatory circuits present in bacterial cells.

In the absence of direct experimental data, some clues to the range of functions of these newly described domains could be revealed by their genomic context, including operon structures and conserved domain fusions [5]. Unfortunately, the GGDEF-, EAL-, and HD-GYP-encoding genes are seldom found in operons, let alone conserved ones. Indeed, of the 29 E. coli genes that encode a GGDEF domain, an EAL domain, or both, only one, yfiN, forms a potential operon with another uncharacterized gene. Six more are paired into potential operons yddVU, yeaIJ, and yliEF. Thus, the majority of the genes coding for these domains in E. coli (and in most other bacteria) are not predicted to be in operons. Furthermore, in many cases the orientation of these genes is opposite to that of their nearest neighbors.

An interesting feature of many GGDEF and EAL domain-encoding genes is their presence in various transposons. For example, an EAL-encoding Urf2 has been found at the end of the mer operons in transposons Tn21 and Tn501, although it is not required for mercury resistance and is deleted in Tn5053[45]. A part of this gene, named tnpM for transposition modulator, has been shown to enhance Tn21 transposition by activating transposase expression and decreasing resolvase expression [46]. Whether the full-length Urf2 has the same activity is unknown. Genes for stand-alone EAL and GGDEF domains have also been found in the Lactococcus lactis transposon Tn5481.

4.1 Conserved fusions of novel signaling domains

Two-domain fusion proteins consisting of a phosphoacceptor CheY-like domain and either a GGDEF or an EAL domain were classified as response regulators even before the abundance of such fusions has become apparent [18,39,40]. A systematic analysis of domain organization of signaling proteins in completely sequenced bacterial genomes shows numerous domain fusions that pair novel output domains (GGDEF, EAL, and HD-GYP) not just with CheY-like domains but also with extracytoplasmic ligand-binding sensor domains or with cytoplasmic PAS and GAF sensor domains (Table 4). The variety of these multi-domain proteins seems to mirror that of sensor kinases. This circumstance apparently reflects an underlying uniformity of the mechanisms of signal transduction in the cell, from an N-terminal sensor domain to a transmitter domain to a C-terminal response output domain, and suggests that the novel domains comprise a distinct signaling (cyclic diguanylate-based?) system that complements the classical two-component system.

Indeed, in several independent cases [18,20,21,23] it has been shown that predicted response regulators containing these novel domains are regulated by, and act in parallel with, the ‘standard’ (CheY-HTH) response regulators. They seem to provide an additional output module and, potentially, a means of feedback control. The systems that are regulated by these novel response regulators include those responsible for the interaction of the bacteria with the environment (fimbriae, extracellular proteins, virulence) and with each other (biofilm formation [47,48], Table 3). Although such systems are extremely important in vivo, they may act in response to stimuli that have not yet been replicated in vitro.

4.2 Cross-talk between different signaling systems

The variety of fusions between signaling domains discussed above extends to fusions of these domains with components of other signaling pathways, creating a complex network of regulatory interactions. Perhaps the most interesting case is a fusion of a GAF domain to the Enzyme I of the PEP-dependent sugar: phosphotransferase system, first described for the E. coli PtsP protein [30]. PtsP and similar proteins encoded in the genomes of P. aeruginosa, V. cholerae, and Mesorhizobium loti might modulate the activity of the phosphotransferase system in response to the levels of cGMP or some other ligand that interacts with its N-terminal GAF domain.

Another notable example of cross-talk between different regulatory systems (Table 4) is the fusion of CheY-like and GAF domains with phosphatase domains of the PP2C type, found in RsbU-like regulators of σB subunit, which participate in the stress response in B. subtilis and many other bacteria [49,50]. Such fusion proteins should be able to couple stress responses directly to perturbations in oxygen and/or cGMP levels.

5 Conclusions and perspectives

Comparative analysis of complete microbial genomes reveals a network of regulatory interactions that is much more complex than was assumed previously. This complexity is mostly limited to free-living bacteria, whereas parasites with degraded genomes have few, if any, sensory transduction systems. Functions of some of the signaling domains are already known, whereas the functions of others remain to be discovered. If the GGDEF and EAL domains indeed function as a diguanylate cyclase and a c-diGMP phosphodiesterase, respectively [19], c-diGMP could emerge as a major cell regulator in bacteria but, remarkably, not in archaea. Experimental characterization of the functions of these domains will significantly advance our understanding of the principles and mechanisms governing the prokaryotic regulatory machinery.

Note added in proof

While this paper was under review, the c-diGMP phosphodiesterase activity of the Acetobacter xylinum DGC1-like protein (see Table 3) was shown to be regulated by oxygen [61]. Also, we became aware of the data implicating GGDEF-containing proteins in hemin storage in Yersinia pestis [62] and in flagellar function in E. coli [63].

References

  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
  11. [11].
  12. [12].
  13. [13].
  14. [14].
  15. [15].
  16. [16].
  17. [17].
  18. [18].
  19. [19].
  20. [20].
  21. [21].
  22. [22].
  23. [23].
  24. [24].
  25. [25].
  26. [26].
  27. [27].
  28. [28].
  29. [29].
  30. [30].
  31. [31].
  32. [32].
  33. [33].
  34. [34].
  35. [35].
  36. [36].
  37. [37].
  38. [38].
  39. [39].
  40. [40].
  41. [41].
  42. [42].
  43. [43].
  44. [44].
  45. [45].
  46. [46].
  47. [47].
  48. [48].
  49. [49].
  50. [50].
  51. [51].
  52. [52].
  53. [53].
  54. [54].
  55. [55].
  56. [56].
  57. [57].
  58. [58].
  59. [59].
  60. [60].
  61. [61].
  62. [62].
  63. [63].
View Abstract