OUP user menu

Recurrent intragenomic recombination leading to sequence homogenization during the evolution of the lipoyl-binding domain

Marina V. Omelchenko, Kira S. Makarova, Eugene V. Koonin
DOI: http://dx.doi.org/10.1111/j.1574-6968.2002.tb11140.x 255-260 First published online: 1 April 2002


The lipoyl-binding domain is often present, in one or several copies, in the E2 subunit and, less often, in the E1 and E3 subunits of 2-oxo acid dehydrogenase complexes. Phylogenetic analysis shows evidence of multiple, independent intragenomic recombination events between different versions of the lipoyl-binding domain in various bacteria and eukaryotic mitochondria, leading to homogenization of the sequences of the lipoyl-binding domain within the same enzymatic complex in several bacterial lineages. This appears to be the first case of sequence homogenization at the level of an individual domain in prokaryotes.

Key words
  • Lipoyl-binding domain
  • Protein evolution
  • Intragenomic recombination
  • Sequence homogenization
  • Duplication

1 Introduction

2-Oxo acid dehydrogenase multisubunit complexes perform the oxidative decarboxylation of 2-oxo acids to the corresponding acyl-CoA derivatives and have an essential role in energy metabolism in most organisms [1]. The pyruvate dehydrogenase complex (PDHC) links glycolysis with the tricarboxylic acid (TCA) cycle through the oxidative decarboxylation of pyruvate to acetyl-CoA. The 2-oxoglutarate dehydrogenase complex catalyzes the conversion of 2-oxoglutarate to succinyl-CoA in the TCA cycle itself. The branched-chain 2-oxo acid dehydrogenase complex participates in the catabolism of branched-chain amino acids (valine, leucine and isoleucine). Acetoin catabolism complex catalyzes the cleavage of acetoin into two dicarbon compounds [2,3].

All these enzymatic complexes share similar subunit and domain architectures. They consist of multiple copies of three enzymes: E1 (substrate-specific dehydrogenase), E2 (dihydrolipoamide acyltransferase) and E3 (dihydrolipoamide dehydrogenase). The catalysis by these dehydrogenase complexes requires three prosthetic groups, thiamin diphosphate, lipoic acid and FAD, and two cofactors, NAD and CoA [4]. The E1 subunit catalyzes oxidative decarboxylation of a 2-oxo acid and reductive acylation of lipoic acid. The lipoic acid moiety is covalently bound to a specific lysine residue in the amino-terminal lipoyl-binding domain (LBD) of E2. The core of all 2-oxo acid dehydrogenase complexes is formed by aggregation of E2 components that catalyze the transfer of the acyl group from lipoic acid to CoA resulting in the formation of acyl-CoA and dihydrolipoamide-E2. Finally, the E3 component catalyzes NAD-dependent oxidation of the lipoyl group of E2, thus completing the catalytic cycle of E2.

Typically, the N-terminal LBD is present in each E2 subunit. Duplications of the LBD in E2 and fusions of LBD with E1 and E3 subunits were previously discussed [48]. The LBDs are linked to each other and to the remaining parts of the corresponding proteins via long, flexible segments that are typically enriched in alanine, proline and charged residues. These flexible linkers could be important for interactions of LBDs with active sites and for rapid transfer of acyl groups between the lipoyl moieties [4]. The LBD is also present in the glycine cleavage protein H and is homologous to the biotin-binding domain present in some carboxyl transferases, e.g. pyruvate carboxylases and CoA carboxylases (http://www.expasy.org/cgi-bin/nicedoc.pl?PDOC00168).

Here, we reconstruct the evolutionary history of the LBD duplications and present evidence that intragenomic recombination is the most likely explanation for the independent homogenization of lipoyl-binding domain copies in 2-oxo acid dehydrogenase multisubunit complexes in many prokaryotes.

2 Materials and methods

Amino acid and nucleotide sequences of lipoyl-binding domain-containing proteins were identified using the gapped BLAST program [9] to search the non-redundant protein sequence database (NCBI, NIH, Bethesda, MD, USA). Multiple sequence alignments were constructed using the ClustalW program [10]. Evolutionary distances were calculated using the Dayhoff PAM model as implemented in the PROTDIST program of the PHYLIP package [12]. Distance trees were constructed using the least-square method [11] as implemented in the FITCH program of PHYLIP [12]. Maximum likelihood trees were constructed using the ProtML program of the MOLPHY package [13], with the JTT-F model of amino acid substitutions [13,14] to optimize the least-square trees with local rearrangements. Bootstrap analysis (10 000 replications) was performed for each maximum likelihood tree as implemented in MOLPHY using the resampling of estimated log-likelihoods (RELL) method [13,15,16]. Alternative placements of selected clades in maximum-likelihood trees were compared using the rearrangement optimization method (the Kishino–Hasegawa test) as implemented in the ProtML program [13].

3 Results and discussion

When comparing the sequences of LBDs from various prokaryotes and eukaryotes, we made the surprising observation that, on several occasions, paralogs from the same species, e.g. the LBDs from E2a and E3a subunits from Streptococcus pyogenes, were more similar to each other than they were to the corresponding orthologs from other species. This suggested the possibility of unusual events during the evolution of the LBD and prompted a more detailed phylogenetic analysis.

Since the total number of available LBD sequences is already very large, for the purpose of phylogenetic analysis, we used the sequences from those species that showed a LBD duplication within at least one of the 2-oxo acid dehydrogenase complexes (Fig. 1). Inclusion of LBD sequences from species that had no duplications (bacteria of the Bacillus/Clostridium group, archaea, rickettsia) did not substantially affect the tree topology (data not shown). Despite the small number of alignment positions (75 amino acids), the phylogenetic tree of LBDs had a stable topology as indicated by strong bootstrap support for most branches and the convergence of the results produced with different methods for the tree reconstruction. Specifically, nearly identical tree topologies were obtained when the tree was built on the basis of either nucleotide sequences or amino acid sequences of LBD by neighbor-joining and Fitch–Margoliash methods (data not shown). Most of the LBD sequences included in this analysis belonged to the PDHC subunits. The LBDs from the acetoin catabolism complex formed a distinct branch within the PDHC tree, suggesting that this complex has evolved from specific PDHC components (Fig. 2). The tree contained an α-proteobacterial branch, including a distinct cluster of eukaryotic sequences of mitochondrial origin, the γ/β-proteobacterial branch, low G+C Gram-positive bacteria and the actinobacterial branch; strong bootstrap support was obtained for each of these branches (Fig. 2). The same branches were observed when a tree was reconstructed for E2 subunit sequences without the LBD ([7]; data not shown). Most of these branches contained LBDs found in different contexts, which indicated that the duplications leading to these forms of the LBD occurred independently in different lineages (Fig. 2).

Figure 1

Genomic context of 2-oxo acid dehydrogenase complexes in bacteria and the location of lipoyl-binding domains. The subunits and domains of the 2-oxo acid dehydrogenase complexes and additional genes present in the same operons are color-coded as indicated at the bottom of the figure. The orthologous genes are aligned; genes connected by lines belong to the same (predicted) operon. The acetoin catabolism complex is shown for S. pyogenes and C. magnum, and the pyruvate dehydrogenase complex is shown for all other species.

Figure 2

A maximum likelihood unrooted tree of lipoyl-binding domains. The tree was reconstructed and bootstrap probabilities were computed as described in Section 2. The bootstrap values greater than 70% are indicated at the corresponding forks. The encircled numbers indicate branches whose alternative locations in the tree were examined using the Kishino–Hasegawa test (Table 1). α-Proteobacteria are colored red, eukaryotes (mitochondrial sequences) are colored red and emboldened, the rest of the proteobacteria are colored blue, actinobacteria are colored green. The acetoin complex subunits are shown in black and emboldened, the rest of the sequences are from pyruvate dehydrogenase complexes. Arrows indicate domains that probably have been subject to intragenomic recombination. Duplicated domains are designated as follows: E1α, E1β, E2, E3 indicate the subunit containing the domain; a, b, c indicate domain position starting with the N-terminal domain (so that c is the domain adjacent to the catalytic domain). GI numbers and complete organism names are indicated for each sequence.

On many occasions, orthologous LBDs, i.e. those from the same enzyme encoded in different bacteria, grouped together within each of the aforementioned taxonomically coherent branches (Fig. 2). However, each branch also showed a comparable number of cases when paralogous domains from different enzymes grouped together. This ‘anomalous’ clustering of paralogous domains was observed among α-proteobacteria (in Caulobacter and Zymomonas), in the mitochondrial cluster (Arabidopsis and Dictyostelium), low G+C Gram-positive bacteria (two Listeria species, Lactococcus lactis, Enterococcus faecalis, S. pyogenes), and actinobacteria (Streptomyces coelicolor, Mycobacterium spp.).

Within the γ/β-proteobacterial branch, the scenario of LBD evolution appears to have been particularly complicated. The LBD is present in two copies in the E2 subunit in all γ- and β-proteobacteria. Moreover, these bacteria have identical sets of subunits in PHDC and similar operon structures (Fig. 1). Taken together, this indicates that the duplication probably occurred prior to the divergence of the γ- and β-proteobacterial lineages. A subsequent duplication apparently occurred in the ancestral lineage of Pseudomonas, Azotobacter, Vibrio, enterobacteria, Pasteurella and Haemophilus whereby the E2b domain was duplicated again giving rise to the E2c domain, which has been independently lost in the Haemophilus, Yersinia and Pseudomonas lineages (Fig. 2). Another duplication mapped to the ancestor of β-proteobacterial lineage and gave rise to an additional LBD in the E3 subunit (Fig. 1). Xylella fastidiosa, the most deeply rooted γ-proteobacterial lineage [17], has the same domain arrangement as β-proteobacteria, which might be explained by a xenologous gene displacement [18] after a horizontal transfer of PDHC from a β-proteobacterium to Xylella lineage. Given that E2a and E2b domains appear to have evolved prior to the divergence of γ- and β-proteobacteria, they are expected to form separate clusters. However, this is not necessarily the case. For example, all three LBDs of Neisseria meningitidis and X. fastidiosa, and E2a and E2b domains of Ralstonia eutropha cluster together. These observations suggest that, at some stage of evolution of each of these lineages, intragenomic recombination led to the homogenization of LBD sequences. In some cases, it is possible to determine the directionality of this postulated recombination event. Specifically, in the Neisseria lineage, the ‘master copy’ probably originated from the E3 domain, whereas in the Xylella lineage, it was one of the E2 domains. In contrast, no clear indications of intragenomic recombination between LBDs were observed among γ-proteobacteria. Although the expected grouping of orthologs was not seen for all species in the maximum likelihood tree (Fig. 2), this clustering was predominant in the tree constructed using the FITCH method (data not shown). Furthermore, an alternative tree, in which the E2a domain from Yersinia pestis joined the branch of E2a domains from Vibrio cholerae and enterobacteria, had a high likelihood as shown using the Kishino–Hasegawa test (Table 1), indicating that clustering of orthologs probably accurately depicts the evolution of this group.

View this table:
Table 1

Log-likelihood analysis of possible placements of selected branches of maximum likelihood trees for the analyzed lipoyl-binding domains

TreeaDiff lnLbS.E.cRELL-BPd
1→2; 3→4−
7→8; 9→6−58.817.80.0001
10→11; 12→13−40.511.70.0001
16→17; 18→19−40.511.70.0001
  • aThe numbers refer to local rearrangements of the tree as indicated on the corresponding figure.

  • bDifference of the log-likelihoods relative to the best tree.

  • cStandard error of Diff lnL.

  • dBootstrap probability of the given tree calculated using the RELL method (resampling of estimated log-likelihoods) [15].

All known α-proteobacteria have LBDs on the E1β and E2 subunits of PDHC, which indicates that the duplication of the LBD probably occurred in the common ancestor of the α-proteobacterial clade (Fig. 1). Orthologous domains from Mesorhizobium loti, Sinorhizobium meliloti and Agrobacterium tumefaciens grouped together as expected (Fig. 2). In contrast, clustering of paralogous domains was observed in Caulobacter crescentus (Fig. 2). All mitochondrial sequences show a different placement of LBDs, which apparently originated early in the evolution of mitochondria, but apparently after their divergence from the α-proteobacterial ancestor. Both mitochondrial LBDs are located in the E2 subunit of PDHC. The expected grouping of orthologous domains was observed in rats, pigs and humans, whereas ‘anomalous’ clustering of paralogous domains was seen in Arabidopsis thaliana and Dictyostelium discoideum.

In Gram-positive bacteria, LBDs apparently underwent independent duplications in different classes of 2-oxo acid dehydrogenase complexes, i.e. PDHC in Deinococcus radiodurans, the Lactococcus–Enterococcus–Listeria branch, the actinobacterial branch, and Acholeplasma laidlawii and Mycoplasma capricolum, and acetoin catabolism complex in S. pyogenes and Clostridium magnum. Paralogous LBDs clearly cluster together in the majority of Gram-positive bacterial lineages (Fig. 2).

The observations summarized above suggest recurrent intragenomic recombinations between regions coding for LBDs in different 2-oxo acid dehydrogenase complexes, resulting in the homogenization of LBD sequences. These findings were unlikely to be caused by artifacts in tree construction as indicated by the strong bootstrap support for most of the corresponding clusters (Fig. 2) and by the results of the Kishino–Hasegawa test, in which the likelihood of alternative trees with clustered orthologs was assessed (Table 1). The proposed intragenomic recombination between lipoyl-binding domains resembles gene conversion, the phenomenon of regular sequence homogenization via non-reciprocal recombination, which has been observed in a variety of multigene families in eukaryotes [19] and on several occasions in bacteria and archaea, e.g. among rRNA genes [20] and genes coding for elongation factor Tu [2123], nitrogenases [24], and surface antigens of pathogenic bacteria genes [25,26]. However, the evolutionary pattern observed for LBD is different. Postulated homogenization events apparently occurred sporadically and independently in several evolutionary lineages, but did not re-occur repeatedly to maintain sequence identity, which is characteristic of gene conversion. The specific mechanism of homogenization of the LBD sequences most likely involved direct recombinational replacement of one version by another. The exact boundaries of recombination could not be determined because the sequences of the homogenized copies of the LBD in the same species are not identical and, furthermore, they are connected to each other or to the corresponding enzymatic domains by variable flanking sequences that could not be aligned to assess the possibility of their involvement in the postulated recombination events. An alternative evolutionary scenario also can be considered whereby the ‘master copy’ is duplicated followed by the elimination of the other version; even under this scenario, however, the relocation of the duplicate to the precise location of the eliminated LBD-coding sequence remains to be explained.

4 Conclusions

The analysis of the evolutionary history of LBDs reveals several interesting trends. Firstly, duplication of LBD is a specific feature of 2-oxo acid dehydrogenase complexes. Apparently, these duplications involve LBDs located in all subunits of these complexes. Secondly, LBD duplications appear to have occurred independently during the evolution of different 2-oxo acid dehydrogenase complexes and in different lineages of bacteria. Thirdly, and most unexpectedly, recurrent intragenomic recombination seems to have resulted in homogenization of LBD sequences, an event that occurred independently and at different stages of evolution in several lineages. Together with the fact that, in each case, only the sequences of LBDs and not the flanking sequences are homogenized, this suggests that homogenization of the LBD sequences confers a selective advantage on the corresponding organisms, but the nature of the underlying selective forces remains enigmatic.


M.V.O. is supported by the Department of Energy grant DE_FG02_01ER63220 from the Microbial Cell Program.


  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
  11. [11].
  12. [12].
  13. [13].
  14. [14].
  15. [15].
  16. [16].
  17. [17].
  18. [18].
  19. [19].
  20. [20].
  21. [21].
  22. [22].
  23. [23].
  24. [24].
  25. [25].
  26. [26].
View Abstract