OUP user menu

Clusters of diverse genes existing as multiple, sequence-variable mosaics in a phytoplasma genome

Rasa Jomantiene, Robert E. Davis
DOI: http://dx.doi.org/10.1111/j.1574-6968.2005.00057.x 59-65 First published online: 1 February 2006


Phytoplasmas are cell wall-less prokaryotes living as obligate parasites and pathogens of plants and insects, making them attractive subjects for studies to gain a greater understanding of transkingdom parasitism and pathogenicity. During a study of two phytoplasma genomes, we obtained evidence for previously unreported clustering of genes, pseudogenes, mobile genetic elements, intergenic repeat units, and repetitive extragenic palindromes that occur in multiple, homologous clusters in some phytoplasma genomes. The clusters represent previously unrecognized mosaics, possibly assembled through multiple events of targeted mobile element attack, duplication, recombination, and rearrangement. Multiple clusters could conceivably afford potential for genome reduction through homologous recombination. Differences in the sizes and multiplicity of such clusters possibly account for some of the previously reported but unexplained variations in genome size among closely related phytoplasma strains.

  • mollicutes
  • genome size
  • gene cluster
  • transposase
  • insertion sequence


Phytoplasmas comprise a group of cell wall-less bacteria that cause economically significant diseases of plants and cannot be isolated in culture (Lee & Davis, 1986; Lee et al., 2000). These obligately parasitic plant pathogens possess the smallest genomes known among nonsymbiotic bacteria (Marcone et al., 1999), having discarded biosynthetic pathways for nutrients supplied by their hosts (Davis et al., 2003; Oshima et al., 2004). Yet, genome sizes can vary considerably among closely related strains (Marcone et al., 1999), a finding that still lacks explanation. In plants, phytoplasmas are restricted to phloem tissue, largely inhabiting enucleate, living sieve cells, and they are transmitted from plant to plant by phloem-feeding insect vectors. These features make phytoplasmas attractive subjects for genomic studies to gain a greater understanding of transkingdom parasitism and pathogenicity.

Phytoplasma genomics is still in the early stages of development. Although analysis of one complete phytoplasma genome sequence has been published (Oshima et al., 2004) and parts of other genome sequences have become available in the GenBank database or on the internet, various salient features of these genomes have gone unreported. The present study uncovered multiple clusters of functionally unrelated genes and pseudogenes. The tight linkage of the sequences in the clusters led us to hypothesize that the clusters represent a newly recognized phytoplasmal genomic mosaic that is present in multiple, but sequence-varied, copies in some phytoplasma genomes, and that they may account in part for variations in genome sizes among closely related strains. A brief synoptic report of the findings has been published in abstract form (Jomantiene & Davis, 2005).

Materials and methods

DNA cloning, sequencing, and hybridizations

Genomic DNA of clover phyllody (CPh) phytoplasma was extracted from phytoplasma-enriched preparations of sieve cells, cloned using the pBluescript SK(+) phagemid vector in Escherichia coli DH5α, sequenced by primer walking to achieve at least sixfold coverage per base position on both strands of the phytoplasma DNA, and the nucleotide sequence data were assembled as described previously (Davis et al., 2003). Southern blot analysis was performed as described previously (Davis et al., 2003). The membrane was probed with a digoxigenin-labeled DNA product from a PCR. Three primer pairs were used in separate PCRs in which the template was cloned genomic segment CPh66: fliAF1, 5′-TAA CCA TAA TTG ATG AGG -3′ and fliAR1, 5′-AAT TGA ACT TCA TTT ACC-3′, for the fliA gene probe [product size, 316 nucleotides (nt)]; phgF1, 5′-ATC AAA CTA TCA CAA CGT-3′ and phgR1, 5′-TGA GAA CAG TAG AGT TGA C-3′, for the phage-related protein gene probe (product size, 444 nt); and tmkF1, 5′-AAC TGT ATC AAG GTT TAG G-3′ and tmkR1, 5′-TGA TCG CCA TTT GAT AGT G-3′, for the tmk gene probe (product size, 530 nt).

Nucleotide and amino-acid sequence analyses

Basic local alignment search tool (BLAST) searches (Altschul et al., 1990) were carried out at the National Center for Biotechnology Information (NCBI) website, http://www.ncbi.nlm.nih.gov. Potential protein coding region (ORF) analyses were carried out at the NCBI website, http://www.ncbi.nlm.nih.gov/gorf/orfig.cgi. Multiple alignments were constructed using the Megalign program of Lasergene. The structure of putative proteins was analyzed using the SMART program at website http://smart.embl-heidelberg.de/.

Results and discussion

While analyzing the genome sequence of CPh phytoplasma, a member of the 16S rRNA gene restriction fragment length polymorphism group 16SrI subgroup C, we identified a DNA segment (CPh66, GenBank accession no. DQ111953) containing a gene cluster that was repeated in several phytoplasma genomes. The segment contained nine putative ORFs and a pseudogene region (Table 1 and Fig. 1). Some ORFs were suggestive of mobile genetic element attack in the cluster region. For example, the ORF3- and ORF6-encoded proteins exhibited similarity (43% identity over 130 amino acids, E value 2e–23, and 43% identity over 171 amino acids, E value 4e–36, respectively) to a hypothetical protein encoded by extrachromosomal (EC) DNA of group 16SrIII phytoplasmas. The ORF3- and ORF6-encoded proteins also exhibited similarity (from 32% identity over 115 amino acids, E value 8e–07, to 29% identity over 200 amino acids, E value 3e−12) to phage-related proteins encoded by the genomes of diverse-walled bacteria, e.g. Xylella fastidiosa (ZP00040545, COG5377), Providencia rettgerii (AAM08015), and Vibrio cholerae (AAL59709). Results from BLAST searches failed to reveal ORF3- and ORF6-related sequences in the genomes of ancestral relatives Bacillus and Clostridium species, raising the question of whether they were acquired by phytoplasmas through horizontal gene transfer.

View this table:
Table 1

 Features in clover phyllody (CPh) phytoplasma genomic DNA segment CPh66

ElementEndpoints (nucleotides)G+C (%)Product size (amino acids, kDa)Best BLAST hits (% identity over amino acids)Predicted function/similar protein/feature
ORF11 … 46723PartialNP_950615 (98/155)DNA-directed RNA polymerase σ subunit, FliA; COG1191
IRU1-TIU2511 … 55314Intergenic regions in: AP006628, AY497459, AY497461, AY270153Putative transposase-associated intergenic unit
ψ553 … 94927NP_950946 (36/149)Hypothetical protein
ψ1261 … 201923NP_950945 (51/253)Chromosome segregation ATPase; SMC, COG041
ψ2841 … 317624NP_950944 (80/110)Hypothetical protein; a member of AAA+ superfamily
ψ4003 … 453729NP_ 950943 (46/101)Hypothetical protein, similar to ATP-dependent Zn protease
ORF24559 … 504420162 (19.2)NP_ 950942 (48/84)Hypothetical protein
ORF36035 … 642430129 (15.1)NP_950597 (84/115)Hypothetical protein, COG5377: phage-related, predicted endonuclease
IRU1-TIU16424 … 677416Intergenic regions in: AY364444, AY497459, AY497461, AY270153, AP006628Putative transposase-associated intergenic unit TIU1 (IRU1-TIU1), conserved in AY1, NJAY, PnWB, and 12 intergenic regions of the OY-M phytoplasma chromosome
ψTra56478 … 652413NP_950658 (39/48)Possible transposase Tra5 pseudogene remnant
ORF46775 … 728134168 (19.9)NP_950300 (80/161)Hypothetical protein, similar to ATP-dependent Zn protease
ORF57313 … 787317186 (22.3)NP_950258 (34/189)Hypothetical membrane protein; COG1268
IRU27874 … 841824AP006628Intergenic region, homologous with repeated OY-M intergenic sequences
ORF68419 … 905733212 (25.3)NP_950900 (91/212)Hypothetical protein, COG5377: phage-related, predicted endonuclease
ORF79132 … 955727141 (17.2)NP_950653 (86/133)Hypothetical protein, similar to a plasmid-encoded protein YP_214982
ORF89554 … 1018330209 (24.4)BAA31455 (93/209)Thymidylate kinase
ORF910355 … 1098126208 (24.7)NP_950898 (89/202)Hypothetical protein
  • * ψ, possible pseudogene sequence; IRU1, intergenic repeat unit 1; TIU1 and TIU2, transposase-associated sequences; IRU2, intergenic repeat unit 2; ORF, open reading frame.

  • G+C, rounded to the nearest whole number.

  • Gene sequences used as probes in hybridizations.

  • § Protein encoded by ORF5 contains a signal peptide (residues 1 … 32) and five transmembrane regions (residues 64–83, 90–107, 122–144, 151–173, and 183–205) predicted by SMART, and exhibits similarity (28% identity over 159 amino acids) to dimethyladenosine transferase (NP_950558).

  • The tmk gene start codon lies within the 3′-end of the upstream gene (ORF7); the two genes may be coordinately regulated, integrated plasmid sequences.

  • BLAST, basic local alignment search tool; OY-M, onion yellows-Mild.

Figure 1

 Diagrammatic representation of features in cloned DNA fragment CPh66 from the genome of clover phyllody (CPh) phytoplasma. Horizontal bar represents the 10 999 bp nucleotide sequence. ORFs are represented by numbered block arrows. Bracket indicates the region of probable pseudogenes (ψ) in sequence CPh66. Phytoplasmal repetitive extragenic palindromes (PhREPs) in the CPh66 sequence are indicated by red vertical arrowheads. Solid bars indicate intergenic repeated units (IRUs) in CPh66; IRU1-TIU2, adjacent to ORF1, is a 42-base segment similar to the IRU1-TIU1 sequence flanked by ORF5 and ORF6. The gene orders in four examples of similar gene cluster regions in the genome of onion yellows-Mild (OY-M) phytoplasma are shown for comparison. Empty arrows in CPh66 represent ORFs not exhibiting any similarity to any ORF in the OY-M clusters illustrated; those in the OY-M clusters exhibited no similarity with ORFs in CPh66. Homologous ORFs are the same color. *Gene sequence used as a probe in Southern hybridizations. End points of sequences in the OY-M genome were as follows: OY-M1, 390,393 … 397,973; OY-M2, 610,466 … 616, 658; OY-M3, 788,254 … 799,687 complement; OY-M4, 84,325 … 93,516. TIU, transposase-associated sequences.

We found that the genome segment also contained two intergenic regions (termed conserved intergenic regions) and five extragenic palindromes that occur as repeated sequences (termed phytoplasmal repeated extragenic palindromes, PhREPs) in the completed genome sequence (GenBank no. AP006628) of onion yellows-Mild (OY-M) phytoplasma, a member of group 16SrI subgroup B (Figs 1 and 2 and Table 2). The intergenic regions designated intergenic repeat unit-1-transposase-associated sequence-1 (IRU1-TIU1) and IRU1-TIU2 were each found as multiple homologous copies frequently associated with mobile genetic elements (transposase genes, insertion sequences, IS) in the completed OY-M phytoplasma genome and in the genomes of aster yellows (AY1), New Jersey aster yellows (NJAY), and peanut witches broom (PnWB) phytoplasmas (Table 1 and Fig. 3), suggesting the presence of highly conserved core sequences and possible targeting of IS to the gene clusters. Presumably, these sequence features and the repeated clusters went unrecognized (as such) prior to the present study, and thus were previously unreported for phytoplasma genomes.

Figure 2

 Alignment of phytoplasmal repeated extragenic palindromes (PhREPs) from clover phyllody (CPh) phytoplasma with PhREPs in the onion yellows-Mild (OY-M) phytoplasma genome (GenBank no. AP006628). Conserved palindromic sequences are boxed. Asterisks indicate palindromic sequences within ORFs in the annotated OY-M genome sequence. Sequences flanking conserved regions are included for context and are not aligned. Numbers indicate the endpoints of the sequences in the OY-M genome sequence.

View this table:
Table 2

 Nucleotide sequences of phytoplasmal repeated extragenic palindromes in phytoplasma genomic segment CPh66

PalindromeEndpointsNucleotide sequence§
  • * PhREP sequences were found in cloned DNA fragment CPh66 derived from CPh phytoplasma; multiple copies of each PhREP were found in the completely sequenced genome of OY-M phytoplasma.

  • PhREP2 and PhREP3 are embedded in intergenic repeat units, IRU1 and IRU2, respectively.

  • Base positions in cloned genomic DNA segment CPh66 (GenBank no. DQ111953).

  • § Palindromic sequences are underlined. Pyrimidine (T)-rich sequence characteristically flanks the palindrome.

  • PhREPs, phytoplasmal repeated extragenic palindromes; IRU, intergenic repeat unit; OY-M, onion yellows-Mild.

Figure 3

 Alignment of nucleotide sequences, from diverse phytoplasmas, that are similar to the transposase-associated intergenic repeated unit (IRU1) in clover phyllody (CPh) phytoplasma cloned DNA fragment CPh66 (GenBank no. DQ111953). Partial IRU1 sequences from clone CPh66, and partial homologs from other phytoplasma genomes, are shown. CPh IRU1-TIU2 is a 42-base homolog (bases 511 … 552) of IRU1-TIU1 in clone CPh66. CPh IRU1-TIU1 is a 109-base segment (bases 6527 … 6637) in cloned DNA fragment CPh66. Other intergenic sequences were from AY1, aster yellows (GenBank no. AY497459, bases 2751 … 2859); NJAY, New Jersey aster yellows (GenBank no. AY497461, bases 2551 … 2659); PnWB, peanut witches' broom (GenBank no. AY270153, bases 2789 … 2898); and OY-M, onion yellows-Mild (GenBank no. AP006628) phytoplasmas. Base positions of the intergenic regions in the OY-M genome were: (1), 652023 … 652131; OY-M (2), 735096 … 735204; OY-M (3), 824343 … 824450; OY-M (4), 429795 … 429901; OY-M (5), 459169 … 450278; OY-M (6), 781189 … 781299; OY-M (7), 783029 … 783134; OY-M (8), and 383035 … 383144 (9), respectively. TIU, transposase-associated sequences.

Southern hybridizations, using probes amplified from genomic DNA segment CPh66, were performed to determine whether the CPh phytoplasma genome contained multiple paralogues of coding sequences found in the CPh66 DNA segment. The results revealed multiple paralogues of FliA (DNA-directed RNA polymerase specialized σ factor FliA), thymidylate kinase, and phage-related protein coding sequences that were, respectively, localized on the same CPh DNA HindIII restriction fragments, in relatively close association with one another (Fig. 4). We concluded that these gene sequences were clustered at multiple sites in the CPh phytoplasma genome, and we envisioned that a similar gene clustering may be evident in other phytoplasma genomes. Our reasoning led to the hypothesis that genomic DNA segment CPh66 represents an unusual, multiple-copy gene cluster common to the genomes of other phytoplasmas. To test this hypothesis, we mapped the locations of CPh66 gene and pseudogene homologues on the completed chromosome sequence of OY-M phytoplasma.

Figure 4

 Composite of Southern hybridizations of clover phyllody (CPh) phytoplasma genomic DNA digested with HindIII and probed with labeled tmk (tk), phage-related (ph), or fliA (fl) gene sequences, respectively. Genomic DNA segment CPh66 contained no recognition sites for HindIII. Molecular weight marker was DNA molecular weight marker III.

A total of 16 fliA homologues, 11 tmk homologues, 12 phage-related protein gene homologues, 10 homologues of the clone CPh66 pseudogene region (bases 468 … 4559) and three OY-M annotated smc genes, 19 ATP-dependent Zn protease homologues, eight ORF9 homologues, 12 homologues of intergenic region IRU1, and 17 homologues of intergenic region IRU2 were located and mapped on the completed chromosomal DNA sequence of OY-M phytoplasma (Fig. 5). Some sequences that mapped on the OY-M chromosome were apparently intact genes, while others were truncated genes or possible pseudogenes. Remarkably, homologues of each of these sequences were clustered in just four regions dispersed over approximately two-thirds of the OY-M chromosome. Their nonrandom distribution and close proximity to one another show that these diverse, functionally unrelated gene sequences are frequently clustered together in the OY-M genome. The mosaic assemblies in OY-M exhibit a gene order similar to that in DNA segment CPh66 of CPh phytoplasma (Fig. 1). Their repeated occurrence and sequence variability are likely in part owing to duplication and recombination/rearrangement events.

Figure 5

 Maps of the circular chromosome of onion yellows-Mild (OY-M) phytoplasma (GenBank no. AP006628) showing locations of nucleotide sequences homologous to sequences in the clover phyllody (CPh) phytoplasma DNA genomic segment CPh66. Sequence locations are indicated by short radial lines. Homologous gene sequences or intergenic regions, and their copy numbers (in parentheses), in the OY-M genome were: (a) fliA (16); (b) putative ATP-dependent Zn protease gene homolog (19); (c) phage-related protein gene (12); (d) tmk (11); (e) CPh66 ORF9 (8); (f) 10 homologs of CPh66 pseudogene region (bases 468 … 4559) and three annotated smc genes (13); (g) intergenic region IRU1 (12); (h) intergenic IRU2 (17). Four regions of gene clustering are shown: I, II, III, and IV, respectively. Numbers indicate distance, in kilobases, from start of the chromosomal replication initiator protein gene (dnaA). IRU, intergenic repeat unit.

The PhREPs (Fig. 2) appear to represent a new family of repetitive extragenic palindromes (REPs) that bear an obvious resemblance to some REPs described in other bacteria (Shyamala et al., 1990; Espeli et al., 2001; Aranda-Olmedo et al., 2002; Tobes & Ramos, 2005); notably, previously reported REPs differ among species, exhibit sequence variability within a species, may be partially palindromic, and may appear intragenic in annotated genomes (Tobes & Pareja, 2005). However, Tobes & Pareja and Tobes & Ramos reported that recent work failed to find REPs in the completely sequenced genomes of several species of the genus Mycoplasma, and REPs were not noted in the OY-M genome by Oshima . (2004). To our knowledge, this is the first published report of REP-like sequences in a phytoplasma. While the role of REPs in bacterial genomes remains unclear, their features, including potential for forming stem–loop structures in DNA and RNA, have suggested possible influences on genome stability. The potential for stem–loop formation, and the stretch of thymidines at the 3′-end of each REP found in phytoplasma genomes, resemble DNA-encoding structural features of ρ-independent terminators (Farnham & Platt, 1981; d′Aubenton-Carafa et al., 1990), and some phytoplasma REPs may play roles in transcription termination as well as other functions. REPs in other systems may function as hot spots for transposition, genome rearrangements, and recombination, resulting in sequence deletions, and/or may play a role in DNA sequence mobility, protection of cellular transcripts from exonuclease degradation, gene expression, and targeting of transposases and exogenous transforming DNA (Sharples & Lloyd, 1990; Oggioni & Claverys, 1999; Mazzone et al., 2001; Wilde et al., 2001; Wilde et al., 2003). It seems plausible that PhREPs may have similar roles in phytoplasmas, contributing to genome size reductions and to the plasticity of their small, AT-rich genomes.

Taken together, the data support the concept that the linked sequences noted here represent a previously unrecognized feature of phytoplasma genome architecture constituting a multiple-copy, sequence-variable cluster containing PhREPs, conserved intergenic units, and diverse, functionally unrelated gene sequences. Other studies have focused on chromosomal duplications and clustering of functionally related genes (Lawrence & Roth, 1996; Reams & Neidle, 2004), but models proposed to explain the origins of those clusters may not explain the sequence clustering reported here, because the phytoplasmal-clustered sequences appear to be functionally unrelated. Although their role cannot be definitively identified at this time, multiple clusters may afford potential for genome reduction through homologous recombination. It would be interesting to know whether differences in sizes and multiplicity of such clusters account for some of the previously reported (Marcone et al., 1999) but unexplained variations in genome size among closely related phytoplasma strains.


We gratefully acknowledge Jonathan Shao for the excellent technical assistance and Yan Zhao for helpful suggestions on the manuscript.


View Abstract