OUP user menu

In silico comparison of pKLC102-like genomic islands of Pseudomonas aeruginosa

Dieco Würdemann, Burkhard Tümmler
DOI: http://dx.doi.org/10.1111/j.1574-6968.2007.00891.x 244-249 First published online: 1 October 2007

Abstract

The genomic island pKLC102 first detected in Pseudomonas aeruginosa clone C strains can cross species barriers and exhibits the highest mobilization rate of a genomic island known to date. Homologous genomic islands of 81–108 kb in size were identified in the completely sequenced P. aeruginosa strains PA7, PA14, 2192, C3719 and PACS2, but not in strains PAO1 and LES. All pKLC102-like genomic islands are integrated in chromosomal tRNALys genes and share a syntenic set of more than 70 homologous ORFs, part of which are related to DNA replication or mobility genes. The conserved backbone has predilection sites for the uptake of island-specific gene cassettes. A major difference between the islands is the organization of the origin of replication oriV.

Keywords
  • Pseudomonas aeruginosa
  • mobile genomic island
  • pKLC102
  • PAPI-1

Introduction

Genomic islands are large blocks of chromosomally integrated DNA, mostly detected in the vicinity of tRNA genes and typically flanked by direct repeats. They were acquired by horizontal gene transfer and typically confer traits that increase fitness, adaptation to specific habitats, metabolic proficiency or virulence (Dobrindt et al., 2004).

The Pseudomonas aeruginosa chromosome consists of three hypervariable regions (Römling et al., 1995, 1997). Two of these regions are the sites for the integration and excision of genomic islands into the two copies of a tRNALys gene at positions PA0976.1 and PA4541.1 of the PAO1 reference genome (Kiewitz et al., 2000; Stover et al., 2000). The 104-kb genomic island pKLC102 (Klockgether et al., 2004) integrated in PA4541.1. pKLC102 is the most mobile genomic island known to date (Klockgether et al., 2007). Being a hybrid of phage and plasmid elements, the spontaneous excision rate from the host chromosome is at least 1 × 10−1. The copy number of extrachromosomal circular pKLC102 in the cell varies between one and 30 during bacterial growth. Qiu (2006) have recently proven that the pKLC102-like island PAPI-1 can be transferred by conjugation into other P. aeruginosa strains whereby it integrates into one tRNALys gene of the recipient chromosome.

pKLC102-like islands are abundant in the P. aeruginosa population (Klockgether et al., 2007). Hybridization of genomic DNA from 71 strains onto pKLC102 macroarrays indicated that pKLC102-like genomic islands consist of a syntenic backbone of conserved ORFs and a variable set of gene cassettes. Sequence similarity between individual ORFs in the investigated strain panel was judged from the hybridization signal intensity (Klockgether et al., 2007).

During the last year the genome sequence of several P. aeruginosa strains has become available (Lee et al., 2006) in addition to that of strain PAO1 (Stover et al., 2000), providing the opportunity to compare pKLC102-like islands at the nucleotide sequence level. The major findings of the comparative sequence analysis of the pKLC102-like islands in strains SG17M, PA7, PA14, 2192, C3719 and PACS2 are summarized in this report.

Materials and methods

All analysed DNA sequences were retrieved from GenBank (http://www.ncbi.nlm.nih.gov) or from the Pseudomonas Genome Database V2 (http://www.pseudomonas.com/). Putative ORFs were identified by the programs artemis (Rutherford et al., 2000; http://www.sanger.ac.uk/Software/Artemis/), genemark or genemark.hmm (Lukashin & Borodovsky, 1998; Besemer & Borodovsky, 1999). Predicted ORFs were reviewed individually for the assignment of start and stop codons. Public databases were searched for homologous sequences by blastn algorithms using the programs ‘blast two sequences’ (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi) or ‘blast with microbial genomes’ (http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi) with preset settings.

Informative multilocus 17-marker single nucleotide polymorphism (SNP) genotypes were identified by either low-resolution microarray genotyping (Wiehlmann et al., 2007) or directly from the genome sequence. Strain PA7 could not be classified by SNP genotype due to its high sequence divergence from the other strains. Parsimony analyses were performed with the program ‘pars’ from the package ‘phylip3.66’ (http://evolution.genetics.washington.edu/phylip.html; J. Felsenstein, 2002; Department of Genome Sciences, University of Washington, Seattle). Strains were compared by the SNP code of the core genome (Wiehlmann et al., 2007) and pKLC102-like islands were compared by presence or absence of homologs to pKLC102 ORFs as reference. Homologs were called present if the sequence was more than 75% identical with that of the respective pKLC102 ORF.

Results and discussion

Genetic organization of sequenced pKLC102-like islands

pKLC102-like genomic islands were identified in the completely sequenced genomes of P. aeruginosa strains PA7, PA14, 2192, C3719 and PACS2, but not in strains PAO1 and LES (Table 1). Strain PA14 harbors the previously described pKLC102-like genomic island PAPI-1 (He et al., 2004).

View this table:
Table 1

General characteristics of pKLC102-like genomic islands in completely sequenced Pseudomonas aeruginosa strains

NameC3719PACS22192PA7PA14
OriginClinical isolate, ManchesterClinical isolate, WashingtonCF-isolate, BostonWound isolate, Buenos AiresWound isolate, Boston
Size of pKLC102 like gene island (kb)81.121106.45785.10885.554107.899
pKLC102 homologous ORFs75/10398/10383/10378/10375/103
GI843222261071052898432872494418214116048575

The genomic islands are all integrated into one of the two copies of the tRNALys gene. The pKLC102-like island of strain PA7 is incorporated into the tRNALys gene adjacent to PA0976, whereas in all other strains the island is inserted into the tRNALys gene adjacent to PA4554.1. The islands vary in size from 81 to 108 kb, have a direct repeat of the 3′ end of the tRNALys gene at their right border and integrase (CP103) and chromosome partitioning genes (CP1) at their ends. The synteny of pKLC102 homologs is conserved in all islands.

Homologs exist for 75–98 of the 103 pKLC102 genes in the sequenced strains (Table 1). The highest number of homologous ORFs was detected in strain PACS2. Sixty-eight of the 98 homologs have 100% sequence identity.

Thirty-four of the 103 pKLC102 ORFs are likely to account for the conjugative transfer, recombination of DNA and autonomous replication of the mobile island (Fig. 1). A subset of these genes encodes a novel type IV secretion system (Juhas et al., 2007). Further conserved genes are four helicases and a pilin gene cluster (pilI to pilM). The soj gene at one boundary of the pKLC102-like islands has been demonstrated in PAPI-1 to be essential for mobilization from the chromosome and maintenance of the circular episomal form (Qiu et al., 2006). These 34 genes are conserved in all analysed pKLC102-like gene islands. The presence of the complete set of 34 genes in all sequenced pKLC102-like islands indicates that the islands have retained the ability of mobilization, extrachromosomal replication and horizontal transfer as it has been experimentally demonstrated for pKLC102 (Klockgether et al., 2007) and PAPI-1 (Qiu et al., 2006). In addition, 11 (C3719) or 12 (all other strains) of the 19 pathogenicity factors of PAPI-1 (He et al., 2004) are conserved in the sequenced pKLC102-like islands (Fig. 1).

Figure 1

pKLC102 homologs in sequenced pKLC102-like genomic islands. Presence of a homolog of a pKLC102 ORF (Klockgether et al., 2004) is indicated by a cross (+). ORFs marked in dark gray were ascribed a role in conjugation, recombination and transfer of DNA (Klockgether et al., 2004; Juhas et al., 2007). Homologs of ORFs marked in light gray are known to play a role in virulence of strain PA14 (He et al., 2004). ORFs marked in black exhibit both features.

Seventy-two homologs are present in all islands, the majority of which are conserved hypotheticals of yet unknown function. The syntenic arrangement of these 72 genes defines the backbone. Island-specific gene cassettes are nestled within this backbone at predilected sites (supplementary Figs. S1 and S2). Twenty-six pKLC102 homologs are variably present in the other sequenced islands (CP10, CP23-26, CP31-32, CP44-45, CP57-62, CP84-86, CP94-101) (Fig. 1).

Island-specific genes preferentially integrated into the same sites of the pKLC102 backbone (Fig. 2). PAPI-1 harbors the largest portion of 26 kb of island-specific genes. The island-specific part in the other four islands varies from 4.9 kb (strain 2192) to 8.7 kb (strain C3719) (Fig. 2).

Figure 2

pKLC102-like genomic islands in completely sequenced Pseudomonas aeruginosa strains. The 103 pKLC102 ORFs are numbered sequentially at equal distance on the abscissa. The islands in strains 2192, PACS2, PA7, PA14 and C3719 exhibit synteny with the pKLC102 prototype of clone C strain SG17M (Klockgether et al., 2004). Homologs of the pKLC102 ORFs are depicted by a continuous line. Gaps in the line indicate that a homologous gene of this ORF is missing in this strain. The map positions of strain-specific insertions are shown by triangles. The number (see gene annotation in supplementary Fig. S1) and the size of each insertion in bp are given beneath the triangle. Numerous insertion sites are shared by two or more strains, indicating that there are hot spots for the insertion and release of strain-specific sequence.

With the exception of the already characterized pathogenicity island PAPI-1 in PA14, the annotation could ascribe a potential function to only five of the total 25 ORFs of the 14 island-specific inserts in strains 2192, C3719, PACS2 and PA7 (supplementary Fig. S1). Figure 3 visualizes the sequence similarity between the island-specific cassettes that are absent in pKLC102. Homologous sequences either encompass the complete gene cassette (cassette 1 in 2192 and cassette 3 in PA7) or are adjacent to one boundary (all other cases with the exception of cassette 5 in PA7) (Fig. 3, supplementary Fig. S2). Cassette 3 in C3719 encodes a transposase, a homolog of which is also present in cassette 5 of PA7. The localization of homologous sequences at one end of the gene cassettes indicates that the pKLC102-like genomic islands diversified from an ancestor by multiple recombinations at predilected segments giving rise to a mosaic of absolutely conserved, optionally conserved and unique sequences.

Figure 3

Comparison of strain-specific insertions in pKLC102-like genomic islands. The insertions of each gene island are displayed in light gray and numbered as they appear in the respective gene island sequence (see gene annotation in supplementary Fig. S1). The boundaries of the syntenic set of pKLC102 homologs are indicated by black diagonal slashes. Double light gray diagonal slashes indicate large parts of PAPI-1 with no homology to ORFs of the cargo regions in the other strains. Map position and size of homologous sequences that are absent in pKLC102, but present in two or more other sequenced islands, are visualized by gray stripes.

Similarity of pKLC102 islands is not associated with genetic relatedness of strains

The genetic relatedness of the sequenced strains classified by their SNP patterns representative for the conserved core genome (Wiehlmann et al., 2007; Fig. 4a) was compared with the relatedness of their pKLC102-like islands (Fig. 4b). The parsimony analysis revealed no correlation between the relatedness of SNP pattern with that of the pKLC102 composition (Fig. 4). This finding is in accordance with the analysis of 240 P. aeruginosa strains of diverse habitats and geographic origin that pKLC102 represents the most promiscuous part of the P. aeruginosa genome that shows the least association between core and accessory genome (see Fig. 6 in Wiehlmann et al., 2007).

Figure 4

Dendrograms derived from parsimony analysis. (a) Genetic relatedness of Pseudomonas aeruginosa strains based on a genome-wide 17-marker SNP genotype that is representative for the core genome of P. aeruginosa (Wiehlmann et al., 2007). Strain PA7 was excluded because of its high sequence diversity and anomalous SNP pattern. (b) Genetic relatedness of pKLC102-like genomic islands. The scalebars indicate the number of different SNP genotypes (a) or of nonhomologous ORFs (b).

The oriV region of pKLC102-like islands

The region between CP18 and CP19 in pKLC102 was considered by bioinformatic analysis as the possible origin of replication (oriV) (Klockgether et al., 2004). The left part of this region consists of an A+T-rich region preceded by four palindromic sequences 15–27 bp in length, which can form loops (Fig. 5a). The core consists of 15-bp palindromic sequence GTTCGGCATCCGAAC (complementary sequence underlined). Figure 5a shows that the left parts of oriV are virtually identical in the five sequenced pKLC102-like islands. The only subtle differences are one exchange in the A+T-rich region, in total five nucleotide substitutions in 24 loop regions and the deletion of loop 2 in strain 2192.

Figure 5

Comparison of the oriV region of pKLC102-like genomic islands. For each island, the accession number and the positions of the first and last nucleotides are given. (a) Left part of oriV. The A+T-rich region is displayed in yellow, complementary palindromic sequences are marked in either green or light blue. Nucleotide exchanges are highlighted in grey. (b) Right part of oriV. The conserved backbone structure is highlighted in gray. The variable part consists of modules of variable size (2–7 bp). Identical sequences at the same position within the repeat share the same color (except for unique blocks highlighted in pink). Single nucleotide exchanges within the backbone or a module are colored in yellow (for cytosine), green (for guanine), light blue (for adenine) and red (for thymidine). Deletions and insertions are depicted in white.

The right part of the oriV region in pKLC102 is made up of 16 highly conserved 57-bp direct repeats (Fig. 5b). All direct repeats are flanked at both termini by the 19-bp palindrome GTGGTGCCAGTGGCACCAC (complementary sequence underlined) (Fig. 5b). Each of 16 direct repeats consists of a core (marked in gray) that is always present in the same position with a nucleotide identity of nearly 100% and a set of modules with similar but not identical sequence. The modules are 2–7 bp in length and appear at the same position in all repeats. This repeat structure is found in all analysed gene island sequences (Fig. 5b).

The right oriV region of strain PACS2 is identical with that of pKLC102 (Fig. 5b). oriV of strain PA7 has only 12 complete 57-bp repeats. The ninth repeat is interrupted by the 285 bp of additional noncoding sequence that destroys the integrity of the right part of oriV. In strains 2192 and PA14, only six and three direct repeats are present. The repeat region is most truncated and distorted in strain C3719 (Fig. 5b). In summary, the oriV regions of the six sequenced pKLC102-like islands share a similar organization whereby the left part is more conserved than the right part with its complex arrangements of modules that give rise to similar but not identical 57-bp direct repeats.

Conclusion

The comparison of pKLC102-like genomic islands in sequenced P. aeruginosa strains revealed that all islands share a large set of homologs, the majority of which encodes elements necessary for mobilization and transfer of the island. Gene cassettes nestled within this backbone confer the strain-specific genetic repertoire of the island.

Supplementary material

The following supplementary material is available for this article:

Fig. S1. Gene annotation, homologous genes and the coordinates and direction of transcription of hypothetical ORFs in the island-specific gene cassettes of the Pseudomonas aeruginosa strains 2192, C3719, PA7, PA14 and PACS2.

Fig. S2. Coordinates of homologous sequences between the island-specific cassettes that are absent in pKLC102.

This material is available as part of the online article from: http://www.blackwell-synergy.com/doi/abs/10.1111/j.1574-6968.2007.00891.x (This link will take you to the article abstract).

Please note: Blackwell Publishing are not responsible for the content or functionality of any supplementary materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

Acknowledgements

D.W. has been a recipient of stipends of the Christiane Herzog Stiftung and the DFG-supported International Research Training Group ‘Pseudomonas: Pathogenicity and Biotechnology’. The authors gratefully acknowledge the public access to P. aeruginosa genome sequence data that allowed to search for pKLC102-like genomic islands. The Sanger Institute Pathogen Sequencing Unit is sequencing the genome of P. aeruginosa Liverpool Epidemic Strain (LES) in collaboration with Dr Craig Winstanley and Prof. C. Anthony Hart, University of Liverpool, Dr Robert E.W. Hancock, University of British Columbia, and Dr Fiona S.L. Brinkman, Simon Fraser University, British Columbia. The clinical isolate PACS2 has been sequenced by the University of Washington Genome Center, Seattle. The Microbial Sequencing Center at the Broad Institute, Cambridge, MA, has sequenced the cystic fibrosis isolates 2192 (mucoid isolate, Boston) and C3719 (Manchester epidemic strain). This project is being led by Dr Stephen Lory, Harvard Medical School, Boston. The serotype O12 multidrug-resistant strain PA7 has been sequenced by The Institute for Genomic Research (TIGR), Rockville, MD, with Prof. Paul Roy, Université Laval, Quebec City, as external collaborator.

Footnotes

  • Editor: Ross Fitzgerald

References

View Abstract