Integrons are genetic elements known for their role in the acquisition and expression of genes conferring antibiotic resistance. Such acquisition is mediated by an integron-encoded integrase, which captures genes that are part of gene cassettes. To test whether integrons occur in environments with no known history of antibiotic exposure, PCR primers were designed to conserved regions of the integrase gene and the gene cassette recombination site. Amplicons generated from four environmental DNA samples contained features typical of the integrons found in antibiotic-resistant and pathogenic bacteria. The sequence diversity of the integrase genes in these clones was sufficient to classify them within three new classes of integron. Since they are derived from environments not associated with antibiotic use, integrons appear to be more prevalent in bacteria than previously observed.
Integrons are genetic elements commonly found in multidrug-resistant and pathogenic bacteria [1,2] and are characterized by their ability to integrate and excise gene cassettes by site-specific recombination (Fig. 1) [3–5]. Integrons possess a gene, intI, which encodes a site-specific recombinase [1,6] and an attachment site, attI, into which individual gene cassettes are inserted [5,7,8]. Gene cassettes are among the smallest mobile elements known and include only an open reading frame (ORF) and a recombination site, known as a 59-base element (59-be), that is recognized by the integron-encoded integrase, IntI [8,9].
Integron structure. The arrows indicate binding sites for HS298 and HS286. A: A generalized class 1 integron showing the relative positions of the intI1 gene, attI1 and Pc. 59-be sites are shown as ovals. The example of a two-cassette array is shown, with the first cassette being inserted/excised at attI1. B: Representation of the structure of the three integron classes from environmental samples. The relative positions of the respective intI genes, putative Pc promoters and putative attI sites are indicated, as are the positions of the open reading frames in each of Pg2 (class 6) and Pg11 (class 7).
Over 60 gene cassettes have been identified in multidrug-resistant bacteria (, S. Partridge and R. Hall, personal communication). The recombination sites associated with these cassettes are variable in terms of both their sequence and length and each 59-be is unique. Nonetheless, these elements have a number of common features including about 25 bp at each end that conform to consensus sequences  that are imperfect inverted repeats of each other (Fig. 2A). The structure of each outer end is also similar to that of a simple site commonly used by the tyrosine recombinases and comprises a pair of inversely oriented integrase binding domains separated by a spacer of 7 or 8 bp . The recombination crossover has been localized between the G and first T of a seven-base core site (region 1R in Fig. 2A) located in the right-hand simple site .
Structure of 59-be and attI sites. A: The consensus sequence for 59-be sites and adjacent sequence is shown. Numbers between the outer ends indicate the range of numbers of additional bases that make up the two halves with the overlying arrows highlighting the complementarity within this region. Consensus bases in upper case are present in two-thirds or more of the 59-be sites and bases in lower case are present in half or more of the 59-be sites. The likely extents of the two simple sites are shown and putative IntI binding domains are boxed. The vertical arrow indicates the recombination crossover point. The horizontal arrow underlying the sequence within the left-hand simple site indicates the position of HS286 and the direction of polymerization from this primer. The figure is adapted from . B: The 7-bp core site (region 1 and analogous to 1R in a 59-be) and sequences related to the consensus (GTTRRRY) for this region are in bold, numbered, and with the orientation indicated by the arrows under the sequence. Sites 1 and 2 constitute part of a potential simple site and sites 3 and 4 core site-like direct repeats. The region overlined in the Bal3 sequence indicates the likely binding site for the primer HS286. The vertical arrow indicates the known crossover point for insertion of a gene cassette.
Integrons from different isolates are highly variable with respect to the number, order and type of gene cassette they possess . However, the intI gene and attI site in many integrons recovered from multidrug-resistant strains are identical, or nearly so, and these are now designated class 1 integrons [1,2]. Class 1 integrons also possess a promoter for cassette-associated genes, Pc[1,11], as shown in Fig. 1.
Apart from the class 1 integrons, four other integron classes have been identified based on sequence differences between the integrases that they encode. Members of the integron classes 2 and 3, like class 1 integrons, contain antibiotic resistance gene cassettes [9,12,13]. The class 4 integron is found on the small chromosome of Vibrio cholerae and the associated gene cassettes include genes other than antibiotic resistance genes [14,15]. The partial sequence for a fifth intI gene, intI5, associated with Vibrio mimicus, has been lodged in the databases (accession number AF180939), and has been reported as having 75% identity with intI4 .
The integron-encoded (IntI) integrases are part of a large and diverse family of site-specific recombinases. All members of this larger tyrosine family of recombinases nonetheless have characteristic features including the presence of two conserved regions, designated box 1 and box 2, within which are located four very highly conserved residues . The tyrosine recombinases also possess three shorter conserved regions, designated patch I, II, and III . One of the invariant residues, a tyrosine within 20 or so amino acids of the carboxy-terminus, is directly involved in DNA strand scission . IntI integrases form a relatively tight homology group within the tyrosine recombinase family, with proteins from the different integron classes displaying 41–57% pairwise identity. In addition, IntI1–4 all possess an insert of about 16 amino acids between patch II and patch III that is not found in other tyrosine recombinases .
The attI sites of the respective integron classes are also different, with the only common feature being a core site that conforms to the same consensus, GTTRRRY, as that seen in the core site (1R) of 59-be sites. For attI1, it has been shown that the recombination crossover is between the G and first T of this site , as is the case for 59-be sites. Apart from a core site, attI sites also display little identity with 59-be sites [20–23]. For attI1, 2 and 3, a single potential simple site can be identified , although this structure is not present in attI4. All integron classes do however possess a common architecture, with the respective attI site being located near the 5′ end of their intI gene, and with genes that are associated with inserted cassettes being transcribed in the opposite direction to intI as shown in Fig. 1[2,14].
The PCR strategy used here takes advantage of conservation of sequences near the 3′ end of intI genes and sequence similarity in 59-base elements, as well as of the conserved structure of integrons. Using primers designed to target these regions, integron sequences could be directly recovered from environmental DNA. These near complete integrons belong to three previously undescribed classes.
2 Materials and methods
2.1 DNA template isolation
Soil samples were collected from Balmain Power Station and Homebush Bay (Sydney, Australia) and Pulgamurtie in Sturt National Park (western New South Wales, Australia). DNA was isolated from 400±20 mg soil .
2.2 PCR amplification
Primers for regions corresponding to a conserved C-terminal sequence in IntI integrases (HS298, 5′-TGGATCCCACRTGNGTRTADATCATNGT-3′) and to conserved sequences in 59-base elements (HS286, 5′-GGGATCCTCSGCTKGARCGAMTTGTTAGVC-3′) were designed. Linkers which include a BamHI site are underlined. The PCR was carried out using standard techniques with the cycling program as follows: [(94°C×2′30″)]×1, [(94°C×30″)(50°C×30″)(72°C×2′30″)]×35, [(72°C×2′30)]×1. The PCR product (1 μl) from Balmain template was re-amplified using the same conditions in order to increase product yield.
2.3 Ligation and transformation
PCR product was gel purified using the BRESAclean DNA Purification Kit (Geneworks, Australia) as per the manufacturer's instructions. Purified DNA (approximately 100 ng) was ligated into the pGEM-T Easy Vector (Promega, Madison, WI, USA) following the manufacturer's instructions. The ligation mixture was transformed by heat shock into JM109 Escherichia coli competent cells (cat. #L2001, Promega) following the manufacturer's protocol.
2.4 Plasmid isolation and sequencing
Plasmid from clones containing insert was isolated from 3 ml overnight cultures using the Wizard Plus Miniprep DNA Purification System (Promega) as per the manufacturer's instructions. DNA sequencing of cloned inserts was performed at the Macquarie Sequencing Facility (Macquarie University, Australia) using an ABI Prism 377 (Perkin-Elmer Biosystems), using primers flanking the insert region, pGEMF (5′-CCGACGTCGCATGCTCC-3′) and pGEMR (5′-CTCCCATATGGTCGACCTG-3′).
The vector in all cases was pGEM-T Easy. pMAQ675 contains the Pg2-type insert, pMAQ676 the Pg11-type insert and pMAQ677 the Bal3-type insert. Database accession numbers are: AF314191 (Pg2), AF314190 (Pg11) and AF314189 (Bal3).
2.6 Sequence retrieval and analysis
Sequence analyses were performed using programs available through the BioNavigator package (eBioinformatics Pty. Ltd., http://www.eBioinformatics.com) unless otherwise stated. ORFs showing sequence relationship to IntI1 were aligned to a representative set of tyrosine recombinase proteins using Clustal W  and the alignment manually optimized in GeneDoc . Sequences from Treponema denticola, Geobacter sulfurreducens and Shewanella putrefaciens unfinished genomes were identified by tblastn searches of microbial genomes through the NCBI web site http://www.ncbi.gov/Microb_blast/unfinishedgenome.html using IntI1 as the query sequence. The peptide sequences used in subsequent analyses are inferred from preliminary sequence data from The Institute for Genomic Research website at http://www.tigr.org. Accession numbers listed for these sequences (Table 1) refer to the TIGR sequence fragment.
Sequence identity relationshipsa for integron integrases and environmental integrase sequences
1. Bal3 IntI8
2. Pg2 IntI6
3. Pg11 IntI7
8. Sputr IntI
9. Gsulf IntI
10. Tdent IntI
aPairwise sequence identity was calculated in GeneDoc using the length of the shorter sequence as the denominator .
bPeptides 1–3 are from this study, peptides 4–6 are described in , and peptides 8–10 are from unfinished genomes (see Section 2). IntI4 is class 4 integron integrase from Vibrio cholerae. Accession numbers for sequences obtained from database records are: 4, D50438; 5, A42646; 6, L10818; 7, AF055586; 8, Sputr 6407; 9, Gsulf 51; 10, Tdent 7312
To recover integrons from environmental DNA by PCR, two primers were designed. HS298 is specific for the coding region of a conserved domain located 14–22 amino acids from the carboxy-termini of IntI1–4 (Fig. 3) . The second primer, HS286, was designed to bind within the left-hand simple site of 59-be sites (Fig. 2A). Since the structure of integrons is preserved across the known classes, the use of these primers in a PCR should allow the recovery of at least one near complete gene cassette, the attI site, and all but about the last 20 codons of intI (Fig. 1) and in control experiments with class 1 integrons this was the case (data not shown).
Protein alignment of environmental integrases with known integron-encoded integrases. Residues present in all seven integron-encoded integrases or that fall into functionally equivalent groups are shaded. The four very highly conserved residues from across the entire tyrosine family of recombinases  are printed white on black. The boxed region indicates the IntI patch. The overlined region indicates the amino acid residues from which the primer sequence for HS298 was deduced. The residues are numbered for IntI1.
Using HS286 and HS298, environmental DNA from four sites, Pulgamurtie, Balmain, and two locations at Homebush, was sampled for the presence of integrons by PCR. Products greater than 1000 bp were obtained from all four sites. A total of 12 recombinants were recovered and sequenced from three of the four sites, with eight of these derived from Pulgamurtie, two from Balmain, and two from Homebush. Six of the eight clones from Pulgamurtie (Pg11-type, here after called Pg11) had identical inserts of 1329 bp while the remaining two clones (Pg2) were different from these, but identical to each other, with an insert of 1297 bp. The inserts in the two clones derived from Balmain DNA (Bal3) were both 1197 bp in length and were 99% identical, differing at 11 positions, four of which change the sequence of the predicted IntI protein. It is likely that these clones differ as a result of substitutions introduced during the PCR. Two non-identical clones of about 900 bp were obtained from Homebush. Pg2, Pg11, Bal3 and the two Homebush clones were subjected to database searching. Neither clone from Homebush had significant ORFs, they did not reveal similar sequences in the databases and so were not analyzed further. In contrast, Pg2, Pg11 and Bal3 all included an ORF, the predicted product of which showed highly significant matches to DNA integrases exclusively (data not shown) and integron integrases in particular (Fig. 3; Table 1).
The predicted integrase proteins encoded by Pg2, Pg11, and Bal3 were named IntI6, IntI7, and IntI8 respectively, and are aligned with, and compared to, IntI1–4 (Fig. 3). It was found that, in pairwise comparisons, the level of amino acid identity between the environmental integrases (IntI6, 7, and 8) and the known IntI integrases (42–50%) was within the range of values seen for comparisons between IntI1–4 (41–57%) (Table 1). In contrast, the identity between the environmental integrases and non-IntI integrases was less than 25% (data not shown). These data suggest that the intI genes encoded by Pg2, Pg11, and Bal3 may each be part of an integron. This notion is supported by the fact that the three environmental integrases possess the additional motif of amino acids that is unique to the IntI integrases (boxed in Fig. 3) . It is further noteworthy that this motif is relatively conserved with seven of the 16 amino acids invariant across the seven proteins (Fig. 3).
Integrons are characterized by their ability to capture mobile gene cassettes. Consequently, if the environmental integrases are part of integrons, it is likely that a gene cassette would be present at the HS286 end of each clone. Although the location of the primer precludes the recovery of a complete 59-be, and therefore a complete cassette, the presence of a cassette could be inferred by the presence of an ORF extending up to, or through, the primer sequence. On examination of the sequences, an ORF of 131 bp was identified in Pg2 and an ORF of 200 bp was identified in Pg11. Notably, the predicted termination codon for both ORFs, a TAA, is within the site 1L (Fig. 2A) to which HS286 was, in part, designed. This sequence is highly conserved in 59-be sites and is commonly a termination codon for cassette-associated genes. Neither ORF matched any sequences in the databases (data not shown) and no obvious ORF could be found in Bal3.
Another characteristic of integrons is the presence of the attI site essential for the insertion of gene cassettes. In known integrons this site is located at, or near, the start codon of cassette-associated ORFs as it represents a site of cassette insertion [5,20]. Accordingly, Pg2 and Pg11 were examined in this region for attI-like sequences. As shown in Fig. 2B, both clones possess sequences suggestive of attI sites. In Pg2, characteristic features include a putative simple site and adjacent direct repeats. This structure and arrangement is very similar to that seen for attI1 and, in attI1, these sequences are important in both IntI binding and recombination activity [20,22,23]. While Pg11 lacked obvious direct repeats, a putative simple site was present, an arrangement analogous to that seen for attI3. In Bal3 the distance of the HS286 primer binding site from the start of the int gene, about 200 bp, is consistent with the position of attI sites in known integrons (Fig. 2B). As HS286 was designed to a simple site structure, it is possible that, in Bal3, the sequence of this putative attI site has allowed priming to occur from this location. No other obvious attI-like sequence could be found between the HS286 end of the Bal3 clone and the start of intI.
Most gene cassettes do not include a promoter. Thus cassette-associated genes are dependent on an integron-located promoter Pc, which has been identified in class 1 integrons [11,27] and the presence of which is inferred in the other classes. Each of the three putative environmental integrons was examined for possible cassette promoters. In each case, a possible promoter was identified and their relative positions are indicated in Fig. 1. The sequence of the −35 and −10 regions respectively, and spacing in each clone was: TATAAA(17)TATACT (Pg2), TCGAGG(16)TAAAAT (Pg11), and CTGTAA(17)AAGAAT (Bal3).
The presence of novel intI genes together with putative gene cassettes, and putative attI sites in at least two of the three clonal types recovered, strongly implies the presence of previously uncharacterized integrons. Consequently we have designated the integrons represented by Pg2, Pg11, and Bal3 class 6, 7, and 8 respectively.
From a sampling of three environments, we have been able to recover DNA fragments that include features characteristic of known integrons. In two cases, this includes evidence of an inserted gene cassette at a site possessing the architecture of known attI sites. Consequently, we conclude that these two recovered sequences are derived from complete integrons. As these environmental integrons are significantly different from those previously described, they are designated class 6 and class 7. While the third clone, recovered from Balmain, did not possess a partial gene cassette, other features are suggestive of it being part of a complete integron. These include the presence of a putative cassette promoter and sequences resembling an attI site at the region of binding of one of the PCR primers used to generate the amplicons. On the weight of evidence therefore, we conclude that this clone is derived from a third novel integron, which we have designated class 8.
Recent genome sequencing initiatives have highlighted the fact that lateral gene transfer is more prevalent in the bacterial world than previously realized and that elements such as plasmids, transposons and integrons are obvious candidates for facilitating such transfer . The recovery of new environmental integrons from two of the three locations sampled is consistent with the hypothesis that integrons are a common feature of bacterial populations and that they are not confined to pathogenic and multidrug-resistant bacteria. The two environments from which the new integron classes were recovered are quite disparate. Pulgamurtie is a site within a semi-arid National Park relatively free from anthropogenic disturbance and Balmain is an urban site that has suffered long-term industrial contamination. Neither site has had any known exposure to antibiotics. The fact that integrons are present in these environments reinforces the finding of the class 4 integron that these elements are involved in the mobilization of genes other than those involved in antibiotic resistance.
In further support of the notion that integrons are more widespread than previously realized we have identified integrase sequences, in the T. denticola, G. sulfurreducens, and S. putrefaciens genomes (Table 1) that are not of classes 1–4. As they are derived from genome sequencing initiatives that are only partly completed it is not yet possible to identify whether other integron components such as an attI site are present.
The stretch of about 16 amino acids located between patch II and patch III (Fig. 3) , which is present in IntI1–4 and 6–8, is relatively conserved with seven of the 16 residues invariant. As this motif appears to be unique to the IntI integrases we designate it the IntI integrase-specific patch (IntI patch). It is noteworthy that this IntI patch is present in the integrases in T. denticola, G. sulfurreducens, and S. putrefaciens (data not shown).
The technique described here is capable of recovering most of the genetic system known to be responsible for the site-specific capture of gene cassettes as well as the captured genes themselves. It is a novel method for surveying the prevalence of this gene capture system in the microbial world. We are presently using this technique to sample a greater range of environment types than those sampled here. It is also apparent to us that, if indeed this system has a general role in the transfer of genes between species, this approach may be a practical one for the recovery of new genes that are unlikely to be identified by whole genome sequencing strategies given that such ‘floating’ genes may not reside in the chromosome of a defined organism.
This work was supported by a Research Innovation Fund grant from Macquarie University. We thank Dr. Ruth Hall for a critical appraisal of the manuscript. This is publication number 335 from the KCBB.