OUP user menu

Cloning and sequence analysis of the putative rifamycin polyketide synthase gene cluster from Amycolatopsis mediterranei

Thomas Schupp, Christiane Toupet, Nathalie Engel, Stephen Goff
DOI: http://dx.doi.org/10.1111/j.1574-6968.1998.tb12861.x 201-267 First published online: 1 February 1998


The 54-kbp Type I polyketide synthase gene cluster, most probably involved in rifamycin biosynthesis by Amycolatopsis mediterranei, was cloned in E. coli and completely sequenced. The DNA encodes five closely packed, very large open reading frames reading in one direction. As expected from the chemical structure of rifamycins, ten polyketide synthase modules and a CoA ligase domain were identified in the five open reading frames which contain one to three polyketide synthase modules each. The order of the functional domains on the DNA probably reflects the order in which they are used because each of the modules contains the predicted acetate or propionate transferase, dehydratase, and β-ketoacyl-ACP reductase functions, required for the respective step in rifamycin biosynthesis.

Key words
  • Amycolatopsis mediterranei
  • Rifamycin
  • Ansamycin
  • Polyketide synthase
  • Polyketide synthase gene cluster
  • Cloning
  • Sequence analysis

1 Introduction

Rifamycins form an important group of macrocyclic antibiotics which inhibit bacterial DNA transcription very specifically by interacting with the DNA-dependent RNA polymerase [1]. These antibiotics are most effective against Gram-positve bacteria and they are clinically broadly applied against Mycobacterium tuberculosis infections. Some rifamycin derivatives have been found to be effective against difficult-to-treat M. avium and M. intracellulare infections in patients with AIDS [2], and against pneumococci with decreased susceptibility to penicillin [3]. Rifamycins with lipophilic side chains were found to be active at higher concentrations against RNA tumor viruses [4].

The starting compound for semisynthetic and therapeutically useful rifamycins is rifamycin B, which is produced by the soil bacterium Amycolatopsis mediterranei, belonging to the order Actinomycetales [5]. Rifamycin B is a macrocyclic polyketide, containing a naphthoquinone chromophore spanned by a long aliphatic bridge. Incorporation studies with 13C enriched precursors and biosynthetic analysis have demonstrated that 3-amino-5-hydroxybenzoic acid, derived from the shikimate pathway, and eight propionate and two acetate units are condensed to build both the chromophore and the aliphatic bridge of the rifamycin B molecule [5]. Based on these data it was concluded that rifamycin B and the other rifamycins produced by A. mediterranei are synthesized by a polyketide synthase (PKS), that uses a 3-amino-5-hydroxybenzoic acid starter unit and sequentially adds acetate and propionate units. Ten successive condensation steps and the correct processing of the keto groups lead to the formation of the rifamycin ring system.

To obtain information on the genes involved in rifamycin synthesis and their organization, we have undertaken and describe here the cloning and DNA sequence analysis of the putative rifamycin PKS gene cluster. This information will be important to improve our understanding of rifamycin biosynthesis and for the future application of molecular genetic techniques to rationally manipulate rifamycin biosynthesis, either to increase the productivity of industrial strains further or to produce modified rifamycins for clinical applications.

2 Materials and methods

2.1 Bacterial strains and culture conditions

A. mediterranei LBG A3136 wild type (from the Ciba-Geigy collection) was used in this work [5]. The strain was cultivated at 28°C in liquid medium NL148G [6] without glycine for vegetative growth and with 2.5 g l−1 glycine for the isolation of chromosomal DNA. Escherichia coli HB101 and DH5α were used for cloning of DNA and propagation of plasmids, and E. coli XL-1BlueMR (Stratagene, La Jolla, USA) for phage lambda transfections. E. coli was grown at 37°C in Luria broth or on Luria agar [7] supplemented with the appropriate antibiotics when needed.

2.2 DNA isolation and cloning

Standard genetic techniques for in vitro DNA manipulations and with E. coli were as described [7]. Genomic DNA of A. mediterranei was isolated from cells, grown for 48 h in 50 ml NL148G with 2.5 g l−1 glycine, centrifuged at 3000×g for 30 min, and resuspended in 5 ml SET buffer (75 mM NaCl, 25 mM EDTA, 20 mM Tris, pH 7.5). High molecular weight DNA was isolated from these cells as described [8]. Southern blot analysis and cloning of DNA fragments was as described earlier [7, 9]. Nick translation was done with the Gibco/BRL Kit (Life Technologies AG, Basel, CH) according to the manufacturer's instructions. To clone the 13-kbp BglII genomic A. mediterranei fragment, BglII fragments of the size range of 12–16 kbp were isolated from a 0.8% agarose gel by electroelution [7] and cloned into the E. coli positive selection vector pIJ4642, which was derived from pIJ666 [10] by removal of the BglII-BclI fragment containing the neo gene. Chloramphenicol resistant colonies of E. coli HB101 transformed with this gene library were analysed by colony hybridization [7] using as probe the 32P-dCTP labeled 3.8-kbp PvuI fragment of the plasmid p98/1 encoding part of the soraphen PKS gene cluster [11]. Positive clones were analysed by Southern blot hybridization. A 5.7-kbp KpnI fragment, internal to the 13-kbp BglII fragment of A. mediterranei, was isolated and used as a probe to screen the A. mediterranei cosmid library.

2.3 Construction of an A. mediterranei genomic library

Genomic DNA of A. mediterranei was partially digested with Sau3A1 and size fractionated on a sucrose density gradient. DNA fragments of 35 to 45 kbp were ligated to the BamHI-cut pWE15 cosmid vector (Stratagene). The ligated DNA was packaged into lambda phage particles by using the in vitro packaging kit from Stratagene, and transfected into E. coli XL-1BlueMR.

2.4 DNA sequencing

Intact cosmid clones or isolated plasmid subclone inserts were fragmented using an Aero-Mist Nebulizer (CIS-US, Bedford, MA, USA) with a nitrogen pressure of 1–2 pounds per square cm. These random fragments were treated with bacteriophage T4-DNA polymerase, T4 DNA kinase, and E. coli DNA polymerase in the presence of dNTPs to generate 5′ phosphorylated blunt-ended fragments. Fragments were then fractionated on 0.8% low-melting temperature agarose, and 1.5–2-kbp fragments were isolated from the agarose via warm phenol extraction [7]. These fragments were subcloned into blunt-ended, dephosphorylated pBlueScript KS+ (Stratagene) and introduced by transformation into E. coli DH5α. Plasmid DNA for sequencing was isolated from overnight liquid culture grown in 96-well 2-ml deepblock plates [12]. Sequencing reactions were performed using the ‘Perkin Elmer/Applied Biosystems Dye Terminator cycle sequencing ready reaction premix’ (Catalog number 402122) and 20 cycles of linear amplification in a thermocycler. Sequencing reactions were ethanol precipitated, resuspended in formamide loading buffer, and run on an Applied Biosystems (ABI) Model 377 automated DNA sequencer according to the recommendations of the manufacturer. Gel files were tracked and extracted using ABI Analysis Software. Extracted chromatogram files were transferred to a SUN UltraSparc workstation, and sequence assembly and editing was accomplished using the PHRED/CROSS-MATCH/PHRAP/CONSED suite (Phillip Green, University of Washington; Brent Ewing, Washington University in Saint Louis; and David Gordon, Washington University in Saint Louis) and the program GAP [13]. Following assembly of the initial sequencing reactions into multiple contigs, the remaining gaps between contigs were closed by primer-walking, longer reads, and opposite strand sequencing. Low quality areas were improved by additional sequencing coverage. DNA and protein sequences were analysed with the University of Wisconsin Genetics Computer Group programs [13].

3 Results and discussion

3.1 Cloning of a polyketide synthase (PKS) gene cluster from A. mediterranei

PKS genes were identified in the chromosome of A. mediterranei by hybridization to DNA of the Type I PKS gene cluster of Sorangium cellulosum responsible for the biosynthesis of the macrocyclic polyketide antibiotic soraphen A [11]. To clone these genes a 13-kbp BglII fragment from A. mediterranei which gave a hybridization signal in a Southern blot, using as probe a 3.8-kbp PvuI DNA fragment from the soraphen PKS gene cluster, was isolated and cloned in E. coli (see Section 2). A cosmid library of A. mediterranei genomic DNA was constructed and screened for homology to the 13-kbp BglII fragment. Three cosmid clones that gave very strong hybridization signals were identified. Restriction and DNA hybridization analysis revealed that the three cosmid clones overlap and cover a region of about 61 kbp of the A. mediterranei genomic DNA. A continuous stretch of 54 kbp of this region from the A. mediterranei chromosome has significant DNA homology to the PKS genes of Sorangium cellulosum and also to PKS genes (eryA) of Saccharopolyspora erythraea governing the biosynthesis of erythromycin [14]. A restriction map of this region in the A. mediterranei chromosome is shown in Fig. 1 A.

Figure 1

53.8-kbp Type I polyketide synthase (PKS) gene cluster of A. mediterranei involved in rifamycin biosynthesis. A: Restriction map of the genomic region and overlapping inserts of the three cosmid clones used for cloning and sequencing. B: Organization of the ten Type I PKS modules, order of the functional domains and postulated biosynthetic intermediates of the rifamycin PKS. Enzymatic activities are indicated as follows: CL, CoA ligase; ACP, acylcarrier protein; KS, β-ketoacyl-ACP synthase; AT, acyltransferase; DH, dehydratase; KR, β-ketoacyl-ACP reductase. Shaded domains are possibly inactive. C: Structures of rifamycin B, protorifamycin I, protorifamycin (postulated direct product of the rifamycin PKS), and the early intermediate P8/1-OG.

3.2 Sequence analysis of the 54 kbp PKS gene cluster

The cloned 54-kbp chromosomal region of A. mediterranei with homology to PKS genes from S. cellulosum and S. erythraea was completely sequenced (see Section 2). The nucleotide sequence obtained (EMBL No. xyz) was analyzed for open reading frames (ORFs) using the computer program Codonpreference [13]. This analysis revealed five closely packed, very large ORFs reading in the same direction (see Fig. 1B). All five ORFs showed a strong bias towards G or C in the third codon position typical for translated genes in high G+C DNA. The overall G+C content in the translated region is 74%. The most probable translational start sites for the five ORFs were determined by identification of plausible ribosomal binding sites (RBS) with similarity to the 3′ end of the S. lividans 16S rRNA sequence.

The deduced amino acid sequences of the five ORFs are highly repetitive. Each repeating unit corresponds to a catalytic module responsible for a specific round of polyketide chain extension. Ten well defined rifamycin PKS modules were identified by comparison with the PKS biosynthetic domains of the 6-deoxyerythronolide B synthase (DEBS) of S. erythraea[15]. ORF1 and ORF2 contain three modules, ORF3 and ORF4 each code for a single module and ORF5 encodes two modules. This fits exactly to the number of condensation steps required to build up the rifamycin polyketide chain [5].

3.3 Organization of the enzymatic domains in the PKS gene cluster

The localisation of the enzymatic domains in the multifunctional proteins, encoded by the five ORFs of the polyketide synthase gene cluster, was determined by computer assisted comparison with the well defined active domains of DEBS of S. erythraea[15]. The order of the active domains found in the 10 modules (Table 1) is exactly the same as found in other Type I PKS [15, 16]. The range of homology of the different active domains as compared to the domains of DEBS responsible for the biosynthesis of erythromycin were as follows: KS 62%-65%, AT 38%-56%, DH 41%-49%, KR 43%-52%, ACP 50%-54% (% amino acid identity over the domain length).

View this table:
Table 1

Deduced functions of the five ORFs in the PKS gene cluster

ORF (amino acids)Position in gene cluster (start–end point)Proposed functions
ORF1 (4735)1336–15543
Loading domainCoA ligase, ACP
Module 1KS, AT(P), KR, ACP
Module 2KS, AT(A), ACP
Module 3KS, AT(P), ACP
ORF2 (5069)15550–30759
Module 4KS, AT(P), DH, KR, ACP
Module 5KS, AT(P), KR, ACP
Module 6KS, AT(P), DH, KR, ACP
ORF3 (1763)30769–36060
Module 7KS, AT(P), DH, KR, ACP
ORF4 (1728)36139–41325
Module 8KS, AT(P), DH, KR, ACP
ORF5 (3413)41373–51614
Module 9KS, AT(A), DH, KR, ACP
Module 10KS, AT(P), DH, KR, ACP
  • Predicted PKS enzymatic activities are indicated as follows: KS, β-ketoacyl-ACP synthase; AT(P), acyltransferase incorporating a propionate unit; AT(A), acyltransferase incorporating an acetate unit; DH, dehydratase; DH, dehydratase, probably not functional; KR, β-ketoacyl-ACP reductase; ACP, acylcarrier protein.

Two classes of AT domains were found. Modules 2 and 9 contain a sequence motif typical for AT domains catalysing the transacylation of malonyl-CoA, and thus being responsible for incorporation of acetate extender units [17]. The other 8 modules contain AT domains with sequence motifs typical for AT domains catalysing the transacylation of methylmalonyl-CoA, resulting in the incorporation of propionate extender units into the growing polyketide chain.

Eight modules, namely numbers 1 and 4 to 10, contain KR domains with a high degree identity to the KR domains of DEBS from S. erythraea. All these domains contain a potential motif for NADP(H) binding (GxGxxGxxxA) between amino acids 9 and 20, the same as was found in the KR domains of DEBS for erythromycin biosynthesis [15] and the KR domain of the soraphen PKS of S. cellulosum[11]. Module 3 contains a KR domain with lower identity to the KR domains of DEBS, and has an imperfect NADP(H) binding motif (GAEGLGRHAS). Therefore the KR domain of module 3 is probably inactive. No ketoreductase domain is present in module 2, which is the smallest module in the PKS, containing only the core enzymatic functions KS, AT, ACP (Table 1).

Eight of the ten modules contain a domain which is 41–49% identical to the DH domain of DEBS. Six of these have a region with good similarity to the postulated DH active site motif (Table 2). Interestingly, modules 6, 7, and 8 seem to contain DH domains that are intact, even though the corresponding positions are hydroxylated in rifamycin B (Fig. 1). Enoyl reduction is probably not required for rifamycin biosynthesis, and none of the 10 modules contains a typical ER domain.

View this table:
Table 2

DH domains in the ten modules of the putative rifamycin PKS: analysis and alignment of the active site motifs in comparison to the DH domain of DEBS

ModuleActivity required for rifamycin biosynthesisIdentity to DH domain of DEBSActive site motif H.G.PComment
4Yes48% HAIGGVVLIPActive
6No44% HAVGGVVILPActive?
7No49% HTIGGVVLFPActive?
8No49% HTLEDLVVVPActive?
9Yes49% HVIGGVVLVAActive
10Yes47% HAVRDVVIVPActive

3.4 Loading domains in the N-terminal region of the ORF1 deduced protein

The protein deduced from ORF1 has at the N-terminal end a long amino acid extension not assigned to module 1. This leading region contains two enzymatic domains which are probably involved in starter unit activation and attachment to the PKS for rifamycin biosynthesis. The first domain, 507 amino acids at the N-terminal end, has 52% identity to the starter unit activation domain of the rapamycin PKS (RAPS) from S. hygroscopicus[16], which also has homology to ATP-dependent carboxylic acid-CoA ligases (CL). This similarity suggests that the CL domain in Fig. 1 B may form the activated starter unit, 3-amino-5-hydroxybenzoyl-CoA [5]. The region downstream of CL resembles ACP domains of Type I PKS; its role may be the binding of the activated aromatic starter unit, to the PKS for rifamycin biosynthesis.

3.5 Evidence for involvement of the described PKS in rifamycin biosynthesis

Attempts to disrupt the sequenced PKS genes in A. mediterranei were unsuccessful so far, probably because of low transformation and recombination frequencies. It was therefore not possible to test the involvement of these genes in rifamycin biosynthesis by this functional test. However, the described gene cluster of A. mediterranei and the deduced protein sequences match closely the features expected for the rifamycin PKS gene cluster. (i) There are 10 repetitive Type I PKS modules, as required for rifamycin biosynthesis. (ii) The positions of the acetate incorporating AT domains in modules 2 and 9, and of propionate incorporating AT domains in the other eight modules, is exactly as required for the synthesis of the rifamycin polyketide chain. (iii) Correct processing of the β-keto groups by the PKS for 7 of the 10 modules. (iv) Colinearity of the functional modules deduced from the PKS gene cluster with the order of the respective enzymatic steps needed for biosynthesis of the rifamycin ring system (see Fig. 1). (v) There is a specific loading domain, similar to the one of RAPS, for the activation of the aromatic starter unit. (vi) An early intermediate of rifamycin biosynthesis, P8/1-OG, detected in a mutant of A. mediterranei[18], is identical to the postulated intermediate of a mutated PKS, in which only the modules 1 to 3 are active (Fig. 1C). The potential DH activities in the modules 6, 7 and 8 are unexpected. It is possible that some of these may be active and that the corresponding hydroxyl groups are introduced later by the action of hydroxylases. Similar discrepancies exist in the case of rapamycin [16].

Protorifamycin, the postulated direct product of the rifamycin PKS, encoded by the described gene cluster, was not found so far in fermentations of A. mediterranei. However, protorifamycin I, detected in a mutant of A. mediterranei[19], differs only by the absence of a hydroxyl group in the naphthoquinone ring and the oxidation of one methyl group of the aliphatic ring (Fig. 1C). It can be assumed that in rifamycin B biosynthesis protorifamycin is transformed in two enzymatic reduction/oxidation steps into protorifamycin I. From protorifamycin I the biosynthetic pathway to rifamycin B, with the intermediates rifamycin W and rifamycin S is well established by biosynthetic and genetic studies [5, 20].


We thank T. Kieser and J. Ligon for helpful discussions and critical reading of the manuscript. We are grateful to R. Amstutz and U. Regenass for supporting this work.


View Abstract