OUP user menu

Sample sequencing of a Salmonella typhimurium LT2 lambda library: comparison to the Escherichia coli K12 genome

Rita M.-Y. Wong , K.K. Wong , Nicholas R. Benson , Michael McClelland
DOI: http://dx.doi.org/10.1111/j.1574-6968.1999.tb13533.x 411-423 First published online: 1 April 1999

Abstract

As part of the ongoing sequencing of the complete Salmonella typhimurium LT2 genome, a partly ordered set of 416 lambda clones has been developed, representing over 90% of the genome. The average insert size is 17 kb. Sequences were obtained from both ends of each clone in this set. A total of over 600 kb of sequence has been deposited in the genome survey sequence section of GenBank. This resource of clones is available from the Salmonella Genome Stock Center. A preliminary comparison with the Escherichia coli K12 genome indicates that there are likely to be many hundred insertion deletion events, encompassing more than one gene, that distinguish these genomes. Fully 30% of the S. typhimurium sequences have no close homologs in the GenBank database.

Keywords
  • Sample sequencing
  • Salmonella
  • Bacteriophage lambda library
  • Comparative genomics

1 Introduction

As part of a project to complete the genomic sequence of Salmonella typhimurium LT2 we have constructed a bacteriophage lambda library. Sequencing the ends of a sample of these lambda clones ensures the correct melding of the genomic sequence by confirming linkage over many kilobases while also contributing information towards complete sequence of the genome. This library is also a resource for closing gaps in the sequence.

The M13 clones used in the sequencing project will not be maintained after the end of the project. However, the lambda clones will be maintained as a permanent resource of clones from the genome. This manuscript describes the lambda resource, which is being made available prior to the completion of the sequencing project.

2 Materials and methods

Genomic DNA from Salmonella typhimurium LT2 strain AZ1516 was partially digested with Sau3A and the 15–20-kb size class was cloned in a lambda DASHII vector. A total of over 2000 clones were examined for overlap by previously described restriction mapping methods or by deriving radiolabeled riboprobes from one end of an insert in a clone and hybridizing this probe to an array of the clones [1]. The preparation of the library and the methods used to order clones are described in more detail elsewhere [1].

Phage were prepared using standard procedures [2]. DNA was purified from each of these bacteriophage using the Bio101 quick spin kit (www.bio101.com, La Jolla, CA). Five micrograms of each DNA was sequenced from both ends using a Li-Cor sequencer (www.licor.com/bio/, Lincoln, NE) and two vector primers, bearing different infrared fluors, located in the T3 and T7 promoters flanking the cloning site. This strategy allowed the sequences from both ends of the clones to be obtained from four lanes on a sequencing gel (www.licor.com/bio/Posters/GenSeq97/GSAabs.htm).

3 Results and discussion

A total of 416 clones that had minimal or no overlap with each other were selected for sequencing. These clones are estimated to represent well over 90% of the Salmonella genome, after taking overlap into account. An average of about 900 bases of readable sequence was obtained from each successful sequencing reaction. Approximately 600 kb of sequence from 836 reads have been deposited in the genome survey sequence division of the GenBank database (www4.ncbi.nlm.nih.gov/dbGSS/index.html) with accession numbers AF003831AF003833, AF029406AF036003, AF075756AF076018 and AF120033AF120089.

The sequence data are also part of the sequencing project web site (http://genome.wustl.edu/gsc/bacterial/Salmonella.shtml). The latter web site contains a Blast server at http://genome.wustl.edu/gsc/bacterial/bacterial_blast_server.html. This server searches the sequences presented here and the melded M13 sequences from the ongoing sequencing project, currently amounting to over 3 Mb of sequence.

Each sequence from the lambda clones was compared to the complete Escherichia coli K12 genome [3] using BlastN [4] (http://www.ncbi.nlm.nih.gov/BLAST/). Homologous regions between Salmonella and E. coli are generally about 85% identical at the nucleotide level [5]. Thus, a probability threshold of P<e−50 in BlastN was chosen as the definition of putative orthologs because this generally indicated a more than 80% match spanning at least 400 bases. These data are summarized in Table 1.

View this table:
Table 1
graphic
graphic
graphic
graphic
graphic
graphic
graphic
graphic
graphic

High homology with the E. coli K12 genome was seen for both ends of a clone in 222 cases. Among the clones that matched E. coli at both ends, we determined the insert size in 106 cases. Forty of these 106 insert sizes differed in size by more than 4000 bases when compared to the corresponding apparently orthologous region in E. coli, indicating there may be a relatively large net insertion/deletion event in these clones (marked in bold in column E of Table 1). Nine more clones matched the E. coli K12 genome at both ends, but at very widely divergent positions in the E. coli genome. These clones are marked with an asterisk in column C of Table 1. Some of these clones may represent true rearrangements between the Salmonella and E. coli genomes, whereas others may indicate paralogous comparisons with sequences that are not adjacent in the E. coli genome. One hundred and twenty-nine clones matched E. coli K12 only at one end; 65 clones matched E. coli at neither end.

One hundred and fifty-eight of the 836 sequences were highly homologous or identical to sequences from various Salmonella strains already in the GenBank database (P<e−50 in BlastN), reflecting the amount of sequence already available from Salmonella genomes (Marked in bold italics in columns F and G of Table 1). The 836 S. typhimurium sample sequences were also compared to the rest of the GenBank database and a few sequences shared their best homology with sequences other than E. coli K12 or Salmonella. Homologies with a significance of P<e−9 are indicated in Table 1. Further details of the genes involved are presented in Table 2. In many of these cases, there is a close match with E. coli K12 at one end and a close match with a different genome at the other end of the clone. There are cases where bacteriophage or plasmid sequences are the best homologs in the database for one end of a clone. It is possible that these sequences are from previously unknown extrachromosomal phage or plasmids. They are more likely to be from genes that are integrated in the genome of LT2 (such as the FELS prophage [6, 7]), but are related to genes found on phage or plasmids in other bacteria.

View this table:
Table 2

Novel S. typhimurium LT2 genome sequences with higher homology to sequences other than the E. coli K12 genome

Clone numberOrganism with best homology with clone endAccession numberGeneBlastN
T3
1049K. oxytocaAF017781ddrA, ddrB0e+0
B235Klebsiella pneumoniaeL41068hpaA1e−138
A350E. coli plasmid R100-1AF005044traV6e−88
968E. coliM55249retron Ec676e−50
A78Klebsiella aerogenesL01114nac2e−48
163E. coli bacteriophage N15AF064539gp135e−34
270K. pneumoniaeU19581ramA6e−32
175P. putidaX58483hutU7e−29
A365E. coli plasmid pRSD2U82290rafY9e−15
1234E. coli plasmid FM59763traG5e−13
A173Bacteriophage PA-2J02580RZ6e−10
T7
716Citrobacter freundiiD28594hyaA1e−149
560E. coli Bacteriophage P2P25479terminase1e−91
B44P. putidaM35140hutH4e−83
741E. coli hpa geneZ37980hpaG1e−64
B220E. coli prophage CP4-57P32053integrase7e−63
178Klebsiella sp.U32616asst1e−45
1248E. coli F plasmidM97768gene 321e−32
419E. coli Bacteriophage lambdaJ02459vhsJ4e−18
  • A BlastX score.

In 836 sequence reads from around the S. typhimurium genome, we detected 259 sequence reads that were not homologous to E. coli K12. This represents about 30% of the sequences. Thus, based on a genome size of about 5 Mb, it is estimated that there may be 1.5 Mb of non-homologous sequences present in S. typhimurium and absent in the E. coli K12 genome. In each case, such genes may have been introduced into Salmonella after divergence from the common ancestor with E. coli, or these genes may have been deleted in the E. coli lineage.

The large number of S. typhimurium sequences that showed little or no homology with the E. coli K12 genome indicate that these two genomes are rather more different than might be suggested by the considerable concordance in their genetic maps [8]. DNA–DNA hybridization studies estimated the amount of non-homologous sequence to be 30–40% of these genomes [810], which may more accurately reflect the number of regions in these genomes that do not share homology. The proportion of non-homologous sequences observed in the sample we present here (30%) is similar to these DNA–DNA hybridization estimates and is also similar to the proportion of non-homologous sequences we obtained when we compared sample sequences from Salmonella typhi with the complete E. coli K12 genome (38%) [11]. The difference between the 30% and 38% divergence estimates may be attributed to the different length of the sequence reads in the two studies and the different threshold in BlastN used for scoring a homolog that the difference in sequence length required.

The number of insertion/deletion events that distinguish the S. typhimurium and E. coli genomes must be very high. We noted that 40 clones (38%) of the 106 clones of known size that matched E. coli at both ends showed insertion/deletion events of over 4000 bases (Table 1, column E). These clones represent about 1.8 Mb of the genome (106×17 kb), so by extrapolation, perhaps there are well over 100 insertion/deletion events of over 4000 base pairs (40×5 Mb/1.8 Mb=111). This latter estimate is similar to the estimate we obtained for the S. typhi versus E. coli genome, which was determined using a very different approach: the rate of detection of putative junctions between homologous and unique DNA in a set of sample sequences [11].

The sequences we report in this paper and the associated lambda clones have already proved useful as a source of DNA for complementation studies [12] and are a vital component for completion of the Salmonella typhimurium LT2 genome sequence (http://genome.wustl.edu/gsc/bacterial/salmonella.shtml). We previously published an additional set of restriction mapped clones covering the region from about 4 250 000 to about 4 500 000 in E. coli [1]. End sequences from some of these clones have been obtained and are included in Table 1. The resource of over 2000 lambda clones is deposited at the Salmonella stock center (www.ucalgary.ca/~kesander/intro.html).

Acknowledgements

This work was supported by grants from the United States National Institute of Allergy and Infectious Diseases grants AI-34829 and AI-43283. We thank Ken Sanderson, Rick Wilson, and the bioinformatics staff at the Genome Sequencing Center of Washington University, St. Louis, for many helpful discussions and for maintaining the web sites.

References

  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
  11. [11].
  12. [12].
View Abstract