原文 | English |
---|---|
頁(從 - 到) | 548-552 |
頁數 | 5 |
期刊 | Genomics |
卷 | 91 |
發行號 | 6 |
DOIs | |
出版狀態 | Published - 6月 2008 |
引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver
}
於: Genomics, 卷 91, 編號 6, 06.2008, p. 548-552.
研究成果: Letter › 同行評審
TY - JOUR
T1 - Genome sequences of Halobacterium species
AU - Ng, Wailap Victor
AU - Berquist, Brian R.
AU - Coker, James A.
AU - Capes, Melinda
AU - Wu, Timothy H.
AU - DasSarma, Priya
AU - DasSarma, Shiladitya
N1 - Funding Information: Halobacterium species Wailap Victor Ng a Brian R. Berquist b James A. Coker c Melinda Capes c Timothy H. Wu a Priya DasSarma c Shiladitya DasSarma c ⁎ a Department of Biotechnology and Laboratory Science in Medicine, Institute of Biomedical Informatics, National Yang Ming University, Taipei 112, Taiwan, ROC b Unit of Structure and Function in Base Excision Repair, National Institute on Aging, National Institutes of Health, Baltimore, MD 21224, USA c University of Maryland Biotechnology Institute, Center of Marine Biotechnology, Baltimore MD 21202, USA ⁎ Corresponding author. The last decade of the 20th century was an exciting period of genomic research, with the appearance of genome sequences for several dozen diverse microorganisms. Not surprisingly, Science magazine chose sequenced genomes as the breakthrough of the year in 2000 [1] and the American Society for Microbiology published a timeline for completed microbial genomes on the cover of ASM News in the June edition of 2001 [2] . Among these notable achievements was the genome sequence of a representative halophilic archaeon, a member of the third branch of life that could grow at saturated concentrations of sodium chloride. In fact, this haloarchaeon, Halobacterium sp. NRC-1, required brine several times more concentrated than seawater for viability, survived desiccated in salt crystals, and tolerated ultraviolet and ionizing irradiation at intensities thousands of times higher than most cells can withstand [3–5] . The genome sequence of Halobacterium sp. strain NRC-1 was published on October 24, 2000, in a paper co-authored by 44 scientists belonging to an international consortium of 12 research groups [6] . This genome has served as the prototype haloarchaeal genome ever since. The publication of the NRC-1 genome, together with development of a facile knockout system using the selectable and counter-selectable ura3 gene, provided ample research opportunities for the community of microbiologists, geneticists, and evolutionary biologists interested in this class of archaea [7] . Subsequent development of whole genome DNA microarrays showed the value of the completed sequence and led to analyses of the response of this organism to a wide variety of environmental changes [8] . These resources coupled with a bioinformatic database and tool, HaloWeb [9] , publicly available since the publication of the genome sequence in 2000, recognized by Thomson and indexed in ISI Web of Knowledge Current Web, has catapulted this archaeal strain to a model system used in laboratories world-wide and employed for teaching in schools and colleges [10] . Not surprisingly, the original genome paper, published in the Proceedings of the National Academy of Sciences USA , has been cited over 300 times to date in the literature and two related publications in Genome Research , “Sequencing of the NRC-1 small megaplasmid“ [11] and “Computational analysis of the complete genome sequence” [12] , have been cited over 100 times. One of the most anticipated pieces of data following up on the Halobacterium sp. NRC-1 genome sequence was the complete genome sequence of R-1. Although this genome was reported as being complete in many meetings over the years and touted as being essentially identical to NRC-1, as far back as 2002 on a private website (HaloLex [13] ), the nucleotide sequence was not made publicly available to the community. In fact, the purported R-1 genome sequence was not deposited in any public database and was not available for download from NCBI until July 27, 2007. By comparison, the NRC-1 genome had been shared with all 12 collaborating laboratories during the draft phase and the final sequence subsequently deposited for the entire international research community on July 14, 2000. During the interim period between public release of the initial Halobacterium sp. genome sequence and the second Halobacterium sp. genome sequence, over 600 other genomes have been reported, including several other haloarchaeal genomes. One reason that the genome of R-1 was so eagerly anticipated was that the classic papers by Sapienza and Doolittle in × 10 Nature [14,15] had reported that the genomes of both of these strains were highly novel. These authors reported, based on probing with random fragments cloned with EcoRI, HindIII, and BamHI, that the genomes were full of repeated elements, estimated to belong to more than 50 families. They also reported that greater than 4 - 3 recombinational events occur per family per generation and that two daughter cells produced by a single cell division have only an 80% chance of bearing identical genomes. Although the genome sequence of NRC-1 showed only 91 IS elements belonging to 12 families, the original results being skewed by the high GC content and the selection of restriction enzymes (especially EcoRI and HindIII with relatively AT-rich recognition sequences), the unusual genome structure and instability were still remarkable. The comparison of the genome of NRC-1 to the genome of R-1, the latter of which was reportedly isolated as a spontaneously occurring gas vesicle deficient variant of the wild-type NRC-1 strain, was still of great interest [16,17] . However, after a wait of nearly 8 years, the paper by Pfeiffer et al. reporting the genome sequence of R-1 published in a recent issue of Genomics [18] is, unfortunately, extremely disappointing because of what we believe are subjective and unfounded claims made regarding the NRC-1 genome assembly. Instead of a focus on the science with a comprehensive comparison of similarities and differences between the NRC-1 and R-1 genomes, the authors insinuate that the NRC-1 genome was not properly assembled, while at the same time ignoring the fact that a decade of cloning, mapping, and sequencing had clearly established both the structures and the rearrangements of these extrachromosomal replicons in strain NRC-1. These results have been documented in over a dozen peer-reviewed publications [11,19–33] . In so doing, they not only ignore the published record from our group, they also ignore resequencing of the NRC-1 genome by the 454 company published over 2 years ago, which confirmed the original conclusions [34] . Pfeiffer et al. characterize the differences between strains R-1 and NRC-1 as “vanishingly” small, and object to the name of the organism as a species of Halobacterium , since it suggests to them that it is a “distinct species.” Indeed, our choice was to drop the species designation in the absence of definitive genomic data on Halobacterium isolates, a fact that was clearly stated on the title page footnote in the NRC-1 genome paper [6] : “ Halobacterium species are referred to in the literature by a variety of designations, including H. halobium , H. cutirubrum , H. salinarium , and H. salinarum . The precise relationships among these organisms and Halobacterium sp. strain NRC-1 are not entirely clear. Strain NRC-1 was a gift from W.F. Doolittle, Dalhousie University, Halifax, Canada. The strain has been deposited with the American Type Culture Collection, Manassas, VA (reference no. ATCC700922).” The purpose of removing the species designation, contrary to the suggestion of Pfeiffer et al., was to clarify the differences between these related but distinct isolates. Given the long history of isolation of haloarchaea from salted food and leather, and the presence of many repeated elements in the genomes of these strains, it is not surprising that there are many related but distinct isolates. A recent study of prokaryotic taxonomy at the American Type Culture Collection indicated that there is as much diversity within the Halobacterium genus (among strains of Halobacterium ‘salinarum’ ) as there is in the entire family of Halobacteriaceae and that NRC-1 is more distantly related to R-1 than some that are members of other genera [35,36] . Because there is no universally accepted standard for prokaryotic species designation, some clades have much more diversity than others. Among halophiles, some isolates have multiple divergent 16S rRNA genes in the same genome, while other diverse isolates have similar or identical 16S rRNA [37,38] . The fact that the true identities of NRC-1 and R-1 are quite unclear (see, e.g., [16,36] ; also R.D. Simon and W.F. Doolittle, personal communications), our designation for NRC-1 was simply a way to show caution and objectivity to avoid confusion from naming and renaming a widely used strain. One of the most important results from the R-1 genome sequence by Pfeiffer et al. [18] is that it has reaffirmed the high quality of the Halobacterium sp. NRC-1 genome sequence, which was completed by our consortium in 2000 [6,34] . Of the 2,014,239-bp large chromosome of NRC-1, only 4 single nucleotide polymorphisms and 5 single base and 3 large indels were found in the chromosomes of these 2 strains. Though not cited by Pfeiffer et al., the similarities were already established by extensive macrorestriction mapping work [30,39] , as well as the cosmid mapping studies [40,41] over a dozen years ago. Although there are variations in the genetic arrangements and sequences of the plasmids, we consider these to be natural variations. Because the transpositions of insertion sequence elements (ISH) may result in insertions, deletions, and other types of DNA rearrangements, it would not be surprising if such events had occurred in the ISH rich plasmids and the data do not clearly establish whether both strains originated from the same isolate. As detailed above, DNA variations including plasmid DNA structural differences have already been well documented in previous studies. On the question of assembly and misassembly, the Phred, Phrap, and Consed program suite used for the NRC-1 genome has been repeatedly shown to be very reliable for genome sequence assembly [42,43] . Because of the availability of an ordered library and detailed restriction map for pNRC100, ISH elements causing misassemblies were noted and reported in our early assembly of the pNRC100 shotgun sequences [11] . However, we used a very simple but highly effective strategy to circumvent this problem. The simple vector sequence masking function in Phrap allowed successful correct assembly via efficient masking of repetitive elements in sequence assembly. We therefore eliminated the misassembly problem by using a combination of mapping and multistep assembly approaches. We named this method developed to assemble the NRC-1 genome the “hide-and-seek” sequence assembly strategy and also successfully applied it to the Haloarcula marismortui genome in 2004 [6,36] . For the quickly evolving megaplasmids (also called minichromosomes or extrachromosomal replicons) of strain NRC-1, pNRC100 and pNRC200, we concluded the plasmid map and sequence should be accomplished first in our genome project. Thus the pNRC100 restriction map was constructed by extensive restriction mapping of the purified plasmid and its recombinant HindIII fragment clones ) of pNRC100 had been demonstrated by Southern hybridization analysis using rare-cutting restriction enzymes (AflII and SfiI), which cut asymmetrically within the intervening small single-copy and large single-copy regions, respectively, but not within the large IRs. In this regard, the pNRC100 structure resembled some chloroplast and mitochondrial genomes which also contain large IRs [25] . In the 1980s, using pulsed-field gel electrophoresis, we showed the presence of a 35,000- to 38,000-bp inverted repeat (IR) sequence [24] . Inversion isomers ( Fig. 1 [44] . In a deletion derivative of pNRC100 lacking one copy of the IRs, no inversion isomers were observed, indicating that both copies are required for inversion to occur and likely mediated by recombination between the large inverted repeat [31] . These early studies also established the identities and approximate positions of nearly a dozen IS elements in pNRC100, using Southern hybridization and limited nucleotide sequence analysis across the IS element-target site junctions. Four prevalent classes of IS elements were initially found—ISH2, a 0.5-kb element; ISH3, a heterologous family of 1.4-kb elements; ISH4, a 1.0-kb element; and ISH8, a 1.4-kb element—representing a relatively small fraction of all the elements in the genome. The large IRs of pNRC100 terminated at an ISH2 element at one end and an ISH3 element at the other end. Subsequently the pNRC100 sequencing was conducted on shotgun libraries of plasmid and ordered HindIII fragment clones of pNRC100. This sequencing analysis of pNRC100 showed the presence of important genes, such as trxAB coding for thioredoxin and thioredoxin reductase, involved in protein reduction, and cydAB specifying cytochrome oxidase, important in aerobic respiration [11] . However, no rRNA genes were located in the pNRC IRs or elsewhere on the replicons. In the publication that reported the complete sequence of pNRC100 [11] , an evolutionary model for how such a plasmid structure may have arisen, possibly in the laboratory or possibly in nature ( Fig. 1 ), was provided and featured on the cover of Genome Research . For pNRC200 a similar but less laborious approach was undertaken in the mid to late 1990s to map and end sequence cosmid clones of pNRC200, to establish its identity and structure. A HindIII clone library was created for pNRC200 and the structure of this plasmid was validated by sequencing [6,45] . The authors of the R-1 paper write as if they are unaware of the extensive previous work on the NRC-1 plasmids and the assembly strategy used for pNRC100 and the entire NRC-1 genome. Thus, their comments on NRC-1 plasmid sequence fidelity are not justified and should be reevaluated, especially given the extensive published and unpublished work performed to analyze both megaplasmids' structures and sequences. We believe strongly in our published, peer-reviewed assembly of strain NRC-1′s megaplasmids and have already put forth a viable interpretation of the “creation” of pNRC100 from three smaller “hypothetical” plasmids including a segment from the large chromosome. Painstaking work was performed to understand and interpret the genome structure of Halobacterium sp. strain NRC-1, which should not be undervalued as Pfeiffer et al. attempt to do. The application of a comparative genomics approach to enhance gene annotation is a powerful and appropriate strategy for the second genome and underscores the importance of availability of multiple related species in genome analysis. While ORPHEUS may arguably not be the best program for predicting genes, Pfeiffer et al. were able to eliminate most, if not all, of the “spurious” ORFs (6517) it predicted in the R-1 genome by using a comparative genomic and proteomic approach. Without the completion of NRC-1 and the other halophile genomes, the selection of the R-1 ORFs might have been very challenging even with the aid of proteomics data. In contrast, the Glimmer program efficiently predicted most of the Halobacterium sp. NRC-1 genes with a limited number of spurious predictions in the absence of comparative genomic data (see Supplementary Table S5 [18] ). The difficult problem of accurate translation start site prediction for some genes, however, is largely unavoidable and this problem is still present in commonly used gene prediction programs [46–48] . Because NRC-1 was the first completed halophile genome, the advantage of extensive gene curation using a comparative genomic approach was not available at the time of the initial genome sequence, released in 2000. The gene prediction problem is likely not restricted to NRC-1 but to most of the initial genomes sequenced. However, it is the early release of the NRC-1 genome sequence, and not the R-1 sequence, that has nurtured the development of several important postgenomic and systems biology studies [8,10,49–56] . It would be helpful if all the true translation start sites were correctly predicted but it is not an absolute prerequisite for proteome analysis and mapping of the N-terminal peptides by mass spectrometry. Thus the Pfeiffer et al. claim of “identification of N-terminal peptides by proteomics would be hampered if the start codon is misassigned” is moot. A proteomics analysis should not be deferred because the above issue can be easily solved by mapping the MS/MS spectra against six-frame translated protein sequences database, which is already a built-in function in programs such as SEQUEST [57] . Thus it is still feasible to identify the N-terminal peptides and curate the ORF sequences. Regarding the proteome analysis, the statement “a whole class of proteins was discovered that has not been studied in Hbt . salinarium or any other species yet” is controversial. An earlier Halobacterium sp. NRC-1 proteome analysis had identified more than 400 proteins in the membrane and soluble proteomes in 2003 [49] . A subsequent in-depth soluble proteome analysis that identified approximately 888 proteins was documented in 2006 [52] . Given that we wanted to rapidly respond to what we believe are shortcomings in the Pfeiffer et al. R-1 genome sequence report, we have not yet fully analyzed strain R-1 and cannot comment on the validity of genome assembly or annotation. To be cautious, we have not made any comments on the sequence variations between NRC-1 and R-1 strains unless third-party sequences were available. However, we have analyzed Table 2 of the R-1 paper; the readers should be careful about the row “insertion/deletion 3” where “113 additional bases in NRC-1 affecting rRNA promoter region” are described. At least 2 sequences covering this region from ). Because this segment is within the promoter region, it may be useful to reconsider what might have happened to this DNA segment in the R-1 genome assembly. H. halobium NRC817 (Accession No. X03407) and H. cutirubrum (X03285) had been determined and deposited in GenBank. Using the ClustalX2.0.3 program to perform a multiple sequence analysis [58] , it became clear that the “133 additional bases” present in NRC-1 are also conserved in these two sequences ( Fig. 2 As a whole, research groups interested in the biology and evolution of halophilic archaea should be excited and overjoyed to have a second complete genome sequence available to the scientific community. As one such research group, we are pleased with the research opportunities a second Halobacterium sp. genome provides, although we are quite dismayed at the disparaging, subjective, and inappropriate comments regarding the first genome sequence of a haloarchaeon, Halobacterium sp. strain NRC-1. The authors here believe that science should be and is critical, precise, and objective. Here, we have presented these comments and addressed the concerns raised in Pfeiffer et al. regarding the genome assembly procedures used, the gene prediction methodologies, and the proteomic pitfalls for Halobacterium sp. strain NRC-1. As discussed above, the lineage and history of specific haloarchaeal isolates have been muddled and continue to be confusing and points of contention for many labs studying this group of organisms [36] . Specifically, neither we nor Pfeiffer et al. can be certain of Halobacterium sp. strain NRC-1′s origins and therefore we all should withhold judgment regarding the lineages and histories of strains R-1 and NRC-1. Clearly, both genome sequences can be placed within the genus of Halobacterium (based upon nearly identical large chromosome sequences). Apparent also from both genome sequences is that the makeup and genome structure of extrachromosomal replicons are different in these two strains. Whether that is a result of evolution inside or outside a laboratory setting is unknown, but it is certainly not a result of misassembly of the Halobacterium sp. strain NRC-1 genome sequence. We believe that this specific point has already been amply demonstrated and will continue to be borne out through time and additional research in the haloarchaeal community.
PY - 2008/6
Y1 - 2008/6
UR - http://www.scopus.com/inward/record.url?scp=44449095705&partnerID=8YFLogxK
U2 - 10.1016/j.ygeno.2008.04.005
DO - 10.1016/j.ygeno.2008.04.005
M3 - Letter
C2 - 18538726
AN - SCOPUS:44449095705
SN - 0888-7543
VL - 91
SP - 548
EP - 552
JO - Genomics
JF - Genomics
IS - 6
ER -