|
|
Line 1: |
Line 1: |
| In [[genetics]], '''shotgun sequencing''', also known as '''shotgun cloning''', is a method used for [[sequencing]] long [[DNA]] strands. It is named by analogy with the rapidly expanding, quasi-random firing pattern of a [[shotgun]].
| | Nice to satisfy you, my name is Refugia. Puerto Rico is where he and his spouse reside. To gather badges is what her family members and her enjoy. Bookkeeping is her day occupation now.<br><br>My blog home std test ([http://www.huboston.com/members/demetdonato/activity/166080/ Recommended Webpage]) |
| | |
| Since the [[chain termination method]] of [[DNA sequencing]] can only be used for fairly short strands (100 to 1000 basepairs), longer sequences must be subdivided into smaller fragments, and subsequently re-assembled to give the overall sequence. Two principal methods are used for this: [[chromosome walking]], which progresses through the entire strand, piece by piece, and shotgun sequencing, which is a faster but more complex process, and uses random fragments.
| |
| | |
| In shotgun sequencing,<ref name="Staden">{{cite journal
| |
| | last = Staden
| |
| | first = R
| |
| | coauthors =
| |
| | title = A strategy of DNA sequencing employing computer programs
| |
| | journal = Nucleic Acids Research
| |
| | volume = 6
| |
| | issue = 7
| |
| | pages = 2601–10
| |
| | year = 1979
| |
| | pmid = 461197
| |
| | doi = 10.1093/nar/6.7.2601
| |
| | pmc = 327874}}</ref><ref>{{cite journal
| |
| | last = Anderson
| |
| | first = S
| |
| | coauthors =
| |
| | title = Shotgun DNA sequencing using cloned DNase I-generated fragments
| |
| | journal = Nucleic Acids Research
| |
| | volume = 9
| |
| | issue = 13
| |
| | pages = 3015–27
| |
| | year = 1981
| |
| | pmid = 6269069
| |
| | doi = 10.1093/nar/9.13.3015
| |
| | pmc = 327328}}</ref>
| |
| DNA is broken up randomly into numerous small segments, which are sequenced using the chain termination method to obtain ''reads''. Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. Computer programs then use the overlapping ends of different reads to assemble them into a continuous sequence.<ref name="Staden" />
| |
| | |
| Shotgun sequencing was one of the precursor technologies that was responsible for enabling [[full genome sequencing]].
| |
| | |
| ==Example==
| |
| For example, consider the following two rounds of shotgun reads:
| |
| {| class="wikitable"
| |
| |-
| |
| ! Strand
| |
| ! Sequence
| |
| |-
| |
| | Original
| |
| | <code><big>AGCATGCTGCAGTCATGCTTAGGCTA</big></code>
| |
| |-
| |
| | First shotgun sequence
| |
| | <code><big>AGCATGCTGCAGTCATGCT-------<br/>-------------------TAGGCTA</big></code>
| |
| |-
| |
| | Second shotgun sequence
| |
| | <code><big>AGCATG--------------------<br/>------CTGCAGTCATGCTTAGGCTA</big></code>
| |
| |-
| |
| | Reconstruction
| |
| | <code><big>AGCATGCTGCAGTCATGCTTAGGCTA</big></code>
| |
| |}
| |
| | |
| In this extremely simplified example, none of the reads cover the full length of the original sequence, but the four reads can be assembled into the original sequence using the overlap of their ends to align and order them. In reality, this process uses enormous amounts of information that are rife with ambiguities and sequencing errors. Assembly of complex genomes is additionally complicated by the great abundance of [[Repeated sequence (DNA)|repetitive sequence]], meaning similar short reads could come from completely different parts of the sequence.
| |
| | |
| Many overlapping reads for each segment of the original DNA are necessary to overcome these difficulties and accurately assemble the sequence. For example, to complete the [[Human Genome Project]], most of the human genome was sequenced at 12X or greater ''coverage''; that is, each base in the final sequence was present, on average, in 12 reads. Even so, current methods have failed to isolate or assemble reliable sequence for approximately 1% of the ([[Euchromatin|euchromatic]]) human genome.{{Citation needed|date=November 2011}}
| |
| | |
| ==Whole genome shotgun sequencing==
| |
| Whole genome shotgun sequencing for small (4000 to 7000 basepair) genomes was already in use in 1979.<ref name=Staden /> Broader application benefited from [[DNA_sequencing_theory#Pairwise_end-sequencing|pairwise end sequencing]], known colloquially as ''double-barrel shotgun sequencing''. As sequencing projects began to take on longer and more complicated DNAs, multiple groups began to realize that useful information could be obtained by sequencing both ends of a fragment of DNA. Although sequencing both ends of the same fragment and keeping track of the paired data was more cumbersome than sequencing a single end of two distinct fragments, the knowledge that the two sequences were oriented in opposite directions and were about the length of a fragment apart from each other was valuable in reconstructing the sequence of the original target fragment. The first published description of the use of paired ends was in 1990<ref>{{cite journal
| |
| | last = Edwards
| |
| | first = A
| |
| | coauthors = Caskey, T
| |
| | title = Closure strategies for random DNA sequencing
| |
| | journal = Methods: A Companion to Methods in Enzymology
| |
| | volume = 3
| |
| | issue = 1
| |
| | pages = 41–47
| |
| | year = 1991
| |
| | doi = 10.1016/S1046-2023(05)80162-8 }}</ref>
| |
| as part of the sequencing of the human [[Hypoxanthine-guanine phosphoribosyltransferase|HGPRT]] locus, although the use of paired ends was limited to closing gaps after the application of a traditional shotgun sequencing approach. The first theoretical description of a pure pairwise end sequencing strategy, assuming fragments of constant length, was in 1991.<ref>{{cite journal
| |
| | last = Edwards
| |
| | first = A
| |
| | coauthors = Voss, H.; Rice, P.; Civitello, A.; Stegemann, J.; Schwager, C.; Zimmerman, J.; Erfle, H.; Caskey, T.; Ansorge, W.
| |
| | title = Automated DNA sequencing of the human HPRT locus
| |
| | journal = Genomics
| |
| | volume = 6
| |
| | pages = 593–608
| |
| | year = 1990
| |
| | pmid = 2341149
| |
| | doi = 10.1016/0888-7543(90)90493-E
| |
| | issue = 4 }}
| |
| </ref> At the time, there was community consensus that the optimal fragment length for pairwise end sequencing would be three times the sequence read length. In 1995 Roach et al.<ref>{{cite journal
| |
| | last = Roach
| |
| | first = JC
| |
| | coauthors = Boysen, C; Wang, K; Hood, L
| |
| | title = Pairwise end sequencing: a unified approach to genomic mapping and sequencing
| |
| | journal = Genomics
| |
| | volume = 26
| |
| | pages = 345–353
| |
| | year = 1995
| |
| | pmid = 7601461
| |
| | doi = 10.1016/0888-7543(95)80219-C
| |
| | issue = 2 }}</ref>
| |
| introduced the innovation of using fragments of varying sizes, and demonstrated that a pure pairwise end-sequencing strategy would be possible on large targets. The strategy was subsequently adopted by [[The Institute for Genomic Research]] (TIGR) to sequence the genome of the bacterium ''[[Haemophilus influenzae]]'' in 1995,<ref>{{cite journal
| |
| | last = Fleischmann
| |
| | first = RD
| |
| | coauthors = et al.
| |
| | title = Whole-genome random sequencing and assembly of Haemophilus influenzae Rd
| |
| | journal = Science
| |
| | volume = 269
| |
| | issue = 5223
| |
| | pages = 496–512
| |
| | year = 1995
| |
| | pmid = 7542800
| |
| | doi = 10.1126/science.7542800 |bibcode = 1995Sci...269..496F }}</ref> and then by [[Celera Genomics]] to sequence the ''[[Drosophila melanogaster]]'' (fruit fly) genome in 2000,<ref>{{cite journal
| |
| | last = Adams
| |
| | first = MD
| |
| | coauthors = et al.
| |
| | title = The genome sequence of Drosophila melanogaster
| |
| | journal = Science
| |
| | volume = 287
| |
| | issue = 5461
| |
| | pages = 2185–95
| |
| | year = 2000
| |
| | pmid = 10731132 | doi = 10.1126/science.287.5461.2185
| |
| | bibcode=2000Sci...287.2185.}}</ref>
| |
| and subsequently the human genome. | |
| | |
| To apply the strategy, high-molecular-weight DNA is sheared into random fragments, size-selected (usually 2, 10, 50, and 150 kb), and [[clone (genetics)|clone]]d into an appropriate [[vector DNA|vector]]. The clones are then sequenced from both ends using the [[chain termination method]] yielding two short sequences. Each sequence is called an ''end-read'' or ''read'' and two reads from the same clone are referred to as ''[[Paired-end Tags|mate pairs]]''. Since the chain termination method usually can only produce reads between 500 and 1000 bases long, in all but the smallest clones, [[Paired-end Tags|mate pairs]] will rarely overlap. | |
| | |
| The original sequence is reconstructed from the reads using sequence assembly [[software]]. First, overlapping reads are collected into longer composite sequences known as ''[[contig]]s''. Contigs can be linked together into ''scaffolds'' by following connections between [[Paired-end Tags|mate pairs]]. The distance between contigs can be inferred from the [[Paired-end Tags|mate pair]] positions if the average fragment length of the library is known and has a narrow window of deviation. Depending on the size of the gap between contigs, different techniques can be used to find the sequence in the gaps. If the gap is small (5-20kb) then the use of PCR to amplify the region is required, followed by sequencing. If the gap is large (>20kb) then the large fragment is cloned in special vectors such as BAC ([[Bacterial artificial chromosome]]s) followed by sequencing of the vector.
| |
| | |
| Proponents of this approach argue that it is possible to sequence the whole [[genome]] at once using large arrays of sequencers, which makes the whole process much more efficient than more traditional approaches. Detractors argue that although the technique quickly sequences large regions of DNA, its ability to correctly link these regions is suspect, particularly for genomes with repeating regions. As [[sequence assembly]] programs become more sophisticated and computing power becomes cheaper, it may be possible to overcome this limitation.{{Citation needed|date=February 2007}}
| |
| | |
| ===Coverage===
| |
| | |
| Coverage (read depth or depth) is the average number of reads representing a given [[nucleotide]] in the reconstructed sequence. It can be calculated from the length of the original genome (''G''), the number of reads(''N''), and the average read length(''L'') as <math>N\times L/G</math>. For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2x redundancy. This parameter also enables one to estimate other quantities, such as the percentage of the genome covered by reads (sometimes also called coverage). A high coverage in shotgun sequencing is desired because it can overcome errors in base calling and assembly. The subject of [[DNA sequencing theory]] addresses the relationships of such quantities.
| |
| | |
| Sometimes a distinction is made between sequence coverage and physical coverage. Sequence coverage is the average number of times a base is read (as described above). Physical coverage is the average number of times a base is read or spanned by mate paired reads.<ref name="MeyersonFig1">{{cite doi|10.1038/nrg2841}}</ref>
| |
| | |
| ==Hierarchical Shotgun sequencing==
| |
| | |
| [[File:Whole genome shotgun sequencing versus Hierarchical shotgun sequencing.png|thumb| In whole genome shotgun sequencing (top), the entire genome is sheared randomly into small fragments (appropriately sized for sequencing) and then reassembled. In hierarchical shotgun sequencing (bottom), the genome is first broken into larger segments. After the order of these segments is deduced, they are further sheared into fragments appropriately sized for sequencing.]]
| |
| Although shotgun sequencing can in theory be applied to a genome of any size, its direct application to the sequencing of large genomes (for instance, the [[Human Genome]]) was limited until the late 1990s, when technological advances made practical the handling of the vast quantities of complex data involved in the process.<ref name="genome sequencing">Dunham, I. ''Genome Sequencing''. Encyclopedia of Life Sciences, 2005. {{doi|10.1038/npg.els.0005378}}</ref> Historically, full-genome shotgun sequencing was believed to be limited by both the sheer size of large genomes and by the complexity added by the high percentage of repetitive DNA (greater than 50% for the human genome) present in large genomes.<ref name="venter">Venter, J. C. ‘’Shotgunning the Human Genome: A Personal View.’’ Encyclopedia of Life Sciences, 2006.</ref> It was not widely accepted that a full-genome shotgun sequence of a large genome would provide reliable data. For these reasons, other strategies that lowered the computational load of sequence assembly had to be utilized before shotgun sequencing was performed.<ref name="venter" />
| |
| In hierarchical sequencing, also known as top-down sequencing, a low-resolution [[Gene mapping#Physical Mapping|physical map]] of the genome is made prior to actual sequencing. From this map, a minimal number of fragments that cover the entire chromosome are selected for sequencing.<ref name="textbook">Gibson, G. and Muse, S. V. ''A Primer of Genome Science''. 3rd ed. P.84</ref> In this way, the minimum amount of high-throughput sequencing and assembly is required.
| |
| The amplified genome is first sheared into larger pieces (50-200kb) and cloned into a bacterial host using [[Bacterial artificial chromosome|BACs]] or [[P1-derived artificial chromosome|PACs]]. Because multiple genome copies have been sheared at random, the fragments contained in these clones have different ends, and with enough coverage (see section above) finding a '''scaffold''' of [[Contig#BAC contigs|BAC contigs]] that covers the entire genome is theoretically possible. This scaffold is called a '''tiling path'''.[[File:Tiling path.png|thumb|A BAC contig that covers the entire genomic area of interest makes up the tiling path.]] Once a tiling path has been found, the BACs that form this path are sheared at random into smaller fragments and can be sequenced using the shotgun method on a smaller scale.
| |
| Although the full sequences of the BAC contigs is not known, their orientations relative to one another are known. There are several methods for deducing this order and selecting the BACs that make up a tiling path. The general strategy involves identifying the positions of the clones relative to one another and then selecting the least number of clones required to form a contiguous scaffold that covers the entire area of interest. The order of the clones is deduced by determining the way in which they overlap.<ref name="genome map">Dear, P. H. ''Genome Mapping''. Encyclopedia of Life Sciences, 2005. {{doi|10.1038/npg.els.0005353}}.</ref> Overlapping clones can be identified in several ways. A small radioactively or chemically labeled probe containing a [[sequence-tagged site]] (STS) can be hybridized onto a microarray upon which the clones are printed.<ref name="genome map" /> In this way, all the clones that contain a particular sequence in the genome are identified. The end of one of these clones can then be sequenced to yield a new probe and the process repeated in a method called chromosome walking. Alternatively, the BAC [[BAC library#Genomic libraries|library]] can be restriction-digested. Two clones that have several fragment sizes in common are inferred to overlap because they contain multiple similarly spaced restriction sites in common.<ref name="genome map" /> This method of genomic mapping is called restriction fingerprinting because it identifies a set of restriction sites contained in each clone. Once the overlap between the clones has been found and their order relative to the genome known, a scaffold of a minimal subset of these contigs that covers the entire genome is shotgun-sequenced.<ref name="textbook" />
| |
| Because it involves first creating a low-resolution map of the genome, hierarchical shotgun sequencing is slower than whole-genome shotgun sequencing but relies less heavily on computer algorithms for genome assembly than whole-genome shotgun sequencing. The process of extensive BAC library creation and tiling path selection, however, make hierarchical shotgun sequencing slow and labor intensive. Now that the technology is available and the reliability of the data demonstrated,<ref name="venter" /> the speed and cost efficiency of whole-genome shotgun sequencing has made it the primary method for genome sequencing.
| |
| | |
| ==Shotgun and Next-generation sequencing==
| |
| The classical shotgun sequencing was based on the Sanger sequencing method: this was the most advanced technique for sequencing genomes from about 1995–2005. The shotgun strategy is still applied today, however using other sequencing technologies, called [[DNA_sequencing#Next-generation_methods|next-generation sequencing]]. These technologies produce shorter reads (anywhere from 25–500bp) but many hundreds of thousands or millions of reads in a relatively short time (on the order of a day).<ref>{{cite journal
| |
| | last = Karl
| |
| | first = V
| |
| | coauthors = et al
| |
| | title = Next Generation Sequencing: From Basic Research to Diagnostics
| |
| | journal = Clinical Chemistry
| |
| | volume = 55
| |
| | issue = 4
| |
| | pages = 41–47
| |
| | year = 2009
| |
| | pmid = 19246620
| |
| | doi = 10.1373/clinchem.2008.112789 }}</ref>
| |
| This results in high coverage, but the assembly process is much more computationally expensive. These technologies are vastly superior to Sanger sequencing due to the high volume of data and the relatively short time it takes to sequence a whole genome.<ref>{{cite journal
| |
| | last = Metzker
| |
| | first = Michael L.
| |
| | title = Sequencing technologies - the next generation
| |
| | journal = Nat Rev Genet
| |
| | volume = 11
| |
| | issue = 1
| |
| | pages = 31–46
| |
| | year = 2010
| |
| | pmid = 19997069
| |
| | doi = 10.1038/nrg2626 }}</ref>
| |
| | |
| ==See also==
| |
| *[[DNA sequencing theory]]
| |
| | |
| ==References==
| |
| {{Reflist|30em}}
| |
| | |
| ===Further reading===
| |
| {{Refbegin}}
| |
| *{{cite web | title=Shotgun sequencing comes of age | work=The Scientist | url=http://www.the-scientist.com/news/20021231/06 | accessdate=December 31, 2002}}
| |
| *{{cite web | title=Shotgun sequencing finds nanoorganisms - Probe of acid mine drainage turns up unsuspected virus-sized Archaea
| |
| | work=SpaceRef.com| url=http://www.spaceref.com/news/viewpr.html?pid=21532
| |
| | accessdate=December 23, 2006}}
| |
| *{{cite web | title=Genomic shotgun sequencing | work=biology science | url=http://www.cd-genomics.com/gene/shotgun.htm | accessdate=April 11, 2009}}
| |
| {{Refend}}
| |
| | |
| ==External links==
| |
| {{NCBI-handbook}}
| |
| | |
| {{DEFAULTSORT:Shotgun Sequencing}}
| |
| [[Category:Molecular biology]]
| |
| [[Category:DNA sequencing]]
| |