The sequencing of Nucleic Acids, such as deoxyribonucleic acid (DNA) or Ribonucleic acid (RNA), is a fundamental part of biological discovery. Such sequencing and/or the detection of the same is useful for a variety of purposes and is often used in scientific research, as well as medical advancement. For instance, the genomics and bioinformatics fields are concerned with the application of information technology and computer science to the field of molecular biology. In particular, bioinformatics techniques can be applied to process and analyze various genomic data, such as from an individual so as to determine quantitative and qualitative information about that data that can then be used by various practitioners in the development of diagnostic, prophylactic, and/or therapeutic methods for detecting, preventing, or at least ameliorating diseased states, and thus, improving the safety, quality, and effectiveness of health care. The need for such diagnostic, therapeutic, and prophylactic advancements have led to a high demand for low-cost sequencing, which in turn has driven the development of high-throughput sequencing, termed as Next generation sequencing (NGS).
Generally, the approach to DNA and/or RNA analysis, such as for genetic diagnostics and/or sequencing, involves nucleic acid hybridization and detection. For example, various typical hybridization and detection approaches include the following steps. Particularly, for genetic analysis, a DNA or RNA sample of a subject to be analyzed may be isolated and immobilized on a substrate. In such instances, the immobilized genetic material acts as a template for new nucleic acid synthesis. A probe of a known sequence identity, e.g., a disease marker, may be labeled and washed across the substrate. If the disease marker is present, a binding event will occur, e.g., hybridization, and because the probe has been labeled the hybridization event may either be or not be detected thereby indicating the presence or absence of the disease marker in the subject's sample.
For DNA sequencing, first, an unknown nucleic acid sequence to be identified, e.g., a single-stranded sequence of DNA of a subject, composed of a combination of unknown nucleotides, e.g., As, Cs, Gs, and Ts, is isolated, amplified, and immobilized on the substrate. Next, a known nucleotide labeled with an identifiable tag is contacted with the unknown nucleic acid sequence in the presence of a polymerase. When hybridization occurs, the labeled nucleotide binds to its complementary base in the unknown sequence immobilized on the surface of the substrate. The binding event can then be detected, e.g., optically or electrically. These steps are then repeated until the entire DNA sample has been completely sequenced, e.g., sequencing by synthesis. Typically, these steps are performed by a Next Gen Sequencer wherein thousands to millions of sequences may concurrently be produced in the next-generation sequencing process.
For example, a central challenge in DNA sequencing is assembling full-length genomic sequence data, e.g., of chromosomal sequences, from a sample of genetic material obtained from a subject. Particularly, such assembling includes one or more genomic analysis protocols, such as employing a mapping and/or an aligning algorithm, and involves mapping and aligning a fragment of identified sample sequence to a reference genome, yielding sequence data in a format that can be compared to a reference genomic sequence, such as to determine the variants in the sampled full-length genomic sequences. In particular, the methods employed in sequencing protocols do not produce full-length chromosomal sequences of the sample DNA.
Rather, in a typical sequencing protocol, sequence fragments, typically from 100-1,000 nucleotides in length, are produced without any indication as to where in the genome they map and align. Therefore, in order to generate full-length chromosomal genomic constructs, or determine their variance with respect to a reference genomic sequence, these fragments of DNA sequences need to be mapped, aligned, merged, and/or compared to the reference genomic sequence. Through such processes the variants of the sample genomic sequences from the reference genomic sequences may be determined.
However, as the human genome is comprised of approximately 3.1 billion base pairs, and as each sequence fragment is typically only from about 100 to 500 to 1,000 nucleotides in length, the time and effort that goes into building such full length genomic sequences and determining the variants therein is quite extensive, often requiring the use of several different computer resources applying several different algorithms over prolonged periods of time. In a particular instance, thousands to millions of fragments or even billions of DNA sequences are generated, mapped, aligned, and merged in order to construct a genomic sequence that approximates a chromosome in length. A step in this process may include comparing the sequenced DNA fragments to a reference sequence so as to determine where in the genome the fragments align.
In such instances, the raw genetic material must be processed so as to derive usable genetic sequence data therefrom. This processing may be done manually or via an automated sequencer. Typically, such processing involves obtaining a biological sample from a subject, such as through venipuncture, hair, etc. and treating the sample to isolate the DNA therefrom. Once isolated the DNA may be denatured, strand separated, and/or portions of the DNA may then be multiplied, e.g., via polymerase chain reaction (PCR), so as to build a library of replicated strands that are now ready to be sequenced, e.g., read, such as by an automated sequencer, which sequencer is configured to “read” the replicate strands, e.g., by synthesis, and thereby determine the nucleotide sequences that makes up the DNA. Further, in various instances, such as in building the library of replicated strands, it may be useful to provide for over-coverage when preprocessing a given portion of the DNA. To perform this over-coverage, e.g., using PCR, may require increased sample preparation resources and time, and therefore be more expensive, but it often gives an enhanced probability of the end result being more accurate.
More particularly, once the library of replicated strands has been generated they may be injected into an automated sequencer that may then “read” the strands, such as by synthesis, so as to determine the nucleotide sequences thereof. For instance, the replicated single stranded DNA may be attached to a glass bead and inserted into a test vessel, e.g., an array. All the necessary components for replicating its complementary strand, including labeled nucleotides, are also added to the vessel but in a sequential fashion. For example, all labeled “A”, “C”, “G”, and “T's” are added, either one at a time or all together to see which of the nucleotides is going to bind at position one. After each addition a light, e.g., a laser, is shone on the array. If the composition fluoresces then an image is produced indicating which nucleotide bound to the subject location. More particularly, where the nucleotides are added one at a time, if a binding event occurs, then its indicative fluorescence will be observed. If a binding event does not occur, the test vessel may be washed and the procedure repeated until the appropriate one of the four nucleotides binds to its complement at the subject location, and its indicative fluorescence is observed. Where all four nucleotides are added at the same time, each may be labeled with a different fluorescent indicator, and the nucleotide that binds to its complement at the subject position may be determined, such as by the color of its fluorescence. This greatly accelerates the synthesis process.
Once a binding event has occurred, the complex is then washed and the synthesis steps are repeated for position two. For example, a marked nucleotide “A” may be added to the mix to determine if the complement at the position is a “T”, and if so, all the sequences having that complement will bind to the labeled “T” and will therefore fluoresce, and the samples will all be washed. Where the binding happened the bound nucleotide is not washed away, and then this will be repeated for all positions until all the over-sampled nucleic acid segments, e.g., reads, have been sequenced and the data collected. Alternatively, where all four nucleotides are added at the same time, each labeled with a different fluorescent indicator, only one nucleotide will bind to its complement at the subject position, and the others will be washed away, such that after the vessel has been washed, a laser may be shone on the vessel and which nucleotide bound to its complement may be determined, such as by the color of its fluorescence. This continues until the entire strand has been replicated in the vessel.
A typical length of a sequence replicated in this manner is from about 100 to about 500 base pairs, such as between 150 to about 400 base pairs, including from about 200 to about 350 base pairs, such as about 250 base pairs to about 300 base pairs dependent on the sequencing protocol being employed. Further, the length of these segments may be predetermined, e.g., engineered, to accord with any particular sequencing machinery and/or protocol by which it is run. The end result is a readout, or read, that is comprised of a replicated DNA segment, e.g., from about 100 to about 1,000 nucleotides in length, that has been labeled in such a manner that every nucleotide in the sequence, e.g., read, is known because of its label. Hence, since the human genome is comprised of about 3.1 billion base pairs, and various known sequencing protocols usually result in labeled replicated sequences, e.g., reads, from about 100 or 101 bases to about 250 or about 300 or about 400 bases, the total amount of segments that need to be sequenced, and consequently the total number of reads generated, can be anywhere from about 10,000,000 to about 40,000,000, such as about 15,000,000 to about 30,000,000, dependent on how long the label replicated sequences are. Therefore, the sequencer may typically generate about 30,000,000 reads, such as where the read length is 100 nucleotides in length, so as to cover the genome once.
However, in part, due to the need for the use of optically detectable, e.g., fluorescent, labels in the sequencing reactions being performed, the required instrumentation for performing such high throughput sequencing is bulky, costly, and not portable. For this reason, a number of new approaches for direct, label-free detection of DNA hybridization reactions have been proposed. For instance, among the new approaches are detection methods that are based on the use of various electronic analytic devices. Such direct electronic detection methods have several advantages over the typical NGS platform. For example, the detector may be incorporated in the substrate itself, such as employing a biosystem-on-a-chip device, such as a complementary metal oxide semiconductor device, “CMOS”. More particularly, in using a CMOS device in genetic detection, the output signal representative of a hybridization event can be directly acquired and processed on a microchip. In such an instance, automatic recognition is theoretically achievable in real time and at a lower cost than is currently achievable using NGS processing. Moreover, standard CMOS devices may be employed for such electronic detection making the process simple, inexpensive, and portable.
However, in order for next-generation sequencing to become widely used as a diagnostic in the healthcare industry, sequencing instrumentation will need to be mass produced with a high degree of quality and economy. One way to achieve this is to recast DNA sequencing in a format that fully leverages the manufacturing base created for computer chips, such as complementary metaloxide semiconductor (CMOS) chip fabrication, which is the current pinnacle of large scale, high quality, low-cost manufacturing of high technology. To achieve this, ideally the entire sensory apparatus of the sequencer could be embodied in a standard semiconductor chip, manufactured in the same fab facilities used for logic and memory chips. Recently, such a sequencing chip, and the associated sequencing platform, has been developed and commercialized by Ion Torrent, a division of Thermo-Fisher, Inc. The promise of this idea has not been realized commercially due to the fundamental limits of applying a metal oxide semiconductor field effect transistor, or MOSFET, as a biosensor. When a MOSFET is used in solution as a biosensor, it is referred to as an ISFET. A particular limitation, however, includes a lack of sensor sensitivity and signal to noise characteristics as the semiconductor node scales down to lower geometries of the transistor (gate length).
More particularly, a field effect transistor, FET, typically includes a source electrode and a drain electrode together forming a gate, and further including a channel region connecting the source and drain electrodes. The FET may also include an insulating barrier separating the gate from the channel. The operation of a conventional FET relies on the control of the channel conductivity, and thus the drain current, by a voltage, VGS, applied between the gate and source. For high-speed applications, and for the purposes of increasing sensor sensitivity, FETs should respond quickly to variations in VGS. However, this requires short gates and fast carriers in the channel.
Unfortunately, FETs with short gates frequently suffer from degraded electrostatics and other problems (collectively known as short channel effects), such as threshold-voltage roll-off, drain-induced barrier lowering, and impaired drain-current saturation, results in a decrease in sensor sensitivity. However, scaling theory predicts that a FET with a thin barrier and a thin gate-controlled region (measured in the vertical direction) will be robust against short-channel effects down to very short gate lengths (measured in the horizontal direction). Nevertheless, these effects make the use of such technologies difficult to employ in sequencing reactions.
Accordingly, the possibility of having channels that are very thin in the vertical dimension would allow for high-speed transmission of carriers as well as for increased sensor sensitivity and accuracy. What is needed, therefore, is a FET device that is configured in such a manner as to include a shorter gate than is currently achievable in present FET applications, which will allow such technologies to be fully deployed in sequencing reactions. Hence, a solution that includes such a FET device designed for use in biological applications, such as for nucleic acid sequencing and/or genetic diagnostics would especially be beneficial.