Field of the Disclosure
The present disclosure generally relates to field effect transistors and methods of making and using the same for sequencing, diagnostics, and bioinformatics processing. More specifically, the present disclosure relates to one-dimensional and two-dimensional field effect transistors useful for chemical and biological analysis.
Description of the Related Art
The detection and sequencing of nucleic acids, such as deoxyribonucleic acid (DNA), is a fundamental part of biological discovery. Detection and/or sequencing are useful for a variety of purposes, and are often used in scientific research, drug discovery, medical diagnostics, and in the prevention, monitoring, and treatment of disease. For instance, the genomics and bioinformatics fields, which rely on nucleic acid detection and sequencing techniques, are concerned with the application of information technology and computer science to the field of molecular biology. In particular, bioinformatics techniques can be applied to process and analyze various genomic data, such as from an individual so as to determine qualitative and quantitative information about that data that can then be used by various practitioners in the development of diagnostic, prophylactic, and/or therapeutic methods and products for detecting, preventing, treating, or at least ameliorating disease states, thus improving the safety, quality, and effectiveness of health care. The need for such diagnostic, therapeutic, and prophylactic advancements has led to a high demand for low-cost nucleic acid detection and sequencing methods, devices, and reagents, which in turn have driven, for example, the development of high-throughput sequencing, termed as Next Generation Sequencing (NGS).
Generally, the approach to DNA analysis, such as for genetic diagnostics and/or sequencing, involves nucleic acid hybridization and detection. For example, various conventional hybridization and detection approaches include the following steps. For genetic analysis, an RNA or DNA sample obtained from a subject to be analyzed is isolated and immobilized on a substrate. A detectable probe of a known genetic sequence, e.g., having a nucleotide sequence that corresponds to a disease marker (e.g., a marker evidencing a bacterial, fungal, or viral infection, a single nucleotide polymorphism (SNP) associated with a particular disease such as cancer, an autoimmune disease, etc.) is then added to the substrate, typically in a reaction mixture containing the requisite reagents to allow the probe to interact with its target, if present in the sample. If the disease marker is present, a binding event, e.g., hybridization, will occur and because the probe is detectable (e.g., via the inclusion in the probe of a detectable label such as a fluorescent dye), the hybridization event can either be or not be detected, thereby indicating the presence or absence of the disease marker in the subject's sample.
For DNA/RNA sequencing and/or detection, first, an unknown nucleic acid sequence to be identified, e.g., a single-stranded sequence of DNA/RNA from a subject, is isolated, amplified, and immobilized on a substrate. Next, in the presence of a primer complementary to a portion of the isolated nucleic acid sequence to be sequenced and/or identified, (preferably labeled) nucleotides, and a suitable DNA polymerase, a nucleic acid sequencing and/or detection reaction may take place. In such an instance, where the primer recognizes a corresponding sequence of the isolated and/or bound nucleic acid sequence, the polymerase can begin to add one or more labeled nucleotides to extend the primer in the presence of the unknown nucleic acid sequence, using the unknown nucleic acid sequence as the template. When the primer is extended, the most recently added labeled nucleotide, which hybridizes via hydrogen-bonding to its complementary base in the unknown sequence immobilized on the surface of the substrate, the most recent nucleotide's addition can then be detected, e.g., optically or electrically. These steps are then repeated until the entire DNA/RNA molecule has been completely sequenced. Typically, these steps are performed on a Next Gen Sequencer wherein thousands to millions of DNA fragments can be sequenced concurrently in the NGS process.
As will be appreciated, a central challenge in DNA sequencing based on the sequencing of numerous short DNA fragments is assembling full-length genomic sequences, e.g., chromosomal sequences, from a sample of genetic material, as the sequencing methods used in NGC processes do not produce full-length gene or chromosomal sequences from the sample DNA that can then be used for a desired genetic analysis, e.g., SNP genotyping, assessment of genetic variation or identity between the subject's sample and a reference gene, genome, etc. Rather, sequence fragments, typically from 100-1,000 nucleotides in length, are produced without any indication as to where in the genome they reside. Therefore, in order to generate full-length gene or chromosomal genomic constructs, or determine variants with respect to a reference genomic sequence, such DNA sequence fragments need to be mapped, aligned, merged, and/or compared to a reference genomic sequence. Through such processes the variants of the sample genomic sequences from the reference genomic sequences may be determined by suitable bioinformatics approaches, such as by implementing a suitable variant calling application.
Even so, as the human genome comprises approximately 3.1 billion base pairs, and as each sequence fragment in an NGS process is typically only from 100 to 500 to 1,000 nucleotides in length, the time and effort that goes into building full-length genomic sequences and determining the genetic variants therein is quite extensive, often requiring the use of several different computer resources applying several different algorithms over prolonged periods of time. This is because in a given NGS analysis, thousands, millions, or even billions of DNA sequences are generated, which sequences must then be aligned and merged in order to construct a genomic sequence that approximates a chromosome or genome in size. A step in this process often includes comparing the DNA fragment sequences to a reference sequence to determine where in the genome the fragments reside.
In order to perform an NGS analysis, genetic material from a subject must be pre-processed. This preprocessing may be done manually or via an automated sequencer. Typically, preprocessing involves obtaining a biological sample from a subject, such as through venipuncture (blood, plasma, serum), buccal swab, urine, saliva, etc., and treating the sample to isolate the DNA therefrom. Once isolated, the DNA is then fragmented and denatured. The DNA (or portions thereof) may then be amplified, e.g., via polymerase chain reaction (PCR), so as to build a library of replicated strands that are now ready to be sequenced, such as by an automated sequencer. The sequencing machine is configured to sequence the amplified DNA strands, e.g., by synthesis of new, complementary strands that include labeled nucleotides, from which the nucleotide sequences that make up the DNA in the sample can be determined.
Further, in various instances, such as in building the library of amplified strands, it may be useful to provide for over-coverage or over-representation when preprocessing a given portion of the DNA. To provide this over-representation, increased sample preparation may be required, thus making the process more expensive, although such steps often yield an enhanced probability of the end result being more accurate.
Once a library of amplified DNA strands has been generated, the strands may be injected into an automated sequencer that can then determine the nucleotide sequences of the strands, such as by synthesis. For instance, amplified single-stranded DNA can be attached to a nano- or micro-bead and inserted into a test vessel, e.g., an array. All the necessary components for synthesis of its complementary strand, including labeled nucleotides (for adenine (A), cytosine (C), guanine (G), and thymine (T)), are also added to the vessel but in a sequential fashion. In some instances, one or more the nucleotides, e.g., “A”, “C”, “G”, and “T's” that are added may be configured so as to be reversible terminators, e.g., such that once incorporated into a growing strand being synthesized cause the synthesis reaction for that particular strand to be terminated at that point of incorporation, thereby producing several strands of terminated sequences that collectively represent the entire template nucleic acid sequence. Hence, in performing a nucleic acid synthesis or detection reaction all of the necessary nucleotide reactants are added, either one at a time or all together, to see which of the nucleotides is used to extend a primer molecule.
Particularly, after each addition, unincorporated nucleotides are washed away and a light, e.g., a laser, is then shone on the array. If the reaction fluoresces, that fluorescence can be detected, thereby indicating which nucleotide has been added and, due to the nature of the genetic code, which complementary nucleotide was present in the template DNA fragment in the subject location. In processes where labeled nucleotides are added one at a time, if extension occurs, then it's indicative fluorescence will be observed. If extension does not occur, the test vessel may be washed and the procedure repeated until the appropriate one of the four nucleotides binds to its complement and is incorporated by the polymerase into the growing DNA strand at the subject location such that its indicative fluorescence can be detected.
Where all four reversible terminator nucleotides are added at the same time, each may be labeled with a different fluorescent indicator; when the complementary labeled nucleotide binds to its complement in the template DNA strand such that it is then added by the polymerase during the elongation step, the identity of the added, labeled nucleotide at the subject position can then be determined, such as by the color of its fluorescence. As will be appreciated, the use of all four labeled nucleotides in a given reaction greatly accelerates the synthesis process.
After each elongation reaction, the complex is then washed and the synthesis steps are repeated for the next position. This process of elongation and detection is then repeated for all nucleotides for as many positions as are present in the input DNA fragments or for so long as the sequencing machine directs (e.g., 100, 500, 1,000, or more cycles), thereby generating “sequence reads” of the over-sampled nucleic acid segments. The resulting sequence data is collected.
Usually a typical length of a sequence replicated in this manner is from about 100 to about 500 or about 1000 base pairs, such as between 150 to about 400 base pairs, including from about 200 to about 350 base pairs, such as about 250 base pairs to about 300 base pairs dependent on the sequencing protocol being employed. Further, the length of these segments may be predetermined, e.g., engineered, to accord with any particular sequencing machinery and/or protocol by which it is run. In any event, the end result is a readout, or “read”, that is comprised of an extended DNA fragment synthesized from an input DNA fragment.
Extended DNA fragments typically range from about 100 to about 1,000 nucleotides in length, and each nucleotide is labeled in such a manner that every nucleotide in the sequence can be identified because of its label. Hence, since the human genome is comprised of about 3.1 billion base pairs, and various known sequencing protocols usually result in labeled replicated sequences, e.g., reads, from about 100 or 101 bases to about 250 or about 300 or about 400 bases, the total number of segments that need to be sequenced, and consequently the total number of reads generated for single read coverage can be anywhere from about 10,000,000 to about 40,000,000, such as about 15,000,000 to about 30,000,000, dependent on how long the label replicated sequences are.
Therefore, the sequencer may typically generate about 30,000,000 reads, such as where the read length is 100 nucleotides in length, so as to cover the genome once. However, to ensure the accuracy of a particular base call (e.g., A, C, G, or T) at a particular nucleotide position, it is desirable that copies of each fragment in a sample be sequenced 5, 10, 20, 30, or more times, in some cases up to 500 or more times. Such over-sampling thus results in even more reads, thereby requiring more analysis. Fragment amplification in the pre-processing phase helps to facilitate such redundancy.
However, in part, due to the need for the use of optically detectable, e.g., fluorescent, labels in the sequencing reactions being performed, the required instrumentation for performing such high throughput sequencing is bulky, costly, and not portable. For this reason, a number of new approaches for direct, label-free DNA sequencing have been proposed. For instance, among the new approaches are detection methods that are based on the use of various electronic analytic devices. Such direct electronic detection methods have several advantages over the conventional NGS platform. For example, the detector may be incorporated in the substrate itself, such as employing a biosystem-on-a-chip device, such as a complementary metal oxide semiconductor (“CMOS”) device.
More particularly, in using a CMOS device in genetic detection, the output signal representative of a nucleotide's addition in a DNA sequencing reaction can be directly acquired and processed on a microchip. In such an instance, automatic recognition is achievable in real time and at a lower cost than is currently achievable using conventional NGS processes and equipment. Moreover, standard CMOS devices may be employed for such electronic detection, making the process simple, inexpensive, and portable.
Particularly, in order for NGS methods to become widely used for diagnostic and therapeutic applications in the healthcare industry, sequencing instrumentation will need to be mass produced with a high degree of quality and economy. One way to achieve this is to recast DNA sequencing in a format that fully leverages the manufacturing base created for computer chips, such as CMOS chip fabrication, which is the current pinnacle of high technology large scale, high quality, low-cost manufacturing. To achieve this, ideally, the entire sensory apparatus of the sequencing device should be embodied in a standard semiconductor chip, manufactured in the same fabrication (“Fab”) facilities used for logic and memory chips. Recently, such a sequencing chip, and the associated sequencing platform, has been developed and commercialized by Ion Torrent, a division of Thermo-Fisher, Inc. The promise of this idea has not been realized commercially, however, due to the fundamental limits of applying a metal oxide semiconductor field effect transistor, or MOSFET, as a biosensor. In particular, when a MOSFET is used in solution as a biosensor, it is referred to as an ISFET (ion sensitive field effect transistor). Particular limitations of ISFET devices include a lack of sensor sensitivity and signal-to-noise characteristics as the semiconductor node scales down to lower geometries of the transistor (gate length).
As is known, a field effect transistor (FET) typically includes a gate, a channel region connecting source and drain electrodes, and an insulating barrier separating the gate from the channel. The operation of a conventional FET relies on the control of the channel's conductivity, and thus the drain current, by a voltage, designated VGS, applied between the gate and source. For high-speed applications, and for the purposes of increasing sensor sensitivity, FETs should respond quickly to variations in VGS. However, this requires short gates and fast carriers in the channel. Unfortunately, FETs with short gates frequently suffer from degraded electrostatics and other problems (collectively known as short channel effects), such as threshold-voltage roll-off, drain-induced barrier lowering, and impaired drain-current saturation, which result in a decrease in sensor sensitivity. Nevertheless, scaling theory predicts that a FET with a thin barrier and a thin gate-controlled region (measured in the vertical direction) should be robust against short-channel effects down to very short gate lengths (measured in the horizontal direction).
Accordingly, the possibility of having channels that are very thin in the vertical dimension would allow for high-speed transmission of carriers as well as for increased sensor sensitivity and accuracy. What is needed, therefore, is a FET device that is configured in such a manner as to include a shorter gate than is currently achievable in present FET applications. A solution that includes such a FET device designed for use in biological applications, such as for nucleic acid detection, sequencing, and/or other diagnostic applications, would be especially beneficial.