The present disclosure relates generally to the field of data gathering and analysis related to biological samples. More particularly, the disclosure relates to techniques for interacting with a cloud computing environment to share, store, and analyze biological related information (e.g., biological data, protocols, analysis methods, etc.).
Genetic sequencing has become an increasingly important area of genetic research, promising future uses in diagnostic and other applications. In general, genetic sequencing involves determining the order of nucleotides for a nucleic acid such as a fragment of RNA or DNA. Relatively short sequences are typically analyzed, and the resulting sequence information may be used in various bioinformatics methods to logically fit fragments together to reliably determine the sequence of much more extensive lengths of genetic material from which the fragments were derived. Automated, computer-based examinations of characteristic fragments have been developed and have been used more recently in genome mapping, identification of genes and their function, and so forth. However, existing techniques are highly time-intensive, and resulting genomic information is accordingly extremely costly.
A number of alternative sequencing techniques are presently under investigation and development. In several techniques, typically single nucleotides or strands of nucleotides (oligonucleotides) are introduced and permitted or encouraged to bind to the template of genetic material to be sequenced. Sequence information may then be gathered by imaging the sites. In certain current techniques, for example, each nucleotide type is tagged with a fluorescent tag or dye that permits analysis of the nucleotide attached at a particular site to be determined by analysis of image data. Although such techniques show promise for significantly improving throughput and reducing the cost of sequencing, further progress in speed, reliability, and efficiency of data handling is needed.
For example, in certain sequencing approaches that use image data to evaluate individual sites, large volumes of image data may be produced during sequential cycles of sequencing. In systems relying upon sequencing by synthesis (SBS), for example, dozens of cycles may be employed for sequentially attaching nucleotides to individual sites. Images formed at each step result in a vast quantity of digital data representative of pixels in high-resolution images. These images are analyzed to determine what nucleotides have been added to each site at each cycle of the process. Other images may be employed to verify de-blocking and similar steps in the operations.
In many sequencing approaches the image data is important for determining the proper sequence data for each individual site. While the image data may be discarded once the individual nucleotides in a sequence are identified, certain information about the images, such as information related to image or fluorescence quality, may be maintained to allow researchers to confirm base identification or calling. The image quality data in combination with the base identities for the individual fragments that make up a genome will become unwieldy as systems become capable of more rapid and large-scale sequencing. There is need, therefore, for improved techniques in the management of such data during and after the sequencing process.
Besides the data gathered during and after sequencing, the genomic analysis workflow from sample extraction to reporting of the data analysis may involve the generation of a significant amount of paper-based information such as lab tracking forms, user guides, and various manifests for tracking sample and content information. All of the paper-based information may complicate the genomic analysis workflow for both individuals and larger entities performing genomic analysis. Thus, there is a need for improved techniques in the management of such information before, during, and after the genomic analysis workflow.
Further, certain steps within the genomic analysis workflow may be subject to a great deal of variability due to different individuals and entities performing the steps. For example, sample preparation includes a high degree of diversity (e.g., in number of steps, processing time, and specific chemistry needed for specific genomic analysis applications). Also, sample preparation has historically been the least automated and integrated part of the genomic analysis workflow, while including the highest user-to-user and site-to-site variability. Thus, there is a need for improved techniques to create a more tightly integrated workflow from sample extraction to reporting, while making the genomic analysis workflow more accessible to individuals and larger entities and promoting sharing between these individuals and entities.
Yet further, certain sample preparation cartridges used in preparing samples for genomic analysis (e.g., the sequencing described above) may not serve the specific needs (e.g., specific application) of the user. Additionally, individuals or entities with lower-throughput needs and lacking resources may not utilize an automated sample preparation system and/or application-specific sample preparation cartridges, but instead utilize self-derived assays. Thus, there is a need for providing a customizable sample preparation system for use with an automated sample preparation system by those individuals or entities with lower-throughput needs and or lacking resources.