The present invention relates to metadata in cloud storage systems and networks, and more particularly, this invention relates to generation and management of metadata driven by genomic sequence analysis and genomic sequence data processing workflows.
To date, metadata have been applied in limited contexts, e.g. to allow human users to manually annotate image data shared via a social network. The limitations give rise to a technical gap between these conventional user-mediated metadata applications and the restrictive, regimented constraints imposed by data storage, management, and/or processing environments common to high throughput data processing centers, high volume data storage solutions, and related systems that operate using large volumes of data, high-volume data processing operations, an/or related data storage and retrieval solutions. A primary example of such high-volume data management/processing applications is genome data processing and genome sequence analysis, which conventionally includes processing data on the order of millions to trillions of units (e.g. image files, sequence reads, etc.) in a single operation or series of operations.
Accordingly, it would be beneficial to provide systems, techniques, and computer program products that manage and organize metadata in a workflow-centric rather than the conventional annotation-centric manner.
For example, workflow-centric metadata management and organization techniques, systems, and computer program products can provide advantageous features and functionalities including but not limited to: (1) seamlessly coupling metadata with workflows, especially coupling workflow and/or metadata creation, access, replication, migration, distribution and/or consumption; (2) scalability to accommodate data processing performance and/or capacity requisite to enable real-time data acquisition and/or data processing for systems handling data volumes on the order of billions of data objects or more; (3) applicability to high-performance-computing and/or cloud-computing environments and associated resources; (4) flexibility to design and deploy custom metadata as provenance for workflow(s); and/or (5) facile adaptability to genomic data acquisition and analysis tools and techniques.