The present invention relates generally to the field of data processing, and more particularly with assigning storage locations for information resources utilized by a graph workload.
Advances in information technology (IT) enable individuals, organizations, enterprises, and agencies to access and process ever-increasing volumes of stored information. Unstructured data is not well suited for storage with well-known relational databases. Parsing and analyzing unstructured data for storage in a relational database consumes computing resources and can mask contextual references related to the generation of the unstructured data. In addition, some generated data is “data in motion,” which is more valuable if not critical for decision making when utilized in real-time, such as weather forecasting or threat detection, as opposed after-the-fact or forensic analysis of an outcome based on the generated data.
Unstructured data can include numerous relationships that relate different portions or instances of the unstructured data as a web of interactions with other computing resources, files, and databases. In addition, software applications (i.e., applications), such as transaction processing, utilize data from many sources. In the example of transaction processing, a software application generates a workload that is comprised of other workloads that can be distributed within a networked computing environment. In this example, a transaction process can include workloads, such as analytics that provide fraud detection, certification of payment of the transaction by a financial institution, analytics of a merchant that can customize incentives for a customer engaged in the transaction, software of the merchant to receive and process the transaction of the customer, and queries to a shipping company to cost and schedule a delivery item associated with the transaction.
Some types of workloads can execute independently and directly return results, whereas other types of workloads are interdependent and can require data from one or more sources, and each source can execute additional workloads to obtain data required by a preceding workload. This web of relationships among workloads and data can be modeled by a graph structure, such as a graph database, where the data associated with a workload exists at nodes/vertices of a graph structure, and the relationships and interactions among data resources are described by edges of the graph structure that link nodes/vertices. A graph workload is defined as queries or analytics represented by a graph structure and data used for workload. Graph workloads and graph databases are useful for storing, traversing, and processing highly complex relationships. Graph workloads and graph databases can improve intelligence, predictive analytics, social network analysis, and decision and process management. Natural language processing (NLP) is another example of a graph workload.
In addition, graph workloads can execute within different IT (e.g., data processing) systems distributed among various locations that communicate via networks, such as enterprise systems, data centers, private cloud environments, and public cloud environments. Within an IT system, information and data can be stored on various types (e.g., storage tiers) of storage devices ranging from cheaper, slower (e.g., high latency) storage of a magnetic tape based system; to more expensive, fast non-volatile storage, such as solid-state disk drives; to faster volatile system based memory. Some forms and structures of data can readily be compressed to reduce storage requirements. Other data may utilize encryption software included on a storage device. Based on the performance requirements of a workload, data supporting the workload is stored in various storage tiers.
However, not all data is equal in terms of usage, cost sensitivity, and security/regulatory compliance. Some data accessed by a workload may be subject to specific controls for example, encryption and isolation based on a standard, such as ISO 27001:2013. Other data may be subject to Health Insurance Portability and Accountability Act (HIPAA) compliance regulations. Current solutions for modeling graph workloads, partitioning graph workloads, and assigning storage locations for data utilized by graph workloads do not incorporate the plurality of constraints (e.g., properties), such as networking delays, input/output (I/O) bandwidth, regulatory compliance, security, data compression ratios, workload interactions as well as know factors, such as cost and performance within a single solution.