The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Database systems such as relational databases and object databases typically implement an index to facilitate efficient and rapid search and retrieval of records from the database. However, the metadata managed in different database systems, and different kinds of indexing methods, may have different strengths and weaknesses. There may be certain queries that are time-consuming or inefficient when submitted to a traditional database index. Examples include queries to specify a path of connections between endpoint entities. Performing a path-based search may require submitting a first query to the database, receiving a result set of data objects with properties and links, performing many more queries to retrieve other data objects by following the links, and repeating the process iteratively until all links have been followed; this approach requires many cross-network message roundtrips and cannot scale to large numbers of queries against databases that have large numbers of objects with complex relationships to other objects.
Certain graph database systems are optimized to organize metadata and conduct searches using logical graphs consisting of nodes and edges, with graph traversal algorithms that can be superior in some cases to traditional relational database indexes. Unfortunately, the data object schema implemented by a graph database typically is considerably different than the schema or ontology of a non-graph database system.
Analytics applications generally access large datasets to perform analytic operations. When a user wishes to perform an operation on a dataset, the user identifies where the dataset is stored and the analytics application sends a query to the server storing the dataset. The server computer system storing the dataset executes the query against the dataset and returns the requested information to the analytics application. Depending on the type of query, executing the query against the dataset can be extremely inefficient. Path-based queries are an example. If the result set for a query comprises data items with paths representing relationships among the data items, executing the query directly against a relational or columnar database requires the server computer system to check each row to determine if the row satisfies the query. Indexes of the relational or columnar database typically cannot be used as a source of relationship information between one record and another.
Therefore, relational and columnar databases typically require unacceptable amounts of time to produce result sets for path-oriented searches in large or complex datasets.
Consequently, there is a need in the technical field of distributed databases for new, efficient approaches for retrieving results for path-oriented searches when large-scale data repositories are involved.