1. Relevant Field
Systems and methods consistent with the present invention generally relate to the storage and management of data structures, preferably within databases. More particularly, systems and methods consistent with the invention relate to the assignment or storage of data structures in hosts in response to a database query.
2. Background Information
Businesses and other organizations generate and/or receive a variety of data items and electronic information (broadly referred to hereafter as “data structures”) during the course of their operation. These data structures may be generated and/or received from various entities located in different regions and/or countries. To organize and manage operations of the organization, data structures may be stored in storage devices located in a variety of locations. The storage devices may be referred to as hosts for particular data structures. These storage devices may include databases that store the data structures as tables of data, where the tables may represent relations between the stored data. These relations may be based on attributes of the stored data. The tables may consist of rows and columns, where a row or column may be referred to as a tuple.
Tables in databases may implement relations by using rows of a table to indicate tuples and using columns to represent a tuple's attribute values. During operation, customers or employees of an organization may need to extract and/or use the data structures stored in related databases that are distributed over a variety of storage devices. Mainly, an entity may enter a query in a search engine including a logical condition to extract sets of tuples that meet the logical condition. Tuples may contain values from more than one table, and a table representing one relation (relation A) can include a column for the same attribute as another table representing another relation (relation B). To extract the tuples, the table representing relation A may be joined with the table representing relation B. The tables may be joined by logically matching rows using the values of the attributes stored in the columns of the tables.
Prior to joining the tables, join conditions for each join may need to be evaluated and each join condition J may include two pair lists, one for each of the relation represented by the join. Each pair list may represent a plurality of tuples reflecting attributes associated with the stored data.
For example, if a join condition J may reference two relations A and B. The evaluation of J may then include determining all pairs (a, b), where “a” may represent values in a row of the table representing relation A and “b” may represent values in a row of the table representing relation B. The values may be selected such that they satisfy the condition J by joining A and B. If J is a condition represented by A.x=B.y, where “x” may be an attribute of relation A and “y” may be an attribute of relation B, then the pair (a, b) of rows should satisfy the condition a.x=b.y.
Further, two pair lists PL (A, J) and PL (B, J) may be generated during evaluation of join conditions. PL (A, J) may consist of pairs (a0, x0), where the first component “a0” may be an identifier for a row “a” of the table representing the relation A and “x0” may be the value a.x. Similarly, PL (B, J) may consist of pairs (b0, y0), where the first component “b0” may be an identifier for a row “b” of the table representing the relation B and “y0” may be the value b.y. When a total join consists of n join condition, there may be 2*n pair lists and these pair lists need to be reduced to decrease the size of the pair lists, by assigning, or allocating one pair list (or parts thereof) from the host where it is located to the host of another pair list. This process of reduction may require resource intensive computations.
Once the reduction is performed, the remaining join evaluations may include allocating or assigning all pair lists a common host. For example, evaluating join J to satisfy condition A.x=B.y may include matching two reduced pair lists PL (A, J) and PL (B, J) by forming a pair (a, x) in PL (A, J) and a pair (b, y) in PL (B, J), such that x is equal to y and a new reduced pair (a, b) may be generated.
Extracting tuples and assigning tuples to a host in response to entity queries may require numerous resource intensive computations, increasing the time and costs associated with responding to a query. Conventional techniques including hash join, a sort merge join, a join via semijoin, and nested loop join are well known in the art, but may still require resource intensive computations.
In view of the foregoing, it is desirable to provide methods and systems for reducing the time and computations required to extract tuples in response to a query. For example, there is a need for improved methods and systems to execute joins and assign hosts more efficiently and by using less resource intensive techniques.