A database is an organized collection of logically-related data, typically in digital form. The data are typically organized in a schema that is intended to model relevant aspects of how the data (for example, the availability of rooms in hotels) may be used. Thus the database may be defined and/or structured in a way that supports processes requiring the stored information (for example, finding a hotel with vacancies). The data in a database can be viewed as a part of a larger database system that includes (1) a data model, which defines the structure of how the data in the database is organized and interrelated, and (2) a database management system (DBMS), which is a software package that controls the creation, maintenance and use of the database. The DBMS effectively acts as a shell, surrounding the data and controlling the interactions between the data and the outside world.
Each database stores and organizes data according to a particular data structure. Data structures are generally based on the ability of a computer to fetch and store data at any place in its memory, specified by an address, which is a bit string that can be itself stored in memory and manipulated by the program. An extremely simple data structure, the “record” data structure, stores values in one or more of a fixed number of fields. For instance, a date record may have a year field, a month field and a date field. An “array” data structure stores data as collections of elements indexed by one or more identifying keys. An array differs from a record in that arrays typically do not have a fixed number of fields, the fields are identified by index rather than by name, and each field generally must contain the same type of data (e.g., integer, text string, etc.). The access and manipulation of data stored in record and array type data structures is enabled by the computation of the addresses of data items using arithmetic operations.
When discussing data structures, it can be helpful to describe them with reference to certain identifying characteristics or qualities. One such characteristic or quality is that of the relative “tightness” or “looseness” of the association(s) between data in a data structure. The concept of tightly or loosely associated data structures is similar in some respects to that of the computer science concept of tight and loose coupling. In a “tightly” coupled system, changes in one area can have significant impact and cause changes in another area. In a “loosely” coupled system, changes in one area have minimal or negligible impact on another area. For example, given a tightly coupled pair of objects, changes to internal functions of the first object (A) would alter object A's data or functional return values in a manner that would require adjustment of the second object's (B) internal functions in order to maintain existing inter-object functionality. In contrast, in a loosely coupled pair of objects, changes to the internal functions of object A would have little or no impact on the internal functions of object B.
Referring to FIG. 1A (which is a schematic diagram illustrating an exemplary data record in a “tightly associated” data structure), in a typical database configuration, members of a data structure 102, referred to as elements or nodes, include a data field 104, where the actual information of use to a user is stored, and up to three fields to identify the members' relationship with other members of the data structure, such as parent 108, child 112, and group 116 identifiers. This creates a “tight” relationship structure (relatively speaking) between members of the data set that can be easily queried, but requires additional storage space in the database and the use of more complex queries and query processing. Intersecting branches, where nodes of one data structure reference nodes of additional data structures one or more times, can also become problematic, as some nodes in one or more of the collection of data structures may require parsing multiple times, while other nodes may cross reference back to a node in one or more data structures that precedes the intersecting nodes, creating a loop.
In such a “tightly” associated data structure or database, each node stored in a database record may contain multiple association references to additional database records of other nodes. These association references establish relationships between different entities. Modifying the association references of a given node would then also require modifications of the other database records storing the associated nodes in order to maintain accurate association references and enable efficient use of the data structures and database.
For example, given a node with both parent, child, and group associations, moving a node from one group to another may require as many as ten database transactions to ensure accurate records. To begin, the database is queried to find the originating node's record, which provides the associated parent and child record identifiers, for a total of one database transaction. The database is then queried for the child node record, which is then updated to point to the originating node's parent record, for a cumulative total of three transactions. The database is then queried to find the originating node's parent's record, which is then updated to point its child association to the originating node's associated child record, for a cumulative total of five transactions. The database is then queried to find the originating node's new parent record, which is then updated to point its child association to the originating node's record, for a cumulative total of seven transactions. The database is then queried for the new parent record's original associated child record, which is then updated to point its parent association to the originating node's record, for a cumulative total of nine transactions. Finally the originating node is updated with its new parent, child, and group association values, for a grand total of ten database transactions. As should be apparent, as the number of nodes and possible associations increases, a substantial amount of resources may be expended doing little more than maintaining a database so that it can continue to be used.
Table 1 shows a 26 node database table, organized by row and column, with each row configured in a manner similar to the data structure depicted in FIG. 1A. The ‘Node’ column identifies each node by the alphabetic label referenced in FIG. 2. The ‘Address’ column represents the memory address of each node in the data structure. The ‘Data Fields’ column represents the data contained in each node. Each node may be associated with one or more nodes based on the value of several database columns. The ‘Group’ column represents one of six groups with which the node may be associated. The ‘Parent’ column represents the alphabetic label of the parent node that is associated with this node. The ‘Parent Address’ column references the memory address that can be used to find the associated parent node in the data structure. The ‘Child’ column represents the alphabetic label of the child node that is associated with this node. The ‘Child Address’ column references the memory address that can be used to find the associated child node in the data structure. For example, node J has a parent association with node E, and its ‘Child’ and ‘Child Address’ columns are set to the values ‘E’ and ‘4n’ respectively. Node J also has a child association with node O, with its ‘Parent’ and ‘Parent Address’ columns set to the values ‘O’ and ‘14n’ respectively. As an example, the relationship between the nodes represented in Table 1 is shown graphically in FIG. 2.
Compared to a “loosely” associated data structure, and using a standard integer based identifier, a “tightly” associated data structure requires one to two extra columns per database row, for example the ‘Child’ and ‘Child Address’ columns, which require an additional four to eight bytes of data storage. If the fields are indexed (as suggested by element 120 of FIG. 1A) to improve performance, then this can further add an additional 8 bytes per row. Altogether, a typical scenario would require an additional 12 to 16 bytes per database row for the columns required to tightly associate the data structure with a single child.
TABLE 1ParentChildDataAd-Ad-NodeAddressFieldsGroupParentdressChildrendressesA0aaa1F5nN/ANullBnbbb1F5nN/ANullC2nccc1G6nN/ANullD3nddd1Q16nN/ANullE4neee2J9nN/ANullF5nfff2L11nA, B0, nG6nggg2L11nC2nH7nhhh2P15nN/ANullI8niii2O14nN/ANullJ9njjj2O14nE4nK10nkkk3P15nN/ANullL11nlll3P15nF, G5n, 6nM12nmmm3Q16nN/ANullN13nnnn3Q16nN/ANullO14nooo3T19nI, J8n, 9nP15nppp4U20nH, K, 7n, L10n,11nQ16nqqq4U20nD, M, 3n, N12n,13nR17nrrr4Y24nN/ANullS18nsss4X23nN/ANullT19nttt4X23nO14nU20nuuu5Null13nP, Q15n, 16nV21nvvv4Y24nN/ANullW22nwww4Z25nN/ANullX23nxxx5Z25nS, T18n, 19nY24nyyy5Z25nR, V17n, 21nZ25nzzz6O14nW, X, 22n, Y23n,24n
Embodiments of the invention are directed toward reducing the required data storage, simplifying the associations between nodes, and more efficiently iterating over a “loosely” associated data structure when compared to conventional approaches using a “tightly” associated structure. Embodiments of the invention solve these and other problems both individually and collectively.