This invention relates to data structures and sorting algorithms that are used in computer programs such as databases, spreadsheets, organizers, word processors and any other computer program that employs sorting.
Computer programs such as databases organize data that may be in the form of stock prices, addresses, money, dates, statistics, phone numbers, recipes, probabilities, etc. Each individual piece of data, standing alone, is generally not very informative. Database applications make data more useful because they help users organize and process the data. The database application allows the user to compare, sort, order, merge, separate and interconnect the data. As a result, the database user is able to generate useful information from the data.
Many other computer programs also involve sorting and other forms of database management. For example, electronic spreadsheets calculate complicated mathematical formulas and sort and extract mathematical and textual data. Accounting and inventory programs are also specialized databases. Even word processors use sorting and other database functions in their spell check and mail merge operations.
The time that a computer application requires to perform fundamental data operations such as sorting determines the overall speed of the computer application because these operations are often performed frequently. The fundamental data operations include: the time required to insert the data into a container; the time required to locate the data that is associated with a key; the time required to retrieve data from the container using the key; the time required to count the number of data entries having the same key (where duplicate keys are allowed); and, the time required to write data from the container to an array in sorted order based upon the keys. As can be appreciated, any improvements in the performance of these fundamental operations can significantly improve the overall performance of the computer application.
Natural numbers expressed conventionally (base 10 or base 2) as a string of digits have a word length of log n, where n is the number expressed. When using string or numerical keys to retrieve data objects, the key length must grow as log n if duplicate keys are to be avoided. In many practical situations, however, the key length is arbitrarily bounded. Duplicates are avoided because the number of objects (n) in the database is bounded by practical considerations. For example, 9-digit social security numbers (SSNs) are used to uniquely identify people in the United States. Bounding the keyword size allows the implementation of an order n sort function for objects with either numerical or general string keys.
As set forth in Cormen, Leiserson, and Rivest, xe2x80x9cIntroduction to Algorithmsxe2x80x9d, MIT Press, (1990), the theoretical best time complexity for sorting by comparison a set of objects with unique keys is Omega(n log n). By the definition of a set, duplicates are not allowed. As n increases, the length of the key (field width or number of digits in the key) increases as log n. This would apply, for example, to an application that stores a large set of objects that are keyed to serial numbers. In Nilsson, xe2x80x9cThe Fastest Sorting Algorithm?xe2x80x9d, Dr. Dobb""s Journal, April 2000, an O(n log log n) integer sorting algorithm is disclosed that allows duplicate keys.
Many practical applications have a fixed key length, such as the 9-digit social security number (SSN), name string (last+first+middle), or other property of the stored object. If, as in a relational database, duplicate keys are not allowed, the fixed key length puts an upper limit on storage capacity that is not usually reached in practice. Known O(n) sorting algorithms such as counting sort, radix sort, and bucket sort impose an upper bound on n.
Therefore, data structures and sorting algorithms that will improve the speed and efficiency of the fundamental data operations of computer applications would be desirable.
A data structure according to the invention is created in memory of a computer. The data structure stores data objects for access by a computer application that is executed by the computer. Each of the data objects is associated with a digit or character of key. The data structure includes a root node object and a first node object that is a first child of the root node object. The first node object includes a first node ID property that is assigned a first character of a first key that is associated with a first data object. A second node object is a first child of the first node object and includes a second node ID property that is assigned a second character of a first key that is associated with the first data object. A third node object is a second child of the root node object and includes a third node ID property. The third node ID property is assigned a first character of a second key that is associated with a second data object if the first character of the first key is different that the first character of the second key.
According to other features of the invention, the first, second and third node objects include a parent pointer property that points to a parent of the first, second and third node objects. The first, second and third node objects include a child pointer property that points to children of the first, second and third node object objects. The first and third node objects are at a first node level and the second node object is at a second node level. The first data object is stored in a data queue property of a node object at an nth level where n is equal to a number of characters in the first key.
According to other features, the first, second and third node objects include a data flag property that indicates whether a node object is associated with a data object. The first, second and third node objects include a data queue property that stores a data object. A depth-first traversal of the data structure encounters the data objects in alphabetical order of the keys and a breadth-first traversal encounters the data objects in numerical order of the keys.
In another aspect of the invention, a data structure according to the invention is created in memory of a computer for storing data objects for access by a computer application that is executed by the computer. Each of the data objects is associated with a key. The data structure includes a root node object and a first node object that is a first child of the root node object. The first node object includes a first node ID property that is assigned a first character of a first key that is associated with a first data object. A second node object is a first child of the first node object and includes a second node ID property that is assigned a second character of a first key that is associated with the first data object. A third node object is a second child of the first node object and that includes a third node ID property. The third node ID property is assigned a second character of a second key that is associated with a second data object if the first character of the first key is the same as the first character of the second key.
In other objects of the present invention, the data structure is used to sort the data objects.
Still other objects, features and advantages will be apparent from the specification, the claims and the drawings.