1. Field of the Invention
The invention relates generally to methods and apparatus for determining memory utilization in digital computer systems. More particularly, the invention relates to methods and apparatus for identifying the actual population of data within computer memory utilized to support a relational data base, where memory is defined herein as being "populated" if it is both allocated and (thereby identifying unpopulated areas of memory simply making a determination of memory allocation without regard to how it is being used), utilities can be designed to recover memory resources, data base management techniques can be revised to more conservatively allocate memory, etc. Accordingly, memory resources can be more efficiently used and in certain instances the cost of adding memory to enhance a system can be delayed or eliminated.
2. Description of the Related Art
Computer data bases can be classified into three main categories: Hierarchical, Network, and Relational. This invention concerns itself with methods and apparatus for analyzing relational data bases which store data in independent tables based on various mathematical algorithms.
"The Dictionary of Computers, Information Processing, and Telecommunications" (second edition, 1987, published by John Wiley and Sons) defines a relational data base as "A data base in which relationships between data items are explicitly specified as equally accessible attributes". An alternate definition, set forth in "A Guide to DB2", (by C. J. Date, published in 1984 by Addison-Wesley) describes a relational data base as a data base in which "The data is perceived by the user as tables (and nothing but tables)".
Accordingly, a relational data base may simply be thought of as a collection of relations where each relation is a table.
Relational data bases accommodate systems that store large amounts of data, providing rapid access for data retrieval, convenient updating, and economic storage. They can represent real world information structures, be reliable, afford privacy, and maintain integrity.
Having defined a relation as the data structure that corresponds to a table, further definitions may now be made of terms used herein to describe the invention.
The term "data page" (or "page"), as used herein refers to a fixed-length area of contiguous blocks of memory used to store a relation. Relations that occupy more space than available on one data page may have additional pages assigned as required.
Each data table, as indicated hereinabove, is called a relation, while each row (record) of a relation will be referred to hereinafter as a "tuple". The entries in a row (the columns) are defined herein as "attributes". The range of possible data that may populate an attribute is defined as the attribute's "domain". The attribute domain serves to define the valid entries that may be made for each attribute within the tuple, within the relation.
As an example, if a relational data base were to be constructed to allow description of food location within super markets, then each aisle might be considered a relation and every packed shelf would be a tuple within that relation. Each shelf position would have a domain indicating that only vegetables go here, and only fruits there. A specific can of 10 ounce sliced XYZ brand pineapple located on a particular shelf would be an attribute with which the shelf tuple has been populated.
To provide access to specific tuples, one or more attributes are designated as the "key" for the relation. This means that not more than one tuple with the same key exists in the relation. When specific tuples are to be accessed, the appropriate key must be given which will identify a unique tuple in the relation. A key may consist of more than one attribute and in this case, all attributes of the key must be provided to access this tuple.
Relational data bases use multiple data access and storage methods such as indexing, linear-sequential, and hashing to name a few. All of the methods require mathematical manipulation of the key. The particular type of data access and storage methods employed with respect to a given relation will be shown hereinafter to be a useful input, according to a preferred embodiment of the invention, in determining memory population. For the sake of completeness, various data access and storage methods, although well known to those skilled in the art, will be described hereinafter in the Detailed Description of the invention.
Finally, with respect to a relational data base, it should be noted that data page designs can be categorized into two types. First, all data pages can have the same size (for example, 32 contiguous blocks, each providing for the storage of 256 bytes of data), and as a particular relation needs more memory, a data base manager system allocates it another page.
Alternatively, the data base can have different classes of data page sizes that depend on the expected sizes of relations. In this type of relational data base, each relation has a fixed data page size that is independent of other relation's data page sizes. Within a relation, however, each data page has a fixed uniform size. Data pages are then allocated in multiples of the basic data block. For example, if the basic block is 256 bytes, data pages can be one block (256 bytes), 2 blocks (512 bytes), four blocks (1024 bytes), thirty-two blocks (8192 bytes), etc.
As indicated hereinbefore, the relational data base structure, outlined hereinabove for background purposes, is known to enable a computer system to store and access data base data relatively quickly. This is especially useful for computers that are transaction oriented, such as computers employed in modern day telephone switching systems.
Having described a relational data base and the terminology associated therewith, examples of known methods, models, etc., for monitoring memory resources, particularly in a relational data base, will now be set forth.
There exist, in computer systems associated with telephone switching systems, many well known methods and apparatus for monitoring memory resources. For example, in the commercially available 5ESS ("5ESS" is a registered trademark of AT&T) switching system, tools exist to report, on demand, allocated system data memory, data memory used by a particular processor, the system memory available (which is the total system memory less allocated memory), etc. However, such tools do not provide information on individual data structures within the data base. Accordingly, no indication is provided regarding unused memory space within the reportedly "used" section of the data base.
Again in the context of the 5ESS switch, a tool commonly referred to as an "Office Data base Editor", provides reports, on demand, of a data structure's design specifications. It can also count the number of individual items of information (previously referred to herein as "tuples") stored in a data structure. This tool is used to help a telephone company engineer locate a corrupted item of information in the data base and correct it manually. This tool is used on a structure by structure basis, and requires as input a structure's internal identification number. A single data base query can take over an hour depending on the telephone traffic the switching system processor is processing at the time of the data base query. Therefore, it is not a practical method for collecting data on the entire data base.
Still another known tool, a 5ESS "Access Editor", provides reports, on demand, of a data structure's design specifications. This includes a structure's internal identification number which is needed when accessing the Office Data base Editor, and the address of the data structure's master directory page. Like Office Data base Editor, this tool is used on a structure by structure basis. A single data base query can take longer than 10 minutes depending on the traffic load of the switching system at any time. The higher the telephone traffic the less resource the switching system processor can devote to a data base query. Again, the tool is not a practical method for collecting data on the entire data base.
Further yet, internal utility routines exist that can be used in the 5ESS switch for monitoring memory resources, such as programs which automate the manual terminal key strokes that are needed to query a data base using The Office Data base Editor and the Access Editor. However, internal system utilities are known to frequently cause data base reads to fail.
Outside the context of telephone switching systems per se, commercially available utilities, such as Norton Utilities, report, for example, the amount of memory used on a hard drive. Other memory management features are incorporated within such utilities, however, no methods or apparatus are provided which enable the user to determine the amount of unpopulated data in otherwise allocated memory.
Theoretical models for data base management are also known. For example, Gio Wiederhold in "File Organization for Data base Design", (Mcgraw-Hill, 1987), presents theoretical models for discussing data base population and presents mathematical models. However, he does not teach how to determine memory utilization in a real physical sense.
None of the above referenced tools, utilities, or theoretical models teach, claim or even suggest how to actually determine the amount of data populated within a "used" portion of memory. For example, a memory manager could have allocated a block of memory which to Norton Utilities would appear as "used", while only a portion of the block is actually populated with data.
In view of the known art, it would be desirable to provide methods and apparatus for very quickly and automatically analyzing a data base, in particular, a relational data base, structure by structure, to identify the amount of data actually populated (with at least tuple granularity) within the used portion of memory.
Furthermore, it would be desirable if such methods and apparatus could identify unused space, and compile statistics per structure and per data base.
Still further, it would be desirable if such methods and apparatus could operate with minimal user involvement, and not be dependent on or interfere with the operation of the computer system's transaction processing, be error free, etc.