Computer systems generate and store vast amounts of information. This information is generally persistently stored on storage devices, such as computer disks. A collection of such information is often referred to as an "object." Since a storage device may contain many objects, each stored object is assigned a unique identifier. When an object is to be retrieved from the storage device, the unique identifier is used to identify the object to be retrieved. For example, an object (i.e., a file) that is created by a file system is given a unique filename as its identifier. To subsequently access the object, a computer program provides the filename to file system. The file system uses this filename to identify and locate the object on the storage device.
Although a computer system can generate identifiers (e.g., filenames) that are unique among the objects that it creates, such identifiers, however, may not be unique when other computer systems are considered. In particular, another computer system may generate the same identifiers for its objects. For example, two computer systems may have objects named "autoexec.bat," which contain very different information. Thus, the filename "autoexec.bat" does not uniquely identify one object. Rather, it identifies two objects on two different computer systems. This, of course, is not a problem if each computer system only uses the objects that it creates. This, however, can be a problem if computer systems are networked together and another computer system asks to retrieve the object identified as "autoexec.bat." The identifier "autoexec.bat" does not uniquely identify which object should be retrieved.
These objects could be uniquely identified by both a unique identification of the computer system and a unique identifier of the object within that computer system. A couple of problems, however, have arisen with such an approach. There is no standardized mechanism for assigning a unique identifier to the computer systems themselves. Thus, two computer systems may have the same identifier. As a result, the combination of the computer system identifier and the object identifier still may not be unique. Moreover, even within a single computer system, the object identifiers may not actually be unique. If the computer system has various computer programs, then each computer program may generate identifiers for objects, especially for objects that are not stored by the file system, that are not unique.
To solve these problems, the Open Software Foundation (OSF) created the Universally Unique Identifier (UUID). The UUID is a 128-bit value that is defined so that the chance of two computer systems generating a UUID with the same 128-bit value would be extremely small. FIG. 1 illustrates the format as defined by the OSF of the UUID. The UUID 110 contains three fields: a 48-bit node ID field 120, a 16-bit clock sequential/variant field 130, and a 64-bit clock/version field 140. The node ID field comprises bits 0-47; the clock sequential/variant field comprises bits 48-63; and the clock/version field comprises bits 64-127.
The node ID field contains a node identifier that uniquely identifies the computer system that created the UUID. By convention, manufacturers of network access cards assign a 48-bit unique identifier to each network access card that they create. Consequently, if a computer system has a network access card, then the node ID field is set to the value of that network access card identifier. However, if a computer system does not have a network access card, then the computer system randomly generates a value that it uses as its unique identifier and sets the node ID field to that value. Because of the large size (48 bits) of the node ID field, the probability that a computer system will randomly generate the same identifier of that of another computer system is extremely small. If a computer system did randomly generate the same identifier as that of another computer system, since the computer system that randomly generated the identifier has no network access card, it was originally expected to be unlikely that the resulting duplicate UUID would be used by any computer system other than the one that created the UUID.
The clock/version field is divided into a 60-bit clock subfield and a 4-bit version subfield. The clock represents time since Oct. 15, 1582 (beginning of the Gregorian calendar usage) in increments of 100 nanoseconds. The 60-bit clock subfield is further divided into a 32-bit low part 141, a 16-bit medium part 142, and a 12-bit high part 143. The 12-bit high part is further divided into an 8-bit low subpart 143a and a 4-bit high subpart 143b. The 60 bits of the computer system clock 150 are stored in the clock subfield in the following way. Bits 0-31 of the system clock are stored in the low part (bits 96-127). Bits 32-47 of the system clock are stored in the medium part (bits 80-95). Bits 48-55 of the system clock are stored in the low subpart (bits 72-79) of the high part, and bits 56-59 of the system clock are stored in the high subpart (bits 64-67) of the high part.
The clock sequential/variant field is subdivided into a 12- or 13-bit clock sequential subfield and a 3- or 4-bit variant subfield. The variant subfield is a 3- or 4-bit, zero-terminated subfield that identifies the format of the universally unique identifier. One format is the OSF-defined format. The use of a different value in the variant field for each format ensures that a UUID in one format will not be a duplicate of a UUID in another format. The clock sequential subfield is used to ensure uniqueness of the UUID in the event that a computer system generates two UUIDs with the same value in the clock/version field. Thus, whenever there is a possibility that the clock for the computer system may generate a duplicate time, then the clock sequential subfield is incremented. For example, when a clock is set back an hour to account for the transition from standard time to daylight savings time, the clock will generate the same time that was generated an hour earlier. Thus, there is a possibility that the clock/version subfield of two UUIDs generated an hour apart would have the same value. Consequently, the clock sequential subfield is incremented when the clock is set back to ensure that the combination of the clock subfield and clock sequential subfield will be unique at all times at each computer system.
Although because of its definition there is an extremely small chance that duplicate UUIDs will be generated, there are several instances in which these chances are greatly increased. First, if a computer system has multiple processors, then there may be a possibility that duplicate UUIDs may be generated by two processors at the same time. In particular, assuming that the processors share the same network access card and thus have the same node identifier, then there is a possibility, albeit small, that two processors may generate a UUID at exactly the same time. To prevent the generating of duplicate UUIDs, such multiple processor systems use a centralized allocator for generating UUIDs. When a processor needs to generate a new UUID, the processor requests the UUID from a central allocator that is shared by all processors. The central allocator generates a UUID and returns it to the processor. Because the UUIDs are sequentially generated by the central allocator at one processor, there is no chance of a duplicate UUID being created for this computer system. However, the overhead of requesting a UUID from a central allocator each time a new UUID is needed may be unacceptable. Consequently, each processor may request a range (e.g., 256) of UUIDs from the central allocator with each request. The central allocator returns a UUID with the low-order bits of the clock subfield set to 0. The requesting processor can then assign the UUIDs from the range to its objects by incrementing the clock subfield value once for each UUID. FIG. 2 illustrates the allocation of ranges of UUIDs. A local UUID generator 202 at one processor may request a range of 256 of UUIDs from the central UUID allocator 201. The central UUID allocator then allocates 256 UUIDs for that processor. The local UUID generator then generates UUIDs from that range. If, however, a local UUID generator 203 for another processor requests 256 UUIDs, the central UUID allocator may not be able to allocate the UUIDs immediately. In particular, because the UUIDs are clock-based, the central UUID allocator should wait for at least 256.times.100 nanoseconds to pass before allocating the next UUID range to the other local UUID generator. It would be desirable to avoid this waiting.
Second, the chance of generating duplicate UUIDs has increased recently because many computers, especially home personal computers, do not have network access cards. Consequently, each such computer would randomly generate its own node identifier. Of course, the more computers that randomly generate a node identifier, the greater the possibility of duplicate node identifiers being generated. In addition, since these computers are being increasingly interconnected via the Internet, the possibility that duplicate UUIDs will cause problems also increases.
The large size, 128 bits, of the UUID may be problematic in certain situations. For example, each row of a table in a database may be an object that is assigned its own UUID. Such rows may average 100 bytes of information. Since a UUID is 16 bytes in length, there would be a 16% storage overhead in storing a UUID along with each row in the table. FIG. 3 illustrates the overhead of UUIDs in a database table. The table 320 contains a UUID column 321 and data column 322. Each row 323 contains a UUID and data. The large size of the UUID also results in very large indexes into the table. The index 310, which is used to rapidly locate a row with a given UuID, contains an entry for each row in table 320. Each entry of the index contains a copy of the UUID that is in a row and row identifier (RID) that points to the corresponding row in table 320. Because each UUID is thus stored twice, there is a 32% storage overhead associated with UUIDs in such situations. It would be desirable to reduce this overhead.