A hash table is a technique for storing and retrieving data, which associates calculated index values with stored data. Hash tables are particularly efficient for data lookup operations, because they make use of index values that are efficient to search. In order to store a data value, a hash function is performed on a key value associated with the data value, thereby determining a hash value. This hash value is used as the index value to an entry in a table in which the data value is stored. The key value is also stored, in a location related to the entry used by the data value. For example, the key value may be stored in the same entry as the data value, or a pointer to another storage location may be stored with the data value.
In order to retrieve the data value, a query key value is subjected to the hash function, in order to determine the location in the table of the associated data value. The data value can then be returned. In some implementations, the query key value is compared with the stored key value in order to determine that the same key value was used to store the data value. If the key values match, then the data value is returned.
As an example of the use of a hash table, consider the storage of telephone numbers for contacts. The contact name is used as the key value, and the telephone number is the data value. In order to populate the storage table, the contact name is subjected to a hash function, and the resulting hash value used to indicate the table entry into which the contact name and associated telephone number are to be stored. In order to retrieve a telephone number, a contact name is submitted as a query key value, and is subjected to the hash function. The contact name and telephone number are retrieved from the table entry having the resulting hash value as an index value. The contact names are compared, and if they match, the telephone number is returned.
The table entry may be referred to as an entry, location or bucket, and each entry has a unique index value.
One particular use of a hash lookup table is in Ethernet networks for data packet forwarding. Ethernet is rapidly becoming the most commonly used data network for connecting computers and their peripherals together. Part of the attraction of Ethernet to network administrators is the ease of connecting and addressing these network attached devices or “end stations” as they are often called.
FIG. 1 of the accompanying drawings is a block diagram illustrating a simple computer network. The network 1 includes a number of personal computers 11a, 11b, 11c and a server device 12, which communicate with one another using Ethernet communications via an Ethernet bridge 14. The Ethernet bridge device 14 includes a plurality of ports 15a, 15b, 15c and 15d, and a routing unit 16. The routing unit operates to route data packets (or “frames”) between a source device and a destination device connected to ports of the bridge device 14.
Ethernet bridges have the capability to learn automatically the addresses of the End Stations they are connected to directly or via other Intermediate Stations or bridges which provide the interconnection between Ethernet network segments. The learning process is dynamic and performed automatically every time a bridge is powered up.
Ethernet data is encapsulated into Ethernet frames when it is sent between End Stations. A frame is illustrated in FIG. 2 of the accompanying drawings, and has a header containing a pair of End Station addresses used to direct the frame between a source and destination. Ethernet End Station addresses are defined by the IEEE 802 standard Medium Access Control (MAC) address space, each network attached device having a unique MAC address assigned during manufacture.
Commonly, the Ethernet bridge device 14 will also support multiple logical Ethernet network segments by utilising IEEE 802.1Q VLAN tagging whereby the MAC address is supplemented with a VLAN (virtual local area network) tag to indicate with which logical Ethernet network segment a particular frame is associated. The remainder of the frame consists of a payload (the data being transferred) and a CRC (cyclic redundancy check) for maintaining data integrity.
The Ethernet bridge device 14 contains a forwarding database which is used to route Ethernet frames from an ingress port of the bridge to the correct egress port of the bridge in order to deliver the frame closer to a destination End Station. In the simple example shown in FIG. 1, this results in data packets being delivered between personal computers and the server 12. In more complex network arrangements, the data packet is routed to the next bridge until the destination is reached.
This forwarding database uses the hash table technique described above and is populated as frames are routed through the bridge between Ethernet segments from which the network is constructed, joining end stations through intermediate points. When a frame is received by the bridge at a port (the ingress port for that frame), the bridge 14 learns the end station source details (MAC address, VLAN tag) for that frame, and maps those details to the port concerned. The mapping is stored in the forwarding database for subsequent use in routing, by subjecting the source details to the hash function of the forwarding database, and storing the port information in the table at the entry corresponding to the resulting index. Later incoming frames need to determine the egress link to which the frame is to be routed. The destination details of the frame (MAC address, VLAN tag) are used to determine the correct egress port by performing a hash lookup in the forwarding database.
The size and construction of the forwarding database must be carefully considered, since should it not be possible to lookup a port entry in the database, then the Ethernet bridge device 14 is required to route the frame to all other possible egress ports for that frame. Ideally, this only occurs before the forwarding database has learnt the MAC address of the source end station. However, if a subsequent MAC address requires storage in the same place in the forwarding database then it will displace the earlier address and lead to unnecessary flooding of frames.
When the hash function generates the same index for two different keys, then a hash conflict occurs and a mechanism is needed to resolve this. The probability of this occurring is dependent on the size of the hash table in comparison to the number of entries being held in it and how well the hash function distributes the keys amongst the slots of the hash table. As the hash table becomes more full, the probability of a hash conflict increases non-linearly.
Common mechanisms for handling hash conflicts are to form a list of conflicting entries attached to the hash table slot which is searched in some manner; to perform a second attempt with a different hash function or to have a content addressable overflow area. It is common for there to be some limit on the number of conflicts which can be handled either in total or for a particular hash entry.
In all cases it is highly desirable to keep the number of conflicting entries low and this means that the hash function should evenly distribute the entries across the hash entries. However there will always be pathological cases where all keys hash to the same hash value and the table is unable to handle that level of hash conflicts.
Since the distribution of MAC addresses is different when an Ethernet bridge is used in a different environment, it is unlikely that the hash function will always provide an optimal distribution to hash table entries and it is likely that the hash table will be effectively full (i.e. an entry being unable to be inserted) well before all hash entries in the table have been allocated.