In existing messaging systems such as electronic mail systems, key performance metrics are human interactive response time and message throughput. Interactive response time is the time it takes the system to respond to the demands placed on it by a user. Message throughput is how efficiently the system can process demands from the user as well as handling all asynchronous interactions such as receiving and sending mail from other users.
To improve human response time, current systems utilize indexing. An index is an auxiliary access structure of the database that physically organizes part of the data in such a way that it can be quickly and efficiently accessed in a certain pattern. Messaging data is accessed in many different patterns, such as displaying the contents of a folder containing recently received mail (inbox), searching for mail from certain person, or looking for mail that has been previous classified into some category or stored in a folder.
Because there are many different access patterns for messaging data, there may be many different indexes over the same data. And, an index that provides fast access to data in the inbox most often will not provide fast search for all mail from a certain person. As the amount of data stored in the messaging systems grows, the indexes become absolutely essential for maintaining acceptable response times.
While indexes are essential, they do suffer from several serious draw-backs. First, to keep the indexes for the messaging data current, the indexes are synchronously maintained as the data is modified, added to, or deleted from. For example, as data is added to the messaging data, the indexes are updated to reflect the new data; as data is deleted from the messaging data, the indexes are updated to reflect the data removed data; and, as the messaging data is updated, the indexes are updated to reflect the updated values of the data. As the rate of change of the messaging data increases, or as the number of indexes to be maintained increases, the cost of keeping the indexes current becomes problematic. Current messaging systems often spend over 40% of their input/output (I/O) operations doing nothing more than maintaining these indexes.
Second, the access pattern at data modification time, when the indexes are updated, cannot match all the access patterns indexed. Therefore, the I/O necessary to maintain those indexes are often effectively random relative to the data modification itself.
Next, indexes are maintained even if they are not being used. For example, suppose a user wants to sort their data on three different properties (sort on who it was from, when it was received, and what is the subject) because that will allow them to find a particular message quickly because they remember who it was from, roughly when it was received relative to other messages from the same person, and they will recognize the subject when they see it. To sort the data in that way, an index may be created, used once, and never used again. For some period after it is created, the index is maintained.
Lastly, indexes must be explicitly created, modified, or destroyed by a knowledgeable user because the absence of an index will cause unacceptable performance characteristics (e.g. interactive response time and message throughput) and the presence of an unused index causes unacceptable load on the system. Such users are very expensive because the amount of knowledge and experience necessary to balance on the edge of the ‘too few is bad, but too many is bad’ knife is significant, and hence the number of people who can do it are very rare, and in high demand.