1. Field of the Invention
The present invention relates generally to the field of computer software and client-server applications. In particular, it relates to methods of accessing data in a distributed computing environment.
2. Discussion of Related Art
The accelerated growth of network computing in the 1990s has been accompanied by an increasingly prevalent form of communication most commonly referred to as e-mail. As more individuals, whether at home, in corporations, small businesses, academic institutions, or government organizations, have access to computers connected to some type of computer network, electronic mail is quickly becoming (and in many settings already is) a preferred mode of communication. People find e-mail an efficient and effective way to communicate whether they are sending a simple one-time message or carrying on a long-term discussion or conversation.
While e-mail has been used for years within large organizations such as corporations and universities for sending messages within the organization""s internal networks and is typically based on proprietary formats and protocols, the Internet is bringing e-mail out of the realm of large enterprises and into the mainstream. Since the Internet is a publicly accessible, global, computer network, it is increasingly being used for its e-mail capability. In addition, TCP/IP (xe2x80x9ctransmission control protocol/internet protocol), the Internet""s communication layers are being used to develop computer networks within private entities known as intranets based on TCP/IP instead of proprietary formats and protocols. This approach allows, for example, a corporation or university to have an internal computer network that is compatible with the Internet and has all the features of the Internet, including Web sites, the ability to hyperlink, and, of course, send and receive e-mail.
With regard to the e-mail, the explosive growth of the Internet and the growing attraction of intranets has led to a proliferation of e-mail messages. Typically, e-mail messages are received and stored on network servers or on the hard drives of client and stand-alone machines. There is a growing tendency or practice and, in many cases, need, to save e-mail messages electronically and to retrieve them easily when desired. For example, this can be important in any type of research setting where messages containing ideas, comments, or analysis are sent among researchers, possibly in different countries, over a period of several years. For example, it is foreseeable that a certain message sent at a particular time two years ago between two researchers who are no longer available, has to be retrieved. Of course, this capability could also be an important and useful feature in a business environment or in other settings.
The proliferation of e-mail and the increasing number of messages being saved, coupled with the growing demand for retrieving saved messages has exposed problems with current indexing schemes and message storage areas (message stores). There is a growing trend to save messages on servers instead of on client machines. A mail server acts as a central repository of messages and has the advantage of being backed-up regularly, maintained by an administrator, and of being repaired quickly (in most cases) when broken (e.g. when it crashes). Thus, when a user makes a request, it is handled by the server and delivered to the client.
The composition of an e-mail message today can vary widely as can the type of request. In a simple case, an e-mail message can contain, in addition to required headers, a simple message portion consisting of a few lines of text. On the other hand, an e-mail can have several attachments that may include complex graphics, audio segments, video, animation, text having special encoding (e.g. if in a non-Latin based language), and even other entire e-mail messages.
Requests for messages can also vary in terms of depth and breadth of information desired. For example, a request can be for the entire content of a single message sent several years ago between two individuals. Or, a request can be for a list of recipients of any message sent regarding a particular subject in the last 24 hours, the list distinguishing those recipients who have opened the message(s) and those who have not. In sum, the nature of e-mail messages and of requests for e-mail message data have grown more complex thereby exposing weaknesses of present mail servers in handling message storage and retrieval.
Most mail servers presently used for the type of message storage and retrieval discussed above are configured according to the Internet Message Access Protocol, or IMAP. The IMAP protocol is a collection of commands for manipulating messages and indexes for sorting and storing all the information associated with messages and actions performed on them. In order for an IMAP-configured server to take full advantage of the IMAP protocol, information related to users on the network and messages, which includes message content and meta data regarding the message, must be stored in a manner that takes advantage of IMAP indexing. While IMAP servers store data according to IMAP indexing to some degree, none do it in such a manner that optimizes quick, reliable, and non-contentious retrieval and storage of data.
Present IMAP servers experience contention problems and other inefficiencies resulting in poor performance. Although they handle message data as a collection of fields that make up a record, i.e. , they are record-based, writing a new message to a user""s inbox will very likely result in locking out the user from performing other operations in the inbox. The message store of these IMAP servers were not designed to efficiently utilize the indexing available in IMAP. For example, a user may only desire information regarding certain fields (e.g. date, recipients, subjects, etc.) from all messages in a mailbox. IMAP servers are likely to retrieve more information than is needed to satisfy typical user requests for data. Thus, in order to simply get the number of messages sent to a particular user regarding a specific subject, an IMAP server may read from disk the entire content of all the messages in order to derive the number of messages. Present IMAP servers also lack strong integrity and consistency checking capabilities possible in IMAP.
Others mail protocols and operating systems require that an entire message be read or copied regardless of what type of information regarding the message is being requested. For example, servers configured based on the Post Office Protocol (POP) deliver the entire message in its operations. This is similar to VARMAIL, an older file-based mail environment in the UNIX operating system, in which delivery of a message locked out all write operations to a mail folder. This default procedure caused the mail delivery system to be considerably slow. In addition, the VARMAIL environment also required multiple copies of the same e-mail message be stored in the client machine""s memory.
Therefore, what is needed is a method of accessing and manipulating mail messages in a server-based message store that minimizes locking contention and, by taking advantage of the organization of the message store, provides high speed access to mail messages. The methods should take advantage of the high level of indexing and message data provided in the message store to allow for memory-efficient, high-speed retrieval and manipulation of messages in a high-end user, high-volume, distributed computing environment.
To achieve the foregoing, and in accordance with the purpose of the present invention, methods, apparatus, and computer readable medium for accessing data in a message store are provided. In accordance with one aspect of the present invention, a method of accessing data in a message store in a multi-threaded system is disclosed. The system determines whether a process is available for accepting a new connection and responsibility for that connection is transferred to that process, which includes one or more threads. One thread is selected and initialized, and then manages client requests for accessing messages or data in the message store. The thread is terminated when a termination request is received or when a predetermined condition has been met.
In one embodiment of the present invention, a client connection is established by receiving a request from a client request queue. A process is selected is informed of the connection and retrieves information regarding the connection from a shared memory accessible by the selected process and other processes in the system. In yet another embodiment, a selected thread is initialized by allocating a cell in a shared memory for storing a process and thread identifier and associating the thread with an input polling thereby placing the thread in a wait state. In yet another embodiment, the selected process is alerted of the incoming data by an input polling thread and the incoming data is routed to the process and a thread within the process. In yet another embodiment, critical signals directed to a connection thread in the selected process is handled by a critical signal thread. The critical signal thread prevents the entire selected process from terminating (and all the connection threads within such process from abruptly ending) and shuts down in an orderly fashion only the connection thread that caused the critical signals.
In another aspect of the invention, a method of duplicating a message in a message store is disclosed. A reference counter in the message store associated with the message is updated to indicate that an additional user folder is referencing the message. Access to a destination user folder storing the message is limited while a duplicate user folder cell associated with the message is appended to a destination user folder, wherein the user folder cell contains information on the message but does not contain the actual contents of the message. In yet another aspect of the invention, it is determined whether there are other locks on the destination user folder before closing the folder. In yet another aspect of the invention, access to an index directory cell corresponding to the message is temporarily restricted while the reference counter is being updated.
In yet another aspect of the invention, a method of accessing a specific portion of a message contained in the message store including user folders, index folders, and data buckets is disclosed. A mail folder cell associated with the message is examined to obtain the location of a corresponding index folder cell. The index folder cell is then examined to obtain information needed to locate a specific section of the message, the message being contained in a data bucket. The specific section of the message is then retrieved. In another embodiment, the date the message was written to the mail folder cell is used to identify an index directory and an index file. An index directory cell is used to locate an index file cell associated with the message where the index file cell contains information leading to the specific section of the message being sought.
In another aspect of the invention, a computer system for accessing messages in a message store is disclosed. Client requests for connecting to a message store received and routed by a request router. The request router is connected to one or more multi-threaded request handlers, each handler having one or more connection threads. Associated with the request router is a shared memory containing request handler identifiers and connection thread identifiers. The shared memory is accessible by all the request handlers connected to the request router.
In one embodiment, the request router includes a request handler generator capable of creating new request handlers when it is determined that there are no request handlers available to maintain new connection threads. In yet another embodiment, the request router manages one or more of the request handlers connected to it. In yet another embodiment, each request handler includes an input polling thread for detecting an input event directed to an active connection thread in the request handler. In yet another embodiment, each request handler includes a critical signal thread for detecting critical signals directed to a particular active connection thread in the request handler. The critical signal thread terminates only the particular active connection thread that caused the critical signal and thereby keeps other active connection threads in the request handler functioning. In yet another embodiment, the shared memory allocated by the request router includes thread specific data cells associated with connection threads that contain request handler identifiers and connection thread identifiers