1. Field of the Invention
The present invention relates to computer hardware and software, and more particularly to a system and method for maintaining a history database of newsfeeds to a Usenet server.
2. Description of the Prior Art
Usenet is a worldwide collaboration network of servers that support newsgroups. There are many thousands of newsgroups, each covering a particular topic of interest to users. Each server administrator can decide which newsgroups to support, usually based on the requests of the local users who wish to read and contribute material to particular newsgroups. Postings on newsgroups can consist of any form of digital data, but are referenced herein generically as xe2x80x9carticlesxe2x80x9d or xe2x80x9cnews articles.xe2x80x9d
In contrast to the World Wide Web, Usenet is a forum that allows many people to collaborate with many of their peers in the same interest group. Instead of a single user downloading a web page, a Usenet participant can observe a thread of discussion from many different people. When a Usenet participant posts an article, that article will similarly be accessible to each of the participants in the newsgroup.
Usenet is also distinguishable from e-mail transmissions and chat rooms. Newsgroups allow readers to choose their topics of interest. Unwanted articles and messages do not clutter mail in-boxes. As people post new articles or respond to previous articles, those new postings get added below the prior articles to form a stream of discussion.
When a new article is posted on a newsgroup, the originating server sends a copy of the article to each networked server that has requested xe2x80x9cnewsfeedsxe2x80x9d in that particular newsgroup. Since Usenet is a worldwide network, it is possible that a new article could be copied thousands of times and migrate to distant servers. Many Usenet servers are networked to multiple servers, and might therefore receive newsfeeds containing the same article from different sources.
Usenet is generally transmitted using a protocol called NNTP (Network News Transfer Protocol). Special newsreaders are required to post, distribute, and retrieve Usenet articles. Newsreaders are widely available in freeware, shareware, and commercial versions, and are included in certain versions of Microsoft Internet Explorer and Netscape Navigator.
Internet Service Providers (ISPs) have been under popular pressure to provide access to Usenet. The volume of news feeds to servers has increased dramatically, resulting in difficult technological challenges for ISPs to maintain appropriate levels of service to users. High performance servers are now required, along with innovative algorithms, in order to handle the volume of articles that are posted on the various newsgroups.
One of the most difficult challenges relates to a system and method for maintaining an index of articles that are stored on a particular server. This index is herein called a history database because it is a database that maintains a historical record of articles that have been offered by newsfeeds as downloads to the server. As previously mentioned, a single server may be receiving newsfeeds from dozens or hundreds of other servers. Each newsfeed sends a steady stream of queries regarding the status of newly posted articles. If the article is not yet resident on a local server, the newsfeed will download the article so that each local server has an updated discussion thread.
Servers must continuously find storage space for the new articles that arrive through its newsfeeds. Once the storage capacity of a server is filled, the alternatives are to add another storage device to the server, or to delete older news articles or less popular newgroups. Due to the expense of adding large amounts of storage, the usual practice is to delete older news articles, as appropriate, to free storage for the new incoming articles. The history database is updated continuously to reflect these changes.
Since articles are passed from server to server in unpredictable ways, it is common to have the same article offered by multiple newsfeeds. Therefore, it is important for each news server to maintain a history database of articles that are currently resident on the server. In that way, servers can refrain from continuously downloading articles that have already been provided by other newsfeeds. However, the process of reading, writing, and maintaining such a database has been a challenge to software engineers. What has been desired by server administrators, and provided by the present invention, is a system for maintaining a history database in a Usenet server that allows continuous high speed, low latency access for read and write operations.
A system for storing and operating a history database is disclosed that is particularly suited to Usenet servers. The history database is thread-hot, synchronized, and highly parallel. In addition, the database structure enables high speed read/write activity with low latency search processes. The database is statically sized, self-expiring, and self-repairing. No throttling or down-time is required in the normal course of operations.
The invention comprises a xe2x80x9ckey-valuexe2x80x9d database, several pointers, linked lists, locks, and queues. Portions of the structure are sometimes known as a xe2x80x9chash table on disk,xe2x80x9d although the present invention is an improvement on such previously known data structures. All of these elements are arranged to operate in a synergistic manner to achieve a highly efficient history database. Under normal conditions, most of the queries from newsfeeds can be satisfied from a cache of the latest history database entries because many of the newsfeeds will offer the same articles as the other newsfeeds. The same cache also provides space in which to store and aggregate the latest additions to the database such that both xe2x80x9creadxe2x80x9d and xe2x80x9cwritexe2x80x9d operations to the disk are optimized.