Web communities are web virtual spaces where people can freely discuss and provide comments on essentially any topic and view other people's discussions and comments. An example of a web community service is a Usenet. In general, a Usenet is a World Wide Web (WWW) bulletin board that can be accessed through the Internet and many online services. The typical Usenet comprises thousands of forums called newsgroups, which commonly are utilized for community discussions. In particular, people generally utilize newsgroups to post questions and/or answers or partake in discussions.
Many users who interact with online communities (e.g., Usenets) passively interact with such communities by browsing and/or searching archived discussions (e.g., collections of related information) rather than directly participating in discussions. Thus, searching discussions is highly desirable. However, when performing a text search over a collection of documents it is often not enough simply to return all documents that include specified search terms. For instance, if querying with the search terms “Disney vacations” on the World Wide Web, a randomly ordered list of all documents containing the two words “Disney” and “vacations” will likely have little utility to the querier. In order to provide search results more desirable to the querier, many search utilities employ techniques that filter search results. For example, such techniques can be utilized to determine whether a document is likely to be desirable to a user and/or query.
Many of these ranking techniques take into account one or more factors such as, for example, search term proximity, search term frequency and metadata. For example, with term proximity, given search results that include the search terms “Disney” and “vacations,” a document wherein the search terms are closer in proximity (e.g., contiguous) can be rated more desirable than documents wherein the search terms are separated by more terms, longer length terms, additional punctuation between search terms, particular terms, etc. With search term frequency, a document in which a search term appears more often can be deemed more desirable (and given a higher rating) than a document wherein the search term appears less often. Metadata can be utilized to indicate search term characteristics within a document that may be important to the query and/or querier. For example, metadata can be utilized to determine whether a search term is located within a document title and/or is specially formatted (e.g., bold font and large font size large relative to the rest of the document), and/or whether one or more other documents are linked to the document.
Although such techniques commonly are utilized with collections of documents, they do not map over very well to the domain of community archives. For example, compared to web pages, newsgroup articles typically are shorter and do not have rich mark-up (e.g., Usenet postings typically are formatted as plain ASCII) that can facilitate determining query and/or user importance. In addition, newsgroups generally have a very different topological relation to other messages in a collection, rendering cues such as inlink-derived PageRank analysis and anchor text virtually impossible to utilize in this context.