Various approaches to providing computer-generated news Web sites exist. One approach aggregates article headlines from news sources worldwide, and groups similar articles together based upon shared keywords. In some cases, the articles may be grouped into a handful of broad, statically defined categories, such as Business, Sports, Entertainment, and the like. Such approaches may not be effective at grouping articles that are related to more fine-grained concepts, such as individual people or specific events.
Other approaches may use traditional clustering algorithms, such as k-means or hierarchical clustering, to group articles based on keywords. Typically, a k-means approach will group articles into a predetermined number of clusters. In the news context, it may be difficult to determine the correct number of clusters a priori. Thus, the k-means approach may yield clusters that are over-inclusive, in that a cluster may include articles that are not particularly relevant to an event described by other articles in the cluster. Similarly, k-means may yield clusters that are under-inclusive, in that a cluster may exclude an article that is relevant to an event described by other articles in the cluster. Alternatively, hierarchical clustering approaches may be used to determine and present a hierarchy of articles. As with k-means clustering, some clusters generated by hierarchical techniques will be under- or over-inclusive. For example, clusters near the top of the hierarchy will tend to include many articles that have little to do with one another. Similarly, clusters near the bottom of the hierarchy will tend to leave out potentially relevant articles.