With the advent of personal computers and the proliferation of other digital electronic devices, the amount of digital data has grown significantly in recent years. It is not at all uncommon to have databases containing hundreds of thousands or even millions of documents. In many business and scientific settings, one needs to generate a list of common themes or topics contained in a corpus of documents. Despite many sophisticated text clustering and taxonomy generation algorithms available today, it continues to be a difficult problem to produce worthwhile results for many types of data.