Organizations are generating and collecting an ever increasing amount of data. Data may be directly or indirectly generated from disparate parts of the organization, such as, consumer activity, manufacturing activity, customer service, quality assurance, or the like. For various reasons, it may be inconvenient for such organizations to effectively utilize their vast collections of data. In some cases the quantity of data may make it difficult to effectively utilize the collected data to improve business practices. In other cases, the data collected by different parts of an organization may be stored in different formats, stored in different locations, organized arbitrarily, or the like. Further, employees within the organization may not be aware of the purpose or content of the various data collections stored throughout the organization. Accordingly, it may be difficult to discover relevant relationships such as similarity among portions of the data collections. Thus, it is with respect to these considerations and others that the invention has been made.