Inexpensive computer and networking technologies have made increasingly large quantities of digital content available electronically via wired and wireless networks, resulting in information overload as users have access to significantly more information than they can consistently and reliably locate.
Previously, physical content collections, such as private and public libraries of printed publications, including, but not limited to books, coins, maps, and drawings, have been managed by human librarians who have developed taxonomic structures describing the collection's content, and who then impose the structure on the physical collection by examining and assigning each content item to a relevant category or categories. While this manual process may be manageable when the content collection increases slowly, the current, rapid proliferation of digital content, both textual and multimedia, can overwhelm the editorial staffing of any content collection holder.
The resulting proliferation and commoditization of information search and retrieval technologies have created an increasing number of proprietary commercial data, media and text collections, independently indexed and maintained by content sources. These content sources have limited economic incentive to make their digital content fully accessible for indexing by public search engines and the public search engines attain more economic benefit by having the sources as advertisers than by providing search engine users with direct access to the actual content.
In addition to traditional content access via stationary computers, there has been an explosive proliferation of Internet access using mobile computing devices such as laptops, personal digital assistants (PDAs), and mobile telephones. This proliferation of mobile devices is markedly changing the nature of content availability as publishers reformat and reorganize their content for mobile Internet access.
While a desktop computer user can comfortably search for information, using multiple tries and browsing, mobile computing users are generally limited by small screen and input ergonomics, location-specificity, and their own mobility. Due to these constraints, mobile computing users are less likely to want to receive all possibly relevant results, and more likely to want specific information immediately.
The changing nature of content access by a mobile population plays a large part in increasing the value of information retrieval precision over recall with new search and retrieval processes emphasizing the highest possible precision for the first five to ten entries of the results set. For the same reasons, mobile users also require the shortest path to their desired content. Therefore, publishers have a greater incentive to organize their content in information architectures that facilitate access to groups or categories of content.
As alternatives to single publisher search engines or large-scale public search engines, federated searching across multiple content sources improves the chance that a user will get a relevant response to their query. However, content publishers may have organized their content using different information architectures or “taxonomies.”
Generally, a taxonomy may be a controlled vocabulary organized hierarchically to represent relationships between terms in the controlled vocabulary. A taxonomy category may be a labeled vocabulary term or group of related vocabulary terms. For example, a set of product vendor names might be the controlled vocabulary for a department store and the categories may be the names of the store departments (e.g. Shoes, Housewares, Appliances).
Different taxonomies can be created from the same controlled vocabulary, depending on how the vocabulary is grouped into categories and how the resulting categories are arranged with respect to each other.
Significant practical and commercial value has been provided by automated taxonomy development and classification technologies, the goal of which is to organize the information in a given content collection into groups of similar content, label and arrange each group appropriately, and display the group organization of greatest utility to a user accessing the collection.
The resulting proliferation of taxonomy management and classification technologies has generated an increasing number of public taxonomies, used primarily as navigation directories or “browsing search”, such as is found on the Yahoo!, Amazon, and eBay websites, to facilitate access to the proprietary content available from content publishers, centralized public search engines, or content aggregators.
The present invention relates in particular to methods and a system for an improved taxonomy management system which leverages pre-existing taxonomies and categorized content to automatically create, maintain, and manage new taxonomies with minimum effort and greater control by information architects and content publishers.