As the quantity of electronically available information continues to grow, the ability of users to identify and retrieve relevant subsets of that information becomes ever more difficult. There are already several solutions in place, but none of them are entirely adequate.
Perhaps the most popular solutions are the Internet text search engines, including for example, Yahoo!™, Google™, and Microsoft™. These, along with numerous fee-for-service search engines such as Lexis™ and Westlaw™, index substantially every word in a document. While very useful in searching through large databases, such systems do have trouble filter out irrelevant contexts, especially where words have many meanings, (e.g., “chip” can relate to electronics, gaming, farming, timber, etc).
Some systems seek to improve the accuracy of text searching by attempting to discern the context intended by the author or searcher. For example, U.S. patent application 2005/0038781 to Ferrari et al, “Method and system for interpreting multiple-term queries” claims methods of interpreting queries by ranking candidate interpretations. Unfortunately, such systems can be incredibly complex, and still do not adequately accommodate the varying conceptual perspectives of different users.
Other systems seek to improve the accuracy of text searches through the use of keywords or other metadata. Unfortunately, addition of metadata generally requires human analysis, and is therefore expensive to implement. The Westlaw Key™ system was an early adoption of metadata, and proved very useful over the years. But the system has not always been expanded to keep pace with new concepts and distinctions, and the vast amount of effort involved in remaining current in even the legal field demonstrates the impracticality of that approach for a database that could cover essentially all fields. In addition, it is impossible for Westlaw, or any provider for that matter, to implement a formal taxonomy that would be equally suitable to all users. Any two people will necessarily view the very same information from different perspectives.
Still other systems focus on segmenting or otherwise parametizing the information. For example, Lexis™ segment searching classifies case law into segments, (opinion by; name, date, court, counsel, etc), and allows users to search for text within specified segments. In the product world, many retailers including Walmart.com™, Home Depot™, Kmart™, and Circuit City™, are reported to use the Endeca™ InFront™ software package to parametize their product offerings. The problem with these systems is that the classification systems are imposed in a top down format. Unless the system designer, operator or other agent updates the classification lists, the system is stagnant. And even if sufficient time is put into the system to make frequent changes, such systems can suffer from the fact that there is only a single classification (albeit possibly very complex), for any given item of information. The difficult fact is that there are often as many valid classifications as people viewing the data.
In my earlier patents, U.S. Pat. Nos. 6,035,294, 6,195,652, and 6,243,699, (which along with any other citations referenced herein are incorporated by reference in their entirety), I disclosed systems and methods for operating a self-evolving electronic marketplace in which substantially all goods and services could be described and located using sets of parameter/value pairs. That technology provided a self-evolving solution for classifying data, but only for structured databases, only for users to classify their own information, and only for information actually stored on the database. Indeed, one of the principal purposes of the technology was to implode the Internet by storing virtually all data on a single database.
The evolution in web pages and other non-structured data files went in just the other direction. Instead of allowing authors to tag data with their own metadata designators, various attempts were made to impose a top down structure of available metatags. The Dublin Core metadata project, (http;//www.dublincore.org), for example, has sought to persuade users to add metadata to their pages in a consistent, reliable way. But as with all top down system, the “official” metatagging system can never keep pace with the needs of a widely varied user base. Indeed, many users prefer a “folksonomy” approach, in which users are encouraged to develop and implement tags without strict adherence to any particular guidelines. The idea is that over time users will tend to adopt tags that are used most frequently by others, and that less frequently used tags will eventually fall by the wayside. Some electronic bulletin boards, such as Gather, allow individuals to record their own content, and to categorize the content according to both a hierarchical topics tree, and user-originated tags. A recent listing on www.gather.com, for example, showed 134 pages of entries for the topic of World Events|International Events, with one author tagging his text with multiple tags (valour, japan, second world war, bravery, history, war, marines, American marines, and politics), at least some of which may have originated with that author.
Thus, it is known for authors to classify their own works in both structured databases (e.g., bulletin boards) and unstructured files (e.g., web pages), and it is known for database providers to classify the work of others according to the database provider's own, often proprietary, classification system (e.g., Lexis™ and Westlaw™). In some cases the classifications are fixed, and in other cases the classifications can be modified by the users. But all of those solutions fall because they fail to account for the over-riding facts that there are many valid ways to classify something, and no one entity can figure out what would be useful for different people. What is still needed are systems and methods that encourage users to classify the same information with at least some of the same designators, in different ways to achieve inconsistent classifications. Ideally, such users could also add their own designators, and would be guided in selection of designators by historical comparisons of previous usage.