1. Field of the Invention
This application relates to vector-based information storage and retrieval systems. More particularly, this application relates to a system for extracting Human Generated Lists from an electronic database and constructing a relationship network from the Human Generated Lists that can be utilized in order to return objects related to a user query.
2. Description of the Related Art
Phrase based or keyword searching is a common method of searching used for electronic data. Keyword searching searches throughout an information database for instances of the words in the search query. Keyword searching does not, however, give results based on relevance; search query results often include items with no relevance or relationship to one another other than the instance of a word in the search query. For example, a user intending to search products by the technology company Apple may enter the search query “Apple.” The search results, however, would likely include items relating to the apple fruit, songs by the music label Apple, and so on. Consequently, the search query results of phrase based searching often have nothing in common with the user's search intent.
Search methods which relate one object to another object are often used in place of keyword searching in order to provide search query results relevant to the searcher's intent. Such relationship-based search methods vary widely and range from precise to general catch-all approaches. Methods relating text objects can vary widely in precision and approach, quality and quantity. For example, Caid et al., in U.S. Pat. No. 5,619,709, titled “System and Method of Context Vector Generation and Retrieval” relies on context vector generations and dated neural network approaches as opposed to more advanced auto-associative approaches. Weissman et al, in U.S. Pat. No. 6,816,857, uses methods of distance calculation to determine relationships for the purpose of placing meaning-based advertising on websites or to rate document relevance in currently used search engines.
These relationship based searches do not, however simulate the process that a human would use in analyzing relevant information to relate objects with one another. Starting with an object of interest, a researcher typically researches within certain contexts and forms relationships between information gathered during the process of reading and analyzing literature. During this flexible process, the context of interest may change, become refined or shift and take on a new direction depending on the information found or thought processes of the researcher. After the researcher finishes the research process, he is left with a valuable collection of information that is related to a specific theme or context of interest. For example, if the researcher's object of interest was a period of music and the context was the Baroque style, then a researcher might relate compositions to one another, compositions to a composer, compositions to a geographical location or time period. Common relationship-based searches do not simulate this process because they are both inflexible and non-interactive; they neither allow a user to define and control the context and individual relationships during the search, nor do they allow for the quality and quantity of relationships to be determined and visualized interactively by the user.
Furthermore, these searches do not take advantage of relationship information intrinsic to certain types of documents, such as a Human Generated List (HGL). HGLs are collections of non-randomly ordered objects compiled by humans. For example, a compilation CD contains a collection of songs that the creator believed were related in some way. The relationship in this example may be that all the songs are performed by the same artist or of the same genre. Such an HGL contains intrinsic intelligence because the objects in the HGL were chosen based on an existing relationship known at least to the creator of the list. Documents containing this type of intrinsic intelligence may provide more valuable relationship information than other documents.
However, in the absence of large-scale collections of such documents, analysis is not statistically meaningful. With large-scale collections, relationships become reinforced and context may be contained within the collection. With HGLs, large-scale collections were not practical prior to the appearance of HGLs on the internet and in other electronic forms, a relatively recent phenomenon. It is now common to find web pages containing lists of different individuals' favorite movies in a particular genre, music playlists created for an electronic media player, or other HGLs. Existing searches do not effectively identify these HGLs, and do not determine the quantity and quality of relationships between objects in these HGLs.
Existing analysis on HGLs is generally confined to limited analysis of formatted lists. For example, an internet website may ask users to rate or rank movies, and then may compare the user ratings to make recommendations. However, these applications don't reveal hidden and non-obvious relationships. These systems also do not take advantage of HGL content available in non-standard formats, which is easier to acquire than formatted data. Consequently, these systems require a substantial amount of work be performed by users before the information can become relatively useful.