The present invention is related to data management systems. More particularly, the present invention is a collection management system that categorizes a group of objects, content or organisms for maximum accessibility. The physical or perceptual properties which are accessed are user defined.
Current data management systems and search engines are able to perform low-level sorts and superficial searches on a collection of data. For example, most search engines receive input from a user in the form of one or more search terms, and the search engine searches websites across the Internet for words which match those search terms. Although Boolean operators may be used to further define searches, ultimately the searches are simple word matching techniques, and classification methods are strictly hierarchical taxonomies built from classical categorization principles.
Terms are words used for some particular thing. Words, however, have a three-sided character: 1) conceptual—the meaning of a word; 2) physical—how to pronounce or read a word; and 3) syntactic—the grammatical context in which a word is used. When a word is spoken, the speaker is essentially unaware of anything but meaning. Yet, the search terms of current data management systems only use the physical side of the character of a word. Word matching and metatag labeling typify this approach. Reliance on the physical properties of a word impoverishes searching and limits data management systems to binary operations, in that either a word form is correct or it is not. Database inquiry is much the same.
Other data management systems attempt to create document summaries by extracting significant phrases via parts-of-speech taggers and simple grammar. Similar to word matching, significant phrases are identified by frequency information in the document or database. This approach concentrates on the syntactic character of a word. However, only when communication becomes distorted or confusing does a speaker pay any attention to grammatical context. A syntactic approach is also binary, in that a word is either correctly used in a sentence or not. Document summary techniques incorporate two sides of a word's character: physical and syntactic.
The simplistic nature of current search engines and data management systems is emphasized by the use and misuse of metatags. Metatags are terms that are used within a web site to increase the likelihood that a search engine will select a particular web site for presenting to the user. Since search engines look for words which match the search terms, those web sites having a higher frequency of matches will be noted as the most relevant of the search results presented to the user. Misuse of metatags occurs when website operators try to “drive” traffic to a website by repeating certain metatags hundreds, or even thousands, of times in order to drive that particular website toward the most relevant of the search results. Often the words used in the metatags bear little or no relation to the websites in which they are implemented. In this manner, search engines are misdirected to present websites which are not responsive to the search terms input by the users.
Categorization attempts to reduce the limitless variations of reality into manageable proportions. Current data management systems use classical theories of categorization to structure a database in terms of necessary and sufficient features for membership. If X is a member of Y, then the properties of X are compared to the essential features of Y. Knowledge of this set of features encompasses what is known of Y.
Classical categorization techniques, however, can be inadequate when required to handle a continuum of information that is not easily categorized. Similar to the current computer file/folder metaphor for desktop organization, classical categories are binary and comprise an all-or-nothing approach. Information is either present or it is not, and everything is potentially available since all information has equal status.
An alternative theory of categorization organizes the world by degree of membership. Known as prototypical categorization, it involves a “criss-crossing network of similarities”. Unlike classical categories that are strictly hierarchical, prototype categories have a core and a periphery. Core members share more attributes in common than more marginal members. Ludwig Wittgenstein in Philosophical Investigations (1945) anticipated the importance of prototypes in linguistic categorization when he used the metaphor of a “family resemblance” to describe the structure of the category “game”, where there is no common set of properties that define a game from a non-game.
Prototypical categories are useful because they more fully exploit the real-world correlation of attributes and are better able to handle a continuum of information. With prototypical categorization, new entries do not cause restructuring of the category system and marginal membership is permitted. Another advantage with prototypical categorization is that prototypes can change over time and are thought to contain cultural dimensions of meaning. Prototypes can be understood in two ways: either as a cluster of core members of a category, (i.e., focal exemplars); or as a representation of the conceptual core of a category. Exemplars are “good examples” of a particular range of something.
A well-known precedent for the usefulness of exemplars is established by color categorization. Humans are capable of seeing 7.5 million discernable color differences. However, approximately 11 universal “focal” colors reference this vast range of color. That is to say, eleven different exemplars, (basic color terms), maximize access to a very large collection.
Studies have investigated color categories in different languages and found that if a person is asked to name the range of a color, for example red, there is cross-language and within-language variability. The same person may even select different shades of red at different times. Such variability supports the notion that the assignment of word meaning is arbitrary.
However, such variability often disappears when a person is asked to select a “good example” of a basic color term. In that case, a high degree of agreement occurs. Therefore, paying attention to the denotational range of a color term highlights the language specificity of the terminology. Eliciting good examples of color terms highlights what is common between languages. These findings cast doubt on the idea that all linguistic signs are arbitrary. Exemplars represent a level of categorization that is cognitively and linguistically more salient than other levels of categorization.
Although current search engines and data management systems adequately operate, (though oversimplified and unrealistic as they may be), these systems are adept at managing discrete data. But reality-as-a-continuum becomes difficult to describe and categorize. The use of exemplars to structure and meaning could provide a powerful advantage over current data management techniques.