xc2xa71.1 Field of the Invention
The present invention concerns techniques for enhancing the interaction between people and computers. In particular, the present invention concerns techniques for representing, filtering, classifying, and linking semantic data, as well as techniques for rendering semantic data in an intuitive way. Thus, the present invention basically concerns techniques for enhancing the way people find and access stored data.
xc2xa71.2 Related Art
xc2xa71.2.1 Migration from Data Creation and Processing, to Data and Information Access
The way in which people use computing machines has evolved over the last 50 or so years. Initially, these machines were typically used as information processors, and in particular, for performing mathematical operations on numbers. People interacted with such early computing machines by punching and ordering cards to effect a sequence of commands, then by setting switches and viewing light emitting diodes to enter commands, later by entering lines of commands, and finally by using a keyboard and mouse to manipulate icon metaphors of the real world.
To reiterate, early personal computers were typically used to perform mathematical operations, from engineering applications to accounting applications (e.g., spreadsheets). In addition, such early personal computers were used to enter, store, and manipulate information, such as with word processing applications for example, and to effectively access stored information, such as with relational database applications for example. However, in addition to using computers for data entry, storage, and manipulation, people are using computers to access information to an ever increasing degree.
In recent decades, and in the past five (5) to ten (10) years in particular, computers have become interconnected by networks by an ever increasing extent; initially, via local area networks (or xe2x80x9cLANsxe2x80x9d), and more recently via LANs, private wide area networks (or xe2x80x9cWANsxe2x80x9d) and the Internet. The proliferation of networks, in conjunction with the increased availability of inexpensive data storage means, has afforded computer users unprecedented access to a wealth of content. Such content may be presented to a user (or xe2x80x9crenderedxe2x80x9d) in the form of text, images, audio, video, etc.
The Internet is one means of inter-networking local area networks and individual computers. The popularity of the Internet has exploded in recent years. Many feel that this explosive growth was fueled by the ability to link (e.g., via Hyper-text links) resources (e.g., World Wide Web pages) so that users could seamlessly transition from various resources, even when such resources were stored at geographically remote resource servers. More specifically, the Hyper-text markup language (or xe2x80x9cHTMLxe2x80x9d) permits documents to include hyper-text links. These hyper-text links, which are typically rendered in a text file as text in a different font or color, include network address information to related resources. More specifically, the hyper-text link has an associated uniform resource locator (or xe2x80x9cURLxe2x80x9d) which is an Internet address at which the linked resource is located. When a user activates a hyper-text link, for example by clicking a mouse when a displayed cursor coincides with the text associated with the hyper-text link, the related resource is accessed, downloaded, and rendered to the user. The related resource may be accessed by the same resource server that provided the previously rendered resource, or may be accessed by a geographically remote resource server. Such transiting from resource to resource, by activating hyper-text links for example, is commonly referred to as xe2x80x9csurfingxe2x80x9d.
Thus, although people continue to use computers to enter information, manipulate information, and store information, in view of the foregoing developments people are using computers to access information to an ever increasing extent. Although the information people want to access might have been created by them (which would typically reside on the person""s desktop computer), it is often information that was not created by them, or even by a company or group to which that person belongs (which would typically reside on a storage server, accessible via a local area network). Rather, given the world wide breadth of the Internet, the information people want to access may likely be created by unrelated third parties (or content providers).
New user interfaces should therefore help people find information that they want, or that they might want. Unfortunately, the very vastness of available data can overwhelm a user; desired data can become difficult to find and search heuristics employed to locate desired data often return unwanted data (also referred to as xe2x80x9cnoisexe2x80x9d).
Various concepts have been employed to help users locate desired data. In the context of the Internet for example, some services have organized content based on a rigid hierarchy of categories. A user may then navigate through a series of hierarchical menus to find content that may be of interest to them. An example of such a service is the YAHOO(trademark) World Wide Web site on the Internet. Unfortunately, content, in the form of Internet xe2x80x9cweb sitesxe2x80x9d for example, must be organized by the service and users must navigate through a predetermined hierarchy of menus. If a user mistakenly believes that a category will be of interest or include what they were looking for, but the category turns out to be irrelevant, the user must backtrack through one (1) or more hierarchical levels of categories. In the context of personal computers, people often store and retrieve data using a fixed hierarchy of directories or xe2x80x9cfoldersxe2x80x9d. While a person who created their own hierarchy is less likely to mis-navigate through it, changes to the hierarchy to reflect refinements or new data or insights are not automaticxe2x80x94the person must manually edit the hierarchy of directories or folders. Further, if a particular file should be classified into more than one (1) of the directories or folders, the person must manually copy the file into each of the desired directories or folders. This copying must be done each time the file is changed.
Again in the context of the Internet for example, some services provide xe2x80x9csearch enginesxe2x80x9d which search databased content or xe2x80x9cweb sitesxe2x80x9d pursuant to a user query. In response to a user""s query, a rank ordered list, which includes brief descriptions of the uncovered content, as well as a hypertext links (text, having associated Internet address information (also referred to as a xe2x80x9cuniform resource locatorxe2x80x9d or xe2x80x9cURLxe2x80x9d), which, when activated, commands a computer to retrieve content from the associated Internet address) to the uncovered content is returned. The rank ordering of the list is typically based on a match between words appearing in the query and words appearing in the content. Unfortunately, however, present limitations of search heuristics often cause irrelevant content (or xe2x80x9cnoisexe2x80x9d) to be returned in response to a query. Again, unfortunately, the very wealth of available content impairs the efficacy of these search engines since it is difficult to separate irrelevant content from relevant content. In the context of files stored on a personal computer, computer programs such as Tracker Pro from Enfish, Inc. of Pasadena, Calif., Alta Vista Discovery from Compaq, Inc. of Houston, Tex., and Sherlock, from Apple, Inc. of Cupertino, Calif. permit people to organize, filter, and search files on their personal computer. Unfortunately, it is believed that these programs merely group (or organize) and cross-reference files based on a word or phrase (or xe2x80x9ctrackerxe2x80x9d) in the files. Thus, the name (or xe2x80x9ctrackerxe2x80x9d) xe2x80x9cJohn Smithxe2x80x9d might group word processing files (e.g., letters, memos, etc.) having the name xe2x80x9cJohn Smithxe2x80x9d, e-mail files to, from, or having a message containing xe2x80x9cJohn Smithxe2x80x9d, etc. These programs are believed to be too unsophisticated to derive a higher meaning (e.g., what was the purpose of the e-mail to John Smith) from the computer files. These programs can filter files based on simple criteria such as the file type or date, but are believed to be too unsophisticated to filter files based on some higher meaning (e.g., all e-mail that scheduled a meeting to discuss project X and attended by John Smith.). Similarly, these programs can sort files based on a simple property such as file name, file type, date file was created or modified, file location, file author, file size, etc., but are believed to be too unsophisticated to sort, or classify files based on some higher meaning.
In the foregoing, the term xe2x80x9cinformationxe2x80x9d referred to content, such as text files, audio files, image files, video files, etc. However, information can be more broadly thought of as actions taken by a user or users, or as tasks performed by a user or users. For example, content type information may be a text file of a movie review, while a task may be actions taken by a person to plan a date with dinner and movie. Thus, users may want to perform tasks that they have already performed in the past (such as scheduling a meeting, for example), or tasks that are similar to those that they have performed in the past (such as scheduling a meeting with different attendees at a different location, for example), much as they may want to revisit a favorite Internet site.
xc2xa71.2.2 Information Storage and Access Utilities
A number of utilities provide a foundation for storing, locating, and retrieving information. Such utilities may be provided for searching or for filtering information, classifying information, and relating (or linking) information. Such utilities are introduced below.
xc2xa71.2.2.1 Searching for (Filtering) Information
As discussed in xc2xa71.2.1 above, hierarchical directories and search engines are available to help people navigate to desired information. Forms may be employed to restrict the type or range of the information returned. For example, a prospective home buyer may want to search for homes with at least three (3) bedrooms, but under $200,000. Alternatively, natural language query engines may be employed to restrict the range of information returned. Further, information may be sorted to help people navigate to desired information. For example, the Outlook(trademark) contact management program (from Microsoft Corporation of Redmond, Wash.) allows users to sort a list of sent e-mail messages by date sent, recipient, importance, etc.
In each of the foregoing examples, the searching (or filtering) techniques were for searching content type information (such as a text file of a movie review, for example), not task or action based information (such as planning a date with dinner and a movie).
Further, in each of the foregoing examples, information must either be arranged in a predetermined hierarchy or searched or sorted using some indices or fields. Unfortunately, arranging information in a hierarchy requires that the information be classified, either manually or automatically. Further, the efficiency with which information can be located using a hierarchy may depend a lot on predefined classes and sub-classes. Moreover, searching or sorting information using some indices or fields does not consider relationships among the fieldsxe2x80x94such relationships may be useful for focusing the search. Thus, improved searching utilities are needed.
xc2xa71.2.2.2 Classifying Information
Information is often classified so that it may be more easily found when needed later. For example, books may be classified based on the Dewey-decimal system, files may be classified by an account number, etc. As discussed in xc2xa71.2.1 above, web pages may be arranged-in a classification hierarchy, and computer files may be arranged in directories or folders. Such simple classification schemes work well when the type of information so classified is constrained (such as only books, only invoices, only business telephone directories), but may become useless when trying to classify across different information types. Further, such simple classification schemes may become cumbersome when extensive cross-classification-referencing is required.
As mentioned in xc2xa71.2.1 above, the efficiency with which information can be located using a hierarchy may depend a lot on predefined classes and sub-classes. The classes and sub-classes may be defined manually, or automatically, such as by using clustering algorithms for example. Manually defining classes takes time and expertise to be done well. Automatically defining classes is often done based on features of the information, but typically will not consider relationships between such features. Thus, improved classification utilities are needed.
xc2xa71.2.2.3 Relating (Linking) Information
In addition to navigating to desired information, people may be interested in the relationship(s) between different pieces of information. For example, two (2) restaurants may be related by the type of food they serve, a review rating, a price range, their location, etc., or two (2) meetings may be related if they occurred in the same room, occurred at the same time, had at least some common attendees, were scheduled by the same person, etc. Information may be referenced by a list of features, also referred to as a xe2x80x9cfeature vectorxe2x80x9d. For example, a textual file may be represented by a list of the number of times certain words appear in the file. If information is represented by a feature vector, relationships between various information may be inferred by finding common features in the feature vectors of the various information. Unfortunately, defining exactly what a common feature is, is somewhat subjectivexe2x80x94that is, determining whether features must exactly match or just fall under a more general common category is subjective. Furthermore, relationships between different features are not considered when determining relationships between the information represented by the features. Thus, improved utilities for uncovering the relationships between information are needed.
xc2xa71.2.3 Representing Usage Data
Information, such as usage data for example, may be represented in many different ways. As mentioned above, information may be represented by values associated with a list of features (or a xe2x80x9cfeature vectorxe2x80x9d). Usage data may be represented by a xe2x80x9cclick streamxe2x80x9d; that is, as a stream of discrete user inputs. Alternatively, information may be represented as related entities. An example of such a semantic representation is an entity relationship diagram (or xe2x80x9cERDsxe2x80x9d). Entity relationship diagrams were introduced in the paper, Peter Pin-Shan Chen, xe2x80x9cThe Entity Relationship Model-Toward a Unified View of Data,xe2x80x9d International Conference on Very Large Databases, Framingham, Mass. (Sep. 22-24, 1975), reprinted in Readings in Database Systems, Second Edition, pp. 741-754, edited by Michael Stonebraker, Morgan Kaufmann Publishers, Inc., San Francisco, Calif. (1994).
xc2xa71.2.3.1 Limitations Imposed When Analyzing Only Usage Statistics
Without any semantic information, relationships among users"" activities cannot be determined with certainty from usage logs. Log files consisting of click stream information may be used to answer simple questions such as xe2x80x9cwho accessed a file?xe2x80x9d, xe2x80x9cwhen did they access the file?xe2x80x9d, xe2x80x9cfrom where did they access the file?xe2x80x9d, and xe2x80x9chow many users accessed the file yesterday?xe2x80x9d. However, questions relating to xe2x80x9cwhyxe2x80x9d the file was accessed are more difficult. For example, knowing that users transition from URL=52 to URL=644 more frequently than to URL=710 gives an association related to frequency. (URLs, or Uniform Resource Locators, serve as unique indexes to content on the Internet.) However, this analysis gives no indication as to xe2x80x9cwhyxe2x80x9d users made these particular page transitions. A more xe2x80x9chumanxe2x80x9d understanding can be gained through interpreting semantic relationships. For example, semantic relationships may reveal that URL=52 is a sports news web page, URL=644 is a web page post team results, and URL=710 is a link to local weather.
xc2xa71.2.3.2 Limitations Imposed when Analyzing Only Schema
xe2x80x9cSchema analysisxe2x80x9d loosely refers to analyzing authored metadata, schema, and instance data. Schema analysis may be used to model semantic associations and instance frequencies but contains no application usage information.
The problem with a strict schema analysis approach is highlighted by the following example. The entities xe2x80x9cRedmond Cineplex-8xe2x80x9d and xe2x80x9cThe Amigone Funeral Parlor of Redmondxe2x80x9d are both businesses in Redmond and might be connected on a graph representing a city directory schema by, for example, four (4) links (relationships). On the other hand, the link separation between a particular person and his most recent e-mail instance in an e-mail application schema could be, for example, twenty (20). As shown by these examples, schema distances (as defined by the number of relationships separating two (2) entities) may be uncorrelated with typical usage.
Therefore, neither of these two (2) approaches (that is, statistics and schema), taken alone, can lead to a deep understanding of users"" tasks and goals and/or of data generated by users. Thus, a new framework for representing and analyzing information, such as computer usage information for example, is needed.
xc2xa71.2.4 Unmet Needs
A goal of the present invention is to make information more relevant to users and to make it easier for applications to build reusable and broadly deployable services. To this end, users and applications may obtain, modify, monitor, and annotate information. Another goal of the present invention is to support and offer services that facilitate mapping higher-level semantic interactions to physical data, to thereby liberate users and applications from understanding detailed schemas and the query languages of specific information stores and foreign applications. Consequently, cognitive work required by users to map their mental model of tasks and goals into the framework of applications will be reduced. That is, higher-level semantics will allow users to interact with applications in a more natural way.
Conversely, richly annotated data describing the user""s interactions allow models that connect intention with action to be discovered and constructed. Viewed as semantic annotations, these models are available for broadly deployed services whose purpose is to make information more relevant to users.
A further goal of the present invention is to help build models based on usage information. Such models will help applications behave in a more intelligent, personal, predictable, and adaptive manner. The present invention provides automated searching, classifying, linking, and analyzing utilities which surpass, or work well with, heuristics. These situations include cross-domain or application modeling and situations where adaptive models out perform static models.
Thus, it is a goal of the present invention to provide machine understandable representations of users"" tasks and/or of data generated by users so that computer performance and human-computer interactions may be improved.
The present invention defines a pattern lattice data space as a framework for analyzing data in which both schema-based and statistical analysis are accommodated. In this way, the limitations of schema-based and statistical analysis, when used alone (Recall xc2xa71.2.3 above.), are overcome.
Since the computational complexity of representing all possible permutations of related entities on a pattern lattice data space becomes enormous, the present invention may function to manage the size of the lattice structures in the pattern lattice data space. That is, in most instances, only sections or fragments of the pattern lattice data space are generated.
The present invention may also function to classify or cluster, search (find similar data), or relate data using lattice fragments in the pattern lattice data space.
The present invention also defines a superpattern cone or lattice generation function which may be used by the classification and clustering functions. In addition, the present invention defines a subpattern cone or lattice generation process which may be used by the search (find similar data) and data relating functions.
The present invention may also function to label, in readily understandable xe2x80x9cpidginxe2x80x9d, categories which classify information.
Finally, the present invention may function to provide a user interface to cluster, classify, search and link usage information.