Businesses, governmental organizations and other entities are becoming increasingly interested in understanding the context of the information they possess to enable them to evaluate current circumstances and plan for the future. That is, they wish to gain knowledge from information to enable them to better achieve their goals. To this end, the field of knowledge management has emerged. Knowledge management tools allow users to organize, search and present information in a manner that provides a better understanding of the information (i.e., puts pieces of information in context of other information). This can provide the user with greater comprehension of the information. As one example, an entity that provides a technical support call-center may store individual “trouble tickets” for customer problems. Knowledge management tools allow users to search the trouble tickets to determine, for example, if the customer is having a common problem with a product, how many trouble tickets are entered in a specific time period and so on. In this case, knowledge management can put a single trouble ticket in the context of other trouble tickets, helping the entity identify common problems and solutions. Additionally, knowledge management tools can allow users to collaborate on projects by allowing them to share documents and reports. This can allow one knowledge worker to efficiently distribute knowledge to other knowledge workers.
One of the primary mechanisms for providing knowledge to end users is reporting tools that allow a user to aggregate information based on user criteria. The user can use these tools to generate ad hoc reports that correlate data according to the user's specifications. In these systems, the information is generally saved as structured data in a database. The traditional reporting tools use a formal query language over the structured data stored in a relational database. For non-technical end-users, the reporting tools typically provide some form of graphical user interface that provides a more user friendly mechanism for entering the formal queries. These systems, for example, may provide the user with a graphical representation that allows the user to graphically select which columns of various tables should be presented in a report. Based on the user input, these reporting tools generate queries according to the formal query language (e.g., SQL queries) to generate the report.
Returning to the example of a technical support call center, the data for trouble tickets can be stored using a structured database schema. If the user wishes to search for all trouble tickets related to voice over IP phones, for example, the user can enter a product type (assuming product types are defined in the database) and the reporting tool will convert this to a formal query (e.g., SQL query), issue the query and display the results. These tools operate over the structured data stored according to a fixed schema to return results. If the company wishes to change the data collected in trouble tickets, say adding a problem code, the database scheme must be changed to accommodate the new field. Additionally, the reporting tool will have to be reprogrammed to issue SQL queries seeking particular problem codes.
Reporting tools that rely on a well defined database schema suffer an additional shortcoming in that they are not well suited to dealing with unstructured or semistructured data. Unstructured data is data that has no predefined internal structure such as text document and image data. For example, a typical word processing document includes unstructured data since there is no predefined internal structure to the text and images embedded in the document. Semistructured data is a mix of structured data with unstructured data. The structured data is metadata such as the author, date, title and so forth that provide a loose structure for the data. The metadata fields may be structured or unstructured data; for example, date information can be represented in a way that has a predefined internal structure (day, month, year) whereas the title of a document has no structure other than being text. The metadata can also be static or dynamic. Static metadata is more like structured data in that it provides predefined elements of a data record that typically remain the same over records. Dynamic metadata, on the other hand, allows the definition of fields to change over records.
Traditional methods of database queries are extremely limited in their functionality for pattern matching against unstructured data. Information retrieval tools, on the other hand, provide matching against patterns in unstructured data, such as word matching. However, traditional information retrieval technologies do not allow for structured queries against dynamic metadata. Consequently, neither information retrieval tools nor structured database query tools provide a satisfactory mechanism for searching and retrieving data records structured using dynamic metadata.