Modern document management systems employ computers and storage devices to store and track electronic documents, images of paper documents, and other types of digital content. Typical functions of electronic document management (EDM) involve the creation, storage, organization, transmission, retrieval, manipulation, update, and eventual disposition of documents to fulfill an organizational purpose. A successfully implemented document management system (DMS) can improve communication among people and groups of people, which is especially valuable to large enterprises where a huge amount of information is documented and shared among a number of internal and/or external users on a continuing basis.
Many aspects of document management (e.g., organization and retrieval) rely on document metadata which are descriptive or informational data concerning various aspects of underlying documents. With respect to each document, a corresponding set of metadata usually provides information about the document itself, its change or versioning history, related users, storage location, access or distribution restrictions, and any other information that might facilitate the understanding, use, or management of the document. One of the most common uses of document metadata is to facilitate document retrieval. For example, a single document may be directly retrieved from a DMS if a user provides a unique identifier of the document such as a document number. A list of documents may be retrieved by running a structured query language (SQL) search, which typically identifies relevant documents whose metadata match the specified search criteria.
While a typical DMS can automatically generate some metadata for a document (e.g., detect its application type and assign a document number), a significant portion of the metadata is created by a human user who creates the document and/or other users who have entitlement to access the document. For example, when creating a new document in a DMS, a user (e.g., author or typist) may be prompted to enter information regarding the document, thereby establishing an initial set of metadata for that document. Other users who are authorized to make changes to that document may subsequently update it or create newer version(s) thereof, thereby changing the metadata of that document.
FIG. 1 shows a typical “New Document Profile” form that a DMS user may be required to fill in when creating a new document. The “New Document Profile” form may include a number of metadata fields which may be populated with text entries, pop-out menus, or pull-down menus. As shown, the metadata fields cover several aspects of document information, such as basic document identification (e.g., Document Title, Client/Subject, and Matter/Transaction), creator information (e.g., Author ID, Typist ID, and Dept./Team), storage and retention (e.g., Database Location and Length of Retention), security and access control (e.g., Sharing and Security Level). A similar form associated with an existing document, such as a “Document Profile” or “Document Properties” form, may be displayed to an authorized user to make changes to one or more properties of the document.
The above-described human involvements in the creation or modification of document metadata can cause problems for subsequent document retrieval. For various reasons, each individual user of a DMS may have his or her own personal preferences or habits in describing documents. That is, with respect to a same document, one user may choose a set of metadata (e.g., text strings and menu items) to describe that document which are substantially different from another user's choice of metadata. For example, one user may be in the habit of using a set of aliases and/or acronyms in document title fields that are unknown or make no sense to another user. Even when two users happen to choose the same text string or menu item in a metadata field, the text string or menu item may have quite different meanings for the two users. Take, for example, a document related to credit account collections, which should have been categorized under the subject matter of “Account Recovery.” However, a first user may categorize that document under “Credit Risks” because this user considers almost all credit card matters as appropriately related to credit risks. Yet, a second user may categorize that same document under “Delinquencies” because collections logically follow delinquencies. A third user may even categorize the credit account collections document under “Customer Relations” for good reasons.
In many instances, the differences in choosing or entering document metadata arise from human laziness. For example, in order to avoid populating several metadata fields every time a new document is created, a tardy user may keep choosing substantially the same set of metadata to describe different documents, varying the metadata only slightly (e.g., in the title field). Over time, this kind of practice will generate a large number of documents with essentially the same combination of metadata, making it difficult for a traditional SQL query to distinguish one document from another.
Similar examples of user-specific naming patterns abound.
As a result, while a user who knows his or her own preferences or habits may have no trouble retrieving his or her own documents, other users might not be able to locate those documents with traditional SQL queries formulated based on those users' own understanding or interpretation of keywords and categories. This can become especially troublesome when one employee who has unique ways of naming documents leaves a company. Colleagues of that employee or successors to his position may have no clue where he has effectively “hidden” his documents.
Currently, there are no known adequate solutions for document retrieval problems caused by user-specific naming patterns. One natural approach is to simply broaden a document query to make it over-inclusive. However, without any knowledge of a prior user's unconventional ways of naming documents, it is difficult, if not impossible, to know how broadly to expand a search to ensure coverage of that prior user's documents. If the search is overbroad, it may return a list of hundreds of documents. Similarly, it is inefficient to attempt to browse through all documents created by a prior user. Besides, if the prior user named a relevant document in a cryptic way (e.g., using acronyms and shorthand spellings), a subsequent searcher probably will not recognize it as a hit simply by glancing at the document title.
Another approach is to always conduct a full-text content search in combination with or in addition to the usual metadata search. However, full-text searches are slow, wasteful of system resources, and unfeasible for image files or other non-text formats.
Yet another approach is to further break down each metadata field and supply as many standardized menu items as possible for users to choose from. This approach also suffers from several drawbacks. First, it is often difficult to anticipate all possible kinds of documents and provide standardized menu items in advance and down to the most granular level. Second, increased selection of menu items can slow down both document creation and query formulation. Third, there is a limit as to how many menu items an ordinary user is willing to browse through. Over-granulated menus can be confusing for ordinary users who may be reluctant to spend the time to pick the most appropriate menu item.
In view of the foregoing, it may be understood that there are significant problems and shortcomings associated with current document management technologies.