1. Field of the Invention
The present invention relates generally to repositories of data, and more particularly to organizing repositories of semi-structured documents such as email.
2. Description of the Related Art
Computer users increasingly generate repositories of semi-structured documents such as emails and Web pages. Such documents are stored in folders, and the folders can be arranged in a tree-like hierarchy. The documents, however, are not considered part of the hierarchy. A document can be present in more than one folder.
In the case of emails, many users simply receive these documents into a single inbox, and the inbox can become quite full. It will readily be appreciated that with hundreds and perhaps thousands of emails that many users now receive, it is important to organize the email repository, e.g., by subject matter or other convention, so that a user can efficiently manage the documents.
With this in mind, some email systems such as Lotus Notes7 permit a user to create a repository folder structure. Specifically, a user can define a folder hierarchy, name the folders, and then move documents into and out of folders as desired to keep the repository effectively arranged. Unfortunately, this requires that the user initially to manually move each document as appropriate into the various folders and sub-folders, and then continue to move new documents as they arrive from, e.g., the mail server, into the appropriate folders. Accordingly, while the resulting structure is an effective document management tool, in that the documents are arranged as desired by the user, considerable time and effort must be spent by the user to arrive at the desired organization and to then maintain it.
Accordingly, the present invention recognizes a need to provide automation in defining and populating an organizational structure of document folders. Furthermore, the present invention recognizes a need for allowing a person to interactively define and populate, with ease and efficiency, an organizational structure of document folders.
The invention is a general purpose computer programmed according to the inventive steps herein to organize document folders in response to classification indicia provided by a user. The invention can also be embodied as an article of manufacturexe2x80x94a machine componentxe2x80x94that is used by a digital processing apparatus and which tangibly embodies a program of instructions that are executable by the digital processing apparatus to execute the logic disclosed below. This invention is realized in a critical machine component that causes a digital processing apparatus to perform the inventive method steps herein.
In accordance with the present invention, a computer includes at least one computer input device and means for receiving, from the input device, at least one signal representative of user-selected document classification indicia. Also, the computer includes means for determining a profile of at least one folder based on the user-selected document classification indicia.
In a preferred embodiment, the user-selected document classification indicia includes at least one sample document representing a user-desired example of documents in a user-selected folder. Also, the document classification indicia can include classification rules.
As disclosed in detail below, the computer preferably includes means for receiving one or more folder establishing signals from the input device. As intended by the present invention, the document classification indicia represent a user-desired profile of at least some folders. Means are provided for automatically moving one or more documents into the folders, based on the means for determining a profile.
The preferred means for determining a profile includes means for determining, for each document, respective folder probabilities. Each folder probability represents the probability of the document fitting the profile of the respective folder. Also, means define a destination of a document to be the folder associated with the highest folder probability, and means further define a confidence of a document properly being in a destination to be the ratio of the highest folder probability for the document to a second-highest folder probability for the document.
If desired, the process of learning a user""s desired folder profile can be iterative. Accordingly, means automatically move one or more test documents into the folders, based on the means for determining a profile, with means then presenting the test documents, along with the associated folders, on a computer output device that is associated with the computer, such that a user can observe the test documents with folders and determine the efficacy of the profile determination. The user can add more sample documents or move test documents between folders to refine the profiles. Desirably, the additional user-defined document classification indicia can be requested to help refine the profile learning step. Additional applications of the invention include generating a folder, based on the user-defined document classification indicia, and grouping documents into sub-folders, based on the user-defined document classification indicia. The documents can be email documents, and the computer can route incoming email documents into one or more folders, based on the user-defined document classification indicia.
The present invention can also automatically organize folders if requested to do so by the user by discovering topics in the documents.
In another aspect, a computer-implemented method is disclosed for organizing semi-structured documents such as email documents in a database into one or more folders. The method includes receiving one or more sample documents from the user. The sample documents are a relatively small subset of the documents in the database. Also, the method includes automatically associating substantially all of the documents in the database with one or more folders, based on the sample documents.
In still another aspect, a computer program device includes a computer program storage device that is readable by a digital processing apparatus, and a program means is on the program storage device. The program means includes instructions that can be executed by the digital processing apparatus to perform method steps for organizing semi-structured documents into folders. The method steps include receiving, from a computer input device, at least one user-generated sample signal representing one or more sample documents, and based thereon, establishing a profile for one or more document folders. Based on the profile, documents are moved from a database into the folders.