Organizations that routinely handle a large number of digitally processed texts, such as funding agencies, legal reporter services, governmental patent offices, scientific or technical journal publishers, or large corporations, are often faced with the task of classifying newly received or generated documents into one or more of a plurality of different classes, typically for purposes of further document processing.
For example, a large funding agency, such as the NIH or the NSF will typically receive thousands of grant proposals for research funding each month. Before a proposal can be forwarded to the proper group for evaluation, it must be classified as to general funding area, so it can be sent to the appropriate grant-review group.
As another example, a patent office may receive thousands of electronically filed patent applications each week. Before these documents can be further processed by substantive examination, they must first be classified as to patent class, and optionally, subclass, and forwarded to the appropriate art unit or technical center for examination.
Publishers of legal texts or scientific and technical reports or papers may receive thousands of publication submissions per week, and these must be first be classified according to legal area or scientific or technical area before further archiving or publication, or for forwarding to an appropriate editorial review group.
It would thus be desirable to provide an automated text classification system that is able to accurately, that is, within some defined performance limits, classify electronic documents into one or more different classes or categories, typically as a first step in document processing or for purposes of document control.