A general problem that arises in many systems for automating the handling of documents is the need to assign a particular document to a class or category dependent on the subject matter discussed in its content.
Most commercial systems for achieving this rely on a human creating lists of terms which are used to categorize the document. That is, if the terms appear in a document, then this fact is taken as evidence that the document should be assigned to a particular category. Such approaches are inaccurate and require a lot of manual work in setting up and maintaining the term lists and associated boolean operator combinations.
There is need for accurate, self-maintaining and automatically created categorization systems.
Although there have been attempts to create self structuring in other fields these have usually been at the expense of unimodality and have required adaptation algorithms of very high computational loads such as simulated annealing to remove even the simplest redundancies. See Qiuzhen Xue, Yu Hen Hu and Paul Milenkovic, Analysis of the hidden units of the multi-layer preceptron and its application in acoustic-to-articulatory mapping. Proc. ICASSP90 April 1990. Other singular value decomposition (SVD) approaches have been applied but have only removed a subset of the redundant terms. See Schetzen, The Volterra and Weiner Theories of Non-linear systems New York, N.Y.: John Wiley 1980. These methods utilise SVD to reduce the recognition space to a more compact form by use of the singular values.
(1) P. Rayner and M. R. Lynch, a new connectionist model based on a non-linear adaptive filter proc. ICASSP89 April 1989
(2) M. J. D. Powell, Radial Basis Function approximations to polynomials Proc. Department of applied mathematics and theorectical Physics.
(3) A. Ivankhnenko, Heuristic self-organisation problems of engineering cybernetics automatica. Vol 6. 1970, pp.207-209
(4) Qiuzhen Xue, Yu Hen Hu and Paul Milenkovic, Analysis of the hidden units of the multi-layer preceptron and its application in acoustic-to-articulatory mapping. Proc. ICASSP90 April 1990
(5) Schetzen, The Volterra and Weiner Theories of Non-linear systems New York, N.Y.: John Wiley 1980.
(6) S. Haykin, Adaptive Filter Theroy Englewood Cliffs, N.J.: Prentice-Hall 1986
(7) V. Klema and A. Laub, The singular Valve Decomposition: Its computation and some applications. IEEE Trans. AC,vol. AC-25, No 2. April 1980, pp 164-176
A method and apparatus for document categorization are described. In one embodiment, the method comprises automatically selecting one or more discriminant term combinations and using the one or more discriminant term combinations for document categorization.