The present invention, in some embodiments thereof, relates to structural document classification and, more specifically, but not exclusively, to structural document classification base on document feature analysis.
Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. Known methods are used for manually or algorithmically classify documents. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is used mainly in information science and computer science.
Content based classification is classification in which the weight given to particular subjects in a document determines the class to which the document is assigned. It is, for example, a rule in a library classification that at least 20% of the content of a book should be about the class to which the book is assigned. In automatic classification it could be the number of times given words appears in a document.
Data management systems which use content based classification have traditionally existed as either enterprise software or managed service solutions. Enterprise software is typically deployed and maintained on an enterprise server.
Managed service solutions, such as the management of archival databases that contain customer information, are typically operated and maintained by a managed service provider within that provider's managed service environment.