Existing data mining techniques may be frustrated by large document warehouses. Large document warehouses often may not be rapidly and readily searched due to their large volumes. Organizations may have several million unsearched documents on hand that are out of reach of present data mining techniques.
Current networking and research technologies create the possibility for obtaining large document data warehouses via internet data transfer. Searching these documents, however, poses many challenges due to the size of the warehouse and the constant influx of new documents.
Generally, needs exist for improved methods and systems for creating, searching, and classifying the documents contained in large data collections.