1. Technical Field
The present invention relates to document indexing and more particularly to systems and methods for multi-thread, multi-core processing of documents.
2. Description of the Related Art
Stream computing research is becoming an area of great interest in academia and industry especially at terascale and petascale levels. Indexing large numbers of real-time streams with a high data rate in the order of 1-2 GB/s is a challenging problem. Such streams are encountered in backbone network routers, sensor networks and other domains like the financial services industry. This necessitates having sustained aggregate indexing rates of around 50-100 GB/s or more. Current multi-core architectures cannot sustain these high aggregate indexing rates.
This holds for similar multi-core architectures that may be employed in the future even though the architectures might have large number of cores. The current software indexing algorithms do not exploit fine-grain parallelism at the intra-document level and are not optimized for cache hierarchies when there are L1, shared L2/L3 caches with many threads per core and many cores. Therefore, scalability of text indexing with increasing simultaneous multi-threaded (SMT) threads per core and increasing number of cores is an important concern.
Simultaneous multithreading (SMT) is a processor design that combines hardware multithreading with superscalar processor technology to allow multiple threads to issue instructions each cycle. SMT permits all thread contexts to simultaneously compete for and share processor resources, and employs multiple threads.