Field
The invention is related to data management, discovery, and organization within voluminous data repositories.
Description of the Related Art
With the rapid increase in data creation and the capability to cheaply and reliably store vast volumes of data has come an increasing complexity in organizing, searching and discovering data elements within large data repositories. One result is that traditional techniques for searching data for needed elements, such as keyword searching, Boolean operators, and enhanced search are insufficient to cull wanted data from large data repositories because even a small mismatch between, for example, a keyword and data included in a document, may result in the document being omitted from the search results. Similarly, the presence of a keyword in too many documents within a data stream may result in over-inclusive searching, producing search results that are too voluminous for a human to review in an acceptable amount of time. Further, a keyword match may lack intelligence and produce data query results that combine documents simply on the basis of sharing a word (e.g., “state”), even though that keyword has substantively different meanings in the documents (e.g., “solid state” and “state of mind,”). Also, individuals may have a strong intuitive sense of what information is valuable within a set of results, but may not be able to develop keywords that properly reflect that intuition. Therefore, a need exists for document and data discovery methods and systems that are capable of being trained, that are capable of representing intuitive review processes, that are scalable, and that may be deployed within large data repositories.