There is a great deal of so-called unstructured data that resides in the world. Typically, unstructured data has characteristics which, as the name implies, find it highly unstructured and difficult to work with. Perhaps a good perspective from which to understand unstructured data is from the perspective of structured data. Structured data, by its very nature, is typically easily indexed and searched.
As an example, consider the following. In many cases, governments, corporations, and various other large entities such as businesses and the like, can have many thousands of documents to deal with. These documents constitute knowledge in the sense that the documents contain information that might be useful to the particular entity. Yet, by virtue of the voluminous number of documents and the fact that such documents may be in a generally unstructured state, this knowledge is not reasonably and readily attained by these entities. Even if such entities were to have, for example, an intranet, one would have to know what to specifically search for, and what the information means to the searcher.
Thus, as noted above, one of the difficulties in working with unstructured data is that of building and creating knowledge based on the unstructured data. Put another way, one of the challenges with unstructured data pertains to disambiguating the data so that the data can be the subject of meaningful information processing techniques.
Some approaches that have been used in the past in an attempt to disambiguate unstructured data utilize so-called knowledge architects. Knowledge architects are typically very highly skilled professionals who craft knowledge based on the data. The techniques and approaches that these individuals use tend to be very expensive—owing to the highly-skilled nature of the individual(s) architecting the system. Additionally, the specific systems that are put in place by such individuals do not tend to be easily repeatable in different scenarios or environments. Thus, these approaches tend to be expensive and highly specifically directed to a particular problem at hand. As such, there remains a need, in the area of data disambiguation, for systems that are less complex insofar as implementation and deployment are concerned. In addition, there is a need for such systems that do not require a highly specialized professional to set and deploy the system.