Since the advent of large scale and persistent storage and compute capabilities across homes, corporations, and government, large amounts of text data can be, and have been, stored with few barriers to retrieval or dispersal. The ability to extract information from text data has, as a consequence, assumed increasing significance. Applications that use forms of text mining may be found in fields ranging from business intelligence solutions to academics, being used for analysis of patent and academic literature, indexing, clustering, and search and information extraction.
Existing techniques in the field of text mining, or value extraction from a set of text data may involve supervised or unsupervised machine-learning methods. However, the extraction of exact attribute values from unstructured data is still a grey area, with the most accurate methods dependent on a large amount of user input, or training data. In order to circumvent such a requirement, or augment accuracy, some existing methods may additionally use classification techniques upon the dataset. However, data classification techniques carry with them a significant risk of ignoring some data which may, in turn, contain valid values of attributes in the text.
What is needed, then, is a reliable and accurate off-the-shelf solution for attribute or value extraction from text data that is able to work without any need for sample input or training. It is additionally important that any such solution be domain independent, and capable of functioning on structured or unstructured text in any domain.