Information Extraction (IE) is the operation (or process) of extracting structured information from unstructured (or semi-structured), machine readable text. It can be said to represent a basic building block, even a critical component, of many enterprise applications including regulatory compliance, social media analytics, and search routines. Such applications tend to require information extraction programs with very high accuracy and coverage. At least in such settings, building or developing an information extraction program (also referred to herein as an “extractor”), and associated rules, can involve an extremely labor intensive process.
Conventional web information extraction programs may permit the building of extractors visually. However, their construction tends to be limited to a specific type of extractor (e.g., a “wrapper”). Among other shortcomings, conventional arrangements lack the ability to translate a visual representation of an arbitrary complex concept into optimized executable and human-readable/executable extractors.