The present invention is related to techniques and mechanisms for extracting information from web pages or the like.
Various techniques for information extraction, such as Named Entity Recognition (NER) or any other suitable list extraction technique, refer to the process of locating and classifying parts of documents into pre-defined categories. For instance, categories may include people, locations, and organizations. Unfortunately, conventional systems for performing information extraction are often difficult to manage, troubleshoot, and scale across different types of corpus.