1. Field of the Invention
The invention relates generally to databases. More particularly, the present invention relates to a system and method for extracting content from unstructured sources.
2. Description of the Prior Art
Solutions are known to extract content from unstructured sources such as content from web pages on websites. Accordingly, such methods have the objective of collecting data from websites by parsing web pages with the intent to use the data for other applications.
The problem with traditional solutions for extracting unstructured content from web pages on a website is the inability to accurately extract specific entities amongst all the unstructured data found on a web page. Another problem with traditional solutions for extracting unstructured content from web pages on a website is the inability to effectively identify and remove poor data collected through the parsing process. A still further problem with traditional solutions for extracting unstructured content from web pages is the inability to automatically determine the location of specific entities based on their position within a website.