1. Field of the Invention
Embodiments herein present a system, method, etc. for identifying and delivering domain specific unstructured content for advanced business analysis.
2. Description of the Related Art
In just a few years, the Internet made large amounts of unstructured information instantly and ubiquitously available and it has had a tremendous impact on society and business. Breakthrough technologies provide Web scale text mining and discovery platforms that contain very large, distributed repositories (i.e. cluster) with valuable metadata that is automatically extracted from billions of documents, including Web pages, Web logs, bulletin boards, newspapers, etc.
Despite such advances, Web scale repositories contain large amounts of unstructured information but no domain-specific context is associated with it. An automated process is desired to create and maintain a domain specific, contextual data repository of unstructured content for different business or logical domains. An “on-topic store” for a domain consists of Web pages and associated contextual metadata such as dates, geo locations and company names that is relevant to that domain. The on-topic store may be used as the starting point for running more complex analytics specific to that domain. The more topic-focused the store it is, the faster and more efficient the domain-specific analytics can be.
Furthermore, there is a high cost associated to custom text analytics application support and deployment. There are at least two situations where this occurs: first, in an application delivery model in which analytics applications access huge volumes of data by talking directly to the cluster, and second by using standard database or data feed of limited mined data that is application specific. There are no known solutions to these problems.