The Internet is a publicly accessible worldwide system of interconnected computer networks that accommodates access to Internet connected servers and the information that resides thereon. A related service, the world-wide-web includes the universe of Internet accessible information and encompasses the complete set of documents that reside on all Internet servers. Consequently, the Internet can provide practically instant information on most topics.
Researching subject matter (e.g., products, brands, topics) online can involve a search for and identification of information sources from which information can be obtained and reviewed. In many cases such research can require that an online researcher manually extract and collate information from the identified information sources. Identified information sources can include but are not limited to web pages, etc.
Online researchers who wish to research products, brands, or topics using conventional web search systems must read through many individual web pages in order to gain sufficient knowledge about a subject to form an informed opinion about the subject that they are researching. This can be a very time-consuming process. Moreover, even with a significant investment of time (e.g., reading individual documents) a researcher can be left with a very incomplete picture of the web registered sentiment about the research subject or about individual features of particular research subjects that should be considered.
An automated process used for extraction of information from identified web sources is called sentiment extraction. Sentiment extraction is typically performed as a batch process that involves a large corpus of documents. While conventional sentiment extraction can be effective in some contexts, it can be unsuitable for a web-based document-indexing pipeline for various reasons.
One reason such an approach can be unsuitable for a web-based document indexing pipeline is that it operates against multiple documents instead of a single document, and the size of the task involved severely taxes processing resources and reduces the effectiveness of extraction operations. Moreover, the amount of processing power and the time required to perform the calculations that are involved makes the process unsuitable (for performance reasons) for integration into a document-level indexing pipeline.