The World Wide Web (WWW), or simply the “Web”, is the well-known collection of interlinked hypertext documents hosted at a vast number of computer resources (“hosts”) that are communicatively coupled to one another over networks of computer networks known as the Internet. These documents, which may include text, multimedia files and images, are typically viewed as Web pages with the aid of a Web browser, which is a software application running on a user's computer system. Collections of related Web pages that can be addressed relative to a common uniform resource locator (URL) are known as websites and are typically hosted on one or more Web servers accessible via the Internet.
Websites featuring User Generated Content (UGC), which is content created and posted to websites by owners of and, sometimes, visitors to those sites, have become increasingly popular. There are UGC accounts for a wide variety of content, including news, gossip, audio-video productions, photography, and social commentary, to name a few. Of interest to the present inventors is UGC, which expresses opinions (usually, but not necessarily, of the person posting the UGC), for example of products, services, or combinations thereof (herein, the term “product” refers to any or all such products and/or services). Social media sites in particular have become popular places for users of those sites to post UGC that includes opinion information.
The opinions and commentary posted to social media sites have become highly influential and many people now make purchasing decisions based on such content. Unfortunately, for people seeking out such content in order to inform prospective purchasing decisions and the like, the task is not always easy. Blogs, micro-blogs, and social networking sites are replete with ever-changing content, and, even if one can locate a review or similar post of interest, such reviews typically include much information that is of little or no relevance to the topic and/or to the purpose for which the review is being read. Further, while the UGC and opinion information can be of great value to advertisers, retailers, and others, it is extremely burdensome to collect and analyze in any systematic way. It is even more difficult to extract therefrom meaningful commentary or opinions that can form the basis for appropriate responses or informed decisions.
Extracting sentiment from phrases, words or a combination of words continues to present challenges in text analytics, particularly when a given passage of text has multiple sentiment bearing phrases in different sentences. Within a passage of text, numerous sentiment bearing phrases can exist, and, within that passage, sentiment bearing phrases may be bound to different categories, which makes an accurate read on the overall sentiment of the entire passage more challenging. One approach is to look at the individual nugget in the passage, sentence by sentence, drawing little clues, like positive on a first nugget, positive on a second nugget, but negative on a third nugget. In total, the sum of these nuggets adds up to arrive at the summation of the overall sentiment.
Conventional solutions of text analytics requires methods of inputting training data into a computer database where supervised machine learning algorithms can access and process the training data. To increase the pace of training data, researchers have been using web browser based applications to interact with people and present training samples, so that they can tag associated descriptive information with the training samples. A widely used Web-based platform to load, present, and gather tagged information from people is Amazon's Mechanical Turk. In Amazon's Mechanical Turk, users log into the Web-based application and browse for jobs for them to process, and in return for their work, users are paid for a specific task, such as, categorizing text from a set of multiple-choice answers.
Accordingly, it is desirable to have a system and method that provide more effective ways for a hybrid human machine learning platform.