As the use of the Internet proliferates, it is increasingly being used as a vehicle for buying and selling of goods and services. The vendors who sell their goods or services on the Internet are often referred to as “online merchants.” These online merchants primarily use a website to advertize their products and services and secure orders from consumers for their offerings. For instance, a typical online merchant webpage may display relevant information about a product including but not limited to its price, sales tax, shipping and warranty information and details about the product. The information displayed on a website is often encoded in a digital form. Such information may be generally referred to as ‘digital information.’ One of the most difficult tasks for a consumer is to determine which online merchant is offering the best bargain for a particular product or service. To this effect, consumers often engage in bargain hunting where they may visit online merchants and bookmark the webpage displaying information about the product or service of interest to them. In some instances, the consumer may add some information e.g., metadata, of his own to each product web page so bookmarked. This activity of adding additional information is sometimes referred to as “annotation” or “tagging.”
The idea of tagging is not limited to textual information. Any type of digital information e.g., audio, video, graphics, etc. may be tagged. For example, a person watching or having watched a video online may provide annotation against the video or the location of the video.
However, the conventional techniques for gathering and analyzing such digital information often involves manual processing of the tags and annotations. For example, a webpage or a Web resource on the World Wide Web (“Web”) is most often identified by a Uniform Resource Identifier (URI) or Uniform Resource Locator (URL). Other digital information available on a storage medium or a network may be identified by a “recall handle” similar to a URI or URL. Like the URI or URL, each recall handle is unique and is associated with only one item or page of information. Conventional techniques allow use of associating relevant keywords and phrases with a webpage, which is otherwise contextually uncertain. A user may be able to group contextually related web pages for later use. However, in some instances, web pages that share the same or similar annotation keywords and phrases may be grouped together even if they are contextually different. In addition, if the number of related web pages in a group lack specific information of interest to the user, it becomes difficult for the user to judge the relevancy of the web pages in light of the desired data to be analyzed. For example, a user may bookmark several web pages during his research, which he believes provide the information of his interest. The user may group these pages together as relating to the same item of interest. However, these bookmarked pages may not contain information that is actually relevant to what the user is searching. Subsequently, if the user attempts to extract relevant information from each of these bookmarked pages, he will have difficulty in evaluating the merits and relevancy of the information contained in the bookmarked web pages. The level of difficulty encountered by the user is directly proportional to the number of web pages being bookmarked.
Therefore, there is a need in the art for a method, process and system for efficient annotation or identification of digital information.
Moreover, in the era of the Internet, online reviews, including opinions, ratings and votes, are influential in determining the success of a product or seller. However, the current approaches to collecting such reviews cause scattering of knowledge for the same product or seller, or questionable reliability of the resulting data. For instance, some websites limit sources of reviews to well-known commentators, select reviewers, and people who have purchased the product from the websites. While this approach alleviates the potential inaccuracy of including reviews and ratings from people who may not have actually experienced or otherwise engaged with the product or seller, it excludes input from people who may have bought the product or interacted with the seller on another website, or at a brick-and-mortar store. Inevitably, if these people were to provide their opinions, they need to do so elsewhere thereby creating multiple destinations for reviews for the same product or seller. To be more inclusive, some other website, for example, allows any member to provide reviews against a product or seller. However, the readers of these reviews and ratings do not know if they came from a real customer or not. To achieve a compromise between such reliability and inclusivity, some websites allow reviews and ratings from non-customers, while providing an indicator to reviews associated with their customers. However, this approach could discriminate reviews from people who did purchase the product elsewhere. The present invention addresses these problems, and provides other benefits.
Furthermore, the World Wide Web (the Web) is an open distributed online repository of digital resources available through the Internet, mostly in form of web pages linked to one another through hypertext (or more broadly, hypermedia) links. Publication or retrieval of digital resources on the Web may be made by anyone via a server capable of accepting and handling HTTP (HyperText Transport Protocol) requests at a specific TCP/IP (Transport Control Protocol/Internet Protocol) port over the Internet. (The default or well-known TCP port for the Web is port 80, a network port number.) Because of the vast amount of digital resources available on the Web, tools are available for online users or consumers to locate relevant digital resources quickly, such as via a search engine. These tools may mostly be automated (e.g., crawling and indexing webpages) to collect information useful for this purpose. However, the accuracy of such effort has so far been met with limited success, because despite both digital resources (such as webpages) and requests for information (e.g., queries for digital resources relevant to a certain interest) may often belong to or otherwise be associated with a certain primary semantic context, there lack reliable and effective tools or schemes to establish contexts of digital resources and match them against requests consistent with their contexts.
For instance, one type of information pervasive on the Web is advertising. An ad may appear on the same webpage whose primary content may be regarded as belonging to another type or context, such as a journalistic article about health in relation to an ad about a mobile phone. In general, websites may exhibit third-party ads for revenue paid for, for example, by ad sponsors. An ad sponsor is one who is responsible for the cost of an ad placement. In comparison, an ad exhibitor is one that presents ads, such as an ad-carrying website. An ad content provider is one that prepares and produces ad content. On the other hand, a digital resource (or simply a resource) such as a webpage may comprise primarily content of advertising nature, such as those made available by a shopping website.
A user or consumer may often use search engines to research or otherwise discover information of some specific interest, such as looking up medical studies or research publications, shopping for a car, or planning for a trip. A search engine may be regarded as having three components: (a) a component that combs or crawls the Web for content, and indexes the content for suitable storage and optimal lookup; (b) a component that stores and maintains the indexed content; and (c) a component that accepts user queries, such as search words or phrases, and performs lookup against the indexed content, and returns search results to the users. (Often these search results comprise indications of digital resources, such as URLs (Uniform Resource Locators) of webpages and URIs (Uniform Resource Identifiers) of resources. Indications of digital resources may also be regarded as digital resources.) The last component may be available to a user in form of a webpage. In contrast, there may be websites that collect online resources of some specific interest, and allow users to provide queries against or submissions to these collections. For example, a shopping website may allow a seller to submit its individual products and their prices based on some data formats via a submission portal. Yet these seemingly more context-certain websites do not replace the use of search engines for context-specific information dissemination and discovery, because the former may only capture a small portion of the relevant resources that the Web would have, while the latter not only have the Web as their target for information capture, but also impose no website-specific formats or interfaces on content providers as pre-requisite for making resources available to such information capture. For instance, any digital resources accessible via HTTP (HyperText Transport Protocol) may be made available on the Web.
However, because the Web is context ignorant, any kind of information may be published, including but not limited to political news, personal blogs and entertainments. In addition, a single webpage may comprise content of possibly incompatible contexts, such as a news report about a political election with an ad about a product or service for travel. Such contextual uncertainty or ambiguity poses a substantial challenge to search engines that comb the Web for resources consistent with a certain interest or specific to a certain semantic context, such as ads of products and services. For example, a search for a particular product or service could result in web pages that simply contain the search words but are totally irrelevant to the user's intent. In addition, some content provider may deliberately put popular but contextually inconsistent terms or content in their digital resources (e.g., on their webpages) so to increase their relevancy to queries that may otherwise find them irrelevant.
Embodiments of the present invention would not only provide remedies to the above problems, but also make possible context-aware communications for dissemination and retrieval of digital resources.