Search Engines
Through the use of the Internet and the World Wide Web (“the web”), individuals have access to billions of items of information, which will surely continue to grow. For example, the web provides access to items such as web pages, pictures, songs, videos, bookmark sets, white page listings, people, etc., generally and collectively referred to herein as “searchable items.” However, a significant drawback with using the web is that, because there is so little organization to the web, at times it can be extremely difficult for users to locate the particular items that contain the information that is of interest to them. To address this problem, a mechanism known as a “search engine” has been developed to index a large number of searchable items and to provide an interface that can be used to search the indexed information by entering certain words or phases to be queried. These search terms are often referred to as “keywords”. A search engine is a computer program designed to find searchable items stored in a computer system, such as the web or such as a user's desktop computer. The search engine's tasks typically include finding searchable items, analyzing searchable items, and building a search index that supports efficient retrieval of searchable items.
Indexes used by search engines are conceptually similar to the normal indexes that are typically found at the end of a book, in that both kinds of indexes comprise an ordered list of information accompanied with the location of the information. An “index word set” of a document is the set of words that are mapped to the document, in an index. For example, an index word set of a web page is the set of words that are mapped to the web page, in a search index. For items that are not indexed, the index word set is empty.
Although there are many popular Internet search engines, they are generally constructed using the same three common parts. First, each search engine has at least one, but typically more, “web crawler” (also referred to as “crawler”, “spider”, “robot”) that “crawls” across the Internet in a methodical and automated manner to locate searchable items of information from around the world. Upon locating an item, the crawler stores the item's URL, and follows any hyperlinks associated with the item to locate other items. Second, each search engine contains information extraction and indexing mechanisms that extract and index certain information about the items that were located by the crawler. In the context of a web page, for example, index information is generated based on the contents of the HTML file associated with the web page. The indexing mechanism stores the index information in large databases that can typically hold an enormous amount of information. Third, each search engine provides a search tool that allows users, through a user interface, to search the databases in order to locate specific searchable items that contain information that is of interest to them, and their location on the web (e.g., a URL).
The search engine interface allows users to specify their search criteria (e.g., keywords) and, after performing a search, provides an interface for displaying the search results. Typically, the search engine orders the search results prior to presenting the search results to the user. The order usually takes the form of a “ranking”, where the searchable item with the highest ranking is the item considered most likely to satisfy the interest reflected in the search criteria specified by the user. Once the matching searchable items have been determined, and the display order of those items has been determined, the search engine sends to the user that issued the search a “search results page” that presents information (e.g., URLs, titles, summaries, etc.) about the matching searchable items in the determined display order.
Search engines must generate both relevant and comprehensive search results. Comprehensiveness is generally achieved by crawling web sites and by contracting with content providers to supply content feeds. However, this approach fails to capture most of the world's data, which resides on individual computing devices such as personal computers, rather than on the Internet. Such data is usually inaccessible by any search engine unless the user takes proactive explicit steps to make the data available to the Internet, such as by uploading the data to an indexable website or some other publicly accessible location. Accessibility is generally defined as the ability for content to be reached and downloaded, or otherwise consumed, by any user who has a web browser and an Internet connection.
Exposing Personal Content to the General Public
Users that want to expose personal content to the general public or to a subset of the general public must confront two challenges: how to expose such content for receipt by others and how to publicize the availability of such content. A user might expose personal content in numerous ways. The most basic—and currently the most flexible—way to make personal content available on the web is to set up a website and to post content on a publicly accessible web page. Other examples of how to expose content to the public may include (1) using a paid web hosting or web log (also referred to as a “blog”) account on a service which offers hosting to thousands of different customers distributed over hundreds of different physical web servers; (2) using a free hosting service for a web site or a web log, which removes the cost component of web publishing but which sacrifices flexibility and utility because of constraints on file size and file type accepted for posting; (3) upload content to specialized community sites which, while vibrant and active forums for the distribution of information, often take time to join and to learn their community protocols and on which it may be difficult to find the right forum in which to share particular content, and; (4) using a “digital marketplace” to make content available for a fee to receive access to download the source file; (5) using podcasts, which are typically MP3 audio files that are queued for convenient download by users and are synced with the users' portable MP3 player or played directly on the users' computers, and which require both the content contributor to provide suitable hosting environments, bandwidth and storage costs and require the content user to have suitable devices on which to download and replay the content; (6) using newsgroups (e.g., NNTP), which are typically not mainstream, often not archived, have a limited audience, are duplicative, and the information is ephemeral and generally cannot be easily searched; (7) using FTP servers which, by definition, are not part of the web and have limited access via a web browser, require an FTP client and an understanding of server volume structures to use, and are difficult to search; and (8) using the Gopher protocol which by definition is not part of the web, which offers some features not natively supported by the Web but imposes a much stronger hierarchy on stored information and therefore requires careful classification of information so that users can navigate pre-determined file menus to find information.
Furthermore, simply making content accessible is not sufficient. Users also need a means to publicize that such content is available. Examples of how to publicize content to the public may include (1) direct marketing, such as when a user send emails to other users (personal contacts or from a distribution list) with links to the website containing the shared content, which is typically only successful in very limited applications and, unless the mail is passed along (viral marketing), only impacts a small fraction of the potential audience for the content; (2) linking, by which a user may contact the creator of another web site and request that a link to their site (or shared content) be included on a visited web page; (3) submission to search services, by which users may seed search services directly with the web address of the site or content to be shared, with the expectation that the service will eventually send a crawler to this site to index the content and include it in the public search index; and (4) search-engine optimization, an approach which attempts to improve a page's ranking in the query results through the use of a multitude of techniques in page design, keyword vocabulary, cross-linking, and meta-data.
In view of the foregoing, there is a need for better approaches to enable users to expose and publicize personal content to the public and to some subset of the public via the Internet.
Any approaches that may be described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.