This specification relates to privacy protection.
The Internet provides access to a wide variety of resources, for example, webpages, image files, audio files, and videos. A search system can select one or more resources in response to receiving a search query. A search query is data that a user submits to a search engine to satisfy the user's informational needs. The search queries are usually in the form of text, e.g., one or more query terms. The search system selects and scores resources based on their relevance to the search query and on their importance relative to other resources, and provides search results that link to the selected resources. The search results are typically ordered according to the scores and presented according to this order.
Each underlying resource that a search result references includes content, such as text, images, audio and/or video content. The content may be controlled by a publisher (e.g., an owner or manager of a particular web site) or may be user contributed (e.g., blog posts, discussion threads, etc.). Some of the content that is made available in a resource may be personally identifiable information (PII). Personally identifiable information is information that can be used to distinguish or trace an individual's identity, such as their name, social security number, biometric records, etc., alone, or when combined with secondary personal or secondary identifying information that is linked or linkable to a specific individual, such as date and place of birth, mother's maiden name, etc.
The publication of certain types of personally identifying information can be innocuous. For example, a person may voluntarily publish personally identifying information in a social network page. Examples of this include a person's full name, age, gender, city and state of residence, etc. However, the publication of other types of personally identifying information may be harmful. For example, the publication of a person's name, Social Security number, bank account number and a password to electronically access the bank account exposes the person to the risk of identity theft and monetary theft. Typically, people do not voluntarily publish this latter type of personally identifying information.
Unfortunately, malefactors may gain access to such information and offer this information for sale over the Internet. When offering and publishing such information, malefactors sometimes use publicly available websites at which users may freely publish information. Examples of such websites include social network websites, community bulletin boards, newsgroups, and the like. The malefactor may register as a user and post contact information at which the malefactor can be reached and through which the personally identifiable information of individuals can be provided.
The resources that are available through these websites, and most other websites, are often processed by search systems (e.g., indexed for search processing and optionally cached by the search system) so that the resources can be identified in response to search queries. Thus, it is possible that users will be provided with search results that link to underlying webpages that include personally identifiable information.