This specification relates to evaluating sources that collect information, for example, to determine if the information includes spam.
Search engines can be configured to receive search queries provided by users and to provide search results that satisfy the search queries. A search engine can communicate with multiple client devices, for example, computers, through one or more networks, such as the Internet. In some situations, the search engine can search from an index of resources when it receives a search query. The index can be created by crawling multiple resources, each of which is stored, for example, in one or more computer systems. In addition to identifying the resources, the search engine can additionally rank the resources included in the search results according a relevance of each search result to the search query.
The resources can include information describing business entities that offer services, products, or both. In some situations, a business entity can have a dedicated resource, for example, a website addressed by a unique uniform resource locator (URL), which the search engine can crawl and index. As an alternative or in addition, a business entity can provide its business information to one or more of several sources that collect information about such entities, for example, the Yellow Pages. A source can store the information (including information describing business entities) in a computer system. A search engine can crawl and index the stored information. Alternatively, or in addition, the source can provide the stored information to the search engine. When the search engine receives a search query that references a product, service (or both), the search engine can identify, from the index, relevant business information collected and stored by the source.