In Internet advertising, Internet advertising alliances typically provide website owners with advertising code (typically JavaScript code). The website owners embed the advertising code in their web pages and display advertisements provided by the Internet advertising alliances. The Internet advertising alliances also take a share of the earnings from the displayed advertisements. When a user browses a web page including the advertising code, the client equipment where the user is located runs the advertising code and sends an advertising request to advertisement front-end servers. The advertisement front-end servers record information relating to the advertising request in an advertisement request log, execute an advertisement selection algorithm, and send back an advertisement segment. After the advertisement segment is organized by the browser, the advertisement segment is ultimately displayed in a fixed position within an advertisement display zone on the web page. Typically, the advertisement display zone is a rectangular zone. The rectangular zone includes text, pictures, multimedia, and other information presented to the user and is called an advertising space.
The “advertisement request” refers to a data exchange between the client equipment and the advertisement front-end servers via a hypertext transport protocol (HTTP). The advertisement request includes an advertising space identification (ID), advertising space width and height, an advertisement request source page, and other such information. After the advertisement front-end servers receive the advertisement request, the advertisement front-end servers record the information relating to the advertising request in an advertisement request log and respond to the advertisement request of the client equipment by sending back advertisement data to the client equipment.
After the client equipment receives the advertisement data sent back by the advertisement front-end servers, a browser organizes the advertisement data and displays the advertisement data based on the relevant parameters in the advertisement request. No blocking or concealment of the corresponding advertisement display zone exists in the displayed web page.
Typically, by counting the number of advertisement requests in the advertisement request log, the advertising alliance generates page view volume reports of different granularities (for example, website, advertising space, source page), and these page view volume reports guide subsequent apportionment of earnings. Therefore, the page view volume reports are basic data whereby the Internet advertising alliances and the website owners settle accounts. To increase earnings, some website owners employ some unreasonable technical means to increase page view volumes of website advertisements, resulting in serious discrepancies between an amount of advertisement requests and an amount of advertisement displays. These discrepancies harm the interests of the website owners and the Internet advertising alliances.
The technical means that the some website owners employ to raise website advertisement page view volumes mainly include the following:
1. No advertising code is embedded in the page, and an advertisement request is automatically counterfeited by a program.
2. An advertising code is embedded in the page, but the advertising space is concealed using iframe or some other technique. For example, page A has a fixed flow, but no idle advertising space. Page B has idle advertising space, but no flow. The flow refers to ad traffic or ad page views. Page A conceals page B within page A using iframe. When a user visits page A, the user triggers a page B advertisement request. However, because the entire page B is concealed, the advertisement request relating to page B is not displayed.
3. An advertising code is repeatedly embedded in a page to duplicate flow. For example, when a user visits a page, a plurality of advertisement requests are triggered, but only one advertisement display is generated.
4. Page advertisements are piled up and some advertisements are blocked by other advertisements so that the actual advertising result is not achieved.
5. Advertising space positions are falsely reported. For example, an advertisement is declared to the advertising alliance that the advertising space is displayed on the browser home screen when in fact the advertisement is displayed somewhere other than on the browser home screen.
The current advertising volume on the Internet is quite large. Therefore, to use manual sampling to discover advertisement display problems requires a large expenditure of manpower and time. In addition, manual sampling has rather poor coverage and is inefficient. To increase the coverage and efficiency of advertisement display sampling, two main solutions are as follows:
In a first solution, the browser executes JavaScript code provided by background servers, collects advertisement page information (such as advertiser ID, display space size, etc.), and adds the collected advertisement page information as parameters into an advertisement request. Advertisement front-end servers record information relating to the advertisement request and perform data mining on the recorded advertisement request information to uncover abnormalities in the advertisement request. For example, the advertisement front-end servers compile statistics on time intervals of the same advertisement request corresponding to the same advertiser ID. When the time intervals fall below a threshold value, a determination is made that the advertisement display has become abnormal.
In a second solution, advertising page addresses (e.g., uniform resource locators (URLs) of advertising pages) are extracted from an advertisement request log, and a crawling technique is employed to capture advertising pages and advertising page scripts and to identify problems by restoring the page layout.
The above conventional solutions include at least the following limitations.
In the first solution, JavaScript code is executed by a browser to acquire DOM (Document Object Model) node information. The JavaScript code has a certain page layout-acquiring capability and roughly locates the advertising space, but because of browser security restrictions, limited top-level page access capability exists when JavaScript code having multi-level nested iframes is executed, and restoring the actual layout of the page is not possible. Moreover, because page technology for building a web page is complex, web page building techniques that use XHTML, HTML, CSS, JavaScript are relatively inaccurate ways of assessing blocking or concealment.
In the second solution, crawling and capturing advertising pages and scripts of advertising pages requires a powerful browser core engine to render HTML (Hypertext Markup Language) and CSS (Cascading Style Sheets) and to execute page scripts correctly. Because of the complexity, diversity, multi-level nesting, Ajax requests, Flash media, browser core engine compatibility, and other such problems involved in the technologies employed by web pages, relying solely on analysis of page code is a relatively inaccurate way of restoring the actual display condition of advertisements.