This disclosure relates generally to online systems, and more specifically to processing data received at a data processing system of an online system.
Online systems, such as social networking systems, allow users to connect to and to communicate with other users of an online system. Users may create profiles on an online system that are tied to their identities and include information about the users, such as interests and demographic information. The users may be individuals or entities such as corporations or charities. Online systems allow users to easily communicate and to share content with other online system users by providing content to an online system for presentation to other users. Content provided to an online system by a user may be declarative information provided by a user, status updates, check-ins to locations, images, photographs, videos, text data, or any other information a user wishes to share with additional users of the online system. An online system may also generate content for presentation to a user, such as content describing actions taken by other users on the online system.
Additionally, many online systems commonly allow publishing users (e.g., businesses) to sponsor presentation of content on an online system to gain public attention for a publishing user's products or services or to persuade other users to take an action regarding the publishing user's products or services. Content for which the online system receives compensation in exchange for presenting to users is referred to as “sponsored content.” Many online systems receive compensation from a publishing user for presenting online system users with certain types of sponsored content provided by the user. Frequently, online systems charge a publishing user for each presentation of sponsored content to an online system user or for each interaction with sponsored content by an online system user. For example, an online system receives compensation from a publishing user each time a content item provided by the publishing user is displayed to another user on the online system or each time another user is presented with a content item on the online system and interacts with the content item (e.g., selects a link included in the content item), or each time another user performs one or more particular actions after being presented with the content item (e.g., visits a website or physical location associated with the user who provided the content item).
An online system that provides content to its users in exchange for compensation from a user (i.e., sponsored content) may provide a publishing user who provided content to the online system with various metrics describing certain actions performed by other users of the online system after being presented with such sponsored content to describe the effectiveness of the sponsored content at eliciting the certain actions. For example, an online system presents users with a content item and maintains a number of users who select a link included in the content item or a number of times the users visit a web site associated with the content item during a particular time interval based on information received from client devices on which users interact with the content item. Based on the number of users who selected a link included in the content item or a number of times the users visited the website associated with the content item after being presented with the content item, the online system determines a metric and includes the metric in a report describing the content item's effectiveness that is provided to a publishing user associated with the content item.
Determining metrics describing actions performed by users of an online system often involves performing complex, resource-intensive operations on large amounts of data in short periods of time to accurately extract, analyze and process information necessary for the production of meaningful reports. For example, to effectively generate metrics describing events associated with various content items presented on an online system during various time intervals, the online system quickly and accurately receives, formats, reads, analyzes, organizes, aggregates, stores, and presents the required information for various events, content items, and time intervals. To efficiently process the vast amount of information required to generate such metrics, online systems often use data processing systems capable of processing an incoming stream of data in a short amount of time. For example, a data processing system includes a network of data processing elements that perform a series of operations in a distributed manner among the various data processing elements to quickly process incoming data. A data processing system may include components operating on different computing devices and in different locations in various implementations.
However, in some circumstances, data received at a data processing system may be lost or erroneously altered by the data processing system before or during processing, causing inaccurate determination of metrics. For example, a data processing system having multiple data processing elements each performing a specified process on individual pieces of data as a data analysis process is applied to the data loses a piece of data (e.g., a data processing element fails to process the piece of data due to power failure or logic error) during processing. If data is lost or erroneously altered (i.e., “corrupted”) by the data processing system and the loss or alteration is not detected, metrics based on the data may be incomplete or inaccurate. For example, if the online system performs a series of additive operations on data being processed by a data processing system to measure a number of times a user interacts with a particular content item after being presented with the content item, the measurement is inaccurately low if data describing user interactions with the content item is lost or corrupted by the data processing system during processing. Accordingly, metrics based on such data will also be inaccurately low if the online system does not detect and correct for the lost or corrupted data when determining the metrics. Hence, undetected loss or corruption of data by a data processing system of the online system may cause an online system to generate metrics that inaccurately describe performance of various content items presented on the online system.