With the development of cloud computing and big data, various computation models have been developed in the field of big data computation, which are used to perform processing and computation in various data computing scenarios. Filtering out useful data from mass user data is more and more widely used. Especially, according to a large batch of filtering requirements that is input at a time, target user groups that meet each filtering requirement in the large batch of filtering requirements are filtered from the massive user data.
A conventional solution is implemented by using a Map-Reduce framework program. Map-Reduce is a software framework for parallel computation of large volumes of data, which may process billions of data inputs in a few hours. Basic steps of Map-Reduce include two stages: Map and Reduce. The main process of the Map stage mainly includes: (1) reading a large batch of filtering requirements, parsing expressions included therein, establishing a correspondence relationship between the expressions and Map tables, obtaining atomic expressions related to the Map tables and perform deduplication; (2) reading, piece by piece, mass user data that is recorded in the Map tables, and performing computation for each piece of user data cyclically by using the atomic expressions; and (3) according to an identification (ID) of a user in user data outputted from the Map tables, outputting, in the form of a list, at least one of the atomic expressions that the user satisfies. The main process of the Reduce stage mainly includes: (1) reading a large batch of filtering requirements, parsing expressions included therein, establishing a correspondence relationship between the expressions and Map tables, and obtaining atomic expressions that each filtering requirement needs to satisfy, to form an atomic expression list; (2) reading user data in the Map tables, combining user data of each user in the Map tables, and after the combination, obtaining a plurality of atomic expressions that users satisfy in the Map tables to form an atomic expression list; and (3) combining the results obtained in (1) and (2) to obtain a correspondence relationship between the users and the screening or filtering requirements, and outputting the correspondence relationship between the users and the filtering requirements.
The method for filtering data objects that is provided in the above conventional techniques has apparent deficiencies.
The method provided in the conventional techniques is implemented based on the Map-Reduce framework program. After a large batch of filtering requirements is input at a time, a quite large amount of data computation is required. Assuming that the number of filtering requirements is R, an average number of expressions for each filtering requirement is E, and the number of users is N, the total amount of data computation for implementing screening and classification of users is R*E*N, which is a quite large amount of computation and leads to a long time for computation. In addition, as the number of filtering requirements increases, data computation time required to complete screening and classification of large volumes of data increases sharply, which cannot meet service requirements of filtering large volumes of data.