The present invention generally relates to the field of data analysis, and more specifically, to a method and system for cross-site data analysis.
At present, networks have become a common medium for people to access, browse, store, and exchange information on a daily basis. From the perspective of an end user, interaction with the network information may be performed through a site on the network (or simply called “website”). With the development of network technology, more and more sites can mine and study user features, for example, interactive habits, preferences, interests, etc., using a technology such as data analysis, and on this basis, provide personalized and/or customized information service to the users. For example, a video service network can infer from a user's browsing history and previous interactive behaviors which type of information the user potentially prefers, and recommend or display video clips related to this type of information in an eye-catching way.
However, different sites and even different columns of the same site might adopt different algorithms and mechanisms to perform data analysis about the user, which prevents the improvement of user experience and operation efficiency. Specifically, suppose a site has accumulated knowledge about the user through a certain period of analysis and study with respect to the user and may thereby provide customized information services. However, when the user accesses another site, the knowledge about the user as accumulated in the previous site cannot be utilized by the current site, and it is possibly even so when the two sites are run by a same provider. Therefore, upon interaction at the new site, the user cannot directly obtain customized personalized services, but has to wait for the site to learn the user's features from the beginning using data analysis.
A feasible approach to address the above problem is leveraging user names of the user at different sites. It would be appreciated that many sites require a user to register to become a member of the site before allowing the user to use the functions of this site. A username of the user in the site is generally selected by the user, composed of for example letters, digits, and some specific symbols. The prior solution is generally based on the following supposition: if the same username appears in two sites, then it is deemed that the username corresponds to a same user. Correspondingly, the user knowledge and analysis results, which are related to the username, may be shared between the two sites.
However, a same user might have different usernames in the same site. First, the user naming mechanisms of sites operated by different providers are usually isolated from each other. Different reference to providers may adopt different username registration mechanisms. Moreover, the user may adopt different usernames in different sites due to various other reasons; for example, the username is registered in advance by another user, subjective willingness, etc. Therefore, this approach still has defects in reliability and stability to perform cross-site data analysis merely dependent on the same username.