A supplier of services and/or content generates traffic in a telecommunication network. This traffic is received by a user terminal. The services and content covered in this description may be telephone-based, such as services from a customer care department, or an account statement read by a speech generator, internet-based, such as HTML, XML or Java code from a web server or video-based such as real-time multicast, broadcast and video on demand etc. The various types of data sources are naturally adapted to the service and/or content delivered, but for the purpose of this description any device that sends data on a network is referred to as a “content source.” The term “user terminal” is similarly used to describe any device that receives traffic, e.g. services and/or content, from i.e. a telecommunication network. The term “user terminal” therefore includes in this description phones, mobile phones, computers with web browsers, VCRs and any other device that can receive data from a telecommunication network.
Please note that certain user terminals, i.e. a computer with a browser, can have several features in common with certain content sources, such as a computer with a Web server. In these cases the terms “user terminal” and “content source” are used to specify which role they play in the network, and any known technical device or other type of equipment sending and/or receiving data are covered by these terms.
A supplier of services and/or content will from time to time have the need to analyze traffic related to the service and/or content provided. The purpose of a traffic analysis could e.g. be to measure the performance in one form or another, for example, to dimension the content source and/or modify traffic based on measured performance and/or to measure user response to new content or a new service. “Performance” should be interpreted broadly and may i.e. include response times, readability (code quality), a user's subjective perception of a Web page, etc. In some applications it is therefore relevant to measure users' subjective perception of the content or service in addition to technical parameters.
Any statistical analysis collects data representing one or several variables or parameters in a number, the following N, mutual independent experiments. Then may i.e. different hypotheses be tested and/or each parameter's effect on the overall result be estimated.
Telecommunication traffic represent a significant amount of data, especially if several parameters for each session between a content source and one or multiple (simultaneous) user terminals/users is to be registered, stored and analyzed.
The first high-level example of a service can be a website or a cable-TV provider delivering a video on demand service. Assume that the provider wants the video to be displayed to the user within a predetermined time limit, and that the practical data speed through the net is sufficient for the user to see the video uninterrupted while the remaining part of the video is downloaded in the background. It will be of interest for a provider of such services to measure a variety of parameters, such as e.g. initial loading time and number of interruptions. These and other parameters may be obtained from the user terminal without the user's involvement. There is a need to obtain data from a representative sample of user terminals in order to compile relevant statistics.
The second high-level service example is “social media” services, where users themselves provide name, age, gender and other information to the service. This information will later be used to present targeted advertising. The value of the ads will increase for advertisers, because the ads are presented to selected targeted users, in effect providing that i.e. teenage girls don't see the same ads as middle-aged men and vice versa, making the users perceive the advertisement messages as more relevant than they would otherwise have been. In order to attract readers in high volumes the service and website need to be perceived as relevant, useful and/or attractive as possible. The goal of attractiveness will fail if the web site contains a lot of irrelevant advertisements and other elements perceived by users as noise. The service provider may in this case want to measure how long the average user stays on the site, how the site actually looks in different browsers and/or a user's subjective perception of advertising and other site content.
Google Analytics is a tool to log how users use a website. The logs can in turn be analyzed in several dimensions, such as showing where user come from, which pages the users visited on the site, how long users stayed on any given web page etc. This system is today one of the most common web analytics tools providing quantitative analysis of web users use of web sites based on technical tracking logs.
Analyzing high-level traffic data offers inherit challenges. For example, register which web pages that were sent to which user, recording telephone conversations and other related user behaviour may be in conflict with both national regulations and the supplier's privacy policy. It is at the same time important to ensure a representative sample size from a population to provide reliable analyzes.
Unauthorized tracking of a user's Internet behaviour using the so-called spyware in the purpose of analysis and sending unsolicited advertising, so-called spam, is a widely known problem. Such unauthorized user tracking is at best perceived negatively and can be used for fraud or other criminal purposes. A professional supplier of content and/or services will usually not want to be associated with such activities.
A given number of dynamic parameters involving high-level traffic are not necessarily relevant in analyzing traffic. For example, analyzing traffic to and from a particular website may be relevant in one given study, while other, older and/or more proven web sites may be irrelevant. To simplify the analysis and improve precision such irrelevant parameters may be filtered out in the collection phase of the study, rather than first saving the data and then discard it in the analysis phase.
In the following the term “one question” is used to describe a request for a given (statistical) parameter for a given webpage, website or service. Request for the same parameter, i.e. response time, from two different services are therefore two different questions. Similarly, requests for two different parameters, i.e. response time and a user's subjective perception of one service, are two separate questions. A collection of such questions is in the following called, “questionnaire”. It should be noted that some parameter values asked about in such a “survey”, for example response time and other time values, code quality, etc., may partly be filled in by the user terminal without human intervention, while other parameter values, such as subjective perceptions of the content on a Web page, must be supplied by a user. In both cases the returned response will derive from the user terminal.
One purpose of the presented invention is therefore to provide a solution that avoids the aforementioned problems seen in known technology, namely a solution to achieve representative high-level traffic between a content source and a user terminal where privacy and/or anonymity matters are addressed.
Another aspect of the presented invention is to increase the accuracy of the analysis by removing noise from irrelevant parameters.