Streamed media, particularly video, represents an increasingly large percentage of the data delivered over the Internet and other networks. As the popularity of media streaming increases, network providers, content providers, and other service providers increasing are evaluated by their customers based on their ability to deliver media at a high standard of quality. As such, it is ineffective to simply consider media as another category of traffic on the Internet, represented merely in standard network metrics, such as in gigabits per second; rather, it behooves service providers to obtain an accurate understanding of the customer's experience by analyzing how individual media sessions were delivered to the customer. However, tracking of all the possible ways media can be delivered to a device is impractical due to time and resource constraints.
A media session can be defined as a single, specific instance of an end user viewing a streaming video or audio clip on a device, as perceived by the end user, independent of how the content was delivered over the network. For example, an end user might view a short 30 second video on a media site such as YouTube, or watch a full length movie on another site such as Netflix. Both of these are examples of a media session, even though the specifics of how the content was delivered might be very different.
Conventional tools built for analyzing network traffic are of limited use for media session analysis. Initially, media streaming over the Internet was relatively simple; most streaming services used a single Transmission Control Protocol (TCP) flow and a single HyperText Transport Protocol (HTTP) request/response (that is, an interaction”) to obtain the entire media for a session. Therefore, there was a one-to-one correspondence between TCP flows and HTTP interactions, and a one-to-one correspondence between HTTP interactions and media sessions. However, streaming services have evolved to frequently use parallel TCP flows to optimize delivery, and within each flow multiple HTTP interactions are used to stream the media content in smaller fragments. In such configurations, there is a one-to-many correspondence between TCP flows and HTTP interactions, and a many-to-one correspondence between HTTP interactions and media sessions. As such, to accurately model a modern media session using conventional tools, a conventional analytical tool would have to search through the entirety of the HTTP interactions issued by a subscriber, locate all the fragments related to the specific session, and combine all the fragments in order to complete the analysis. This is a complex solution that must be performed perfectly accurately, as a single missing fragment will render a streaming media session useless. To complicate matters further, streaming services often utilize multiple servers for delivering media content, which results in an individual media session if often being composed of fragments from multiple servers, each of which uses multiple flows and multiple interactions.