The Internet is becoming increasingly exploited by users and consumers for sharing data such as personal photos or videos. Multimedia resources available via the Internet are often associated with metadata. For example a video file may have associated metadata such as subtitles, comments, annotations, ratings etc. An Internet trend, particularly in relation to social networks, is to comment, rank, share and recommend online videos to other users. As a consequence more and more metadata associated with online multimedia data such as videos is generated, made available online and later aggregated with the online multimedia data. Other examples of metadata associated with multimedia data such as video data may include output data provided by object content recognition modules or information obtained from sensors embedded in a video recording device. When retrieving a part of a recorded and analysed video, a user may be interested in obtaining, in addition to the video data, the analysis results for different purposes (augmented display, training, monitoring . . . ).
In 2008, W3C initiated a standardization process in order to specify an addressing scheme based on a uniform resource identifier (URI) mechanism. URI is an addressing scheme for identifying resources on the Internet and also provides features to address sub parts of a resource using what is referred to as a fragment identifier. Based on the fragment identifier, the Media Fragments working group is defining syntax to address specific parts in media resources available on the internet. This syntax is referred to as Media Fragments URI. The specification defines the fragment syntax as well as the communication protocol to enable efficient transmission, between a client and a server, of media subparts that are addressed using the Media Fragments scheme. Indeed, a Media Fragment-aware server will interpret the Media Fragments request and will serve only the relevant part of the resource, thus optimizing the usage of network bandwidth and saving user processing resources. Moreover, this addressing scheme enables requests to be expressed that are independent of the media representation format.
The Media Fragments URI specification defines syntax for addressing specific parts of multimedia resources on the Web. Several kinds of addressing co-exist: temporal, spatial, track or through id.
Temporal addressing relates to time-varying resources such as video or audio streams, for example, and enables a specific time-segment or starting-point in a media resource to be referenced. The following are examples of temporal Media Fragment addressing in a video resource:    http://contentServer/video.mp4#t=10,20    http://contentServer/video.mp4#t=35
The first URI references a temporal segment starting at the tenth second, having a duration of 10 seconds, and extracted from “video.mp4” resource that is located on “contentServer”.
The second URI references, for the same video resource, a temporal segment starting at time t=35 seconds (until the end).
Spatial addressing enables a spatial area in a video or in a picture to be referenced. The syntax is simple and specifies position of the top-left corner (x,y) followed by the width and height of the selected area. The following are examples of spatial Media Fragments:    http://contentServer/video.mp4#xywh=45,70, 480, 340    http://contentServer/picture1.jpg#xywh=percent:25,25,50,50
The first URI references a spatial area from point (45, 70), with a size of 480×340 pixels, extracted from “video.mp4” resource that is located on “contentServer”.
The second URI references, for a picture, a spatial area starting at point whose coordinates fall at 25% of original width and 25% of original height with a size of 50% in each dimension compared to the original size.
Track addressing enables one or more tracks to be referenced in the resource as illustrated in the examples below provided that the list of tracks composing the multimedia resource is known.    http://contentServer/movie.mp4#track=video    http://contentServer/movie.mp4#track=video&track=audio_fr
The first URI references the video track of the resource called “movie.mp4”, while the second references both video and audio—fr tracks. 
id-based addressing enables one to reference a specific part of a multimedia resource that has been a priori indexed. Example:    http://contentServer/movie.mp4#id=Georges_kissing_Catherine
This URI references a temporal segment of the “movie.mp4” resource, potentially combined with a track selection and/or a spatial area (for example to focus on the characters) that the author has indexed with the identifier:    “Georges_kissing_Catherine”.
However, applying a URI with the fragment identifier “#” to request so-referenced Media Fragments does not allow the media server to be informed of the specific part that is requested by a user. Indeed, the content of fragment identifier is by definition processed at the user-side since it is not transmitted from the client to the server. As a result the Media Fragments working group defined, in its specification, a transport protocol for Media Fragments request/response, for example on HTTP protocol. This consists in mapping the Media Fragments URI; i.e. the fragment identifier onto the existing HTTP Range header. The client then issues a traditional GET HTTP request to the server with the so-filled Range header to inform the media server of the requested parts in the media resource. Receiving the request, the server then translates the HTTP Range parameters into extraction parameters, for example from time duration 10 seconds to 20 seconds, to perform on the resource. The server then extracts the requested content (or a subset, depending on random accessibility of the resource, or the whole resource if no extraction is possible), sends it back to the client using an HTTP response with a specific new HTTP header: Content-Range-Mapping to indicate what has actually been extracted with its correspondence in byte ranges. This provides an efficient way to exchange pieces of multimedia resources between a client and a server.
For XML metadata, XPath language has been specified by W3C. This language defines syntax for writing queries to extract specific parts of an XML document. XPath 1.0 language defines 4 data and 7 node types. XPath syntax also defines a grammar for building expressions that will be applied to an XML document in order to extract parts of interest. XPath expressions can be gathered into 2 categories as follows:
1. <<Navigation Expressions>>:
These are expressions returning as an evaluation result an ordered set of nodes: essentially LocationPath and Steps that correspond to the specification of a path to resolve into a tree representation of an XML document.
2. <<Computation Expressions>>:                a. Expressions returning a Boolean: OrExpr, AndExpr, RelativeExpr, EqualityExpr.        b. Expressions returning a number: AdditiveExpr, MultiplicativeExpr.        c. Expressions returning any kind of type: FilterExpr and FunctionCall.        
In order to be able to extract an XML fragment from an XML document, at the time of writing of the XPath expression the content of the XML document should be known. This can be achieved by knowledge of XML schema information or by means of a user interface displaying the structure of the document so that a user can select parts of interests.
The Media Fragments URI techniques discussed above do not however focus on metadata addressing, except when metadata is embedded into the media resource and reachable via #track addressing. Metadata distinct from the media resource is the most common case on the Internet.
Moreover, considering the variety of metadata and user generated content on the internet that may be related to a video file for example (e.g. comments, ratings, annotations, subtitles etc.) an XPath technique alone is not sufficient to extract Media Fragments from metadata documents since XPath techniques do not enable “blind” addressing (i.e. without knowledge about the document organization).
When a user requests specific parts of a video or multimedia presentation from an online server, it would be desirable to be able to filter the metadata associated with the video or multimedia data by extracting only the relevant parts that correspond to the requested part in the video or multimedia presentation. This would avoid the transmission of useless information over the network and would limit the buffering requirements at the client-side.