The Internet is a worldwide, publicly accessible network of interconnected computer networks that transmit data by packet switching using the standard Internet Protocol (“IP”). The “network of networks” consists of many smaller domestic, academic, business, and government networks that together enable various services, such as electronic mail, online chat, file transfer, and the interlinked web pages and other documents of the World Wide Web.
It has become very popular to distribute video and audio over the Internet. Especially since broadband Internet access has become more common, media clips are often embedded in or linked to web pages. Currently, there are a multitude of media clips available online, with new websites frequently springing up offering online media to users. One of the most popular of these sites is YouTube, provided by Google, Inc. of Mountain View, Calif., which features both media produced by established media sources and also media produced by small groups and amateurs. Between March and July of the year 2006, YouTube alone grew from 30 to 100 million views of videos per day.
In addition to dedicated video sharing sites such as YouTube, many existing radio and television broadcasters provide Internet ‘feeds’ of their live audio and video streams (for example, the British Broadcasting Service). Broadcasters may also allow users to time-shift their viewing or listening.
Because there are so many websites providing online media, an Internet-connected device, such as a computer, game console, set-top box, handheld computer, cell phone, or other device, can be used to access on-line media in much the same way as was previously possible only with a television or radio receiver.
One of the most common formats used to distribute media on the Internet is the Flash Video or FLV format. Other common formats include Windows Media Video, RealMedia, Quicktime, and DivX. Online media encoded in many of these formats, including Flash Video, either can be streamed to a web browser or other client for online viewing, or they can be downloaded to a storage device. Many users prefer to download pieces of media rather than stream them for a variety of reasons: a user may prefer to watch or listen to the media at a time when he or she may be offline; a user may prefer to archive a copy so that he or she will be able to watch or listen to the media in the future even if the website currently hosting it goes down; a user may prefer to share the file itself with others, rather than share a link to the file; or a user may prefer to download the media for a myriad of other reasons.
There are many ways to download media content from a web site. If the site provides a direct link to the file on the rendered page, downloading a media file may be as easy as right-clicking and selecting “Save Link As . . . ” or some similar command. There are also well-known methods for extracting explicit links from the HTML source of a web page, even if a web site does not render a direct link. However, it is a common practice for websites to make it difficult for users to download media files. One technique to make downloads difficult is to link to or embed a “wrapper” in a web page rather than directly linking to or embedding a media clip. Common examples of media wrappers include JavaScript media players, Flash media players, ActiveX media players, VBScript players, and the like.
JavaScript is the name of the Mozilla Foundation's (of Mountain View, Calif.) implementation of the ECMAScript standard, a scripting language based on the concept of prototype-based programming. The language is best known for its use in websites (as client-side JavaScript), but is also used to enable scripting access to objects embedded in other applications.
Adobe Flash, or simply Flash, refers to the Adobe Flash Player from Adobe Systems Inc. of San Jose, Calif. The Flash Player is a client application available in most common web browsers. It features support for vector and raster graphics, a scripting language called ActionScript and bi-directional streaming of audio and video. There are also versions of the Flash Player for mobile phones and other non-PC devices such as Internet Tablets, personal digital assistants, the Kodak Easyshare One camera from Eastman Kodak Co. of Rochester, N.Y., and the Playstation Portable, From Sony Corporation of Japan, among others.
ActiveX is a term that is used to denote reusable software components that are based on the Microsoft Component Object Model (COM) from Microsoft Corp. of Redmond, Wash. ActiveX controls provide encapsulated reusable functionality to programs and they are typically but not always visual in nature. Example ActiveX controls include: Adobe Reader and Adobe Flash Player from Adobe Systems Inc. of San Jose, Calif.; QuickTime Player from Apple Inc. of Cupertino, Calif.; Microsoft Windows Media Player from from Microsoft Corp. of Redmond, Wash.; RealPlayer from RealNetworks, Inc. of Seattle, Wash., and Java Virtual Machine from Sun Microsystems, Inc. of Santa Clara, Calif.
VBScript (short for Visual Basic Scripting Edition) is an Active Scripting language interpreted via Windows® Script Host from Microsoft Corp. of Redmond, Wash. When employed in Microsoft Internet Explorer, VBScript is very similar in function to JavaScript—it processes code embedded in HTML. VBScript can also be used to create stand-alone HTML applications (file extension .hta).
Popular web sites such as YouTube, Vimeo, and Grouper currently make use of wrappers to obfuscate media assets. There are methods known in the art to download media assets that are obfuscated behind a wrapper, but existing methods all fall short in a number of ways.
One known method identifies media assets by comparing the content on a given web page with a catalog detailing the technical methods used by certain web sites to obfuscate media assets. If the web page in question is hosted by a known web site, then this method may be able to identify and allow the user to download a media asset that is obfuscated on that page. For example, the catalog might allow one to deduce that the web page http://hiddenvids.com/video/foo would contain a video asset located at http://secret.hiddenvids.com/video/foo.mpg. This method may be implemented on a client, for example, as a browser plug-in, or it may be implemented using a proxy server that serves as an intermediary between the client and the target web site.
Disadvantages to this method include the facts that it requires that each individual web site be cataloged and that it requires each individual catalog entry to be continually monitored so that it can be updated when the web site operator changes its obfuscation strategy. For example, this method would have no way of discovering media assets on a new web site until that site's obfuscation schemes were analyzed and input into the catalog. In addition, this method would fail to identify media assets even on known web sites if the web site operator changes its obfuscation scheme in even a simple way such as changing the name of the host that stores the media assets.
Another known method is “stream ripping.” Using this method, a client intercepts the data stream corresponding to a streaming media asset, allowing the user to redirect that data to a file on a storage medium. One disadvantage to this method is that the media asset may be streamed in real time, meaning that it would take up to thirty minutes to capture a thirty minute media asset.