Many Web applications have a file-sharing feature that allows users to share files by uploading them to, and downloading them from, a Web-accessible file repository. Some Web applications provide file-sharing as their only purpose, while others provide file-sharing as one of several collaboration tools. Examples of the former include those provided by My Docs Online (available at mydocsonline.com) and Box.net. Examples of the latter include Yahoo Groups (available at groups.yahoo.com) and a file-sharing application provided by Webex (available at webex.com).
A Web application with a file-sharing feature may be implemented using a single computer, or multiple interconnected computers. One or more computers used by the Web application are Web server computers that accept HTTP requests sent over the internet, either in plain text, or over an encrypted SSL/TLS connection. Such a Web server computer is hereinafter referred to as a “front server” (even if the application is implemented using a single computer). To download a file from the file repository of the application, a user uses a client computer equipped with a Web browser to send an HTTP request to a front server of the application. The request is addressed to a Uniform Resource Locator (URL). The URL has a hostname portion that is mapped by the Internet domain name system (DNS) to an Internet Protocol (IP) address, causing to request to be routed to a front server to which the IP address has been assigned. By configuring or programming a name server within the DNS, the application provider controls what requests are routed to what front servers.
Web applications with a file-sharing feature commonly have multiple instances. Each instance has its own file repository and its own group of instance users who are allowed to access the file repository. For example, an instance of the Yahoo Groups application is a particular Yahoo Group. (Such instances are sometimes referred to as “virtual application instances”, to distinguish them from “physical application instances”, which are independent implementations of the same application; but physical application instances need not be further discussed herein, and the term “application instance” hereinafter refers to a “virtual application instance”.)
Henceforth the term “user”, unqualified, shall mean “application instance user”; the term “user file” shall refer to a file uploaded by a user to the file repository of an application instance; and the term “shared file” shall be synonymous with “user file”.
When a user downloads a user file from an instance repository, using a Web browser running on a client computer, the file may be saved to disk, opened by an application running on the client computer, or displayed by the browser. HTML files, in particular, may be displayed by the browser. An HTML file may contain client-side code, such as scripts written in JavaScript, Java applets, or ActiveX controls. Client-side code contained in an HTML file may be executed when the file is displayed by the browser.
The fact that HTML user files can be directly displayed by the browser and may contain client-side code is a valuable feature. But it is also dangerous, because it makes the Web application vulnerable to attacks through client-side code contained in user files uploaded by malicious users (“cross-user attacks”). There are two types of cross-user attacks: cross-user spoofing attacks, and cross-user scripting attacks.
When an HTML user file is displayed, the address box of the browser (a.k.a. the address bar or the location bar) shows a URL used to download the file. Since the file is downloaded from a Web server belonging to the organization that provides the application, the URL has a hostname portion that belongs to a DNS domain owned by the provider. This may allow a malicious user to upload an HTML file that, when downloaded by a second user, masquerades as a page pertaining to the Web interface of the application (an “application page”) or to the Web site of the provider (a “site page”). It could, for example, masquerade as a login page, tricking the second user into entering his/her login credentials, which may then be sent to the malicious user by client-side code in the HTML user file. This type of attack is hereinafter referred to as a “cross-user spoofing attack”.
The “domain” of a file or Web page downloaded from the Web is defined by the entire hostname portion of the URL used to download the file. The file or page is said to belong to, or originate from, that domain. Client-side code contained in the file is also said to belong to, or originate from, the domain of the file.
The “same-origin policy” followed by Web browsers (cf. http://en.wikipedia.org/wiki/Same_origin_policy) prevents client-side code contained in an HTML user file of an application instance from accessing the contents of files or Web pages originating from a domain other than its own. But, if no precautions are taken, the HTML user file may belong to the same domain as site pages, application pages and other user files of the same application instance or other instances of the same application.
Thus, an attacker who is a user of an application instance may upload an HTML user file containing malicious client-side code (“malicious file”). When the malicious file is downloaded by a second user of the same instance (“victim user”), the malicious client code may take advantage of one or more authentication cookies set in the victim user's browser to run under the identity and with the privileges of the victim user for the benefit of the attacker (“cross-user scripting attack”).
The malicious code may be able to access application pages and obtain data that the victim user is entitled to see but the attacker is not, and may send such data to the attacker; in particular, it may be able to obtain the values of so-called hidden input fields of HTML forms used by the application to protect against cross-site request forgery (CSRF) attacks (cf. http://en.wikipedia.org/wiki/Cross-site_request_forgery), and then use those values to submit forms to the application under the identity and with the privileges of the victim user (“cross-user impersonation attack”).
If the Web application implements a system of file permissions, the malicious code may be able to access a protected file that the victim user, but not the attacker, is permitted to access, and may communicate the contents of the protected file to the attacker (“protected file attack”).
The malicious file may be downloaded, and thus the attack may continue, even after the attacker's authorization to use the application instance has been revoked and the attacker is no longer a user of the instance (“ex-user scripting attack”). Furthermore, if the second user is also a user of a second instance, the attacker may be given access to that instance as well by the malicious code uploaded to the first instance (“cross-instance scripting attack”).
Files other than HTML files may also contain client-side code that is automatically executed by a Web browser when such a file is displayed by the browser. This is the case, e.g., for PDF files containing JavaScript code, or SWF files containing ActionScript code. One way of preventing cross-user scripting attacks is to limit file sharing to types of files, such as image files, that do not contain automatically executed client-side code. This, however, is a severe limitation. Furthermore, the limitation may be ineffective for browsers that practice “content sniffing”, notably Internet Explorer (IE). To “sniff the content” of a file means to heuristically determine the type of the file by examining its content, ignoring the type explicitly declared in an HTTP Content-Type header by the front server that downloads the file and the type implied by the file name extension. A malicious file containing HTML data including malicious client-side code may overcome the limitation by masquerading as an image file: the malicious user declares the file to be an image file when the file is uploaded; the front server that downloads the file declares it to be an image file; but the browser that receives the file sniffs its content, decides that it is an HTML file, and executes the malicious script. (The latest version of Internet Explorer, IE8, has introduced an HTTP header, “X-Content-Type-Options” that addresses this issue: the front server can add this header with value “nosniff” to prevent content-sniffing by the browser.)
Another way of preventing cross-user scripting attacks is to prevent browsers from displaying user files as they are downloaded. This is however a severe limitation on the functionality of the Web application, as files such as HTML or PDF files are meant to be displayed directly by the browser, and users expect this to happen. Furthermore, it is not possible to reliable prevent all browsers from displaying a file as it is downloaded. The front server that downloads the file can prevent IE8 from displaying the file by using a “Content-Disposition” HTTP header with value “attachment” and a “X-Download-Options” HTTP header with value “noopen”; but the “X-Download-Options” header is not recognized by earlier versions of IE.
Cross-user scripting attacks can be considered to be a special case of cross-site scripting attacks (XSS). The traditional defense against a XSS attack is to remove malicious scripts from HTML content, or to disable the scripts by HTML-encoding certain characters (e.g., the character “<” is HTML-encoded by replacing it with the string “&#3C;”). These defenses, however cannot be used to prevent cross-user scripting attacks through shared files because that would require the Web application to modify shared files, which it is not expected to do.
An alternative defense against XSS has been proposed for the case where HTML content that may contain malicious scripts (“untrusted content”) cannot be modified. The alternative defense consists of using a different hostname to download untrusted content; i.e., to address requests for untrusted content to a URL whose hostname portion is different from the hostname portion of a URL used to download other content. This alternative defense is briefly mentioned in a few places on the Web:                Jesse Ruderman suggests it in a Web page entitled “Security Tips for Web Developers”, available at http://www.squarefree.com/securitytips/web-developers.html;        Brian Eaton suggested it in an email message to the Web Security mailing list dated May 18, 2007, available in the list archive at http://www.webappsec.org/lists/websecurity/archive/2007-05/msg00087.html; and        Simon Bohlin suggested it in a comment dated May 19, 2008 on a Google article entitled “ArticleUntrustedDownloads” (sic), which was evidently incorporated into the article; the article and the comment can be found at http://code.google.com/p/doctype/wiki/ArticleUntrustedDownloads.        
In the context of a Web application with a file sharing feature, the alternative defense would consist of using different hostnames to download user files and application pages. Malicious code in a user file would then have no access to application pages. It would still, however, have access to other user files. Furthermore, the traditional means of authenticating requests using an authentication cookie would not work for user files, since a Web browser would not send the authentication cookie along with a request that targets a URL with a different hostname portion. User files would thus be unauthenticated and directly accessible by an attacker without any need to embed a malicious script in a user file.
A variation on the alternative defense is to use a different port number instead of a different hostname. This was done in the Acmemail Webmail project in 2001, as reported by Peter Watkins in an email message to the Web Security mailing list, available in the list archive at http://www.webappsec.org/lists/websecurity/archive/2008-11/msg00032.html.
This inventor conducted an informal survey of Web applications having a file sharing feature in May 2007. The following applications (identified by the names of the domains where they can be found) were included in the survey: groups.yahoo.com, sharefile.com, webex.com, box.net, mydocsonline.com, httpd-com.com, filesanywhere.com, file-works.com, punchwebgroups.com, and biscu.com. The survey found the following:                The Yahoo Groups application at groups.yahoo.com uses the hostname groups.yahoo.com to download application pages, while using a variety of domain names, all ending in “.yahoofs.com”, to download user files. The application is protected against cross-user impersonation attacks; but access to user files is not authenticated, even though the user files of a group are supposed to be available only to members of the group. To mitigate the risk that non-group-members may gain access to user files, the URL of a user file contains a very long path segment that is hard to guess. This, however, does not eliminate the risk; and it has usability drawbacks: the long path segment makes it awkward to use the URL to construct a link to the file, and makes it impossible to remember the URL.        Two of the applications, sharefile.com and webex.com, were found to use the same hostname portion in user file URLs and application pages URLs, while embedding an application instance ID in the hostname. Because of the application instance ID, these applications should be protected against cross-instance attacks. On the other hand, because the same hostname is used for user files and application pages, it is likely that they are not protected against other cross-user scripting attacks, including ex-user attacks, cross-user impersonation attacks and, if applicable, protected file attacks; nor against cross-user spoofing attacks.        The remaining seven applications in the survey were found to use a single hostname for the URLs of user files and application pages of all application instances. They are thus likely to be unprotected against all the types of cross-user attacks described above.        
The present embodiments enable protection of a Web application having a file sharing feature against cross-user attacks without sacrificing usability. It has evolved over a period of time, and elements of it have been published and publicly discussed:                a first Provisional Patent Application, No. 60/934,270, filed on Jun. 12, 2007 but not made public at that time, disclosed some elements of the present embodiments;        a first white paper, based on the first Provisional Patent Application, entitled “Collaborative File Sharing Security: Attacks and Countermeasures” was published on Jan. 17, 2008, by being posted on the Pomcor Web site (www.pomcor.com);        a second Provisional Patent Application, No. 61/061,107 (cross-referenced above by this patent application) was filed on Jun. 12, 2008;        a second white paper entitled “Protecting a Web Application Against Attacks Through HTML Shared Files”, based on the second Provisional Patent Application, was published in November 2008;        an email message announcing the second white paper was sent by this inventor to the Web Security mailing list (websecurity webappsec.com) and the Bugtraq mailing list (bugtraq@securityfocus.com) on Nov. 7, 2008; this resulted in an online discussion of the merits of the present embodiments and of related prior art. The complete discussion can be found in the November 2008 archive of the Web Security mailing list, available at http://www.webappsec.org/lists/websecurity/archive/2008-11/, under the thread name “countermeasure against attacks through HTML shared files”.        