An online synchronized content management system, such as Dropbox® from Dropbox Inc. of San Francisco, Calif., allows its users to store and synchronize data on a cloud-based storage and across multiple client devices. Thus, a user may upload a personal folder to the content management system, and then share the folder on multiple user devices by having duplicate copies of the folder on each of the devices. The instances of the shared folder, though may be residing on different devices, can be kept synchronized. In other words, through the process of synchronization, the contents of the shared folder on multiple client devices can be kept identical. Even the slightest modification made by the user to one of the instances of the folder can automatically be replicated in other instances of the folder in a matter of seconds.
However, data synchronization across multiple client devices has traditionally been handled by a central server such as the content management system. More specifically, a change in content of a shared folder on one client device would be first replicated in the instance of the folder stored on the central content management system. The central server would then synchronize the shared folder with every other client device. Although this method of data synchronization may be relatively straightforward and intuitive to implement, it has several drawbacks.
First, too much burden can be placed on the central distributor. In other words, since every single client device must rely on the content management system to perform the task of data distribution, the content management system must bear the brunt of the workload. Therefore, the content management may become overburdened to handle an excessive amount of data processing and an overwhelming amount of data transfers. Second, the stability and robustness of the system can be compromised. Since every data synchronization job needs to go through the central distributor, the entire system can be paralyzed if the central distributor suddenly becomes unavailable. Third, connecting to and communicating with the central distributor can be costly. Client devices often need to connect to the central server via a wide area network (WAN) such as the Internet. Wide area networks can be, in general, less reliable and of inferior performance than a local area network (LAN).
Accordingly, some content management systems, such as Dropbox®, allow their client devices to synchronize with each other, especially over a LAN. Synchronizing data over a LAN connection (also called a “LAN sync”) may have several benefits over synchronizing through a central distributor. Due to the relatively smaller geographical area that a typical LAN occupies, a LAN generally offers better performance and reliability than a WAN. A LAN, in general, can also be more configurable and customizable. Moreover, communicating over a LAN may be more cost-effective because there is no need to pay additional bandwidth or subscription fees for Internet communication.
However, in order for a client device to synchronize with another client device over a LAN, the client device needs to know which other client devices may be available for communication on the local network and which shared folders may be available for synchronization on each of those client devices. In other words, clients may want to find other clients on the local network that share the same shared folders. One way to obtain such information is for each client device to broadcast the availability information for the shared folders. Thus, each client device would essentially announce to every other client device which folders are available for synchronizing. Sometimes each client device on the local network may announce its shared folders by broadcasting a list of the namespace identifiers (ns_id) corresponding to those shared folders. This can allow each client device on the network to assemble a full list of all the shared folders available on the local network.
However, broadcasting folder information in the clear may have drawbacks. The first concern is privacy. Particularly, a client device may discover those shared folders on the network that a user of the client device does not have permission to access. This may allow an attacker to track individual user accounts and learn which devices host the same shared folders as the attacker.
The second concern is the integrity of the broadcast messages. In particular, a malicious client device could collect the full list of ns_id's and rebroadcast the list in order to induce other client devices to connect to the malicious client device. The malicious actor may also be able to spoof a shared folder announcement by broadcasting a fake yet valid announcement message with an arbitrary set of namespace identifiers. This can allow the attacker to direct LAN synchronization traffic to the attacker-controlled machine. Once a connection is established through such deception, the malicious device may become a vector that enables additional attacks against the unsuspecting peers, such as mining data, tampering data, injecting malicious code, etc. Third, since a peer initiating a peer-to-peer (P2P) connection (i.e., client peer) does not authenticate the peer it is connecting to (i.e., server peer), an attacker-controlled server peer can see hashes of files requested by the other peers and can tell if the peers are requesting certain known files.