Recently in the shape analysis field, an increasing effort has been devoted to obtaining a functional understanding of 3D objects from their geometries and interactions. In this setting, the functionality of an object is learned by analyzing how humans or virtual agents may interact with the object and how closeby objects are related to it geometrically. Typically, such knowledge is acquired from static snapshots of the object and its surroundings, e.g., a chair with a human sitting on it or a table with several objects on top. In a first attempt, Pirk et al. [1] describe object functionalities by capturing and analyzing dynamic object trajectories, e.g., the motion of a moving agent attempting to sit on a chair. Yet, in all of these previous works, the central object maintains its rigidity.
In order to solve the above-mentioned issues, the existing techniques has been adopted the object functionality analysis from static snapshot. There are several specific examples: Affordance-based methods simulate a human agent to predict the functionality of objects [2][3], or to recognize the regions of a scene that enable a human to perform certain actions [4][5]. The interaction context descriptor [6] and functionality models learned for categories [7] consider more general object-object interactions.
Although some of those methods [2][3][4][5] involve the dynamics of human interactions, they do not extend to more general types of object affordances. In the methods [6][7] mentioned above, object-object interactions are static in nature.
In order to solve above mentioned issues, the existing techniques preform object functionality analysis from dynamic interactions. The recent work of Pirk et al. [1] performs functionality inference from dynamic interaction data. The key difference to our work is that they characterize functionalities of static objects by analyzing dynamic interactions, e.g., how a cup can be used in the dynamic action of drinking coffee. However, the analyzed objects are not dynamic themselves. As a consequence, their analysis is performed at the object-level, and not at the part-level as in our work. Moreover, a line of works in the literature target the capture of dynamic interactions. Kry and Dinesh [8] propose a method to acquire the details of hand interactions. The work focuses on the use of specialized hardware for acquiring the interactions, and does not leverage the motion information to represent the functionality of objects. Recent work in computer vision [9] aims at capturing the functionality of tools or representing general human interactions [10]. However, the focus of these works is in recognition, and thus the derived functionality representations are not intended for grouping or transferring part mobility.
In order to solve above mentioned issues, the existing techniques preform part mobility analysis from indoor scenes. The approach of Sharf et al. [11] builds a mobility tree that summarizes the support relations between objects or parts in a scene, and their relative mobility. First, the input scene is analyzed in search of repeated instances of objects or parts. Next, given a repeated model detected in distinct configurations, the method discovers the possible motions that the model can undergo.
One limitation of this approach is that it relies on the occurrence of repeated models in the input scene, appearing in different states of motion, e.g., open and closed drawers. Thus, the detected mobility cannot be easily transferred to objects that do not appear in the scene, since the motion is discovered separately for each instance.