State of the art human computer interaction (HCI) systems include a wide range of input systems that focus upon human speech recognition to enable human users to interact with computerized systems. However, in some environments a computing system receives input from non-human users. For example, the so-called “Internet of Things” (IoT) provides computing and networking services to a wide range of objects that interact with each other in different environments. One use of the IoT is to monitor the activity of users within an environment and the status of multiple objects in the environment, such as appliances in a kitchen or power tools used in a workshop. One drawback to traditional IoT implementations is that they require a large number of “smart” devices where each “smart” device is a computing device that typically incorporates one or more sensors incorporated into an appliance, power tool, or other device to monitor the operation of the device and communicate with other smart devices. Many objects that do not fit the definition of a “smart” device are regularly used in different environments, however. Additionally, even environments that include smart devices may require additional monitoring of events that occur in the environment outside of the traditional sensing and communication capabilities of the smart devices.
One solution to monitor environments includes deployment of different sensors in the environment, such as audio and video sensors. Of course, closed-circuit camera systems are often used for security monitoring, but intrusive video monitoring is often undesirable in many situations such as in private homes. Monitoring sounds in an environment to identify different events that occur in the environment can be less intrusive than video monitoring. However, prior art audio monitoring systems focus on detection of very narrow classes of actions for only a single object in an environment. For example, many alarm systems use glass break sensors that are specifically configured to detect the event of glass breaking, and each glass break sensor often monitors only a single window. The existing systems are not capable of identifying more complex events that include the interaction of multiple objects in an environment and that may occur over prolonged time periods. Consequently, improvements to audio monitoring systems for analysis of events based on sounds from multiple non-human physical objects would be beneficial.