In current society, it can be difficult for people to be able to spend quality time with their loved ones. Busy consumers want personalized recommendations for creative ways to spend more time with their family and friends, and for creating memorable family experiences. They would like support to plan, create and manage unique experiences for family members with given limited time, effort, and resources.
Day to day family conversations provide important acoustic context of what a family likes to do and opportune times for recommendations. However, in most conventional activity and acoustic event detection solutions, it is often cumbersome or inaccurate to relate a non-configured event without prior labels and tags. For example, many methods use Hidden Markov Models (HMM) and Nonnegative Matrix Factorization (NMF) approaches for detecting sound events, which can be very unreliable. Additionally, while some scene independent systems attempt to use a deep learning model applicable across different home scenarios in different user environments, such deep learning models are based on feed forward networks, which lack time and frequency variance. Moreover, temporal context is limited to the short time window of the spectrogram.