The growing capability of computer systems to detect how their users are feeling is changing the way humans and computers interact. More and more computer applications are utilizing their users' affective responses, as expressed via sensor measurements of physiological signals and/or behavioral cues, in order to determine the users' emotions. This capability enables applications, often referred to as affective computing applications, to determine how users feel towards various aspects of their interactions with the computers in order to improve the user experience.
Users often spend a lot of time interacting with computer systems through various platforms (e.g., desktops, laptops, tablets, smart phones, augmented reality systems). Through their numerous interactions with computer users may be exposed to a wide array of content: communications with other users (e.g., video conversations and instant messages), communications with a computer (e.g., interaction with a user's virtual agent), and/or various forms of digital media (e.g., internet sites, television shows, movies, and/or interactive computer games). Affective computing systems can measure the user's affective response to content the users consume, and analyze the information. This can enable the affective computing applications to improve the user experience, for example by selecting and/or customizing content according to the user's liking.
A factor upon which the success of affective computing applications often hinges is the availability of models that can be used to accurately translate affective response measurements into corresponding emotional responses. Creating such models often requires the collection of ample training data, which includes samples taken from similar situations to situations with which the models are to be used. For example, the samples for training may include a pair of (i) affective response measurements obtained by measuring a user with sensor, and (ii) labels describing the emotional response of the user while the measurements were taken. The samples may then be utilized, for instance by machine learning methods, in order to create a model for predicting emotional response from affective response measurements.
A general principle often observed when creating models from data is that the more data available, the more accurate the models crated from it become. In addition, the more similar the training data is to the instances on which the model is to be applied, the more accurate the predictions using the model are likely to be.
For example, a model that predicts emotional response from facial expressions may be trained on a set of samples that include images of faces coupled with a label describing the emotion expressed in the images. Generally, such a model would be a better model if it were trained on a large set of samples that capture a diverse set of faces and emotions, rather than if it were trained on a significantly smaller set of samples that capture a much limited set of faces and emotions. In addition, if used to predict emotions from a face of a specific user, the model would likely be better if at least some of the training samples involved the specific user; this could help the model account for unique characteristics such as the shape of the user's face, specific mannerisms, and/or types of facial expressions. Furthermore, if used primarily in day-to-day situations, the model would probably perform better if trained on samples acquired spontaneously during the day-to-day situations rather than samples acquired in a controlled non-spontaneous manner. The conditions of the spontaneous day-to-day samples (e.g., environment, background, lighting, angle of view) might be significantly different from settings in controlled sample acquisition (e.g., when the user sits down and is prompted to express emotions). In addition, the expressions of emotion in day-to-day scenarios are likely to be more genuine than the ones expressed in controlled sample acquisition. For instance, facial expressions (such as micro expressions) are difficult to create accurately on cue; and thus, may not be the same as the spontaneous expressions.
One of the problems with collecting samples of affective response corresponding to spontaneous expressions of emotions, in day-to-day scenarios, is that it is often difficult to collect such data. It is not always possible to know when spontaneous expressions of emotion are likely to occur, and when expressions of emotion occur, it is not always possible to tell what the circumstances are or what type of emotion is expressed. Thus, collecting such training data may require a certain amount of manual curation in order to determine what type of emotion is expressed, if any. This may become impractical on a large scale, such as when collecting a large body of training data from a user or collecting data for a large group of users. In addition, it may inconvenience users, for instance, if they need to be actively involved in selecting the samples by providing appropriate labels to the samples.
The aforementioned limitations emphasize a need to create an automated method for collecting samples that include measurements of a user's affective response coupled with labels describing the user's corresponding emotional response.