Studies have shown that a healthy diet can significantly reduce the risk of disease. This may provide a motivation, either self-initiated or from a doctor, to monitor and assess dietary intake in a systematic way. It is known that individuals do a poor job of assessing their true dietary intake. In the kitchen when preparing a meal, one can estimate the total caloric content of a meal by looking at food labels and calculating portion size, given a recipe of amounts of ingredients. At a restaurant, estimating caloric content of a meal is more difficult. A few restaurants may list in their menus the calorie value of certain low fat/dietary conscience meals, but the majority of meals are much higher in calories, so they are not listed. Even dieticians need to perform complex lab measurements to accurately assess caloric content of foods.
Human beings are good at identifying food, such as the individual ingredients of a meal, but are known to be poor at volume estimation, and it is nearly impossible even of one had the total volume of a meal to estimate the volume of individual ingredients, which may be mixed and either seen or unseen. It is difficult for an individual to measure nutritional consumption by individuals in an easy yet quantitative manner. Several software applications, such as CalorieKing™, CaloricCounter™, etc., are of limited value since they perform a simple calculation based on portion size which cannot be accurately estimated by users. Veggie Vision™ claims to automatically recognize fruits and vegetables in a supermarket environment during food checkout. However, there are few, if any, published technical details about how this is achieved.
Automatic image analysis techniques of the prior art are more successful at volume computation than at food item identification. Automated and accurate food recognition is particularly challenging because there are a large number of food types that people consume. A single category of food may have large variations. Moreover, diverse lighting conditions may greatly alter the appearance of food to a camera which is configured to a capture food appearance data. In F. Zhu et al., “Technology-assisted dietary assessment,” SPIE, 2008, (“hereinafter “Zhu et al.”), Zhu et al. uses an intensity-based segmentation and classification of each food item using color and texture features. Unfortunately, the system of Zhu et al. does not estimate the volume of food needed for accurate assessment of caloric content. State of the art object recognition methods, such as the methods described in M. Everingham et al., “The PASCAL Visual Object Classes Challenge 2008 (VOC2008),” are unable to operate on a large number of food classes.
Recent success in recognition is largely due to the use of powerful image features and their combinations. Concatenated feature vectors are commonly used as input for classifiers. Unfortunately, this is feasible only when the features are homogeneous, e.g., as in the concatenation of two histograms (HOG and IMH) in N. Dalal et al., “Human detection using oriented histograms of flow and appearance,” ECCV, 2008. Linear combinations of multiple non-linear kernels, each of which is based on one feature type, is a more general way to integrate heterogeneous features, as in M. Varna and D. Ray, “Learning the discriminative power invariance tradeoff,” ICCV, 2007. However, both the vector concatenation and the kernel combination based methods require computation of all of the features.
Accordingly, what would be desirable, but has not yet been provided, is a system and method for effective and automatic food volume estimation for large numbers of food types and variations under diverse lighting conditions.