Most of us shop carefully for the foods we serve our families. We look for products that are high in nutritional value so our families can have a balanced diet. We are especially careful about the foods we buy for younger and older family members. We understand that eating the right things everyday is important to our health and longevity.
A medium-sized supermarket in America is a wonder of the modern world. Supermarkets offer a wider variety of foods than ever before. There, we can find beef raised in Texas, seafood caught off the coast of Washington State, oranges from Florida, kiwis from New Zealand, other foods from all corners of the globe, and a tremendous variety of prepared and packaged foods. As the wealth of products to choose increases, it becomes more difficult to make healthy and sensible choices for ourselves and our families. Often, we wish we had more guidance.
Advertising is a valuable source of information about new food products we might like to buy and serve to our families. Food product manufacturers are interested in accurately advertising their products and in formulating new products that will appeal to consumers and meet their nutritional needs. For example, there is good evidence that diets rich in whole-grain foods and other plant foods and low in saturated fat and cholesterol may reduce the risk of heart disease. It is important for consumers to know this type of information because it helps them choose and serve more healthful foods.
One way that those involved in food marketing research can ascertain how to improve the diet of a population by better meeting nutritional needs is to collect and record detailed data about what people eat. Food consumption data collected for marketing research purposes provides an in-depth, continuous record of the national population's food intake. The food industry has traditionally used such detailed food records to track consumption of specific branded food items and monitor growth of food categories.
One of the entities that has long been involved in collecting food consumption data is the U.S. Department of Agriculture (“USDA”). The USDA has collected such data since 1965 and most recently conducted a Continuing Survey of Food Intakes by Individuals (CSFII) in 1989–91 and 1994–96. The resulting data sets provide information on two-day food and nutrient intakes by approximately 20,000 individuals of all ages nationwide. The USDA survey data set includes, for example, the kinds and amounts of foods consumed by individuals on each of two non-consecutive days as well as other information (e.g., the source of food, whether the food was consumed at home or away from home, and other information including demographics of the survey participants). The USDA provides the resulting data sets on a CD-ROM along with SAS® statistical analysis programs which read the data files into SAS® and create SAS® data files for statistical analysis. See, for example, brochure, “What's On The CSFII 1994–96, 1998 CD-ROM” (USDA June 2000).
Another useful source of information concerning what we eat is the National Eating Trends® (NET) database generated by the NPD Group of Rosemont Ill. The NPD National Eating Trends® service collects food consumption data from 2,000 households annually (approximately 5,000 people) through the use of 14-day food diaries. The NPD/NET data is continuously collected throughout the year to account for seasonal changes in food intake, and provides detailed descriptions of each food consumed including brand names and descriptive nutritional attributes. This data is collected from a population group that is demographically matched and balanced by age, gender, income, race, household size, female employment status and other factors, to reflect the U.S. Census.
Panelists participating in the NPD survey record food consumed at-home and away-from-home during a 14-day period. The NET database provides consumption patterns and trends of more than 4,000 unique food and beverage products, and identifies a variety of different information including, for example:                the demographics of people who use the products (e.g., gender, age, sex, geographical region, etc.),        household demographics,        number of consumers using the product,        frequency of consumption (trended),        life cycle,        nutritional segments,        appliance used in product preparation,        when and how the product is consumed,        meal occasion associated with the foods consumed,        ingredients used,        toppings and additives added,        foods and beverages eaten alongside product,        whether product was a main dish, side dish, appetizer, dessert or snack, where product is consumed (e.g., in home versus carried from home versus away from home),        other foods and beverages more likely to be consumed by product users.        
NPD provides a range of delivery methods to present NET data to its customers. For example, electronic data delivery offers access to trended consumption information on a PC using proprietary NPD Power View® software, a Windows-based system designed for interactive multi-dimensional data analysis. Customized reports and special issue analyses are available to shed light on why consumers do what they do.
The NPD/NET data set is useful for ascertaining what foods American households are eating. The emphasis on household makes sense given that generally, foods are often purchased by one member of a household for the entire household, and meals are generally eaten more or less together within a given household. There is also a practical reason that a single member of the household (e.g., the person in charge of food preparation) generally records the required survey data for the entire household. One of the shortcomings of this emphasis on household data recording is that the diaries record how much of a given food item was served to the entire household, but do not require or permit each household member to record how much of the food he or she consumes individually.
In more detail, the survey form/diary filled out by each household asks the participant to specify how much of the food item was served to the household, how much of the amount served was actually eaten by the household, and who in the household ate the particular food. See, for example, Sample Daily Meal Diary published by NPD Group, incorporated herein by reference. This is a typical procedure for panel surveys to minimize the amount of information recorded and thus increase reliability. One reason for not requiring individual portion size recording is that attempting to require all participants to record how much food was consumed by each over a 14-day period is burdensome and might compromise the accuracy of the recording.
Another potential shortcoming of the NPD/NET dietary intake data set for certain purposes relates to the amount of nutritional information the data set provides. NPD does not attempt to provide detailed nutritional information on each food recorded in its survey. Such detailed nutritional information analysis is typically the work of food research scientists, and is not supplied in the NPD/NET data set. On the other hand, for some food research applications, it would be desirable to provide detailed information concerning the amount of each of over 100 different nutrients (including, for example, individual amino and fatty acids) we consume every day. For example, a company interested in formulating or reformulating a food product to ensure that Americans receive appropriate essential nutrients in their daily diets may want to know how much of each nutrient is consumed each day by each of the various demographic categories of individuals in the United States. Food product manufacturers and providers may also wish to obtain evidence for making advertising claims that their products should be part of your daily diet. Health specialists may wish to analyze nutrient consumption or nutrient consumption trends in the population overall, by demographically-specific segments of the overall population, or by household and/or individual, in order to try and discern correlations between nutrient consumption and disease. Many other applications call for detailed knowledge of the amount of nutrients consumed by every day as well as tracking intake over specific time periods. These issues are not adequately addressed by the NPD/NET data set.
There are good data sources of nutritional analysis for the foods we eat. Several different research entities, including for example, the University of Minnesota, have compiled the nutritional content of many foods. University of Minnesota's Nutrition Data System for Research (NDS-R) software provides detailed nutrient information for more than 18,000 foods, including over 8,000 brand-name products. However, while a wealth of data exists concerning America's eating and consumption habits and corresponding nutritional information, the information resides in a number of discrete data sets developed by different entities (some governmental, some corporate, and some academic). These different data sets are largely incompatible with one another and are generally designed and developed to achieve different overall goals.
The present invention efficiently makes use of this wealth of otherwise-incompatible data by automatically and efficiently integrating plural different data sets. Such capabilities, for example, provide a unique methodology utilizing 14-day food diaries to determine the impact of food consumption patterns on nutrient intake. The resulting integrated database can be analyzed by a conventional statistical analysis package such as SAS® for dynamic analysis and reporting.
In accordance with an aspect provided by an illustrative and exemplary embodiment, a data integration procedure is performed on three independent, special purpose food research related data sets. One data set contains food consumption data based on 14-day diaries. A second data set contains portion size data for a large number of (e.g., over 8,000) different food types. A third data set contains nutrient data for a large number of (e.g., over 18,000) uniquely identified food constituents. The resulting integrated data set can be analyzed using conventional statistical analysis procedures.
In accordance with a further aspect provided by an illustrative and exemplary embodiment, a first data set is analyzed and processed to determine mean age and sex specific serving weights of a certain number of food items. These portion size weights are matched to each food recorded in a second data set representing 14-day food intake. Complete nutrient profiles are assigned to each food in the survey based on a third, nutrient data set. The information from these three data sets is combined in a database, and nutrient intake reports are processed using a conventional statistical reporting interface. This flexible system allows users to categorize the population based on usual consumption of food categories, specific foods and/or specific brands of foods, and to determine dietary differences versus their “non-using” counterparts.
In one non-limiting exemplary and illustrative embodiment, information is integrated from three particular data sources:                a food intake data set (e.g., multiple years of NPD's National Eating Trends® 14-day food intake data),                    a portion-size data set (which may be obtained for example from multiple years of the USDA's CSFII data set), and            a nutrient data set (e.g., from a nutrient profile data set provided by the University of Minnesota's NDS-R).                        
In an example embodiment, the data integration procedure assigns nutrient data and portion size data for each uniquely identified food within the food consumption survey data. This assignment is performed by linking together the three different data sets using a special coding procedure that stores the result as a SAS® data file. SAS® provides an easily accessible and flexible system for reporting the data, performing statistical procedures and producing graphical reports. The data can also be reported for populations selected on any combination of various variables including, for example:                demographics,        number of reporting days,        day of the week,        meal occasion,        use of a specific food/foods,        specific nutrient intake level,        Recommend Daily Allowance (RDA) level,        respondent Body Mass Index (BMI),        other criteria.        
There is substantial value to such dietary intake research. For example:
the dietary research results can be used to build credibility in scientific and food policy communities;
the techniques provided by the illustrative preferred embodiment allow the data sets to be explored for new information, trends and themes that are transformational and can stimulate product development, help create marketing programs, and suggest strategies (e.g., BMI and cereal consumption, diet modeling to meet three whole grains per day, etc.). These techniques may also be useful in connection with food product marketing and public relations (e.g., sugar defense, whole grain intake, impact of breakfast cereal on diet, calcium intake, breakfast patterns, cereal portion sizes, eating patterns of children, seniors and other demographic groups, etc.).
The techniques herein may also be useful for new product development and existing product reformulation (e.g., by identifying nutrient needs in a population such as, for example, calcium fortification, folate fortification and enrichment, etc.).
The information provided by the exemplary and illustrative embodiment may also be useful in a regulatory environment to help with claims documentation, policy strategy development, and to provide data for regulatory comments, fortification review and justification.
The information may also be useful to prepare scientific journal manuscripts and abstracts, augment internal and external clinical and laboratory research projects, and for other scientific value.
Additionally, the information provided by an illustrative and exemplary embodiment may be useful to provide data for speeches and presentations, public relations facts, advertising copy, and consumer information.
In accordance with a more detailed aspect of an exemplary and illustrative preferred embodiment of our invention, we use a food descriptor reduction algorithm that reduces the massive amount of food item data provided by a 14-day dietary intake database into a smaller amount of data useful for identifying the nutrients in the foods actually consumed by dietary intake study participants. In accordance with a specific embodiment, a particular advantageous subset of available data fields is used to uniquely identify on the order of over 5,000 food items from over a billion theoretical possibilities. This data field subset may comprise, for example, 8-dimensional coordinates representing food item identification (e.g., type, form, characteristic, flavor, classification, preparation method, packaging type, and special label code). The preferred exemplary and illustrative embodiment combines many of the codes for each type and groups them according to dietary factors that relate to the nutrient makeup of the foods. These combined and grouped codes are ultimately mapped into nutrient values based on portion size and food nutrient content profiles.
In the example and illustrative embodiment, the groupings are performed based on a lookup table using four keys:                a combo (combination) key comprising a unique sequential value identifying a portion of a unique character code,        a category code identifying a general type of food group (e.g., cereals, milk, baby food, etc.),        a column number pointing to a column in the food intake database, and        a column value designating one or more values that apply to this column and category.        
In the exemplary and illustrative embodiment, the food descriptor reduction mapping process proceeds by scanning a data reduction table to determine whether the particular food within the dietary intake data has been defined within the table and has a corresponding combination key. Multiple iterative scans yield additional combination keys that may be combined together to provide a combination code for the particular food item identified in the food intake data. The resulting combination code food descriptor is located within a food-portion link data file where foods have been previously defined by combining a portion size data set with a nutrition data set for this particular food descriptor code. If the food descriptor combination code is found within the food portion link file, it is mapped to a simpler unique food designator (for storage space considerations) in the example embodiment. If the code is not found (meaning, for example, that a new food item is being reported in the dietary intake data set), an exception is generated so that a dietary intake scientist can dynamically update the appropriate lookup tables to include the new item. The process can be performed iteratively to interactively define new food items as they are introduced to the population and begin appearing in dietary intake data.
In accordance with another aspect provided by the exemplary and illustrative embodiment, a household master analysis is performed to allow tracking of individuals consumers—even through multiple intake surveys from different time periods. While household-based data is enough for many food research and marketing analyses, individual food and nutrient intake is important for certain other research objectives. The preferred and exemplary illustrative embodiment of this invention is able to track individual person dietary intake from dietary intake data sets that are generally designed on the household level but, as it turns out, include sufficient data to provide individual tracking if that data is handled appropriately.
For example, more accurate dietary intake results can sometimes be obtained by using dietary intake data sets from surveys conducted at different times. Often, such surveys will survey the eating patterns of the same households and the same individuals within the same households. However, households can change in their makeup (e.g., when students go off to college), and different people within a household may serve as reporters/diarists for different survey periods. The exemplary embodiment can determine when the same household and/or individual is included in multiple food intake surveys. In the exemplary embodiment, each individual is assigned a unique individual ID by the preferred embodiment, this individual ID being different from the designator(s) used to code the participant within the food intake data set. Unique individual ID's may then be keyed to the same individuals reporting on different dietary intake surveys to allow for individual long term dietary intake tracking. By analyzing the food intake survey results based on individuals, the exemplary embodiment achieves more accurate results since the eating patterns and dietary intake of an individual reported on multiple different surveys can be weighted as pertaining to the same individual. In addition, significant advantages and flexibility can result from the ability to track individual consumption over an extended time period such as number of years. For example, much valuable information can be obtained by determining how a person's eating habits change with age.
In accordance with a further aspect of an exemplary and illustrative embodiment, data is combined to develop demographic-based (e.g., age and sex) categories for portion size determinations.
In accordance with yet another aspect of a preferred and exemplary embodiment, recipe files are used to extract nutrient information from food descriptors. In more detail, once a particular food item has been identified in the food intake data set, it is desirable to be able to determine what nutrients are obtained from eating that particular food in the particular portion size corresponding to the individual who has consumed that food. Since the food intake survey data set in the example embodiment does not report an individual's actual portion size, portion size information is obtained from a different data set based on age, sex and other demographics of the individual who consumed the food item. Once the food item and portion size are known, the preferred exemplary embodiment uses recipes to determine (or estimate) the nutrients that the consumer obtained from eating that food product.
The nutrient data within the nutrient data set does not necessarily, provide a comprehensive nutrient profile for each and every of the thousands of food products that may be identified. As an example, the nutrient data set may not specify the nutrients obtained from eating a mixture, although the nutrient data set might have complete information concerning constituent components of such foods (e.g., flour, milk, butter, oil and other components of a pancake recipe). In accordance with this aspect of the preferred and exemplary illustrative embodiment, recipe files are maintained and may be used to break down particular identified food items into their component constituent parts. The nutrients within each component part may then be identified from the nutrient data set to provide dynamically an overall nutrient content for the particular food item described by the food descriptor.