The present invention relates to a method and apparatus to optimize the amount of information represented by entries in a database. The invention has particular application to databases in which the entries represent both the attributes of individuals who access the database for advice and the advice given to those individuals.
The Internet provides a channel for the supply of advice in a wide variety of fields, all of which require expert advice not normally available to most members of the general public. The advice available through this means can include advice on medical problems and advice on the purchase of financial products such as mortgages, pensions and shares.
The expert advice available by accessing an automated system must necessarily be trustworthy. It is therefore important to the supplier of information and advice to obtain accurate predictions for the best advice to be given to any particular customer. Statistical prediction methods typically involve fitting a statistical model using some database or other data source. The database will have entries relating to the attributes of a collection of people or data subjects, real or imaginary, and the respective advice given by an expert or group of experts for each person or data subject.
The database must necessarily be compiled to include a large number of people or data subjects and one or more experts are required to suggest a particular piece of advice (e.g. the purchase of a particular financial product) that is considered best suits the needs of each individual in the database. Advice for any future individual will then be made by reference to the database.
The size of the database may be limited by including a selection only of all the possible entries that could be included in the database. Such a limitation in size is advantageous in improving the efficiency of the database. A database of limited size may however be less accurate in respect of the advice to be given to individuals accessing the database. The accuracy depends upon the optimal selection of the entries to be included in the database.
It is an aim of the present invention to optimize the information represented by the entries in a database.
According to the present invention, there is provided a method of operating a data processing apparatus to optimize the amount of information represented by entries in a database, the method comprising;
forming prior distributions over the parameters of entries in the database;
subjecting a potential entry for the database to a process of evaluation, the evaluation including;
(a) sampling parameters from the prior distributions;
(b) calculating a utility function from the samples and from the potential entry; and
(c) calculating a utility value from the utility function;
repeating the process of evaluation for each of a plurality of further potential entries; and
selecting from the potential entries the one having the highest calculated utility value.
Further according to the present invention, there is provided a data processing apparatus to optimize the amount of information represented by entries in a database, the apparatus comprising;
means forming prior distributions over the parameters of entries in the database;
evaluating means for subjecting a potential entry for the database to a process of evaluation, the evaluation including;
(a) sampling parameters from the prior distributions;
(b) calculating a utility function from the samples and from the potential entry; and
(c) calculating a utility value from the utility function;
means to cause the evaluating means to repeat the process of evaluation for each of a plurality of further potential entries; and
selecting means to select from the potential entries the one having the highest calculated utility value.