The instant invention relates to information search and retrieval tasks, especially to finding electronically stored images in electronic data bases.
In these days of the internet with the bulk of information vastly and rapidly growing in the shortest time span, an ever increasing demand exists for purposive and effective search and retrieval of information held in store in data bases. The problems involved in so-called information retrieval of data which are stored electronically in data bases may be outlined as follows: A given data base comprises n data sets x1 . . . xn (nxe2x89xa72). The search for pictures and retrieval thereof is carried out by a special method of searching in data bases. The object of the search are data sets x1 . . . xn which embody n pictures in electronic form. What must be found in the data base is a subset Drel of relevant data sets (electronic images). This subset Drel is the relevant quantity of data sets to answer a specific question by a user. In an example of searching for a picture this might be pictures of a beach at a coast of a Hawaiian island.
When applying known searching methods and device to retrieve pictures from data bases, first, attempts are made to describe the relevant subset Drel by catchwords, subsequently the catchwords are drawn upon to make a search request. The user of the data base presents his request in textual formxe2x80x94usually without having knowledge of the full list of catchwords listed in the data base. In the example chosen, the user""s query might include the words xe2x80x9cbeach Hawaiixe2x80x9d. The words of this query are compared with catchwords which are stored for the pictures in the data base. Often in these cases the so-called Boolean search method is applied. This method offers the user the opportunity to link the catchwords by AND, OR, and NOT. Some methods and device additionally permit these three operations to be given a respective weighting.
The following difficulties may have to be overcome when searching in a picture data base:
(1) How can the subset Drel needed for the search be described systematically with words when the data sets are xi (digitized) images?
(2) The data base often comprises a very large number of pictures (n greater than  greater than 100,000) and, therefore, the user cannot review and judge all those n pictures.
Fundamentally, a distinction may be made between two different approaches in the search for pictures. In one case the picture is digitized and features are extracted from the digitized image. That begins with the simplest description, using gray levels or color levels of each pixel (so-called low level features), i.e., for a picture having 1,000xc3x971,000 pixels a total of 1,000,000 different features per picture are extracted. It ends with features referred to as high level features, such as the number of edges and corners or number of surfaces etc. The use of simple features has the advantage of permitting quick calculation. However, it is disadvantageous that such features are not very well suited to describe relevant search quantities of the picture. Although more complex features thus would be much better suited, their extraction at present still involves such great expenditure that it is almost impossible, for practical reasons, to make use of them in connection with data bases containing more than 10,000 pictures.
In another known method a human being provides catchwords to describe a picture, i.e., for each picture a list of catchwords is drawn up which refer to what is represented in the picture. This complex extraction of features has the advantage that it simplifies the characterization of relevant pictures by linking the catchwords. Technically speaking, a picture x is represented by a vector x xcex5 {0, 1}s (s is the number of all the catchwords possible). If the ith catchword is contained in the list of catchwords pertaining to the picture the ith component xi of vector x is 1, otherwise it is 0. Operations, such as conjunction (AND) or disjunction (OR) in this case may be represented by mathematical operations, like multiplication or addition.
Once the search has been started, the picture search machine calculates a system relevance for each of the electronically stored picture data sets x1 . . . xn in respect of the search request. This calculation of the respective system relevance is an essential property of each picture search machine or picture search method. The effectiveness and quality of the calculation of the respective system relevance are of essential importance for the success of the search system. Two approaches, based on differing principles, have become generally accepted with catchword search methods for calculating the system relevance:
If the textual search request comprising only catchwords, as generated by the user, is interpreted as a vector q xcex5 {0, 1}s the similarity between the textual search request and the respective catchword list of the pictures or picture data sets in the data base can be calculated, based on the lists of catchwords available for the electronically stored pictures. This similarity then may be used as a measure of system relevance. This approach, known as the xe2x80x9cvector space modelxe2x80x9d is described, for instance, by G. Salton in xe2x80x9cAutomatic Information Organization and Retrievalxe2x80x9d, McGraw-Hill, New York, 1968.
With another approach, a probability model is applied to the catchwords in relevant documents (estimated on the basis of the textual search request which contains nothing but catchwords), allowing the probability to be calculated that a picture is comprised by the subset Drel, and this probability then may be taken as the measure of system relevance.
On the basis of the system relevances found for all the pictures in the data base, the pictures are put into order in accordance with the system relevances calculated and thus are presented to the user. Many times in practice, it is sufficient to find just the 100 pictures having the highest values of system relevancexe2x80x94a task which can be resolved much more quickly than sorting a huge number of, for instance, 1,000,000 pictures.
If the user of the data base still should not be satisfied with the search result he will have to revert to his query and change the text, for example, by restricting it further. Some systems offer the user a possibility of xe2x80x9cfeedbackxe2x80x9d by way of choosing a picture which he thinks is xe2x80x9cvery similarxe2x80x9d or xe2x80x9cclosexe2x80x9d to the relevant documents Drel.
Such methods have an essential disadvantage in that search queries based on identical text entries by the user in connection with a certain stock of pictures always will provide the same search result. This device that the users in search of a picture are the ones who must adapt to the catchword system of the data base in order to be able to model the individual preferences and characteristics of the data base because the only possible device of xe2x80x9ccommunicationxe2x80x9d between the data base user and the search system is the textual search query. As a rule, that requires intensive and time consuming xe2x80x9cexplorationxe2x80x9d of the specifics of the respective data base chosen by the data base user.
It is the object of the invention to provide an improved method and apparatus for searching for a relevant subset of data sets from a quantity of data sets, especially picture data sets which are stored electronically in a data base and, at the same time, to improve the efficiency and quality of the search as well as its user friendliness.
According to one aspect of the invention a method is provided of automatically searching for relevant picture data sets in a quantity of n (nxe2x89xa72) picture data sets electronically stored in a memory device, picture attributes for each of the n picture data sets being stored electronically in the memory device, and the n picture data sets as well as the stored picture attributes being adapted to be processed electronically by a processor, said method comprising:
(a) providing a first selection of picture data sets from the n picture data sets with the aid of the processor to be output with the aid of a display device;
(b) outputting several of the picture data sets of the first selection of picture data sets with the aid of the display device;
(c) electronically recording a respective evaluation by a user for at least one relevant picture data set of the plurality of picture data sets output according to (b); and
(d) providing a second selection of m (mxe2x89xa6n) picture data sets from the n picture data sets in a sequence which depends on a respective system relevance of the m picture data sets to be output by the display device;
a machine learning process being carried out for electronically determining a decision function f to provide the second selection of the m picture data sets; the picture attributes electronically stored in the memory device for the at least one relevant picture data set constituting a training quantity for the machine learning process; the respective system relevance being determined for k (kxe2x89xa7m) picture data sets with the aid of the decision function f and the respective electronically stored picture attributes; and the k picture data sets comprising at least part of the m picture data sets of the second selection.
According to another aspect of the invention a picture search apparatus is provided, comprising:
a memory device for electronically storing n (nxe2x89xa72) picture data sets and respective picture attributes each associated respectively with the n picture data sets;
a display device for outputting a first selection of picture data sets from the n picture data sets;
recording device for electronically recording a respective evaluation by a user for at least one relevant picture data set of the first selection of picture data sets output;
processor for automatically carrying out a machine learning process to determine a decision function f in consideration of the picture attributes stored electronically in the memory device for the at least one relevant picture data set and for electronically determining a respective system relevance for m (mxe2x89xa6n) picture data sets of the n picture data sets stored electronically in the memory device with the aid of the decision function f and the respective electronically stored picture attributes; and
device for providing a second selection of picture data sets in a sequence which depends on the respective system relevance to be output by the display device, the second selection of picture data sets comprising k (kxe2x89xa7m) picture data sets of the m picture data sets.
The invention comprises the essential fundamental concept of making use of a subset of relevant pictures, or the corresponding picture data sets selected by the user of a picture search device, as a training quantity for a machine learning process of the picture search device. In the course of the electronically accomplished machine learning process the picture attributes associated with the pictures of the training quantity are processed electronically. In this electronic processing, mathematical operations are applied to go to relevant picture data sets in a data base, in accordance with a search query recorded, and make them available for further processing, especially for being output by way of a display.
The invention offers the substantial advantage of substituting the uncomfortable search requiring the input catchwords by a simple evaluation of search results, for example, by clicking mouse buttons. In this manner the user of the search system is relieved of the chore of having to describe the picture (digital features and/or catchwords), as in the prior art.
It is another advantage of the invention that the search system adapts itself to the user by way of the machine learning process, rather than the other way around where the user of the data base must learn the specific catchword system of the data base. While the known search system starts from the assumption that the picture search system operates (internally) with a search query q, this paradigm is replaced by the machine learning process of a decision function f. In practice, this device that no generally applicable measure of similarity must be found between a text inquiry q and pictures x.
The novel method may be used together with existing electronically stored descriptions of pictures and requires no expensive preprocessing of picture data bases. Moreover, the method is adapted to make use of digitized features (easy to be extracted) to optimize the adaptation of the system relevance to the user relevance (as expressed by the quantity Drel which is unknown to the system).
If the user of the novel search method or apparatus wishes to repeat the procedure he is free, after each output of search results, to decide once more in favor of relevant pictures, in other words he can make a new evaluation irrespective of the preceding one, an opportunity offered to support what is called xe2x80x9ccreative driftingxe2x80x9d. With reference to the example chosen, this device that a user who at first looked for pictures of Hawaiian beaches may decide otherwise in his renewed evaluation. If the user discovers an element which rather meets his desire, in the search result displayed, such as a beach in Australia he can designate pictures of Australian beaches as exclusively relevant. In this manner the original concept of a Hawaiian coast is ignored in the renewed evaluation.
In accordance with a convenient further development of the invention the first selection of picture data sets from among the n picture data sets is made by device of a catchword search whereby the novel method can be combined in very simple manner with known methods of catchword searching.
Another embodiment of the invention, preferred in terms of user friendliness, provides for the electronic recording of the respective evaluation by the user for the at least one relevant picture data set according to (c) to comprise the recording of an actuation of an electronic selector device, especially a mouse device which cooperates with the display device. Especially when using the mouse device, a method is realized which permits the user, when selecting picture data sets, to focus on the data sets displayed and thus make his choice in an easy way.
A preferred embodiment of the invention, especially devised to minimize electronic calculating expenditure and to accelerate processing, provides for the respective evaluation by the user for the at least one relevant picture data set according to (c) to be recorded electronically as a binary evaluation so that each picture data set evaluated may be recorded as a relevant picture data set and each picture data set not evaluated may be recorded as a non-relevant picture data set.
Minimizing of the processing time may be achieved by a convenient implementation of the method according to the invention by which the decision function f is determined in the context of the machine learning process, whereby the electronic determination of the respective system relevance is optimized in terms of the time period needed to achieve it.
The machine learning process can be carried out conveniently by device of a perceptron learning method.
A preferred embodiment of the invention permits the machine learning process to be carried out in combination with existing data bases because the picture attributes stored comprise catchwords suitable for electronic evaluation by the processor.