In computerized search systems, applications that deal with a plurality of attributes may require metadata being represented in a plurality of fields. For example, an application that deals with multiple languages would require metadata fields to also have representations for the multiple languages, and the multiple language attributes must be indexed as separate metadata fields.
One way to meet these requirements is to create a distinct metadata field for each required representation. Thus, if representations for multiple languages are required, a search index would have a “French color” field, an “English color” field, a “German color” field, and so forth, instead of a single “color” metadata field. Accordingly, when a search engine indexes a car maintenance report having “color” as a field, there may actually be one field for an English value (red), one for a French value (rouge) and one for a German value (rot), and so forth. Many problems exist with this approach. For example, too many metadata fields are created, which is not efficient and quite difficult to manage. It also lacks a clear solution for situations in which the language attribute is not known or not specified. Users formulating a search query must know which language and metadata field should be searched, and if not known, then a complex query that searches every one of the related fields must be used.
Another approach is to encode the language attributes into the values which are indexed. In this case, only one metadata field “color” is used, but the values might be: “[=English] red”, “[=French] rouge”, or “[=German] rot”. With this approach, a query can find a value from any of the languages. However, users constructing these queries will need to use special syntax, and there are no easy ways to sort search results by language. The encoding will also fail in the unlikely event that the actual metadata to be indexed contains a value that coincidentally has the same structure as the convention used for encoding the attributes. Put another way, it is not possible to differentiate the value “red” for English from the string “[=English] red”. Note that this encoding is completely arbitrary, used simply for illustration purposes—many types of encoding may be possible.
Another issue with conventional computerized search systems is that it is not possible to order or sort the search results based on the metadata values where the metadata is sparse (does not exist for every object). For example, a French user may desire to sort the results based on values in the French language. However, this is not possible since not all fields may have a French value.
Yet another issue relates to potential types of values for a single metadata field. For example, suppose a “name” metadata field has a value in French, one may wish to know whether this value is originally in French or translated from another language. Note that although “language” is used to describe example issues faced by conventional computerized search systems, these issues are not language-specific and can be unrelated to languages. For example, for performance reasons, some search systems may include only a single field for a part number and provide no additional information on whether that part number is an original value or a substitute value. Some search systems may include that information, at the cost of having discrete metadata fields, one for an original part number and another one for a substitute part number. Again, additional field(s) would need to be created and a user who is interested in this information would need to know what specific metadata fields to search for their values or use a complex query that searches every one of the related fields.
Given the deficiencies in conventional computerized search systems, including those issues described above, there is always room for innovations and improvements.