Pairing two documents as “similar” is a difficult problem when the documents are short. Longer documents afford a searcher the luxury of comparing a more extensive vocabulary found in the long documents, but when the vocabulary is limited (such as by the size of short documents), standard methods of comparison are inefficient and produce unreliable results. In many cases, pairing short documents relies on manual comparison of the documents by a user. This method may lead to high-quality results, but becomes problematic when the number of documents is large.
One example of short documents that benefit from determining similarity is the construction of service offerings from pre-built services. Service providers provide services defined by service agreements to customers. These services are, in many cases, a collection of smaller services. The smaller services are often generic or semi-generic pre-built services that may be repeated several times within the same service offering or even be used within multiple agreements for multiple customers. A typical service offering may include several generic or semi-generic services along with one or more custom or semi-custom services.
Services, such as the generic services described above, often have accompanying descriptions that describe the service. For example, a service description may describe the inputs, function, and outputs of the service. These service descriptions are typically quite short. In many cases, the service descriptions are less than 500 words. In some cases, the service descriptions are under 50 words. Some service descriptions are arranged hierarchically, with levels corresponding to the significance of the associated text, and describing relationships between terms in the description.
Efficiently constructing a service offering often depends on breaking down the service offering into as few custom or semi-custom services as possible. By using previously defined services to meet the requirements of the service offering, the service provider may avoid unnecessary repetition in defining and implementing new services. To construct the service offering, a user must often consult a catalog of services to locate previously defined services that may be used as constituent parts of the service offering.
Some catalogs of services for large service providers are very large, containing hundreds or thousands of service descriptions. When using a large service catalog to construct a service offering, a user may have difficulty selecting optimal pre-defined services to construct the agreement. The level of experience and skill required to efficiently construct a service offering often grows as does the size of the service catalog. Consequently, constructed service offerings often fail to use the best pre-defined services to construct the service offering, and the best service offerings require employment of a highly skilled and experienced user.