In the field of internet search technology, searching for products using e-commerce search engines in e-commerce websites generally utilize attributes possessed by many products. The attributes can be price, date of manufacture, etc. However, for certain targeted attributes such as “inside diameter” and “outside diameter,” the targeted attributes may be relevant for special products such as “bearings” but irrelevant for other products. Therefore, it is difficult for targeted attributes to be used for generic query purposes.
Generally, a plurality of web pages are stored in a search engine system. Each web page includes a comprehensive description of standard products, generic attributes such as the price, header, date of manufacture, etc., and user-defined attributes or non-generic attributes. For example, an attribute possessed by all products, such as price and place of origin, is referred to as a generic attribute, and an attribute possessed by certain specific products, such as inside diameter, outside diameter and thickness, is referred to as a non-generic attribute.
Some attribute-based product retrieval technologies retrieve attributes stored on web pages. Attribute information relating to the attributes stored in the web pages can be formatted using XML. The formatted attribute information has the same number of attributes recorded on each web page. For example, assuming that web pages A, B, and C describe products A, B, and C, respectively. Product A has two corresponding attributes: price and date of manufacture, product B has two corresponding attributes: price and date of manufacture, and product C has three corresponding attributes: price, date of manufacture, and inside diameter. Because product C has the non-generic attribute “inside diameter,” which is not possessed by products A and B, in order to implement a formatted storage of the attribute information for products A, B and C, a field can be added to web pages A and B to include the attribute “inside diameter,” and the attribute “inside diameter” can have a value of “0” to indicate that the product described on the corresponding web page does not possess the attribute. For example, products A, B, and C can have the following formatted storage information:
Web page A: “price”, 100; “date”, 2001-1-1; “inside diameter”, 0;
Web page B: “price”, 200; “date”, 2002-1-1; “inside diameter”, 0; and
Web page C: “price”, 300; “date”, 2003-1-1; “inside diameter”, 50.
Based on the formatted storage scheme described above, when a product retrieval is based on a certain attribute, a query can be performed by using the attribute as the query entry. For example, when the product retrieval is based on the non-generic attribute of inside diameter, index ranges, such as “1-50” and “50-100,” can be established for the “inside diameter” attribute. Then, a query of the field “inside diameter” recorded on each web page is performed based on the above index ranges.
Regarding the above storage scheme, when a web page newly stored in the system possesses an attribute not possessed by previously stored web pages, a corresponding field for each existing web page to record the attribute can be added, in order to implement the formatted storage and enable retrieval based on the newly-added attribute. For example, product D newly stored in the system has the corresponding attributes: “price”, “date”, “inside diameter”, and “outside diameter,” which includes a field not found in web pages A, B, and C. Thus, an additional field corresponding to the attribute “outside diameter” can be added to web pages A, B, and C. The attribute “outside diameter” for web pages A, B, and C can be assigned a value of “0” to indicate that the product described on the corresponding web page does not possess the attribute. The specific records can be described as follows:
Web page A: “price”, 100; “date”, 2001-1-1; “inside diameter”, 0; “outside diameter”, 0;
Web page B: “price”, 200; “date”, 2002-1-1; “inside diameter”, 0; “outside diameter”, 0;
Web page C: “price”, 300; “date”, 2003-1-1; “inside diameter”, 50; “outside diameter”, 0;
Web page D: “price”, 400; “date”, 2004-1-1; “inside diameter”, 60; “outside diameter”, 100.
Because the web page newly stored in the system possesses an attribute not previously possessed by the other stored web pages, a field corresponding to the new attribute is added to the existing web pages to record the new attribute. Thus, a large number of fields with little use for expressing attributes may be stored in the system resulting in data redundancy and an unnecessary use of system resources.