This invention relates to search algorithms on the Internet. More specifically, the invention relates to a method for extracting product related information from different websites for facilitating online shopping.
The Internet has enabled online shopping. Online shopping has made the process of shopping time efficient and convenient as compared to conventional shopping. Different search algorithms on the Internet have enabled people to find the desired products easily from the multitude of products available. Conventional search algorithms use keyword based searches to find webpages relating to products. These algorithms generally index all keywords on a webpage and are not based on product-related information. In addition, the search algorithms do not differentiate clearly between which part of the webpage has information about the product and which part does not. However, product-related information such as product image, product title and product price determine purchase to a great extent. These parameters are not taken into consideration while indexing the webpages. In order to include these product related parameters, this information needs to be extracted from each webpage available on the Internet. Typically this is done manually or using semi-automatic information extraction techniques such as Wrapper Induction. However, this becomes time consuming and inefficient because there are millions of webpages and have different formats. Further, the conventional search algorithms do not take into account attributes of product-related information on a webpage, which includes attributes of product image, product title and the like. As a result, the search may yield results that are not relevant.
In light of the aforementioned shortcomings, there is a need for an information extraction method which enables automatic extraction of product related information from millions of webpages on the Internet accurately and efficiently. Accordingly the search algorithm takes into account the attributes of product-related information on webpages and provides results with high accuracy.