The Internet is a vast source of information. Some of the information can be very useful to researchers, scientists, or other professionals, who would like to be able to collect the relevant data and process it. Unfortunately, this task is made difficult by the fact that the information is spread over trillions of webpages and is presented on those webpages in different formats. Moreover, even if one can select the particular webpages containing the relevant information and download all the information from those webpages, the downloaded information is likely to contain a lot of information that is irrelevant for the particular project.