The World Wide Web (WWW) is comprised of an expansive network of interconnected computers upon which businesses, governments, groups, and individuals throughout the world maintain inter-linked computer files known as Web pages. Users navigate these pages by means of computer software programs commonly known as Internet browsers. Web sites and Web pages found on the WWW present a wide range of data to users in varying formats. This data, such as weather information or stock quotes, may be useful in a variety of ways to the Internet user beyond simply viewing that data in the form and location as it is displayed on the Internet.
Transferring data found on websites to a database format would allow significant data management and manipulation capabilities afforded through standard database software. However, there are currently no effective methods for capturing and replicating that data within a database application. The user can always manually transfer the data viewed on the Internet into a database. However, when transfer is done on a regular basis, such as might be done for a regular recording of stock prices, the transfer of data from the website to the user's database becomes inefficient, labor intensive, and prone to transcription errors. This is particularly true for large amounts of data on a large variety of Web pages that are retrieved periodically, such as every 10 minutes, 45 minutes, daily, etc.
Current systems exist for transferring Internet data to databases for archiving Web content in the form of a source file and visual representation, allowing subsequent viewing of that data at a later date. While allowing the user to capture data and archive it, these systems do not provide a mechanism for parsing individual data elements, replicating that data in a relational database format, or automatic data collection in a manner specified by the user.
Another system described in U.S. Pat. No. 6,078,924 to Ainsbury et. al. retrieves data from a wide range of document formats and converts that data into a common format and location. This system includes the capability to collect data automatically in a manner specified by the user. It also allows the retrieval of data from a wide range of data formats, including the capability to retrieve data from HTML documents. However, while the data is collected and replicated in a central format and location such as a spreadsheet, it does not effectively account for issues of relational database replication.
Thus, there is need for a data collection and retrieval system and method to efficiently manipulate and analyze data displayed on websites. The system and method should be able to retrieve Internet data automatically and to assign that data to appropriate places in a database structure defined by the user. For data that changes regularly such as stock quotes or weather data, a system is desired which will automatically retrieve data based on a user defined schedule. The need for such a system and associated method has heretofore remained unsatisfied.