1. Field of the Invention
The present invention relates to the field of databases. In particular, the present invention relates to methods and systems for automatically partitioning schema objects, such as large database tables.
2. Description of the Related Art
The concepts of launching a Web browser, pointing it at a Web site of interest, and viewing the site's content by clicking on the links that are presented on each page are now familiar concepts. A Web page may appear to be a monolithic logical unit when it is viewed, yet it really can be further decomposed. For example, a typical Web page may include some HTML text and multiple images. Other pages may contain running applets that provide streaming audio or video. At other times, the user is not viewing a page in the ordinary sense at all, but may be interacting instead with an application that is running on a server somewhere. The Web server that provides such content may not have the logical concept of a page. From the Web server's point of view, it is merely responding to requests from browsers that connect to it, including requests for HTML text, images, Java Server Pages (JSP) and the like. Web servers usually maintain logs of all such received requests. These log files, therefore, constitute an audit trail that provides detailed information about the activities on a site. This trail is sometimes referred to as the “clickstream” of the site. Every time someone views a page from a Web site, the Web server writes one or more entries in the log file. Moreover, every page view recorded in a Web server log file corresponds to one entry therein. The entries in the log file may include attributes such as, for example, measures such as the byte count, dwell time, time to serve, and dimension table foreign keys. The data in the log file usually adheres to one of the standard log file formats unless the format has been customized. Most Web servers support at least one of three open log file formats: NCSA Common Log File Format (CLF), NCSA Extended Log File Format (ECLF), or W3C Extended Log File Format (ExLF).
Because even simple pages typically require multiple requests before they can be fully rendered, Web server log files can quickly grow very large. For example, a small site with only a few hundred page views a day can easily generate log files with thousands of entries on a daily basis. A large and popular site may generate a log file in which millions of additional entries are added every day. In the early days of the Web, when there was not much activity, it might have been possible for an administrator to manually inspect the log files and gain some rough understanding of the magnitude and nature of the traffic on a Web site. The sheer size and complexity of the log files in use today preclude such an approach. Today's log file volumes require automated methods of turning the raw log file data into useful business information.
Whatever the format, the Web server log file may be used as the raw data to generate various reports related to the Web site's effectiveness, traffic patterns and other usage and performance metrics. Conventional Web analytic applications may have a set of predetermined report engines that query the Web server log file and build a report based upon the results of the queries. However, as such conventional tools do not persistently store the Web log raw data, they do not have the ability to execute ad-hoc and dynamic queries of the log file data. Such queries must be formulated within the context of a new report, which will then go back to the Web server log file to execute the queries necessary to build the requested report.
As noted above, the log file may grow by potentially millions of new entries each day. To facilitate queries on such large data sets, the log file data may be loaded into a database table or tables. However, queries on such large tables tend to perform poorly, as the query may have to traverse a potentially large number of rows to access the needed data. Partitioning is often used to logically break such large tables into more manageable units. Typically, all partitions of a given table will have the same structure and each partition will contain a subset of the range of total rows of the table. A convenient way to partition Web log files, for example, is by day. That is, all records created in a given day or a predetermined range of days will be assigned to a predetermined partition. Partitioning is now a standard feature of most databases. Conventionally, the burden of setting up and maintaining partitions falls upon technically trained persons such as the database administrator (DBA). Often, a database schema contains many large tables, each of which must be partitioned to optimize queries thereon. Partitioning can, therefore, become a substantial administrative burden.
It is clear, therefore, that a need exists for automating the partitioning process and making such partitioning process accessible to non-technically trained persons. Such an accessible and automated portioning method would find utility not only within the context of managing and using Web server log files but in many other instances in which very large database tables are created.