The present invention relates to generating summary data such as traffic summary information from a plurality of records. A preferred arrangement includes obtaining event information such as traffic summary information for a requested period of time from multiple tables of differing temporal resolutions. Whilst the invention is applicable to handling of different types of information in multiple tables, we will describe as a preferred arrangement a method of obtaining traffic summary information for a requested period of time from multiple database tables of differing temporal resolutions, the traffic information being in respect of the traffic carried by a computer network.
The monitoring of computer or communications network usage is very important in modern computer networks, including local area networks (LANs) and wide area networks (WANs) as it allows the network manager to see how the network is performing. This allows the network manager to perform traffic flow analysis, determine bandwidth requirements, enforce company policies and ensure that the security of the network has not been compromised. However, it will be understood that the invention may be applied to other systems, for example, where the information relates to road traffic, or a telephone network traffic, or a railway system, or an electrical supply network (for example, where the relevant information may be the power or current).
Network monitoring applications often collect information on some or all of the individual traffic flows between network elements. This information may then be stored in a plurality of tables usually on databases held on computers connected to the network, which computers are accessible to the network manager. The information may be analysed at a later date.
A common technique used to control the amount of storage space required for the database table is to accumulate into a single record all data covering a known time interval (the record time interval) having a known record start time, a known record end time, and a known record period, although only two of these three are required as the third can thereby be deduced: for example, a single database table record may represent all data on a given traffic flow for a known record period which may comprise one hour or one day (a database table storing daily data having a lower temporal resolution than a database table storing hourly data).
Thus different temporal resolutions of data may be stored and updated in different database tables; for example, one table of records may hold data in a format where each record contains the sum of a defined hour of data on a particular traffic flow while another table of records holds data in a format where each record contains the sum of a defined day of data on a particular traffic flow. Each temporal resolution table may store different numbers of records, for example, an hourly resolution table may only store one week""s worth of data at high resolution, whilst a daily resolution table may store one month""s worth of data at lower resolution.
Thus in a first table traffic data in respect of a number of links in a computer network may be stored in successive records for successive record periods of one hour, for example, 12 am to 1 am; 1 am to 2 am; 2 am to 3 am; etc, and in another table, for successive record periods of one day, for example; start (12 am) day 1 to end day 1; start day 2 to end day 2; start day 3 to end day 3 etc.
From time to time, it is desirable to provide traffic information in respect of a requested period of time, defined by a requested start time and a requested end time, which spans a number of record periods.
A particular problem arises in that it is desirable to reduce the amount of calculations required to provide the traffic information for the requested period. It is therefore desirable to use as the basis of traffic summary information for the requested period, a database table having the minimum temporal resolution since that will require the minimum computation. However, problems arise in that it may be required to provide traffic information in respect of a requested period of time that does not match the record periods of time of the database table with the minimum temporal resolution. For example, it may be desired to determine the traffic data in respect of a requested period of time, starting, for example, 1 pm on a day 1 and ending 1 pm on day 3. Hitherto, in order to provide that information, one could not use a daily table but one would have to use an hourly table, which means that 48 separate records would have to be processed whereas if the requested start and end times had been a 48 hour period starting start day 1 and ending end day 2 then two daily records only would need to be retrieved and processed.
The present invention relates to a method of reducing or minimising this problem We will describe a method and an algorithm that, given a requested time interval, will generate summary data (total amount of data over the requested time interval) for e.g. requested traffic flows by combining data from different records from, for example, different database tables of differing temporal resolutions and time periods (as described above).
The present invention provides a method of generating summary data for a requested time period having a requested start time and a requested end time from a plurality of data records, each data record comprising a summary of the relevant data between a record start time and a record end time, said method comprising;
(a) setting a current start time equal to the requested start time;
(b) selecting a record with a start time equal to the current start time and an end time furthest from the start time but before or equal to the requested end time;
(c) processing the data from this record;
(d) setting the current start time equal to the record end time of that record;
(e) if the current start time is earlier than the requested end time, repeating steps (b) to (e) and if the current start time is not earlier than the requested end time, finishing.
In this way, the data to be processed is reduced. The processing may comprise summing of the data.
Where there is more than one type of data, the step (b) includes selecting a record with the relevant data. In one arrangement, the relevant data is traffic information, for example the information relating to traffic on a computer or communications network.
The data may be stored in a plurality of tables, and commonly each table stores data of different temporal resolution, that is, each record in a particular table will store data for a period of time of the same length and the relevant period will be different for different tables. Thus each record in one table may include data in respect of one hour and each record in another table may include data in respect of one day.
According to another aspect, the present invention provides a method of generating summary information over a requested time period having a requested start time and a requested end time by combining data from a plurality of tables of different temporal resolutions, said method comprising;
(a) determining the requested type of data, requested start time, and requested end time for the requested summary information;
(b) setting a current start time equal to the requested start time;
(c) selecting an appropriate temporal resolution table based on current start time and requested end time;
(d) processing data records from the selected table;
(e) setting the current start time equal to the end time of selected time interval;
(f) if the current start time is earlier than the requested end time, repeating steps (c) to (f) and if the current start time is not earlier than the requested end time, finishing.
Preferably, in step (c) the appropriate table is the table with the minimum temporal resolution which includes a record having a record start time corresponding to the current start time.
Preferably in step (d), the processing includes summing the data from the relevant records.
Preferably, step (c) comprises;
selecting the table with the lowest temporal resolution:
determining if the table contains a record start time corresponding to the current start time and the record end time is earlier than or equal to the requested end time;
if the table does include such a record, selecting that table;
if the table does not include such a record, determining if there any more tables with higher temporal resolution;
if there are such tables, selecting the table with the next highest temporal resolution and returning to the first determining step;
if there are no such tables, selecting the table which has the most amount of overlap with the current start time and requested end time.
Preferably step (d) comprises;
selecting the relevant data from the selected record;
determining if the selected record time interval lies completely between current start time and request end time,
if it does so, adding all of the selected data from that record to the summary for that relevant data;
if in the first determining step the selected record time interval does not lie completely between current start time and request end time, adding pro-rata the part of the data for that record which overlaps with the current start time and requested end time to the summary for that data.
According to another aspect, the present invention also provides a computer program on a computer readable medium or embodied in a carrier wave for generating summary data for a requested time period having a requested start time and a requested end time from a plurality of data records, each data record comprising a summary of the relevant data between a record start time and a record end time, said computer program comprising:
(a) a program step for setting a current start time equal to the requested start time;
(b) a program step for selecting a record with a start time equal to the current start time and an end time furthest from the start time but before or equal to the requested end time;
(c) a program step for processing the data from this record;
(d) a program step for setting the current start time equal to the record end time of that record;
(e) a program step for repeating steps (b) to (e) if the current start time is earlier than the requested end time, and, finishing if the current start time is not earlier than the requested end time.
According to another aspect, the present invention also provides a computer program on a computer readable medium or embodied in a carrier wave for generating summary information over a requested time period having a requested start time and a requested end time by combining data from a plurality of tables of different temporal resolutions, said program comprising;
(a) a program step for determining the requested type of data, requested start time, and requested end time for the requested summary information;
(b) a program step for setting a current start time equal to the requested start time;
(c) a program step for selecting an appropriate temporal resolution table based on current start time and requested end time;
(d) a program step for processing data records from the selected table;
(e) a program step for setting the current start time equal to the end time of selected time interval;
(f) a program step for repeating steps (c) to (f) if the current start time is earlier than the requested end time and finishing if the current start time is not earlier than the requested end time.
The speed of data retrieval is reduced by minimising the number of database table records that have to be retrieved and processed. For example, if a single days worth of data on a given traffic flow is represented by both a single daily record and twenty-four hourly records, the daily record will be retrieved in preference (assuming that the entire day is spanned by the requested time interval).
Additionally, if the requested start time and end time do not map exactly to the start time and end time of record time intervals available in the various temporal tables the algorithm will aim for the best accuracy by choosing the temporal resolution and time interval that has the best overlap at that boundary. The use of pro-rata calculations are avoided where possible as these may lead to a reduction of accuracy of the results.
We will describe a method of generating traffic summary information for a requested period by combining data from multiple database tables of differing temporal resolutions. It allows traffic summaries to be generated for requested time intervals that span the data available in multiple tables and describes a method of reducing the number of records that have to be processed.