This invention relates to the operation of a data network, and in particular to the efficient management of resources by monitoring for network elements that are not being used and shutting them down. This allows the resources that would be required to maintain those elements in operational condition, and to monitor their correct operation, to be used more effectively elsewhere in the network.
In particular, the invention is concerned with monitoring the use of network servers which have been installed to support the operation of specific applications, for example in the information technology operation of a large business enterprise, or a service made available to the public over a public telecommunications network.
In many cases the application, and the supporting server, are initially set up to support a specific task. However, when that task is complete, and the personnel who installed the server have moved on to other tasks, the existence of the server may be overlooked or, if its existence is properly recorded, its purpose may not be well documented. There may be no-one charged with the responsibility of decommissioning it. Moreover, it cannot be assumed that the completion of the task for which the server was originally installed implies that the server is now redundant, as other applications may have subsequently been developed which also rely on it. Furthermore, some essential applications may only be used occasionally, for example applications relating annual or seasonal events (e.g annual pay reviews, crop harvesting etc, or handling of extreme weather conditions).
A server may be embodied in a dedicated item of hardware. In most cases the server functions will not take up the entire capacity of the installed hardware. For more efficient use of resources, it is common to install the necessary functions in general-purpose server hardware which can host a number of server functions. These are known as “virtual servers”. They are accessed using different IP addresses on the hardware.
In order to save resources, it is desirable to close down a server if it is not being used. As a precaution sufficient data may be stored to reinstate the server should it be discovered subsequently that there is a requirement for the application it embodies, so that a duplicate of the original server can be created. This process of shutting down a server in such a way that it (or a duplicate) can be re-instated if required is referred to in this specification as “hibernation”. Such a system is described in United States Patent Application US2012/227038
It is known to use simple policy trigger thresholds on a suitable metric to initiate shut down or hibernation of servers. However, most existing systems require significant manual configuration to determine suitable trigger points. This involves the application and infrastructure administrators identifying what behaviour is indicative of a server that is, or is not, still active in order to identify suitable threshold values. It also requires some knowledge of the functions and intended use of the servers in order to determine suitable trigger points.
Simple threshold values of the volume of bits carried is used in European Patent specification 1742416 and United States Application US2010/0287390 which monitors and manages application flows on a network with an objective of increasing end user quality of experience and reducing the need to purchase expensive additional WAN bandwidth. These metrics are an unreliable indicator of CPU utilisation, especially if use is intermittent. In particular, a relatively small volume of bits transmitted to and from the server may nevertheless require a considerable amount of CPU power—for example a complex CPU process may result from a simple binary input such as an alarm, or may result in a simple binary output (go/no go). The United States application referenced also allows network elements to be put into a “standby” mode, but operation of a server in such a standby mode does not release its resources for other uses as in a full “hibernation”.
It is known, for example from Karagiannis et al (“Profiling the End Host”,—Proceedings of the Passive & Active Measurement Conference, 2007, page 186) to monitor the data flow to and from end users (clients) of a system, for example to determine whether service quality parameters are being met. However, such data is not useful for determining whether a connection or terminal is in use. In particular, if a terminal is not in use, it will not appear in the data, so it could not be used as the input source for decommissioning analysis.
According to the invention, there is provided a method of operating a data network comprising a plurality of servers, the servers having respective network addresses associated with respective application functions, comprising the steps of: monitoring data flows to and from the servers,                classifying servers according to their data flow patterns;        identifying servers classified as having data flow patterns indicative of low usage;        retrieving programming instructions and data relating to the identified servers classified as having data flow patterns indicative of low usage;        shutting down the servers classified as having data flow patterns indicative of low usage;        storing the retrieved programme instructions and data in a storage medium from which the stored data relating to each server may subsequently be retrieved to create a corresponding virtual server replicating the server to which it relates by recovering and installing the stored programme instructions and data such that further data requests can be fulfilled by the virtual server        
The invention also extends to an apparatus for controlling the operation of a plurality of data servers connected to a data network, the servers having respective network addresses associated with respective application functions, the apparatus comprising:                a data flow monitor to monitor data traffic to and from the servers,        a server classification system for identifying servers having a data flow pattern associated with low usage of the servers;        a server management system for accessing programming instructions and data from servers identified as having low-usage flow patterns, and shutting down the operation of such data servers        a server hibernation store comprising data storage for the programming instructions and data accessed from the low-usage servers        a server virtualisation system comprising a programmable server having means for retrieving, from the server hibernation store, programming instructions and data relating to a server and installing the programming instructions and data in a programmable server in order to generate a virtual server replicating the server in respect of which they were originally retrieved.        
It will be appreciated that the mere volume of data handled by an application may not be indicative of the use or utility of the application itself. For example, an application may be gathering and storing a large volume of data, but this may be of no purpose if the need for that data has passed and no-one is accessing the results. Conversely, some data input and or extraction patterns may be highly seasonal, so that instantaneous usage volumes may be unrepresentative. The identification of data flow patterns indicative of low use of an application may be related to the flow patterns expected by the designers of the applications, but in the preferred embodiment each server is classified by comparison with the classification of exemplars from a model database of server data flow patterns. After classification the server characteristics may be added to the exemplars in the model database
After its retrieved programme instructions and data have been stored, a server can be shut down, subject to a criticality override factor. A virtual server is created replicating the server that has been shut down by recovering and installing the stored programme instructions and data such that further data requests can be fulfilled by the virtual server. Data flows to the virtual server can be monitored and classified in the same way, and the virtual server shut down if it is classified as having data flow patterns indicative of low usage. The criteria for triggering a shut down of the virtual server may be different from the criteria applied to the original server from which the data was replicated.
The invention may be embodied as a computer program or suite of computer programs stored on a non-transitory computer-readable storage medium which upon execution by a computer system performs the invention.
The process uses network flow patterns to identify servers that are not being used, to allow such servers to be put into a dormant state (referred to herein as “hibernation”) in which the resources they use can be released so as to reduce operational data centre costs and allow reuse of server hardware.
The servers to be monitored may be embodied in physical hardware, or may be “virtualised” servers, running software on a general-purpose computer that emulates a physical server. The network flows to and from these types of server are similar, and it is only necessary for the process to identify the relevant IP address, and not the nature of the associated hardware.
The term “hibernation” is used to refer to a process in which the resources required to operate the server can be restored if a requirement for it is subsequently identified, for example by the application owner attempting to access it. Once the application is hibernated, the application's operating parameters are backed up to long-term storage, so that if they are required in the future they can be provisioned in the virtualised datacentre. The user can restore the application data to a virtual data centre using a web-based self-service catalogue.
The process may identify some servers as being appropriate for immediate decommissioning rather than hibernation, or servers that have already been hibernated may be after a predetermined period of non-use, in either case this may be subject to a human intervention.