The present invention relates to a method of recovery procedures in a mobile communication systems, and more precisely, within a network node of a mobile communication system. The invention relates further to a network node arrangement for performing recovery operations in a communication system providing mobility for the users thereof.
Various mobile communication network systems, such as for example the digital C-SM, D-AMPS or PDC systems or analogue NMT or AMPS systems, are well known to the skilled person. In general it can be said that they have been developed to provide an increased freedom and mobility for the users of mobile stations capable of communicating over a radio interface towards the network system. These systems are often referred to as Public Land Mobile Networks (PLMN).
It is also possible to use various data networks services, such as to access the TCP/IP Internet and to use services provided through the Internet, by means of a mobile station (MS) provided with data processing capability. Mobile data transmission is supported by both the digital mobile telephone systems, such as the GSM (Global System for Mobile communications) and the analog mobile systems.
The development in the area of mobile communications is leading towards even more powerful and flexible solutions which allow the users thereof to sent and receive, in addition to speech and text messages, various kind of data transmissions, such as high resolution images or ever video images. These improved solutions include General Packet Radio Service (GPRS), which can be referred to as 2nd generation mobile telecommunication system and has its bases in the GSM system, and a Universal Mobile Telecommunications System (UMTS), which can be referred to as 3rd generation mobile telecommunication system. Both GPRS and ZiTS are fairly new services defined e.g. by ETST (European Telecommunication Standard institute). IMT2000 is a further example of the improved radio service solutions.
The operational environment of a developed mobile communication system, such as the GPRS or the UMTS, can be defined as consisting of one (or several radio networks) interconnected by a core network. The core network provides an access to a data network, such as the Internet, for the users of the mobile stations communicating with the PLMN. The mobile network comprises a plurality of packet data service nodes (SN). Each SN (cr Servicing GPRS Support Nodes; SGSN) is connected to the radio network in such way that it is capable of providing the mobile stations provided with data processing facilities with the packet data service via base stations of the radio network. The intermediate radio network provides then packet switched data transmission between the current SN and the MS. The different mobile networks or PLMNs are connected so an external data network (e.g. the global connectionless TCP/IP Internet or a Packet Switched Public Data Network; PSPDN) via suitable linking apparatus, such as a GPRS Gateway Support Node (GGSN) or several GGSNs. Thus it is possible to provide packet data transmission between MSs and external data networks by Means of the GPRS or corresponding packet Radio Service, wherein the mobile network operates as an access network for the mobile user.
In the GPRS or in the proposed UMTS, the MS may have different operating states: an idle state, a standby state and an active estate. If is possible for the MS to remain continuously in the standby state, i.e. xe2x80x9calways onxe2x80x9d. In other words, it is possible to switch the power on, register the MS in a GPRS or UMTS network and remain connected even several weeks without sending any data over the radio access bearer connection between the MS and the network apparatus. This registration can be active for weeks, but resources on the air interface (physical devices such as base stations, logical radio access bearers etc.) may tear down after a certain period of inactivity.
The radio connections between plurality of MS and the mobile radio networks are controlled by a Radio Network Controller (RNC) or a Base Station Controller (BSC) or a similar node arranged to control the connections. From point of view of the RNC the number of standby users (i.e. users who have not transferred any data during the last few minutes/hours) may be much higher than the number of active users is (i.e. the users currently transferring data).
A RNC node comprises several user plane processors handling user plane traffic and the related tasks (e.g. so called layer 2 processing, re-transmission over the air interface, ciphering etc.). Each of the user plane processors is dedicated to handle a small amount of the entire user plane traffic. The allocation of the user plane traffic connections to different processors can be made by a resource handler. One possibility of dividing the user plane traffic between the different processors is to measure the load in every processor and to select a processor with the smallest load for a new connection.
In addition to the user plane processors, the RNC node comprises routing processors. The routing processors transfer user data between Exchange Terminal (ET) cards and an appropriate user plane processor. The ET cards are used for correcting the nodes to the transfer network. One network node may contain more than one ET card.
In addition to routing tasks, the routing processor may also handle tasks like protocol termination. One routing processor may serve several user plane processors simultaneously. There may be more than one connection between the routing processor and the user plane processor. The RNC node may be provided with one or several routing processors. Each of the routing processors may have an IP address (Internet Protocol address) of its own, or then several routing processors may share the address.
In case a failure occurs in one of the routing processors, all connections implemented in the user plane should be immediately transferred to one or several of the remaining (and still operating) routing processors. Since one routing processor may have thousands of connections, the transfer thereof to another routing processor takes a substantially long time, and a problem relies in the arrangement of the control of the transfer proceedings in an appropriate manner.
The time required for the transfer is an essential disadvantage since it affects directly to the service level or quality experienced by the users. If a processor fault occurs in a routing processor and nothing is done, all users having at least one of their user plane connections through this faulted processor will discover the fault since no data can get through the node due to this faulted processor. Therefore the users must first disconnect the current ongoing data transfer session and then immediately establish a new session. To be able to do this it is assumed that the system has noticed the faulted processor and can allocate a new one, otherwise it might try using the same faulted processor even for the new session and it would be impossible to initiate a new session.
In addition, those subscribers being in the standby state may not become aware of the faulted processor, but believe that that everything is in order. In case where somebody else tries to reach the subscriber terminal connected to the faulted processor, the connection cannot be established.
Therefore there is a need to solve the problem relating to relocation of the user plane connections from a faulted processor to another processor such that a reasonable overall system load can be maintained. The overall performance of the system should not be deteriorated and the user plane traffic on another processors should not be hindered due to the faulted processor.
Prom the point of view of the end user, the processor fault should be as invisible as possible. All symptoms caused by a processor fault should be such that they could be compared to the general symptoms caused by problems in the Internet performance (slow transfer of data, lost datagrams etc.) so the upper layers (TCP (Transmission Control Protocol) layer and application layers) could take care of this.
One of the most problematic situations is an instance where each and every user will notice the system fault, and that this will occur more or less simultaneously. This will happen if the users must close their session and restart the connection regardless the fact whether they were in an active state or not. This can lead to an overload in the RNC node, and more precisely, to an overload situation in the control plane (i.e. in the resource handler or similar facility) of the node, which may happen if there are thousands of connections and if they have to be relocated immediately after a processor fault regardless of the status of the connections (active or standby). The same will occur if the node or system automatically initiates a relocation process for all user plane connections from the faulted processor. In addition, signalling network between various nodes may also become overloaded due to a fault in one processor in one of the nodes of the network system. In more general terms, it is important to be able to avoid any situations which could result to a restart of the entire network node.
Therefore it is an object of the present invention to overcome the disadvantages of the prior art solutions and to provide a new type of solution for recovery procedures in a network node. A further object is to limit the recovery procedures to occur only within the network node including the faulted processor.
According to a first aspect, the invention provides a method of recovering from a Processor fault in a mobile communication network node provided with a Plurality of processors, wherein corrections are established between the network node and mobile stations for packet data communication between the network node and the mobile station, comprising classifying the connections into priority order or, basis of predefined classifying parameters; monitoring the working condition of at least one of the processors of the network node; and in case of detecting a processor fault, relocating user plane connections within the network node from the faulted processor to another processor in accordance with the classified priority order of the connections.
According to a further aspect, the invention provides a network node in a communication system serving a plurality of mobile stations via a radio interface, wherein connections are established between the mobile stations and the network node, comprising: control means arranged to classify the connections into a priority order; a plurality of processors; means for monitoring the working condition of at least one of the processors; wherein the arrangement is such that in case of detecting a processor fault in the node, user plane connections within the node are relocated from the faulted processor to another processor of the node in accordance with said priority order.
According to additional embodiments the invention provides a method, wherein the classifying parameters are based on Quality of Service (QoS) parameters. According to one aspect, real-time connections are relocated first and connections which do not have any strict real-time requirements are having lower priority. In one alternative connections with no strict real-time requirements are defined as higher priority connections and are relocated first and real-time connections are disconnected after the detection of processor fault. The method may in accordance further comprise monitoring the status of the connections between the mobile stations and the network node, wherein the classifying parameters can be based on connection activity status.
Several advantages are obtained by means of the present invention, since the solution provides a faster recovery from processor faults in mobile communication system Codes, such as in UMTS RNCs, or SGSNs or GGSNs, than the prior art solutions. The solution provides a transparent or as little deteriorating transfer of connections from a faulted processor to another processor as possible, and thus the active users will notice only a minor drop (if at all) in the service quality. The standby user may not notice anything, but will use new processor when becoming active again. In addition, there is no reed to re-establish the connection, but it will continue through another routing processor. The solution provides a possibility to limit the consequences of a processor fault to be visible only within one node, while the connections between several nodes are not affected (e.g. between a RNC and a SGSN).
In the following the present invention and the other objects and advantages thereof will be described in an exemplifying manner with reference to the annexed drawings, in which similar reference characters throughout the various figures refer to similar features.