Cellular Massive Multiple-Input, Multiple-Output (MIMO) based on reciprocity-based channel acquisition is becoming a very attractive candidate in consideration of future radio access technologies. This is due to the promise for very large increases in throughput per unit area, especially when used over dense (small cell) deployments. Massive MIMO is also envisioned as a candidate for addressing large variations in user load, including user-traffic hotspots. One challenge of such deployments is load balancing. That is, there is a challenge associating users with cells not only based on relative signal strength from each cell, but also taking into account the relative user-traffic in the vicinity of each cell with a goal to optimize network-wide performance. Load balancing is even more challenging in emerging deployments. First, load balancing is even more important with small cells, as these are inherently less planned, and thus less regular than macro deployments, with large variability in effective-area coverage. Furthermore, emerging networks are multi-tier networks comprising tiers with base stations (BSs) having vast differences in the coverage area. Indeed, load balancing algorithms need to exploit the fact that each user can be served by multiple BSs, from multiple tiers, in order to effectively balance the network load across all tiers taking into account the fact that BSs from each tier cover different areas.
Non-uniform load distribution is considered to be a major challenge in small cell networks. If the load cannot be balanced efficiently, the performance gains that are expected as a result of the increased density of network access points (due to use of small cells) may be distributed in a very non-uniform manner within the user population. Various load balancing techniques are proposed for dynamically arranging user load across small cells. These techniques are generally designed considering traditional physical (PHY) layer approaches, where one BS serves at most one user at a certain frequency and time resource. But it is well accepted now that major gains in PHY layer are expected due to MU-MIMO and especially Massive MIMO.
Conventional downlink MU-MIMO schemes have been at the forefront of investigations in the past decade. These schemes promise spectral efficiency increases by using multiple antennas at the base-station and serving multiple users simultaneously without the need for multiple antennas at the user terminals. This is achieved by using knowledge of the channel state information (CSI) between each user and the transmitting base-station. Having CSIT (CSI available at the transmitter) allows the transmitter to precode the user streams so that each user terminal (UE) sees only its own stream. Given a base station with M transmit antennas, K single-antenna user terminals can be served simultaneously, giving roughly a multiplexing gain equal to min(M, K) with respect to a system serving a single terminal.
For the transmitter to achieve this operation reliably, it needs to have sufficiently accurate CSIT, i.e., the transmitter needs to know the channels between itself and each of the users sufficiently accurately. The techniques used for acquiring CSIT fall into two categories. The first class employs M pilots (one per base-station transmit antenna) in the downlink to allow each user terminal to estimate the channel coefficients between the user-terminal's own antenna(s) and those of the base-station. This operation provides each CSI at each receiving user-terminal (CSIR) regarding the channel between each base-station transmit antenna and the user-terminal receive antennas. The CSIR, i.e., the CSI information available at each user-terminal, is then fed back to the transmitter by use of uplink transmissions to provide CSIT, i.e., CSI at the transmitting base-station. This class of CSIT acquisition schemes have two overheads: (i) a downlink pilot overhead, which scales linearly with M (then number of antenna elements at the transmitting base-station); (b) an uplink feedback overhead, responsible for making available to the base-station the channels between each user-terminal and each base-station antenna. In the case, each user terminal has a single antenna, the uplink feedback is responsible for providing to the base-station the MK channel coefficients (complex-scalar numbers), one coefficient for each channel between each user terminal antenna and each base-station antenna. Although the uplink overhead could in principle be made to grow linearly with min(M, K), with the methods used in practice this overhead grows as the product of M and K. The downlink overhead limits the size of the antenna array, M, that can be deployed. Similarly, the uplink overheads limit both M and K, as the overheads grow very fast with respect to increasing M and K.
The second class of CSIT acquisition techniques is referred to as reciprocity-based training schemes. They exploit a property of the physical wireless channel, known as channel reciprocity to enable, under certain suitably chosen (M, K) pairs, very high-rate transmission with very efficient CSIT training. In particular, pilots are transmitted in the uplink by each user (K pilots are needed, but more could be used) and the corresponding pilot observations at the base-station are directly used to form the precoder for downlink transmission. If the uplink training and the following downlink data transmission happen close enough in time and frequency (within the coherence time and the coherence bandwidth of the channel), then the uplink training provides directly the required (downlink channel) CSI at the transmitter, since the uplink and the downlink channels at the same time and frequency are the same. In this class of techniques, the uplink overheads scale linearly with K, i.e., with the number of user terminals that will be served simultaneously. These schemes are also typically envisioned as relying on TDD (Time Division Duplex) in order to allow uplink training and downlink transmission within the coherence bandwidth of the user terminal channel with a single transceiver shared for uplink and downlink data transmission.
One attractive aspect of reciprocity-based training schemes is that one can keep on increasing the size of the transmit antenna array, M, making it “Massive”, without incurring any increase in the training overheads. Although with M>K, increasing M does not increase the number of simultaneously multiplexed streams, K, (i.e., K streams are simultaneously transmitted, one to each user), increasing M induces significant “beamforming” gains on each stream (which translates to a higher rate per stream), at no additional cost in training. Alternatively, increasing M allows reducing the transmit power required to yield a target rate to a user terminal, thereby allowing for greener transmission schemes. Another advantage of Massive MIMO is hardening of the user rates, i.e. with large number of antennas, the rate that a user gets does not significantly change by small scale fading. This property allows practical load balancing and scheduling techniques for Massive MIMO deployments.
This work considers instantaneous CSIT acquisition by reciprocity based training schemes. The challenge with reciprocity based training schemes is that the “compound” uplink and downlink channels at the same time and frequency are not the same. Specifically, although the uplink and downlink physical channel components are the same, each compound channel between a “source node” (responsible for transmitting an information-bearing signal from the transmit antenna) and a destination node (attached to the receive antenna) includes additional impairments due to the transmitter (the circuitry, at the transmitter) and the receiver (the circuitry, at the transmitter). When the transmitter and receiver roles are interchanged, different impairments occur at each node, thereby rendering the two compound channels non-reciprocal. There exist various calibration techniques to calibrate receivers and transmitters so that the compound DL and UL channels are approximately reciprocal. In the following, it is assumed that reciprocity is perfectly established.
Small cell deployment in heavy traffic areas, often referred to as “hot spots,” is considered as a promising solution for coping with surging traffic demands. In some deployment scenarios, the small cell layer might co-exist with the macro cell layer. Another complementary promising direction towards coping with heavy traffic demands in a power- and bandwidth-efficient manner is Massive MIMO. In Massive MIMO, the number of antennas serving users is much larger than number of users being served. In downlink Massive MIMO, for instance, many users can be served at the same time either using Zero Forcing Beamforming, or even the simpler Conjugate Beam Forming, exploiting the fact that the number of users served is far smaller than the number of antennas. As the number of antennas gets large, transmission beams get sharper, thereby achieving the desired received signal level with much lower transmitted power levels. Furthermore, with large antenna arrays, the achieved user rates harden, i.e. variance in user rate due to fast (e.g., Rayleigh) fading becomes effectively negligible.
In traditional macro cells deployments, user terminals associate themselves with the macro BS with the largest power. Although variations can arise in the traffic-load of different nearby BSs, such as e.g., in the case of hot-spots, in general such variations are relatively small due to the size and planning of the cells. In the case of smaller less-planned cells, the traffic load from one small cell to the next can exhibit much larger variations. As a result, much larger variability of the load can arise across different small BSs, if users simply associate with the BS from which they receive the strongest signal. Clearly, many BSs may be over-loaded while other nearby BSs might be serving much fewer users. Fortunately, in the case of small cell deployments, many users may be able to receive signals from several BSs as there are more BSs (on average) in proximity. As a result, many recent works consider load balancing and association methods in order to make the best out of the available resources brought by small cells.
The problem of designing good load balancing and user association techniques becomes in general more challenging in cases where more than one user is scheduled at the same time and frequency resources, i.e. with multi-user transmission schemes. Indeed, the rate each user receives in the context of a multiuser transmission scheme, such as e.g., Linear Zero-Forcing Beamforming (LZFBF), depends not only on the user's own channel, but also on the number of other users scheduled together with the user for such multiuser transmission as well as the channels of these users. The problem of scheduling user sets to maximize the sum of user rate when LZFBF precoding is applied has been considered, and a greedy algorithm for the user selection when considering a single cell with a single BS has been proposed. A technique in has been used as a building tool to schedule cellular and cluster MU-MIMO transmissions in cellular networks applying proportional fairness at each BS. These techniques can also be systematically expanded to include a broader range of fairness conditions with the framework of virtual queues in.
All of the above scheduling methods are local, in that they assume that user to BS association has already taken place, so that the fairness framework can be applied locally at each BS. Predicting a priori the effect that different user-BS associations have on the network-wide fairness provided across the network by these “locally fair” schedulers is in general non-trivial. However, when the number of antennas at the BS is much larger than the number of users instantaneously served by a BS in each transmission resource element, the instantaneous user rates “harden” (show much lower variability) and can be accurately predicted by just knowing the size of the serving set at the BS.
Reciprocity-Based Massive MU-MIMO
Consider the problem of enabling MU-MIMO transmission from an array of M transmit antennas to U single-antenna user terminals. The downlink (DL) channel between the i-th base-station transmitting antenna and the k-th user terminal is given by{right arrow over (yki)}={right arrow over (hki)}{right arrow over (xi)}+{right arrow over (zki)}where {right arrow over (xi)}, {right arrow over (hki)}, {right arrow over (yki)}, {right arrow over (zki)} denote the transmitted signal from base-station antenna i, the compound DL channel between the two antennas, and the observation and noise at the receiver of user terminal k, respectively. This model is applicable at any resource block. In general, the variables in the above equation are resource-block dependent. This dependency is currently ignored in the notation for convenience though with an abuse of notation, it will be used when time-sharing across various resource blocks are considered. The amplitude and phase shifts introduced by RF-to-baseband conversion hardware (e.g., gain control, filters, mixers, A/D, etc.) at the receiver of user terminal k as well as the amplitude and phase shifts introduced by the baseband-to-RF conversion hardware (e.g., amplifiers filters, mixers, A/D, etc.) at the transmitter generating the signal to be transmitted by base-station antenna i are all included in the DL compound channel.
Similarly the uplink channel between the k-th user terminal and the i-th base-station antenna is given by=+where , , , , denote the transmitted signal from user terminal k, the compound uplink (UL) channel between the two antennas, the observation and noise at the receiver of base-station antenna i, respectively. The amplitude and phase shifts introduced by RF-to-baseband conversion hardware (e.g., gain control, filters, mixers, A/D, etc.) at the receiver of base-station antenna i as well as the scalar (complex) coefficient  contains the amplitude and phase shifts introduced by the baseband-to-RF conversion hardware (e.g., amplifiers filters, mixers, A/D, etc.) at the transmitter generating the signal to be transmitted by user terminal k are all included in the compound UL channel.
In the uplink, the following model may be used:=+where  is the vector of dimension K×1 (i.e., K rows by 1 column) comprising the user symbols on subcarrier n at symbol time t,  is the M×U channel matrix that includes the constant carrier phase shifts and the frequency-dependent constant in time phase shifts due to the relative delays between the timing references of the different terminals,  and  are the received signal vector and noise at the user terminal.
In the downlink, the following model may be used:{right arrow over (y)}={right arrow over (x)}{right arrow over (H)}+{right arrow over (z)}where {right arrow over (x)} is the (row) vector of user symbols on subcarrier n at symbol time t, {right arrow over (H)} is the U×M channel matrix that includes the constant carrier phase shifts and the frequency-dependent constant in time phase shifts due to the relative delays between the timing references of the different terminals, {right arrow over (y)} and {right arrow over (z)} are the received signal (row) vector and noise at the user terminals. Other BSs at sufficiently close distance cause interference as network MIMO/joint transmission/CoMP or any other interference mitigation techniques are not considered. Interference from the other access points is included in the noise term.
Assuming perfect calibration, the compound UL and DL channels become reciprocal, so that={right arrow over (H)}
For simplicity, the thermal noise is neglected. In order to estimate the downlink channel matrix, the U user terminals send a block of U orthogonal frequency division multiplexing (OFDM) symbols, such that the uplink-training phase can be written as=+noisewhere  is a scaled unitary matrix. Hence, the base-station can obtain the channel matrix estimate=+noise
In order to perform downlink beamforming, the compound channel downlink matrix {right arrow over (H)} is used.
The ZFBF precoding matrix is calculated asW=Λ1/2[{right arrow over (H)}H{right arrow over (H)}]−1{right arrow over (H)}H where Λ is a diagonal matrix with λm's as diagonal elements that imposes on each row of the matrix W, the row normalization ∥wm∥2=1, for all m.
Hence, the ZFBF precoded signal in the downlink with equal power for each beam also taking account a distance-dependent pathloss model with the diagonal matrix matrix G, whose diagonal elements are gi's.
                                          y            →                    =                    ⁢                                                    u                →                            ⁢                              p                                  1                  /                  2                                            ⁢                              G                                  1                  /                  2                                            ⁢              W              ⁢                              H                →                                      +                          z              →                                                                    =                    ⁢                                                    u                →                            ⁢                              p                                  1                  /                  2                                            ⁢                              G                                  1                  /                  2                                            ⁢                                                                    Λ                                          1                      /                      2                                                        ⁡                                      [                                                                                            H                          →                                                H                                            ⁢                                              H                        →                                                              ]                                                                    -                  1                                            ⁢                                                H                  →                                H                            ⁢                              H                →                                      +                          z              →                                                                    =                    ⁢                                                    u                →                            ⁢                              p                                  1                  /                  2                                            ⁢                              G                                  1                  /                  2                                            ⁢                              Λ                                  1                  /                  2                                                      +                          z              →                                                      λ      k        =          1                        [                                    (                                                                    H                    →                                    H                                ⁢                                  H                  →                                            )                                      -              1                                ]                          k          ,          k                    
Notice that the resulting channel matrix is diagonal, provided that S≦M.
Prior Art on MU-MIMO User Scheduling
Although there several methods available in the literature for scheduling multi-user MIMO transmissions at the BS, a widely accepted class of methods involves scheduling policies which, at any given scheduling instant at the BS, schedule the subset of users that would yield the highest expected weighted sum-rate. Each user's expected rate in each scheduled set for transmission is a function on the instantaneous channels of all the users in the scheduled set. Indeed, assuming LZFB transmission as described in the preceding section, at any given resource block the coefficients λk's depend on the instantaneous channel matrix of all users in the scheduling set (served by ZFBF), and in particular, they can be expressed as
                    λ                  k          ,          S                    ⁡              (        t        )              =          1                        [                                    (                                                                                          H                      →                                                              k                      ,                      S                                        H                                    ⁡                                      (                    t                    )                                                  ⁢                                                                            H                      →                                                              k                      ,                      S                                                        ⁡                                      (                    t                    )                                                              )                                      -              1                                ]                          k          ,          k                      ,where {right arrow over (H)}k,S(t) denote the compound downlink channel matrix for UEk in the user set S at the t-th resource block. Clearly, since the choice of the user set S and/or resource block (t) affects λk, the expected user rates are a function of both the scheduling set and the instantaneous channel realization. Fixing the scheduling time instance, and assuming LZFBF transmission, the problem of choosing the subset S which maximize the weighted sum-rate is combinatoric in the number of antennas, as the number of possible subsets, S, that can be considered for scheduling grows exponentially fast with the maximum number of users that can be considered for joint scheduling. One solution to this problem has been proposed that relies on a greedy algorithm for user set selection, with at most quadratic complexity.
Another important factor defining the scheduling assignments that are produced by the scheduling policy is the method by which the “user weights” are chosen at each scheduling instant prior to performing the weighted sum rate maximization operation. Although many methods exist for choosing these weights, a widely accepted class of methods (because of their ability to result in nearly optimal performance with respect to a fairness criterion belonging to a broad class of fairness criteria) is one that relies on the use of “virtual queues” to determine the instantaneous user-weights in the weight-sum rate optimization.
Prior Art on Load Balancing
Traditionally, association in cellular networks has been user-terminal based. Users measure their signal-level with respect to the beacons of the nearby BSs and associate to the base-station with the strongest received signal. A generalization of this principle has been used in heterogeneous networks. In the case of comparing signal strengths from a macro and a small cell, a user-terminal can also apply a “bias” to favor association to the small cell (with respect to the macro cell).
As traffic-load imbalances are far more pronounced in small cells, there has been some recent work in load balancing in small cells. Indeed, small cells are much more sensitive to the cell association policy because of the non-uniformity of cell size, and the smallest average number of users they serve. This non-uniformity can result in extremely imbalanced traffic-load based on a max-SINR cell association. The prior art in this area mainly involves methods of exchanging information between each user and nearby BSs, which attempt to balance their load using signaling exchanges with nearby users. Another related technique, referred to as “cell breathing,” relies on dynamically changing (contracting or expanding) the coverage area depending on the load situation (over-loaded or under-loaded) of the cells by adjusting the cell transmit power. Also note that these works focus on small cells scheduling only single-user transmissions.
Limitations of the Previous Works
The methods described above have important limitations. First, given that the user rates in a MU-MIMO, transmission are not simply a function of large-scale signal-to-interference plus noise ratio (SINR), but in general depend on the scheduling set and the channel realization, the resulting load-balancing techniques are not extendable in any straightforward resource-efficient manner. Furthermore, the nature of reciprocity-based Massive MIMO TDD makes large scale SINR in a link between a user and all BSs in proximity available given a single uplink pilot broadcast from the user. In this context, the association (if any) of a user to a BS can be performed by centralized processing among the BSs without involving exchanges with the users. Using a central controller to fully perform load-balancing among the BSs as well as schedule transmission at each of the BSs is, however, computationally intractable.
Techniques for load balancing and scheduling by use of a combination of processing and information sharing between a central controller and a set of BSs have been put forth. The methods leverage properties of “massive MIMO” type transmission to enable achieving near-optimal scheduling and load-balancing performance in Massive MIMO cellular transmission. These methods are suitable for performing load balancing by use of a central controller balancing the load across a group of base-stations. However, these techniques are not readily area-scalable as they rely on a single central controller allocating resources over the network over which load balancing is performed. To illustrate the fact that these schemes are not readily scalable, assume for a moment that the average density of the BSs, and user-populations is fixed. Increasing the area over which load balancing is performed results, at best, in a linear increase of the number of optimization variables with coverage area. This is because increasing the coverage area results in a linear increase in the number of users that need to be associated with the various base-stations covering the area. Hence these techniques are limited to balancing the load in a given, confined, geographical area.