Spread spectrum transmission solutions are becoming increasingly important for instance in global navigation satellite systems (GNSS). Presently, the Global Positioning System (GPS; U.S. Government) is the dominating standard, however alternative standards exist and are expected to gain increased importance in the future. So far, the Galileo system (the European programme for global navigation services) and the Global Orbiting Navigation Satellite System (GLONASS; Russian Federation Ministry of Defense) constitute the alternative standards. Due to different signal formats and frequency bands of these standards a navigation receiver adapted for one GNSS, say GPS, is not able to receive and process signals from a satellite that belongs to a different GNSS, say the Galileo system.
In order to enable this type of system flexibility, a multi-mode receiver is required. However, including multiple receiver chains in a single device is not only expensive, it also renders the unit bulky and heavy, particularly if more than two signal formats are to be processed. Instead, a programmable software receiver solution is desired, wherein the signal processing principles may be altered according to which signals that presently shall be received and processed.
A software-based receiver is also desirable in cases where the GNSS receiver is intended to share a processing platform with other radio signal receivers and/or signal processing devices.
Various software solutions are already known for processing GNSS signals. The patent document WO2004/036238 describes a spread spectrum signal processing solution according to which data words are formed containing one or more consecutive sample values based on received spread spectrum signals. The data words are then correlated with pre-generated code vectors to produce resulting decoded data in a processing-efficient manner.
Akos. D. et al., “Tuning In to GPS—Real-Time Software Radio Architectures for GPS Receivers”, GPS World, July 2001 describes a receiver architecture through which IF signal samples are fed directly from a radio front-end to a programmable processor for continued processing. The article mentions the possibility of using single instruction multiple data (SIMD) instructions to process multiple data samples in parallel.
Dovis, F. et al., “Design and Test-Bed Implementation of a Reconfigurable Receiver for Navigation Applications”, Electronics Department, Politecnico di Torino, Navigation Signal Analysis and Simulation Group, Spring of 2002 relates to the design of a reconfigurable GNSS receiver which is capable of fusing data from two or more different GNSS:s. The document sketches an architecture which, in addition to a radio front-end, includes a Field Programmable Gate Array (FPGA) and a Digital Signal Processor (DSP).
Hence, the prior art includes various examples of software-based GNSS receivers. Nevertheless, in order to meet the growing mass market's demands in terms of high flexibility, low cost and upgradeability software receivers are desired that have even further enhanced power efficiency. Namely, in order to be included in a platform of a handheld device, such as a mobile phone or a Personal Digital Assistant (PDA), the processing load caused by the GNSS receiver's software baseband engine should be as low as possible (i.e. a low MIPS requirement must be fulfilled; MIPS—millions of/or Mega Instructions Per Second) because a low power consumption enables implementation in weaker microprocessor systems and/or co-existence with other processing intensive applications running on the same platform.
Moreover, the time-critical memory requirement should be minimized, since the mass-market embedded platforms are generally weak in terms of bus bandwidth, cache sizes and memory latencies (i.e. read/write stalls). In a software baseband receiver a trade-off can normally be made between memory usage and MIPS usage (i.e. between what is pre-computed and stored in tables and vectors and what is generated on the fly). Consequently, in order to be optimal, a software baseband solution should be well adapted to the microprocessor architecture both with respect to algorithm design and implementation, i.e. use as few operations as possible with an optimized memory usage and access.
A GNSS receiver performing a continuous tracking procedure (e.g. required for in-car navigation) must be capable of processing a high-bandwidth data stream in real-time. The software solutions currently available for real-time tracking in embedded architectures are predominantly single-bit operand solutions. Here, the Doppler shift removal and correlation operations must be executed with operands restricted to single bit binary values in order to lower the internal data stream bandwidths and processing load. This imposes a significant sensitivity loss (up to 6 dB). Furthermore, in a typical use case (e.g. inside a car), the navigation device is often placed such that there is no direct line of sight between the receiving antenna and the satellites. This causes additional signal power degradation by 6-10 dB.
We will now discuss the relationship between the digital baseband processing approach used and the resulting quality of the decoded signal. Assuming that the antenna is of good quality, the radio frequency conditioning unit (i.e. the analog part of the receiver block that demodulates and samples the signal) is provided with a low-noise amplifier (LNA) of high quality, has a sufficiently large analog bandwidth (and sampling frequency) and the local oscillator driving the front-end has an adequate frequency stability, the digital baseband processing essentially determines the receiver's total noise-figure.
Further, if circumstances external to the receiver, such as interference, multi-path fading and signal obscuration are disregarded, any signal power loss is caused by optimizations in the quantizing of correlation operands, the quantizing of the tracking error in the time delay of code replicas and the frequency error in the Doppler estimation.
Generally, a spread spectrum receiver may compensate for a weak (low power) signal by performing longer coherent and non-coherent correlation operations. Namely, this averages out a larger amount of noise (by the summing performed in correlation) and hence renders the signal more easily detectable. By prolonging the integration time (e.g. the time spent on correlation before investigating the correlation result) any loop filters used for tracking the code and carrier frequency and phase will be updated less frequently. This generally degrades the performance and stability of these filters, especially in terms of dynamic performance.
The tracking loops aim at matching the incoming signal with respect to code, carrier frequency and phase. This matching is performed by repeatedly adjusting the frequency of the locally generated replica code and carrier Doppler shift. As soon as the relevant discriminators (error functions) indicate no (or a sufficiently low) difference between incoming signal and a locally produced signal, the incoming signal and local replica are considered to be aligned. At this point, the receiver has a best possible estimate of carrier Doppler shift and code start (the parameters used for position, velocity and time computations, as well as for determining a strongest possible signal power retrieved from the correlation process).
In order to enable decoding of a GNSS signal, the timing error for the replica code must be within ±1 chip, so that any detectable signal power can be produced. If multi-path effects and cross-correlation effects are considered, a lower timing error is typically required.
To determine the carrier Doppler shift, the frequency error must be less than the inverse of the integration time. Otherwise, any resulting Doppler shift cancels out the correlation gain.
The PRN codes used for spreading and despreading are only two valued (+1, −1). Therefore, these codes may be represented with binary values without any correlation loss. As for the sampled incoming data, a single bit value representation works, which gives the CDMA systems in general (and GNSS:s in particular) a remarkable robustness. Even though each sample mostly contains noise (or undesired signal energy) an adequate correlation process is still able restore the signal.
An increase from 1-bit data to 2-bit results in a C/No (carrier-to-noise, bandwidth independent signal power metric) gain of about 2.5 dB-Hz, and an increase from 2-bit representation to 4-bit representation accomplishes another 1 dB-Hz gain. However, further increases of the number of bits only provide insignificant quality enhancements, and are therefore not justified in commercial applications.
As for the carrier Doppler frequency compensation, the sinusoid amplitude values are usually quantized with 1- to 5-bit values, depending on quality the receiver. Use of single bit values instead of a 3-bit representation results in a signal power loss of about 2 dB-Hz. A single bit representation is also unfavorable because it introduces unwanted signal properties. Namely, the 1-bit quantized sinusoid is actually a square-wave, which is relatively remote from the carrier waveform used the transmitter side (i.e. in the satellites). The transmitter normally modulates a carrier wave by means of phase shift keying, such as Binary Phase Shift Keying (BPSK).
Nevertheless, since the single bit representation allows for the least complex hardware implementation this is the standard approach in low-end GNSS receivers. More advanced (and expensive) receivers often use multi-bit data and multi-bit carrier Doppler representations.
When it comes to software baseband implementation the least complex implementation usually coincides with the executing least-addressable-unit (LAU) that is supported by the microprocessor system architecture. Usually, the LAU is 8-bit or 16-bit valued (byte or half word/word registers). Moreover, high-performance instructions, such as single cycle MACs (multiply-accumulate) in digital signal processors (DSP:s) or dedicated SIMD instructions tend to use LAU operands as input.
Today's most MIPS-efficient implementations of software baseband solutions use XOR instructions with 1-bit operands. This can be explained by the fact that most modern microprocessor architectures support 32-bit XOR instructions, which in turn enables 32 parallel multiplications of 1-bit operands in one instruction. This is possible because the product of a 1-bit by 1-bit multiplication does never expands outside one bit. The XOR operation simply updates the sign.
The MIPS requirement for a continuous tracking GNSS software receiver is almost entirely determined by the performance of the carrier Doppler removal and replica code correlation. The reason behind this is that the baseband processing is performed on a sample basis (i.e. in the MHz-domain), whereas the tracking loops updates, the navigation data decoding and the position computations are carried out at a higher system level (i.e. in the kHz- and Hz-domain respectively). Therefore, the latter signal processing is less time critical.
In the light of this, the baseband algorithm design and its implementation are of vital importance to the performance of the software receiver. Thus, using XOR instructions is simply not sufficient to achieve a good processing efficiency.
Instead, the efficiency of following operations/steps also determine the overall performance: load sampled signal data; loading/generation of local I/Q Doppler operands; multiplication of data with Doppler operands to compensate for a carrier Doppler shift; loading/generation of local replica PRN code operands; multiplication of baseband data with replica operands; accumulation of individual results for producing correlation outputs; and storing of results.
In order to attain a basic implementation efficiency, the operands should be vectorized and pre-computed as much as possible given a reasonable trade-off between desired accuracy and memory requirements, for example as is proposed in the International Patent Application WO2004/036238.
The generalized baseband processing can be described by complex vector operations, using in-phase (I) and quadrature-phase (Q) notation, as:
                              A          τ                =                              ∑                          k              =              0                                      L              -              1                                ⁢                                          ⁢                                    [                                                (                                                                                    d                        1                                            ⁡                                              [                        k                        ]                                                              +                                          j                      ·                                                                        d                          Q                                                ⁡                                                  [                          k                          ]                                                                                                      )                                ·                                  (                                                                                    s                        1                                            ⁡                                              [                        k                        ]                                                              +                                          j                      ·                                                                        s                          Q                                                ⁡                                                  [                          k                          ]                                                                                                      )                                            ]                        ·                                          p                τ                            ⁡                              [                k                ]                                                                        (        1        )            where L is the vector length (typically one code epoch in samples),                d[k] is a complex sampled data vector,        sI[k] is an in-phase part of a complex carrier Doppler vector,        sQ[k] is a quadrature-phase part of a complex carrier Doppler vector,        pτ[k] is a τ-delayed real-valued local PRN code replica, and        Aτ is a complex correlation result with respect to a delay given by τ.        
The most common number of replica delays, τ, is three; denoted, early AE (τ=E), prompt AP (τ=P), and late AL (τ=L). Provided that three delays are used, in total six accumulator values will be produced (i.e. three complex accumulators). Alternatively, a combined early-minus-late approach may be used, which produces a total of four accumulator values (AP,I, AP,Q, AE-L,I and AE-L,Q).
Preferably, for different delays, τ, the baseband version of the sampled data is preferably reused between different delays.
The equation (1) is valid both for I/Q-sampling and IF-sampling. In the latter case, d[k] is real-valued (i.e. all dQ[k] values are zero), and s[k] also includes the IF frequency in addition to the Doppler shift.
Assuming that the operands are vectorized, pre-computed and rapidly accessible from memory, a straight-forward baseband processing results in the following pseudo code complexity for computing the accumulator values for three replica delays of a single sample value d[k] (represented by dI[k] and dQ[k] respectively in complex notation):
for (k=0:L−1){bI=dI[k]·sI[k]−dQ[k] sQ[k]bQ=dI[k]·sQ[k]+dQ[k] sI[k]AE,I= AE,I+bI·pE[k]AE,Q= AE,Q+bQ·pE[k]AP,I= AP,I+bI·pP[k]AP,Q= AP,Q+bQ·pP[k]AL,I= AL,I+bI·pL[k]AL,Q= AL,Q+bQ·pL[k]}
Hence, in addition to unavoidable load and store operations 10 multiplications and 8 additions are required. The processing of a single code epoch (1 millisecond) of a GPS C/A signal using I/Q sampling would require an L-value of approximately 2000. Assuming in total 25 instructions (serial LAU processing) per sample value gives roughly 50000 instructions per channel and millisecond, i.e. a processing load of 50 MIPS. Thus, an implementation wherein all vector operands are pre-generated, a fully parallel twelve-channel software receiver would cause a processing load of approximately 600 MIPS. If, instead, IF sampling were employed fewer instructions per pass would be required. In this case, however, the L-value must be doubled (i.e. around 4000), which results in an equivalent overall processing load. Naturally, such a MIPS-requirement is unsuitable for today's handheld devices.
By dividing the L sample values representing a code epoch into smaller blocks, SIMD instructions may be applied to these blocks and several passes can be computed in parallel. Furthermore, if single-bit data is used it is possible to lower the processing burden down to less than 10 MIPS per channel by applying XOR operations and summation look-up tables (LUT).
However, to achieve such performance with acceptable accuracy the receiver must have access to a relatively large memory means having a high bandwidth and low latency. Namely, as mentioned initially, the parameters: memory usage, MIPS and accuracy can all be traded against one other depending on the application and target architecture.
Increasing the number of bits used for estimating the carrier Doppler shift and/or digitizing the incoming data stream may attain an improved sensitivity in the baseband processing. Nevertheless, this causes a performance loss in the above-mentioned packed processing SIMD approach, since the microprocessor register widths are fixed and fewer samples can then be computed in parallel per pass with XOR operations. If both carrier Doppler shift estimation and the incoming data stream are multi-bit valued the processing becomes very complex, and difficult to perform efficiently because the representation of the intermediate products will inevitably expand.
For 2-bit valued data and carrier Doppler shift estimation a descent implementation can be designed by using sign and magnitude representation and a separate processing of these parts. However, also in this case the performance penalty compared to single-bit processing is still considerable. In implementations with more than 2 bits per data value the additional logic operations required for combining individual sign and magnitude parts becomes a serious bottleneck.