DSPs power the race to 4G
Next-generation base stations speed mobile connectivity with SDRs, hard-coded accelerators, and multicore CPUs.
Mike Demler, Technical Editor -- EDN, April 21, 2011
At A Glance
|
The evolution of wireless mobile broadband technology from 3G (third-generation)
to 4G (fourth-generation) networks has given rise to considerable
marketing hype. The “official” 4G technologies, which the ITU (International
Telecommunication Union) designates, are LTE (long-term-evolution)-Advanced and WirelessMAN (metropolitan-area-network)-Advanced, which
is perhaps better known as WiMax (worldwide interoperability for microwave
access), or IEEE 802.16m. Nevertheless, these facts haven’t stopped a few wireless
operators—AT&T and T-Mobile in the United States, for example—from
appropriating the designation for their new and improved 3G networks. The
companies are boosting their network speed with HSPA+ (evolved high-speed-packet-access)
services. The manufacturers generally base these pseudo-4G mutants on software upgrades to 3G
base stations, usually with some additional enhancements to the backhaul connection.Verizon Wireless, backing LTE, and Sprint and Clearwire, backing WiMax, have begun the transition to the all-IP (Internet Protocol) network architectures that are mandatory for a legitimate 4G system, as has TeliaSonera in Europe, albeit with considerably lower data rates than the 1-Gbps download speed that the ITU-endorsed versions specify. Regardless, if you are designing cellular base stations for either system, you know the reality: 4G’s higher data rates and lower communications latency place much heavier demands on the DSP (digital-signal-processing) components that execute the underlying communication algorithms. LTE specifies a maximum download data rate of 150 Mbps versus HSPA+’s 42 Mbps, and the maximum upload rate increases from 11 to 75 Mbps in LTE.
Industry-standards organization 3GPP (Third Generation Partnership Project) denotes the enhanced 4G base stations as eNodeB (Evolved NodeB, Reference 1). To deliver the coverage, capacity, and throughput demands of 4G, wireless operators must build heterogeneous networks by linking base stations of various sizes—from small femtocells that support just a few users in a residential or an enterprise setting, to whole-building picocells, to wide-area microcells and macrocells that can simultaneously support hundreds or thousands of users. Operators are also managing the 4G rollout by installing multimode base stations with SDRs (software-defined radios) that simultaneously support both 3 and 4G. The scalability of your base-station-DSP design is critical.
Fortunately, as a designer of an eNodeB system, you have a lot to choose from when it comes to DSP components. If you are developing an LTE SOC (system on chip), several silicon-IP (intellectual-property) vendors can provide most of the building blocks that you need in a customizable form that you can fine-tune for your ASIC. Alternatively, you can find off-the-shelf DSP ASSPs (application-specific standard processors) that support LTE or WiMax processing. At the 2011 MWC (Mobile World Congress) in Barcelona, Spain, several chip companies made competing claims to having developed the first complete LTE base station on a single chip. A third option is also available, with new FPGAs offering much of the customizability of an ASIC with the DSP-hardware performance of an ASSP. It pays to compare before you decide which 4G DSP vehicle is best for your application.
4G DSP building blocks
To build a 4G modem, you must start with the PHY (physical) layer, Layer 1 or the radio-interface layer. To exploit the high data rate and spectral efficiency of 4G radio technologies, which are similar for LTE and WiMax, designers apply sophisticated DSP for the OFDMA (orthogonal-frequency-division/multiple-access) modulation with as much as 64-QAM (64- state quadrature-amplitude modulation); the interface to MIMO (multiple-input/multiple-output) antennas with adaptive beam forming; and a host of sophisticated techniques for packet processing, error control, and QOS (quality of service). The 3GPP industry-standard organization is now up to Release 9 of its LTE specification, and more changes should emerge as the organization further develops Release 10 and beyond for LTE-Advanced.
Tensilica has developed the Atlas LTE
reference-architecture platform, which
implements a complete 3GPP LTE Layer
1 PHY with components of the ConnX
DSP family (Figure 1). You can modify
the fully programmable Atlas SDR
after manufacture of an SOC to accommodate
changes in the LTE standard.
The ConnX BBE (baseband engine) 16
contains 16 18×18-bit MAC (multiply/accumulate) units that can perform FFTs
(fast Fourier transforms) or other digitalfilter
functions and an eight-way SIMD
(single-instruction/multiple-data), threeissue,
VLIW (very-long-instruction-word)
vector-processing pipeline.The Atlas often offloads bit-manipulation functions to the ConnX BSP3, which targets 16-, 20-, 32-, and 40-bit vector operations, and performs computations on 128-bit-wide vector files that it can load and store as four 32-bit words, eight 16-bit words, or 16 8-bit words. The ConnX Turbo16 LTE turbo performs the LTE-turbo-coding function on data streams operating as fast as 150 Mbps, and it is programmable so that you can modify the software algorithms for various data sources and formats. The 16-way SIMD ConnX SSP16 arithmetic processor processes streams of soft bits to perform functions such as LTE HARQ (hybrid-automatic-repeat-request) error-correction coding. You can add optional specialized functions, such as a Viterbi-accelerator module, to the SSP16.
Tensilica recently extended the BBE family with the ConnX BBE64-128, increasing performance to more than 100 billion MAC operations to meet future requirements for LTE-Advanced (Reference 2). The BBE64-128 enables 128 MAC operations per cycle for maximum throughput and minimum energy consumption. Modeless switching to Tensilica’s smaller standard 16- and 24-bit instructions enables high code density for nonvector algorithms.
The high-performance Ceva XC323 licensable IP core for 4G SDR base-station applications features dual vector-processing engines (Figure 2). Ceva’s DSP core integrates an eight-way VLIW SIMD architecture in a 2×256-bit configuration for as many as four parallel operations in each processor, with 32 MAC operations per cycle. The core also has built-in native support for complex arithmetic. The Ceva-XC323 is scalable for base stations from femtocells to macrocells, and the architecture supports 3G standards, such as WCDMA (wideband-code division/multiple access) and HSPA as well as 4G WiMax, LTE, and LTE-Advanced. The XC323 software supports nonvectorized operations, and the instruction sets cover a full range of Layer 1 PHY requirements such as DFT (discrete Fourier transform), FFT, channel estimation, MIMO detectors, an interleaver, a deinterleaver, and optional support for Viterbi decoding. Ceva based the XC323’s GCU (general computation unit) on the Ceva-X1641. The device provides four-issue SIMD operation and four 16×16-bit two’s-complement MAC units and four 40-bit ALUs (arithmetic-logic units).

The XC323’s PSU (power-scaling unit) has built-in static- and dynamic-power management and supports multiple voltage domains for the various functional units. To conserve energy, you can operate the XC323 core in multiple modes ranging from full operation to memory-retention mode to complete power shutoff. The full-duplex AXI bus also contains low-power features, including the ability to shut down when no data traffic is present.
Base stations on chips
Competing claims emerged at the 2011 MWC for the first complete base station-on-chip design. In live demonstrations with partners AirWalk Communications and Lime Microsystems, Mindspeed Technologies showed working production silicon for the Transcede 4000 SOC, which the company initially announced with backing from three presilicon customers in 2010 (Reference 3). Mindspeed describes the Transcede 4000, which is a finalist for the 2010 EDN Innovation Awards in the ASSP category, as the first commercially available single-chip eNodeB product for LTE small cells (Figure 3 and Reference 4). Mindspeed is manufacturing the 300-Mbps transistor chip in a 40-nm TSMC (Taiwan Semiconductor Manufacturing Co) process.

The complex, heterogeneous Transcede 4000 SOC integrates 26 programmable processors. The PHY layer includes 10 instances of Ceva’s 1641 DSP-IP core and 10 Mindspeed DSP accelerators in the SPU (signal-processing-unit) cluster. The microcoded processors accelerate fixed functions, and the Ceva cores handle general-purpose programmable-DSP functions. Mindspeed can remap the microcoded accelerators to suit various applications if necessary.
“It’s not just a matter of throwing some network processors and some general-purpose DSPs onto a chip,” says Taylor. You must know the architecture so that you know the required speed across the whole system, and you must ensure that you have the right amount of memory and use a nonblocking architecture, he adds. The Transcede 4000 intelligently allocates memory with smart DMA (direct-memory-access) engines, which perform dynamic allocations between the PHY and Layer 2 switch-level or MAC (media-access-controller) functions.
The system cluster performs control functions and data-packet processing, using a combination of dual- and quad-core ARM Cortex-A9 processors in an SMP (symmetric-multiprocessing) configuration. The control plane requires RISC (reduced-instruction-set-computer) processors because they enable the completion of instructions in one cycle. A task dispatcher that runs on one of the ARM cores performs dynamic load balancing. The dispatcher assigns a list of tasks to the next available DSP to run in the local DSP memory. This approach alleviates the need for complex software design and makes the architecture modular and extensible. You can replace Mindspeed-defined tasks with your own differentiated algorithms. Because the dispatcher recognizes the hardware architecture, your software scales with the number of processors in a system. The ARM RISC processors perform packet-processing functions, such as branch prediction, Taylor says, and these functions don’t fit well with DSP architectures that must process tight algorithm loops and deep vector processing.
Integrated architecture
Freescale Semiconductor’s new QorIQ Qonverge architecture provides an integration of communications-processing, DSP, and wireless-acceleration technologies in various SOC configurations that can work in 3 and 4G femtocell, picocell, metrocell, and macrocell base stations. Qonverge combines the Power Optimization with Enhanced RISC Architecture CPU core of the original QorIQ communication processor, which Freescale (formerly, Motorola) developed through collaboration with IBM, with a StarCore DSP and MAPLE (multiaccelerator-platform-engine) multimode baseband accelerator. According to Preet Virk, leader of the global-networking-segment marketing division at Freescale, the company has been shipping multicore silicon for the QorIQ since 2009.
| Read the “Benchmarking DSPs: Do FPGAs cost more?” post in the IC Design Corner blog. |
Freescale also announced the PSC9132, a configuration of the QorIQ Qonverge for picocell and enterprise-femtocell applications. The PSC9132 incorporates two e500 Power Architecture cores and two StarCore SC3850 DSPs, and it extends performance to the full LTE maximum download and uploading speeds of 150 and 75 Mbps, respectively. You can also use the PSC9132 for HSPA+ or in a WiMax 802.16e application to deliver as much as 50- and 13-Mbps downloading and uploading speeds, respectively, for as many as 64 users. The device also includes a CPRI (Common Public Radio Interface) and a MIMO accelerator.
The company is planning to make the PSC9130/31 and PSC9132 available in the second half of this year. Freescale will manufacture the picocell and femtocell products in a 45-nm process. To extend the family for metrocell and macrocell applications, Freescale plans to use a 28-nm process and to have chips available in the early part of 2012. The company did not disclose details of the architecture or the number of embedded cores that future 28-nm designs will integrate. The targeted specifications for the metrocell and macrocell devices support hundreds of users in a single sector of an LTE-Advanced base station or multiple sectors of LTE with 20-MHz channels and as many as eight receiving and transmitting antennas per sector.
TI adds accelerators
At the 2011 MWC, Texas Instruments announced that its new TCI6618 wireless-base-station SOC doubles the performance of the TCI6616, which the company released just six months earlier. According to Kathy Brown, TI’s wireless-base-station-product manager, the TCI6618 targets use in the high-end-base-station market for macrocell or compact-macrocell applications (Figure 4). The TCI6618 integrates four 1.2-GHz C66x DSP cores that support both fixed- and floating-point arithmetic operations, which is unique to TI’s base-station devices, according to Brown. Floating-point operation is especially useful for increasing precision of matrix inversions, she says, resulting in higher spectral efficiency. The TCI6618 is available for sampling, and the company plans volume production for this year. TI will also later announce a device targeting small femtocells and picocells.

The TCI6618 has 15 coprocessing accelerators to complement the four DSP cores, performing 95% of the LTE Layer 1 processing. Three TCPD3s (turbo decoder coprocessors) perform turbo decoding, and a TCP3e performs encoding. The product also has four VCP2s (Viterbi-decoder coprocessors) and three FFTCs (fast-Fourier-transform coprocessors). The TCI6618’s BRC (bit-rate coprocessor) performs uplink and downlink bit processing for WCDMA/HSPA+, TDSCDMA (time-division-synchronous-code division/ multiple access), LTE, and WiMax. It also supports GSM (global-system-for-mobile) communications and CDMA. For LTE, the device supports a data rate as high as 914 Mbps.
A network coprocessor enables communications between the TCI6618 and the network, with a packet accelerator and a security accelerator for autonomous packet-to-packet processing. The network coprocessor handles acceleration for Layer 2 and transport processing. According to TI, the network coprocessor, together with the DSP cores and other acceleration for layers 1 and 2 processing, eliminates the need for a RISC processor.
The TCI6618’s multicore navigator decreases programming effort by allowing you to create an abstraction of tasks with priorities that the packet-based manager in the multicore navigator then dispatches to the DSP cores as they become available. This approach is similar in concept to the task dispatcher that runs on one of the ARM cores in Mindspeed’s SOC.
TI’s KeyStone architecture, a common
platform for scaling C66x designs,
includes the TeraNet switch fabric,
which provides 2 Tbits of bandwidth for
data transfer within the SOC (Figure
5). The MSMC (multicore-shared-memory
controller) enables the cores
to directly access shared memory without
using any of TeraNet’s bandwidth.
Peripherals in the TCI6618 include
a six-lane AIF2 SERDES (serializer/deserializer)-based antenna interface
that operates as fast as 6.144 Gbps
and that complies with the OBSAI
(Open Base Station Architecture
Initiative) RP3 (Reference Point 3)
and CPRI Standards. Four Lanes of
SRIO (Serial RapidIO) 2.1 and two
lanes of PCIe (Peripheral Component
Interconnect Express) Generation 2
I/Os support operation as fast as 5 Gbps.
A 64-bit DDR3 interface allows you to
connect to external memory at speeds
as high as 1333 MHz.

FPGAs embed ARM cores
According to Mark Quartermain, senior manager for the communications business unit at Xilinx, the traditional role of FPGAs in baseband processing is to provide specialized coprocessing accelerators for functions that the DSP does not perform well, such as the turbo decoder, and in connectivity, such as CPRI. He points out that this support is insufficient for 4G, however, for which a single device must provide all the processing to meet data-rate and latency requirements, and instead proposes a linear processing flow with one dedicated FPGA per sector. By replacing the older Virtex-6 FPGAs with the new Kintex-7, you can achieve a 50% reduction in power and cost, says Quartermain.
Xilinx offers a TDP (targeted design platform) for a complete LTE-baseband channel card. You can use the uplink and downlink reference designs, which Xilinx built from a library of optimized prebuilt and prevalidated LogicCore components. The LTE library components include a 3GP P channel decoder and estimator, an LTE FFT, an LTE MIMO, and turbo encoder/decoders. For the radio-interface layer, Xilinx offers the multimode-radio TDP, which you can apply to LTE, TDSCDMA, WiMax, WCDMA, CDMA2000, or GSM applications. LogicCore functions include a crest-factor-reduction block, a digital predistortion filter, digital upconversion and downconversion converters, and CPRI- and SRIO-interface blocks.

You can reach Technical Editor Mike Demler at 1-408-384-8336 and mike.demler@ubm.com.
|
References |
|
Talkback



















