DSPs power the race to 4G
Mike Demler, Technical Editor - April 21, 2011
The evolution of wireless mobile broadband technology from 3G (third-generation) to 4G (fourth-generation) networks has given rise to considerable marketing hype. The “official” 4G technologies, which the ITU (International Telecommunication Union) designates, are LTE (long-term-evolution)-Advanced and WirelessMAN (metropolitan-area-network)-Advanced, which is perhaps better known as WiMax (worldwide interoperability for microwave access), or IEEE 802.16m. Nevertheless, these facts haven’t stopped a few wireless operators—AT&T and T-Mobile in the United States, for example—from appropriating the designation for their new and improved 3G networks. The companies are boosting their network speed with HSPA+ (evolved high-speed-packet-access) services. The manufacturers generally base these pseudo-4G mutants on software upgrades to 3G base stations, usually with some additional enhancements to the backhaul connection.
Verizon Wireless, backing LTE, and Sprint and Clearwire, backing WiMax, have begun the transition to the all-IP (Internet Protocol) network architectures that are mandatory for a legitimate 4G system, as has TeliaSonera in Europe, albeit with considerably lower data rates than the 1-Gbps download speed that the ITU-endorsed versions specify. Regardless, if you are designing cellular base stations for either system, you know the reality: 4G’s higher data rates and lower communications latency place much heavier demands on the DSP (digital-signal-processing) components that execute the underlying communication algorithms. LTE specifies a maximum download data rate of 150 Mbps versus HSPA+’s 42 Mbps, and the maximum upload rate increases from 11 to 75 Mbps in LTE.
Industry-standards organization 3GPP (Third Generation Partnership Project) denotes the enhanced 4G base stations as eNodeB (Evolved NodeB, Reference 1). To deliver the coverage, capacity, and throughput demands of 4G, wireless operators must build heterogeneous networks by linking base stations of various sizes—from small femtocells that support just a few users in a residential or an enterprise setting, to whole-building picocells, to wide-area microcells and macrocells that can simultaneously support hundreds or thousands of users. Operators are also managing the 4G rollout by installing multimode base stations with SDRs (software-defined radios) that simultaneously support both 3 and 4G. The scalability of your base-station-DSP design is critical.
Fortunately, as a designer of an eNodeB system, you have a lot to choose from when it comes to DSP components. If you are developing an LTE SOC (system on chip), several silicon-IP (intellectual-property) vendors can provide most of the building blocks that you need in a customizable form that you can fine-tune for your ASIC. Alternatively, you can find off-the-shelf DSP ASSPs (application-specific standard processors) that support LTE or WiMax processing. At the 2011 MWC (Mobile World Congress) in Barcelona, Spain, several chip companies made competing claims to having developed the first complete LTE base station on a single chip. A third option is also available, with new FPGAs offering much of the customizability of an ASIC with the DSP-hardware performance of an ASSP. It pays to compare before you decide which 4G DSP vehicle is best for your application.
4G DSP building blocks
To build a 4G modem, you must start with the PHY (physical) layer, Layer 1 or the radio-interface layer. To exploit the high data rate and spectral efficiency of 4G radio technologies, which are similar for LTE and WiMax, designers apply sophisticated DSP for the OFDMA (orthogonal-frequency-division/multiple-access) modulation with as much as 64-QAM (64- state quadrature-amplitude modulation); the interface to MIMO (multiple-input/multiple-output) antennas with adaptive beam forming; and a host of sophisticated techniques for packet processing, error control, and QOS (quality of service). The 3GPP industry-standard organization is now up to Release 9 of its LTE specification, and more changes should emerge as the organization further develops Release 10 and beyond for LTE-Advanced.
Tensilica has developed the Atlas LTE reference-architecture platform, which implements a complete 3GPP LTE Layer 1 PHY with components of the ConnX DSP family (Figure 1). You can modify the fully programmable Atlas SDR after manufacture of an SOC to accommodate changes in the LTE standard. The ConnX BBE (baseband engine) 16 contains 16 18×18-bit MAC (multiply/accumulate) units that can perform FFTs (fast Fourier transforms) or other digitalfilter functions and an eight-way SIMD (single-instruction/multiple-data), threeissue, VLIW (very-long-instruction-word) vector-processing pipeline.
The Atlas often offloads bit-manipulation functions to the ConnX BSP3, which targets 16-, 20-, 32-, and 40-bit vector operations, and performs computations on 128-bit-wide vector files that it can load and store as four 32-bit words, eight 16-bit words, or 16 8-bit words. The ConnX Turbo16 LTE turbo performs the LTE-turbo-coding function on data streams operating as fast as 150 Mbps, and it is programmable so that you can modify the software algorithms for various data sources and formats. The 16-way SIMD ConnX SSP16 arithmetic processor processes streams of soft bits to perform functions such as LTE HARQ (hybrid-automatic-repeat-request) error-correction coding. You can add optional specialized functions, such as a Viterbi-accelerator module, to the SSP16.
Tensilica recently extended the BBE family with the ConnX BBE64-128, increasing performance to more than 100 billion MAC operations to meet future requirements for LTE-Advanced (Reference 2). The BBE64-128 enables 128 MAC operations per cycle for maximum throughput and minimum energy consumption. Modeless switching to Tensilica’s smaller standard 16- and 24-bit instructions enables high code density for nonvector algorithms.
The high-performance Ceva XC323 licensable IP core for 4G SDR base-station applications features dual vector-processing engines (Figure 2). Ceva’s DSP core integrates an eight-way VLIW SIMD architecture in a 2×256-bit configuration for as many as four parallel operations in each processor, with 32 MAC operations per cycle. The core also has built-in native support for complex arithmetic. The Ceva-XC323 is scalable for base stations from femtocells to macrocells, and the architecture supports 3G standards, such as WCDMA (wideband-code division/multiple access) and HSPA as well as 4G WiMax, LTE, and LTE-Advanced. The XC323 software supports nonvectorized operations, and the instruction sets cover a full range of Layer 1 PHY requirements such as DFT (discrete Fourier transform), FFT, channel estimation, MIMO detectors, an interleaver, a deinterleaver, and optional support for Viterbi decoding. Ceva based the XC323’s GCU (general computation unit) on the Ceva-X1641. The device provides four-issue SIMD operation and four 16×16-bit two’s-complement MAC units and four 40-bit ALUs (arithmetic-logic units).
By applying the instruction-level parallelism in the XC323, you can parallelize as many as eight control and vector instructions. You can construct a homogeneous multicore system design by connecting multiple instances of the XC323 through the built-in AXI (advanced-extensible-interface) bus. A snooping mechanism in the data-memory subsystem removes handshake overhead by detecting external-device accesses to internal memory buffers. The data-memory subsystem also contains a message queue for synchronization and system control and control for access to external memories to enable data sharing among multiple cores. Designers can take advantage of the multilayer AXI concept to implement programmable arbitration between masters and slaves.
The XC323’s PSU (power-scaling unit) has built-in static- and dynamic-power management and supports multiple voltage domains for the various functional units. To conserve energy, you can operate the XC323 core in multiple modes ranging from full operation to memory-retention mode to complete power shutoff. The full-duplex AXI bus also contains low-power features, including the ability to shut down when no data traffic is present.
Base stations on chips
Competing claims emerged at the 2011 MWC for the first complete base station-on-chip design. In live demonstrations with partners AirWalk Communications and Lime Microsystems, Mindspeed Technologies showed working production silicon for the Transcede 4000 SOC, which the company initially announced with backing from three presilicon customers in 2010 (Reference 3). Mindspeed describes the Transcede 4000, which is a finalist for the 2010 EDN Innovation Awards in the ASSP category, as the first commercially available single-chip eNodeB product for LTE small cells (Figure 3 and Reference 4). Mindspeed is manufacturing the 300-Mbps transistor chip in a 40-nm TSMC (Taiwan Semiconductor Manufacturing Co) process.
Alan Taylor, marketing director for wireless-baseband products at Mindspeed, says that a targeted, multicore SOC is the best method of minimizing power and cost and meeting the high-performance DSP requirements of 4G baseband processing. He also notes that scalability is essential as network operators migrate from 3G as 4G standards for LTE continue to evolve. General-purpose DSPs lack the needed precision, Taylor says, and performance requirements drive the need for dedicated fixed-function processors to eliminate the overhead of implementing functions such as FFTs in software. He believes that offloading fixed functions to FPGAs, as has been common in base-station designs, introduces additional power consumption and cost.
The complex, heterogeneous Transcede 4000 SOC integrates 26 programmable processors. The PHY layer includes 10 instances of Ceva’s 1641 DSP-IP core and 10 Mindspeed DSP accelerators in the SPU (signal-processing-unit) cluster. The microcoded processors accelerate fixed functions, and the Ceva cores handle general-purpose programmable-DSP functions. Mindspeed can remap the microcoded accelerators to suit various applications if necessary.
“It’s not just a matter of throwing some network processors and some general-purpose DSPs onto a chip,” says Taylor. You must know the architecture so that you know the required speed across the whole system, and you must ensure that you have the right amount of memory and use a nonblocking architecture, he adds. The Transcede 4000 intelligently allocates memory with smart DMA (direct-memory-access) engines, which perform dynamic allocations between the PHY and Layer 2 switch-level or MAC (media-access-controller) functions.
The system cluster performs control functions and data-packet processing, using a combination of dual- and quad-core ARM Cortex-A9 processors in an SMP (symmetric-multiprocessing) configuration. The control plane requires RISC (reduced-instruction-set-computer) processors because they enable the completion of instructions in one cycle. A task dispatcher that runs on one of the ARM cores performs dynamic load balancing. The dispatcher assigns a list of tasks to the next available DSP to run in the local DSP memory. This approach alleviates the need for complex software design and makes the architecture modular and extensible. You can replace Mindspeed-defined tasks with your own differentiated algorithms. Because the dispatcher recognizes the hardware architecture, your software scales with the number of processors in a system. The ARM RISC processors perform packet-processing functions, such as branch prediction, Taylor says, and these functions don’t fit well with DSP architectures that must process tight algorithm loops and deep vector processing.
Freescale Semiconductor’s new QorIQ Qonverge architecture provides an integration of communications-processing, DSP, and wireless-acceleration technologies in various SOC configurations that can work in 3 and 4G femtocell, picocell, metrocell, and macrocell base stations. Qonverge combines the Power Optimization with Enhanced RISC Architecture CPU core of the original QorIQ communication processor, which Freescale (formerly, Motorola) developed through collaboration with IBM, with a StarCore DSP and MAPLE (multiaccelerator-platform-engine) multimode baseband accelerator. According to Preet Virk, leader of the global-networking-segment marketing division at Freescale, the company has been shipping multicore silicon for the QorIQ since 2009.
|Read the “Benchmarking DSPs:|
Do FPGAs cost more?” post in the
IC Design Corner blog.
Freescale also announced the PSC9132, a configuration of the QorIQ Qonverge for picocell and enterprise-femtocell applications. The PSC9132 incorporates two e500 Power Architecture cores and two StarCore SC3850 DSPs, and it extends performance to the full LTE maximum download and uploading speeds of 150 and 75 Mbps, respectively. You can also use the PSC9132 for HSPA+ or in a WiMax 802.16e application to deliver as much as 50- and 13-Mbps downloading and uploading speeds, respectively, for as many as 64 users. The device also includes a CPRI (Common Public Radio Interface) and a MIMO accelerator.
The company is planning to make the PSC9130/31 and PSC9132 available in the second half of this year. Freescale will manufacture the picocell and femtocell products in a 45-nm process. To extend the family for metrocell and macrocell applications, Freescale plans to use a 28-nm process and to have chips available in the early part of 2012. The company did not disclose details of the architecture or the number of embedded cores that future 28-nm designs will integrate. The targeted specifications for the metrocell and macrocell devices support hundreds of users in a single sector of an LTE-Advanced base station or multiple sectors of LTE with 20-MHz channels and as many as eight receiving and transmitting antennas per sector.
TI adds accelerators
At the 2011 MWC, Texas Instruments announced that its new TCI6618 wireless-base-station SOC doubles the performance of the TCI6616, which the company released just six months earlier. According to Kathy Brown, TI’s wireless-base-station-product manager, the TCI6618 targets use in the high-end-base-station market for macrocell or compact-macrocell applications (Figure 4). The TCI6618 integrates four 1.2-GHz C66x DSP cores that support both fixed- and floating-point arithmetic operations, which is unique to TI’s base-station devices, according to Brown. Floating-point operation is especially useful for increasing precision of matrix inversions, she says, resulting in higher spectral efficiency. The TCI6618 is available for sampling, and the company plans volume production for this year. TI will also later announce a device targeting small femtocells and picocells.
According to Ramesh Kumar, TI’s worldwide business manager for multicore and media-infrastructure DSPs, customers are increasingly adopting the C66x processor’s double-precision floating-point capability for higher dynamic range in a range of DSP applications. You can switch from fixed to floating point on a cycle-by-cycle basis, depending on your algorithm’s needs. Kumar points out that this flexibility is not a “natural” function of FPGAs (Reference 5).
The TCI6618 has 15 coprocessing accelerators to complement the four DSP cores, performing 95% of the LTE Layer 1 processing. Three TCPD3s (turbo decoder coprocessors) perform turbo decoding, and a TCP3e performs encoding. The product also has four VCP2s (Viterbi-decoder coprocessors) and three FFTCs (fast-Fourier-transform coprocessors). The TCI6618’s BRC (bit-rate coprocessor) performs uplink and downlink bit processing for WCDMA/HSPA+, TDSCDMA (time-division-synchronous-code division/ multiple access), LTE, and WiMax. It also supports GSM (global-system-for-mobile) communications and CDMA. For LTE, the device supports a data rate as high as 914 Mbps.
A network coprocessor enables communications between the TCI6618 and the network, with a packet accelerator and a security accelerator for autonomous packet-to-packet processing. The network coprocessor handles acceleration for Layer 2 and transport processing. According to TI, the network coprocessor, together with the DSP cores and other acceleration for layers 1 and 2 processing, eliminates the need for a RISC processor.
The TCI6618’s multicore navigator decreases programming effort by allowing you to create an abstraction of tasks with priorities that the packet-based manager in the multicore navigator then dispatches to the DSP cores as they become available. This approach is similar in concept to the task dispatcher that runs on one of the ARM cores in Mindspeed’s SOC.
TI’s KeyStone architecture, a common platform for scaling C66x designs, includes the TeraNet switch fabric, which provides 2 Tbits of bandwidth for data transfer within the SOC (Figure 5). The MSMC (multicore-shared-memory controller) enables the cores to directly access shared memory without using any of TeraNet’s bandwidth. Peripherals in the TCI6618 include a six-lane AIF2 SERDES (serializer/deserializer)-based antenna interface that operates as fast as 6.144 Gbps and that complies with the OBSAI (Open Base Station Architecture Initiative) RP3 (Reference Point 3) and CPRI Standards. Four Lanes of SRIO (Serial RapidIO) 2.1 and two lanes of PCIe (Peripheral Component Interconnect Express) Generation 2 I/Os support operation as fast as 5 Gbps. A 64-bit DDR3 interface allows you to connect to external memory at speeds as high as 1333 MHz.
The KeyStone architecture indicates the possibility of using an ARM core as the CPU, along with the C66x DSP core. Texas Instruments has designed the switch-fabric infrastructure and shared-memory architecture to accommodate ARM cores and DSP cores in future devices but will not announce the first device with ARM cores until mid-2011.
FPGAs embed ARM cores
According to Mark Quartermain, senior manager for the communications business unit at Xilinx, the traditional role of FPGAs in baseband processing is to provide specialized coprocessing accelerators for functions that the DSP does not perform well, such as the turbo decoder, and in connectivity, such as CPRI. He points out that this support is insufficient for 4G, however, for which a single device must provide all the processing to meet data-rate and latency requirements, and instead proposes a linear processing flow with one dedicated FPGA per sector. By replacing the older Virtex-6 FPGAs with the new Kintex-7, you can achieve a 50% reduction in power and cost, says Quartermain.
Xilinx offers a TDP (targeted design platform) for a complete LTE-baseband channel card. You can use the uplink and downlink reference designs, which Xilinx built from a library of optimized prebuilt and prevalidated LogicCore components. The LTE library components include a 3GP P channel decoder and estimator, an LTE FFT, an LTE MIMO, and turbo encoder/decoders. For the radio-interface layer, Xilinx offers the multimode-radio TDP, which you can apply to LTE, TDSCDMA, WiMax, WCDMA, CDMA2000, or GSM applications. LogicCore functions include a crest-factor-reduction block, a digital predistortion filter, digital upconversion and downconversion converters, and CPRI- and SRIO-interface blocks.
With the introduction of the Zynq 7000 family of EPP (extensible-processing-platform) devices, the company allows you to combine the baseband processing and the radio interface into one chip for an enterprise femtocell (Reference 6 and Figure 6). You use the FPGA fabric to implement the uploading- and downloading-channel functions and Layer 1 PHY, and you use the Zynq 7000’s embedded dual 800-MHz ARM Cortex-A9 MPCore to execute Layer 2 and higher-layer transport functions. Xilinx based the current LTE TDP reference design on the 32-bit Microblaze RISC-Harvard-architecture soft-processor core. The Zynq 7000 allows you to offload some of the functions of the higher-layer processor to the FPGA fabric, thus accelerating the MAC hardware.
You can reach Technical Editor Mike Demler at 1-408-384-8336 and email@example.com.
|For More Information|
|Tensilica||Texas Instruments||Third Generation Partnership Project||T-Mobile|
Datasheets.com Parts Search
185 million searchable parts
(please enter a part number or hit search to begin)