Design Idea
Routine yields fast bit reversing for DSP algorithms
Edited by Bill Travis and Anne Watson Swager
Mohammed Aziz, University of Leeds, UK -- EDN, 9/13/2001
If you need efficient real-time performance in DSP applications, you need an efficient bit-reversing routine. For several FFT programs, data permutation can take 10 to 50% of the computation time, depending on the input-data dimensions and length. The idea behind bit reversing is to shuffle the data by flipping the address bits around the middle of the address length so that if the data length is N=16, four bits from 0000 to 1111 represent the address. You achieve data shuffling by swapping the address bits around the middle so that B3B2B1B0 becomes B0B1B2B3, which represents the new data location. Note that this operation is not byte flipping unless the input-data length happens to be N=255 elements. Look-up-table techniques are inefficient, because the input data may be very long, and memory space is limited. For real-time DSP applications, you bit-reverse a data array by swapping each position in the data array with the position of its corresponding bit-reversed address by using DSP architectural features. You implement the method shown in Listing 1 using the SHARC DSP chip.
The routine uses the data-memory segment and program-memory segments for input and output. This feature of the DSP-memory architecture allows reading from the input array "in-arry" and writing to the bit-reversed output array "out-arry" in parallel to double the speed. It then uses circular buffers and indirect addressing to go through the N elements, lending itself to simple and straightforward data moving. Existing techniques, such as Evans and Gold-Rader algorithms, do data checking and bit reversing before moving the data, thereby incurring overhead. Table 1 and Figure 1 give the runtime results for the different algorithms (Listing 1, Evans, and Gold-Rader) simulated on a 40-MHz SHARC DSP (ADSP-21060) from Analog Devices (www.analog.com). The runs represent three lengths of 32, 64, and 128 elements and involve 10,000 iterations each in simulating the 2-D case.
Is this the best Design Idea in this issue? Vote at www.ednmag.com.
















