Figure 6
A more efficient way to implement a distributed-arithmetic FIR filter is to replace the shift registers with RAM for all but the first word.
back I>