Pipelining (DSP implementation)
Pipelining is an important technique used in several applications such as DSP systems, microprocessors, and etc. It originates from the idea of a water pipe with continuous water sent in without waiting the water in the pipe to be out. Accordingly, it results in speed enhancement for the critical path in most DSP systems. For example, it can either increase the clock speed or reduces the power consumption at the same speed in a DSP system.
Conceptually, pipelining puts different function units working in parallel. In computer architectures, it usually represents an implementation technique allowing multiple instructions are overlapped in execution to be parallel. For example in Fig(), a function (F) includes three sub-function units (F0, F1 and F2). Assume that there are three tasks (T0, T1 and T2) being operated by these three function units and they can be operated independently. The time for each function unit to complete a task is the same and will occupy a slot in the schedule. In such condition, if we put these three units and tasks in a sequential order, the required time to complete F is five slots. However, if we pipeline T1 to T3 in parallel, the aggregate time is reduced to three slots, which is smaller than in a sequential order. Therefore, it is possible for an adequate pipelined design to achieve significant enhancement on speed.
Costs and Disadvantages
Pipelining does not decrease the processing time for a single task. Instead, we still need to cost the same runtime efforts on the same task when compared to a full sequential design. Please note that the advantage of pipelining is that it increases the throughput of the system when processing a stream of task. Nevertheless, applying too many pipelined functions usually leads to the increase of latency. The time required for a task to propagate through a full pipe prolongs. Further a pipelined system typically requires more resources and costs (buffers, circuits, processing units, memory and etc.) to perform parallel working since the reuse of resources across different stages is restricted.
Comparison with Parallel Approaches
The other technique to enhance the efficiency of designs is called parallel techniques, which usually confuses with pipelining techniques. The core difference is that parallel techniques usually duplicate function units and distribute all the input tasks into different function units. Therefore, it can complete more tasks per unit time but suffers expensive resource costs. For the previous example in Fig(), the parallel technique duplicate each function units into another two. Accordingly, all the tasks can be operated under duplicated function units with the same function simultaneously. The time to complete these three tasks is reduced to three slots.
Pipelining in FIR Filters
Consider a 3-tap FIR filter: y(n)=ax(n)+bx(n-1)+cx(n-2) as shown in Fig(). Assume the calculation time for multiplication units is Tm and Ta for add units.
The critical path, representing the minimum time required for processing a new sample, is limited by 1 multiplication and 2 add function units. Therefore, the sample period is given by
However, such structure may not be suitable for the design with the requirement of high speed. To reduce the sampling period, we can introduce extra pipelining registers along the critical data path. Then the structure is partitioned into two stages and the data produced in the first stage will be stored in the introduced registers, delaying one clock to the second stage. The data in first three clocks is recorded in Table().
Under such pipelined structure, the sample period is reduced to
.
Pipelining in 1st-Order IIR Filters
Consider the 1st-order IIR filter transfer function H(z)=1/(1-a*z^(-1)) The output y(n) can be computed in terms of the input u(n) and the previous output. y(n) = a*y(n-1) + u(n) In a straightforward structure to design such function, the sample rate of this recursive filter is restricted by the calculation time of one multiply-add operation.
Other Pipelined DSP Systems
-Pipelined Walsh-Fourier transform -Pipelined unitary transforms -Pipelined DFT processor -Pipelined FFT processor -etc