1、基于FPGA系统的数字信号处理适用性评估FPGACPLDVHDLDSP数字信号处理等类型的论文外文翻译一枚一、英文原文An Assessment of the Suitability of FPGA-Based Systems for use in Digital Signal Processing Russell J. Petersen and Brad L. HutchingsBrigham Young University, Dept. of Electrical and Computer Engineering, 459 CB,Provo UT 84602, USAAbstract.
2、FPGAs have been proposed as high-performance alternatives to DSP processors. This paper quantitatively compares FPGA performance against DSP processors and ASICs using actual applications and existing CAD tools and devices. Performance measures were based on actual multiplier performance with FPGAs,
3、 DSP processors and ASICs. This study demonstrates that FPGAs can provide an order of magnitude better performance than DSP processors and can in many cases approach or exceed ASIC levels of performance.1 IntroductionTo meet the intensive computation and I/O demands imposed by DSP systems many custo
4、m digital hardware systems utilizing ASICs have been designed and built. Custom hardware solutions have been necessary due to the low performance of other approaches such as microprocessor-based systems, but have the disadvantage of inflexibility and a high cost of development. The DSP processor att
5、empts to overcome the inflexibility and development costs of custom hardware. The DSP processor provides flexibility through software instruction decoding and execution while providing high performance arithmetic components such as fast array multipliers and multiple memory banks to increase data th
6、roughput. The FPGA has also recently generated interest for use in implementing digital signal processing systems due to its ability to implement custom hardware solutions while still maintaining flexibility through device reprogramming 2. Using the FPGA it is hoped that a significant performance im
7、provement can be obtained over the DSP processor without sacrificing system flexibility. This paper is an attempt to quantify the ability of the FPGA to provide an acceptable performance improvement over the DSP processor in the area of digital signal processing.2 Multiplication and digital signal p
8、rocessingA core operation in digital signal processing algorithms is multiplication. Often, the computational performance of a DSP system is limited by its multiplication performance, hence the multiplication rate of the system must be maximized. Custom hardware systems based on ASICs and DSP proces
9、sors maximize multiplication performance by using fast parallel-array multipliers either singly or in parallel. FPGAs also have the ability to implement multipliers singly or in parallel according to the needs of the application. Thus, in order to understand the performance of the FPGA relative to t
10、he ASIC and the DSP processor a comparison of FPGA multiplication alternatives and their performance relative to custom multiplier solutions is needed. This section presents the basic alternatives for multiplier implementations and their performance when implemented on FPGAs.2.1 Multiplier architect
11、ure alternativesWhen implementing multipliers in hardware two basic alternatives are available. The multiplier can be implemented as a fully parallel-array multiplier or as a fully bit-serial multiplier as shown in Figure 1. The advantage of the fully parallel approach is that all of the product bit
12、s are produced at once which generally results in a faster multiplication rate. The multiplication rate for a parallel multiplier is just the delay through the combinational logic. However, parallel multipliers also require a large amount of area to implement. Bit-serial multipliers on the other han
13、d generally require only th the area of an equivalent parallel multiplier but take 2N bit times to compute the entire product (N is the number of bits of multiplier precision). This often leads one to believe that the bit-serial approach is thus 2N times slower than an equivalent parallel multiplier
14、 but this is not true. The bit-times (clock cycles for synchronous bit-serial multipliers) are very short in duration due to the reduced size and hence propagation paths of the multiplier. This results in a bit-serial multiplier achieving about the multiplication rate of an equivalent parallel multi
15、plier on average, even exceeding the performance of the parallel multiplier in some cases.Fig. 1. Block diagrams of basic multiplier alternatives2.2 FPGA multiplication resultsTable 1 lists the performance of several multipliers implemented on three different FPGAs. The FPGAs used were a Xilinx 4010
16、, an Altera Flex8000 81188, and a National Semiconductor CLAy31. The first two FPGAs can be characterized as medium-grained architectures and are approximately equivalent in logic-density while the last FPGA is a fine-grained architecture utilizing smaller but more numerous cells. The multiplication
17、 rate of each multiplier is listed in MHz as well as the percentage of the FPGA required to implement the multiplier. The bit-serial multipliers have listed both their clock rate (bit-rate) and their effective multiplication rate (clock rate/2N).2.3 Multiplier table contentsThe majority of the multi
18、pliers in this study used common architectures such as the Baugh-Wooley twos complement parallel-array multiplier 5 and pipelined versions of the bit-serial multiplier 6 shown in Figure 1. In addition, several custom parallel multipliers were built that take advantage of the special features availab
19、le on the Altera and Xilinx FPGAs. These are intended to represent near the absolute maximum possible multiplier performance that can be achieved with these current FPGAs. These specific customizations will be discussed below.Table 1. FPGA Multiplier Performance ResultsType of Multiplier# CLB/LCs% o
20、f FPGAMult. SpeedAltera 81188 Parallel Multipliers8-bit unsigned fast-adder8-bit signed fast-adder8-bit unsigned synthesis8-bit signed synthesis8-bit signed complex synthesis16-bit unsigned fast-adder16-bit unsigned synthesis16-bit signed synthesis133150129135584645519535131412135763515314.8 MHz12.8
21、MHz7MHz6.84MHz5.86MHz3.34MHz3.66 MHz3.4 MHzAltera 81188 Bit-Serial Multipliers8-bit unsigned29384.03/5.25 MHz8-bit signed91969/4.6 MHz16-bit unsigned61768.49/2.14 MHz16-bit signed1861864/2 MHzNational Semiconductor CLAy Parallel Multipliers8-bit unsigned329117.9 MHz8-bit signed338117.2 MHz16-bit uns
22、igned1425453.6 MHz16-bit signed1446463.53 MHzNational Semiconductor CLAy Bit-Serial Multipliers8-bit unsigned481.532.2/2.01 MHz8-bit signed481.532.2/2.01 MHz16-bit unsigned96329.2/.91 MHz16-bit signed96329.2/.91 MHzXilinx 4010 Parallel Multipliers8-bit unsigned64168.54 MHz16-bit signed259654.35 MHz8
23、-bit unsigned synthesis61159MHz8-bit signed synthesis61158MHz8-bit signed complex synthesis266667.3 MHz16-bit unsigned synthesis242603.8 MHz16-bit signed synthesis250633.7 MHzXilinx 4010 Bit-Serial Multipliers8-bit unsigned17473.1/4.6 MHz8-bit signed32852/3.3 MHz16-bit unsigned33862/1.9 MHz16-bit si
24、gned641650/1.6 MHzXilinx 4010 Parallel Constant Multipliers8-bit unsigned ROM225.521.7 MHz16-bit unsigned ROM842111.36 MHz8-bit unsigned RAM399.7517.86 MHz16-bit unsigned RAM11729.310.4 MHzSeveral of the multipliers listed in the tables have the label synthesis attached. This label indicates that th
25、e multipliers were created by synthesizing simple high-level hardware language (VHDL) design statements (z = a * b). These multipliers were included so as to allow a comparison between hand-placed multipliers using schematics and high-level language designed multipliers. The table results show that
26、the synthesized multipliers performed very favorably as shown in the Xilinx 4010 parallel multiplier table section. The 8 and 16-bit unsigned and signed array multipliers listed first were designed with schematics and were hand placed onto the FPGA. However, their performance was nearly identical in
27、 terms of both speed and area required to the multipliers synthesized from VHDL.2.3.1 Fast carry-logic based parallel multipliersThe Altera 81188 based multipliers labeled fast adder refer to the use of the fast carry-logic available on the Altera FPGAs to make fast ripple-carry adders. These adders
28、 are then used to build fast multipliers by using the adders to add the successive partial product rows. This technique results in multipliers that are approximately twice as fast on the FPGAs as those not implemented with special logic. The disadvantage of this approach is the resulting difficulty
29、that arises with the placement of the multiplier onto the FPGA. The FPGA router is only able to place three of the unsigned 8-bit multipliers on a 81188 FPGA even though they only utilize 13% of the total FPGA resources each.2.3.2 Constant multipliers and distributed arithmeticThe use of constants (
30、constant multiplicand ) in multiplication can significantly reduce the size of a parallel multiplier array. This is because the presence of zeros in the constant can result in the elimination of many partial product terms in the multiplication array. This technique is especially useful in DSP system
31、s since many of the multiplications to be performed can be specified in terms of constant multipliers. For example, with an FIR filter each tap of the filter can be implemented using a multiplier with a constant tap coefficient.The use of constants in multiplication also makes available another tech
32、nique that can result in a significant multiplier performance increase. This technique is called the distributed arithmetic approach to multiplication and can be implemented by the Xilinx FPGAs due to their ability to provide small blocks of distributed RAM to be used as partial-product lookup tables.The distribute
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1