基于FPGA系统的数字信号处理适用性评估FPGACPLDVHDLDSP数字信号处理等类型的论文外文翻译一枚.docx

资源描述

基于FPGA系统的数字信号处理适用性评估FPGACPLDVHDLDSP数字信号处理等类型的论文外文翻译一枚.docx

《基于FPGA系统的数字信号处理适用性评估FPGACPLDVHDLDSP数字信号处理等类型的论文外文翻译一枚.docx》由会员分享，可在线阅读，更多相关《基于FPGA系统的数字信号处理适用性评估FPGACPLDVHDLDSP数字信号处理等类型的论文外文翻译一枚.docx（25页珍藏版）》请在冰豆网上搜索。

基于FPGA系统的数字信号处理适用性评估FPGACPLDVHDLDSP数字信号处理等类型的论文外文翻译一枚.docx

基于FPGA系统的数字信号处理适用性评估FPGACPLDVHDLDSP数字信号处理等类型的论文外文翻译一枚

一、英文原文

AnAssessmentoftheSuitabilityofFPGA-BasedSystemsforuseinDigitalSignalProcessing★★★

RussellJ.PetersenandBradL.Hutchings

BrighamYoungUniversity,Dept.ofElectricalandComputerEngineering,459CB,

ProvoUT84602,USA

Abstract.FPGAshavebeenproposedashigh-performancealternativestoDSPprocessors.ThispaperquantitativelycomparesFPGAperformanceagainstDSPprocessorsandASICsusingactualapplicationsandexistingCADtoolsanddevices.PerformancemeasureswerebasedonactualmultiplierperformancewithFPGAs,DSPprocessorsandASICs.ThisstudydemonstratesthatFPGAscanprovideanorderofmagnitudebetterperformancethanDSPprocessorsandcaninmanycasesapproachorexceedASIClevelsofperformance.

1Introduction

TomeettheintensivecomputationandI/OdemandsimposedbyDSPsystemsmanycustomdigitalhardwaresystemsutilizingASICshavebeendesignedandbuilt.Customhardwaresolutionshavebeennecessaryduetothelowperformanceofotherapproachessuchasmicroprocessor-basedsystems,buthavethedisadvantageofinflexibilityandahighcostofdevelopment.TheDSPprocessorattemptstoovercometheinflexibilityanddevelopmentcostsofcustomhardware.TheDSPprocessorprovidesflexibilitythroughsoftwareinstructiondecodingandexecutionwhileprovidinghighperformancearithmeticcomponentssuchasfastarraymultipliersandmultiplememorybankstoincreasedatathroughput.TheFPGAhasalsorecentlygeneratedinterestforuseinimplementingdigitalsignalprocessingsystemsduetoitsabilitytoimplementcustomhardwaresolutionswhilestillmaintainingflexibilitythroughdevicereprogramming[2].UsingtheFPGAitishopedthatasignificantperformanceimprovementcanbeobtainedovertheDSPprocessorwithoutsacrificingsystemflexibility.ThispaperisanattempttoquantifytheabilityoftheFPGAtoprovideanacceptableperformanceimprovementovertheDSPprocessorintheareaofdigitalsignalprocessing.

2Multiplicationanddigitalsignalprocessing

Acoreoperationindigitalsignalprocessingalgorithmsismultiplication.Often,thecomputationalperformanceofaDSPsystemislimitedbyitsmultiplicationperformance,hencethemultiplicationrateofthesystemmustbemaximized.CustomhardwaresystemsbasedonASICsandDSPprocessorsmaximizemultiplicationperformancebyusingfastparallel-arraymultiplierseithersinglyorinparallel.FPGAsalsohavetheabilitytoimplementmultiplierssinglyorinparallelaccordingtotheneedsoftheapplication.Thus,inordertounderstandtheperformanceoftheFPGArelativetotheASICandtheDSPprocessoracomparisonofFPGAmultiplicationalternativesandtheirperformancerelativetocustommultipliersolutionsisneeded.ThissectionpresentsthebasicalternativesformultiplierimplementationsandtheirperformancewhenimplementedonFPGAs.

2.1Multiplierarchitecturealternatives

Whenimplementingmultipliersinhardwaretwobasicalternativesareavailable.Themultipliercanbeimplementedasafullyparallel-arraymultiplierorasafullybit-serialmultiplierasshowninFigure1.Theadvantageofthefullyparallelapproachisthatalloftheproductbitsareproducedatoncewhichgenerallyresultsinafastermultiplicationrate.Themultiplicationrateforaparallelmultiplierisjustthedelaythroughthecombinationallogic.However,parallelmultipliersalsorequirealargeamountofareatoimplement.Bit-serialmultipliersontheotherhandgenerallyrequireonly

ththeareaofanequivalentparallelmultiplierbuttake2Nbittimestocomputetheentireproduct（Nisthenumberofbitsofmultiplierprecision）.Thisoftenleadsonetobelievethatthebit-serialapproachisthus2Ntimesslowerthananequivalentparallelmultiplierbutthisisnottrue.Thebit-times（clockcyclesforsynchronousbit-serialmultipliers）areveryshortindurationduetothereducedsizeandhencepropagationpathsofthemultiplier.Thisresultsinabit-serialmultiplierachievingabout

themultiplicationrateofanequivalentparallelmultiplieronaverage,evenexceedingtheperformanceoftheparallelmultiplierinsomecases.

Fig.1.Blockdiagramsofbasicmultiplieralternatives

2.2FPGAmultiplicationresults

Table1liststheperformanceofseveralmultipliersimplementedonthreedifferentFPGAs.TheFPGAsusedwereaXilinx4010,anAlteraFlex800081188,andaNationalSemiconductorCLAy31.ThefirsttwoFPGAscanbecharacterizedasmedium-grainedarchitecturesandareapproximatelyequivalentinlogic-densitywhilethelastFPGAisafine-grainedarchitectureutilizingsmallerbutmorenumerouscells.ThemultiplicationrateofeachmultiplierislistedinMHzaswellasthepercentageoftheFPGArequiredtoimplementthemultiplier.Thebit-serialmultipliershavelistedboththeirclockrate（bit-rate）andtheireffectivemultiplicationrate（clockrate/2N）.

2.3Multipliertablecontents

ThemajorityofthemultipliersinthisstudyusedcommonarchitecturessuchastheBaugh-Wooleytwo'scomplementparallel-arraymultiplier[5]andpipelinedversionsofthebit-serialmultiplier[6]showninFigure1.Inaddition,severalcustomparallelmultiplierswerebuiltthattakeadvantageofthespecialfeaturesavailableontheAlteraandXilinxFPGAs.TheseareintendedtorepresentneartheabsolutemaximumpossiblemultiplierperformancethatcanbeachievedwiththesecurrentFPGAs.Thesespecificcustomizationswillbediscussedbelow.

Table1.FPGAMultiplierPerformanceResults

TypeofMultiplier

#CLB/LC's

%ofFPGA

Mult.Speed

Altera81188ParallelMultipliers

8-bitunsignedfast-adder

8-bitsignedfast-adder

8-bitunsignedsynthesis

8-bitsignedsynthesis

8-bitsignedcomplexsynthesis

16-bitunsignedfast-adder

16-bitunsignedsynthesis

16-bitsignedsynthesis

133

150

129

135

584

645

519

535

14.8MHz

12.8MHz

7MHz

6.84MHz

5.86MHz

3.34MHz

3.66MHz

3.4MHz

Altera81188Bit-SerialMultipliers

8-bitunsigned

84.03/5.25MHz

8-bitsigned

69/4.6MHz

16-bitunsigned

68.49/2.14MHz

16-bitsigned

186

64/2MHz

NationalSemiconductorCLAyParallelMultipliers

8-bitunsigned

329

7.9MHz

8-bitsigned

338

7.2MHz

16-bitunsigned

1425

3.6MHz

16-bitsigned

1446

3.53MHz

NationalSemiconductorCLAyBit-SerialMultipliers

8-bitunsigned

1.5

32.2/2.01MHz

8-bitsigned

1.5

32.2/2.01MHz

16-bitunsigned

29.2/.91MHz

16-bitsigned

29.2/.91MHz

Xilinx4010ParallelMultipliers

8-bitunsigned

8.54MHz

16-bitsigned

259

4.35MHz

8-bitunsignedsynthesis

9MHz

8-bitsignedsynthesis

8MHz

8-bitsignedcomplexsynthesis

266

7.3MHz

16-bitunsignedsynthesis

242

3.8MHz

16-bitsignedsynthesis

250

3.7MHz

Xilinx4010Bit-SerialMultipliers

8-bitunsigned

73.1/4.6MHz

8-bitsigned

52/3.3MHz

16-bitunsigned

62/1.9MHz

16-bitsigned

50/1.6MHz

Xilinx4010ParallelConstantMultipliers

8-bitunsignedROM

5.5

21.7MHz

16-bitunsignedROM

11.36MHz

8-bitunsignedRAM

9.75

17.86MHz

16-bitunsignedRAM

117

29.3

10.4MHz

Severalofthemultiplierslistedinthetableshavethelabelsynthesisattached.Thislabelindicatesthatthemultiplierswerecreatedbysynthesizingsimplehigh-levelhardwarelanguage（VHDL）designstatements（z<=a*b）.Thesemultiplierswereincludedsoastoallowacomparisonbetweenhand-placedmultipliersusingschematicsandhigh-levellanguagedesignedmultipliers.ThetableresultsshowthatthesynthesizedmultipliersperformedveryfavorablyasshownintheXilinx4010parallelmultipliertablesection.The8and16-bitunsignedandsignedarraymultiplierslistedfirstweredesignedwithschematicsandwerehandplacedontotheFPGA.However,theirperformancewasnearlyidenticalintermsofbothspeedandarearequiredtothemultiplierssynthesizedfromVHDL.

2.3.1Fastcarry-logicbasedparallelmultipliers

TheAltera81188basedmultiplierslabeledfastadderrefertotheuseofthefastcarry-logicavailableontheAlteraFPGAstomakefastripple-carryadders.Theseaddersarethenusedtobuildfastmultipliersbyusingtheadderstoaddthesuccessivepartialproductrows.ThistechniqueresultsinmultipliersthatareapproximatelytwiceasfastontheFPGAsasthosenotimplementedwithspeciallogic.ThedisadvantageofthisapproachistheresultingdifficultythatariseswiththeplacementofthemultiplierontotheFPGA.TheFPGArouterisonlyabletoplacethreeoftheunsigned8-bitmultipliersona81188FPGAeventhoughtheyonlyutilize13%ofthetotalFPGAresourceseach.

2.3.2Constantmultipliersanddistributedarithmetic

Theuseofconstants（constantmultiplicand）inmultiplicationcansignificantlyreducethesizeofaparallelmultiplierarray.Thisisbecausethepresenceofzerosintheconstantcanresultintheeliminationofmanypartialproducttermsinthemultiplicationarray.ThistechniqueisespeciallyusefulinDSPsystemssincemanyofthemultiplicationstobeperformedcanbespecifiedintermsofconstantmultipliers.Forexample,withanFIRfiltereachtapofthefiltercanbeimplementedusingamultiplierwithaconstanttapcoefficient.

Theuseofconstantsinmultiplicationalsomakesavailableanothertechniquethatcanresultinasignificantmultiplierperformanceincrease.ThistechniqueiscalledthedistributedarithmeticapproachtomultiplicationandcanbeimplementedbytheXilinxFPGAsduetotheirabilitytoprovidesmallblocksofdistributedRAMtobeusedaspartial-productlookuptables.

Thedistribute

展开阅读全文