基于FPGA系统的数字信号处理适用性评估FPGACPLDVHDLDSP数字信号处理等类型的论文外文翻译一枚.docx
《基于FPGA系统的数字信号处理适用性评估FPGACPLDVHDLDSP数字信号处理等类型的论文外文翻译一枚.docx》由会员分享,可在线阅读,更多相关《基于FPGA系统的数字信号处理适用性评估FPGACPLDVHDLDSP数字信号处理等类型的论文外文翻译一枚.docx(25页珍藏版)》请在冰豆网上搜索。
基于FPGA系统的数字信号处理适用性评估FPGACPLDVHDLDSP数字信号处理等类型的论文外文翻译一枚
一、英文原文
AnAssessmentoftheSuitabilityofFPGA-BasedSystemsforuseinDigitalSignalProcessing★★★
RussellJ.PetersenandBradL.Hutchings
BrighamYoungUniversity,Dept.ofElectricalandComputerEngineering,459CB,
ProvoUT84602,USA
Abstract.FPGAshavebeenproposedashigh-performancealternativestoDSPprocessors.ThispaperquantitativelycomparesFPGAperformanceagainstDSPprocessorsandASICsusingactualapplicationsandexistingCADtoolsanddevices.PerformancemeasureswerebasedonactualmultiplierperformancewithFPGAs,DSPprocessorsandASICs.ThisstudydemonstratesthatFPGAscanprovideanorderofmagnitudebetterperformancethanDSPprocessorsandcaninmanycasesapproachorexceedASIClevelsofperformance.
1Introduction
TomeettheintensivecomputationandI/OdemandsimposedbyDSPsystemsmanycustomdigitalhardwaresystemsutilizingASICshavebeendesignedandbuilt.Customhardwaresolutionshavebeennecessaryduetothelowperformanceofotherapproachessuchasmicroprocessor-basedsystems,buthavethedisadvantageofinflexibilityandahighcostofdevelopment.TheDSPprocessorattemptstoovercometheinflexibilityanddevelopmentcostsofcustomhardware.TheDSPprocessorprovidesflexibilitythroughsoftwareinstructiondecodingandexecutionwhileprovidinghighperformancearithmeticcomponentssuchasfastarraymultipliersandmultiplememorybankstoincreasedatathroughput.TheFPGAhasalsorecentlygeneratedinterestforuseinimplementingdigitalsignalprocessingsystemsduetoitsabilitytoimplementcustomhardwaresolutionswhilestillmaintainingflexibilitythroughdevicereprogramming[2].UsingtheFPGAitishopedthatasignificantperformanceimprovementcanbeobtainedovertheDSPprocessorwithoutsacrificingsystemflexibility.ThispaperisanattempttoquantifytheabilityoftheFPGAtoprovideanacceptableperformanceimprovementovertheDSPprocessorintheareaofdigitalsignalprocessing.
2Multiplicationanddigitalsignalprocessing
Acoreoperationindigitalsignalprocessingalgorithmsismultiplication.Often,thecomputationalperformanceofaDSPsystemislimitedbyitsmultiplicationperformance,hencethemultiplicationrateofthesystemmustbemaximized.CustomhardwaresystemsbasedonASICsandDSPprocessorsmaximizemultiplicationperformancebyusingfastparallel-arraymultiplierseithersinglyorinparallel.FPGAsalsohavetheabilitytoimplementmultiplierssinglyorinparallelaccordingtotheneedsoftheapplication.Thus,inordertounderstandtheperformanceoftheFPGArelativetotheASICandtheDSPprocessoracomparisonofFPGAmultiplicationalternativesandtheirperformancerelativetocustommultipliersolutionsisneeded.ThissectionpresentsthebasicalternativesformultiplierimplementationsandtheirperformancewhenimplementedonFPGAs.
2.1Multiplierarchitecturealternatives
Whenimplementingmultipliersinhardwaretwobasicalternativesareavailable.Themultipliercanbeimplementedasafullyparallel-arraymultiplierorasafullybit-serialmultiplierasshowninFigure1.Theadvantageofthefullyparallelapproachisthatalloftheproductbitsareproducedatoncewhichgenerallyresultsinafastermultiplicationrate.Themultiplicationrateforaparallelmultiplierisjustthedelaythroughthecombinationallogic.However,parallelmultipliersalsorequirealargeamountofareatoimplement.Bit-serialmultipliersontheotherhandgenerallyrequireonly
ththeareaofanequivalentparallelmultiplierbuttake2Nbittimestocomputetheentireproduct(Nisthenumberofbitsofmultiplierprecision).Thisoftenleadsonetobelievethatthebit-serialapproachisthus2Ntimesslowerthananequivalentparallelmultiplierbutthisisnottrue.Thebit-times(clockcyclesforsynchronousbit-serialmultipliers)areveryshortindurationduetothereducedsizeandhencepropagationpathsofthemultiplier.Thisresultsinabit-serialmultiplierachievingabout
themultiplicationrateofanequivalentparallelmultiplieronaverage,evenexceedingtheperformanceoftheparallelmultiplierinsomecases.
Fig.1.Blockdiagramsofbasicmultiplieralternatives
2.2FPGAmultiplicationresults
Table1liststheperformanceofseveralmultipliersimplementedonthreedifferentFPGAs.TheFPGAsusedwereaXilinx4010,anAlteraFlex800081188,andaNationalSemiconductorCLAy31.ThefirsttwoFPGAscanbecharacterizedasmedium-grainedarchitecturesandareapproximatelyequivalentinlogic-densitywhilethelastFPGAisafine-grainedarchitectureutilizingsmallerbutmorenumerouscells.ThemultiplicationrateofeachmultiplierislistedinMHzaswellasthepercentageoftheFPGArequiredtoimplementthemultiplier.Thebit-serialmultipliershavelistedboththeirclockrate(bit-rate)andtheireffectivemultiplicationrate(clockrate/2N).
2.3Multipliertablecontents
ThemajorityofthemultipliersinthisstudyusedcommonarchitecturessuchastheBaugh-Wooleytwo'scomplementparallel-arraymultiplier[5]andpipelinedversionsofthebit-serialmultiplier[6]showninFigure1.Inaddition,severalcustomparallelmultiplierswerebuiltthattakeadvantageofthespecialfeaturesavailableontheAlteraandXilinxFPGAs.TheseareintendedtorepresentneartheabsolutemaximumpossiblemultiplierperformancethatcanbeachievedwiththesecurrentFPGAs.Thesespecificcustomizationswillbediscussedbelow.
Table1.FPGAMultiplierPerformanceResults
TypeofMultiplier
#CLB/LC's
%ofFPGA
Mult.Speed
Altera81188ParallelMultipliers
8-bitunsignedfast-adder
8-bitsignedfast-adder
8-bitunsignedsynthesis
8-bitsignedsynthesis
8-bitsignedcomplexsynthesis
16-bitunsignedfast-adder
16-bitunsignedsynthesis
16-bitsignedsynthesis
133
150
129
135
584
645
519
535
13
14
12
13
57
63
51
53
14.8MHz
12.8MHz
7MHz
6.84MHz
5.86MHz
3.34MHz
3.66MHz
3.4MHz
Altera81188Bit-SerialMultipliers
8-bitunsigned
29
3
84.03/5.25MHz
8-bitsigned
91
9
69/4.6MHz
16-bitunsigned
61
7
68.49/2.14MHz
16-bitsigned
186
18
64/2MHz
NationalSemiconductorCLAyParallelMultipliers
8-bitunsigned
329
11
7.9MHz
8-bitsigned
338
11
7.2MHz
16-bitunsigned
1425
45
3.6MHz
16-bitsigned
1446
46
3.53MHz
NationalSemiconductorCLAyBit-SerialMultipliers
8-bitunsigned
48
1.5
32.2/2.01MHz
8-bitsigned
48
1.5
32.2/2.01MHz
16-bitunsigned
96
3
29.2/.91MHz
16-bitsigned
96
3
29.2/.91MHz
Xilinx4010ParallelMultipliers
8-bitunsigned
64
16
8.54MHz
16-bitsigned
259
65
4.35MHz
8-bitunsignedsynthesis
61
15
9MHz
8-bitsignedsynthesis
61
15
8MHz
8-bitsignedcomplexsynthesis
266
66
7.3MHz
16-bitunsignedsynthesis
242
60
3.8MHz
16-bitsignedsynthesis
250
63
3.7MHz
Xilinx4010Bit-SerialMultipliers
8-bitunsigned
17
4
73.1/4.6MHz
8-bitsigned
32
8
52/3.3MHz
16-bitunsigned
33
8
62/1.9MHz
16-bitsigned
64
16
50/1.6MHz
Xilinx4010ParallelConstantMultipliers
8-bitunsignedROM
22
5.5
21.7MHz
16-bitunsignedROM
84
21
11.36MHz
8-bitunsignedRAM
39
9.75
17.86MHz
16-bitunsignedRAM
117
29.3
10.4MHz
Severalofthemultiplierslistedinthetableshavethelabelsynthesisattached.Thislabelindicatesthatthemultiplierswerecreatedbysynthesizingsimplehigh-levelhardwarelanguage(VHDL)designstatements(z<=a*b).Thesemultiplierswereincludedsoastoallowacomparisonbetweenhand-placedmultipliersusingschematicsandhigh-levellanguagedesignedmultipliers.ThetableresultsshowthatthesynthesizedmultipliersperformedveryfavorablyasshownintheXilinx4010parallelmultipliertablesection.The8and16-bitunsignedandsignedarraymultiplierslistedfirstweredesignedwithschematicsandwerehandplacedontotheFPGA.However,theirperformancewasnearlyidenticalintermsofbothspeedandarearequiredtothemultiplierssynthesizedfromVHDL.
2.3.1Fastcarry-logicbasedparallelmultipliers
TheAltera81188basedmultiplierslabeledfastadderrefertotheuseofthefastcarry-logicavailableontheAlteraFPGAstomakefastripple-carryadders.Theseaddersarethenusedtobuildfastmultipliersbyusingtheadderstoaddthesuccessivepartialproductrows.ThistechniqueresultsinmultipliersthatareapproximatelytwiceasfastontheFPGAsasthosenotimplementedwithspeciallogic.ThedisadvantageofthisapproachistheresultingdifficultythatariseswiththeplacementofthemultiplierontotheFPGA.TheFPGArouterisonlyabletoplacethreeoftheunsigned8-bitmultipliersona81188FPGAeventhoughtheyonlyutilize13%ofthetotalFPGAresourceseach.
2.3.2Constantmultipliersanddistributedarithmetic
Theuseofconstants(constantmultiplicand)inmultiplicationcansignificantlyreducethesizeofaparallelmultiplierarray.Thisisbecausethepresenceofzerosintheconstantcanresultintheeliminationofmanypartialproducttermsinthemultiplicationarray.ThistechniqueisespeciallyusefulinDSPsystemssincemanyofthemultiplicationstobeperformedcanbespecifiedintermsofconstantmultipliers.Forexample,withanFIRfiltereachtapofthefiltercanbeimplementedusingamultiplierwithaconstanttapcoefficient.
Theuseofconstantsinmultiplicationalsomakesavailableanothertechniquethatcanresultinasignificantmultiplierperformanceincrease.ThistechniqueiscalledthedistributedarithmeticapproachtomultiplicationandcanbeimplementedbytheXilinxFPGAsduetotheirabilitytoprovidesmallblocksofdistributedRAMtobeusedaspartial-productlookuptables.
Thedistribute