当前流行的商用芯片.docx
《当前流行的商用芯片.docx》由会员分享,可在线阅读,更多相关《当前流行的商用芯片.docx(10页珍藏版)》请在冰豆网上搜索。
![当前流行的商用芯片.docx](https://file1.bdocx.com/fileroot1/2022-10/29/270b62f0-4984-4743-b624-2f2810afceab/270b62f0-4984-4743-b624-2f2810afceab1.gif)
当前流行的商用芯片
1AMDAthlon
LiketheIntelPentiumchip,ofwhichtheAMDAthlonisaclonewithrespecttoIntel'sx86InstructionSetArchitecture,itisfrequentlyusedinclusters.Thereforewediscussthisprocessorherealthoughitisnotusedpresentlyinintegratedparallelsystems.\\TheAthlonprocessorhasmanyfeaturesthatarealsopresentinmodernRISCprocessors:
itsupportsout-of-orderexecution,hasmultiplefloating-pointunits,andcanissueupto9instructionssimultaneously.AblockdiagramoftheprocessorisshowninFigure7
Figure7:
BlockdiagramofAMDAthlonprocessor.
ItshowsthattheprocessorhasthreepairsofIntegerExecutionUnitsandAddressGenerationUnitsthatviaan18-entryIntegerSchedulertakescareoftheintegercomputationsandaddresscalculations.BoththeIntegerSchedulerandtheFloating-PointSchedulerarefedbythe72-entryInstructionControlUnitthatreceivesthedecodedinstructionsfromtheinstructiondecoders.AninterestingfeatureoftheAthlonisthepre-decodingofx86instructionsinfixed-lengthmacro-operationsthatcanbestoredinaPre-decodeCache.Thisenablesafasterandmoreconstantinstructionflowtotheinstructiondecoders.LikeinRISCprocessors,thereisaBranchPredictionTableassistinginbranchprediction.
Thefloating-pointunitsallowout-of-orderexecutionofinstructionsviatheFPUStackMap&Renameunit.Itreceivesthefloating-pointinstructionsfromtheInstructionControlUnitandreordersthemifnecessarybeforehandingthemovertotheFPUScheduler.TheFloating-PointRegisterFileis88elementsdeepwhichapproachesthenumberofregistersasisavailableonRISCprocessors.
Thefloating-pointpartoftheprocessorcontainsthreeunits:
aFloatingStoreunitthatstoresresultstotheLoad/StoreQueueUnitandFloatingAddandMultiplyunitsthatcanworkinsuperscalarmode,resultingintwofloating-pointresultsperclockcycle.BecauseofthecompatibilitywithIntel'sPentiumIIIprocessors,thefloating-pointunitsalsoareabletoexecuteIntelMMXinstructionsandAMD'sown3DNow!
instructions.However,thereisthegeneralproblemthatsuchinstructionsarenotaccessiblefromhigherlevellanguages,likeFortran90orC(++).Bothinstructionsetsaremeantformassiveprocessingofvisualisationdataandonlyallowfor32-bitprecisiontobeused.ThesystembuscomplementingtheAthlonprocessorisalsofasterthanwhatisstandardlyavailableforIntelPIIIprocessors:
200MHzinsteadof133MHz.AMDclaimsthebusspeedcanbescaledtoover400MHz.
Withthecurrentclockfrequencyof1-1.33GHzofthecurrentprocessorstheAthlonisaninterestingalternativeformanyoftheRISCprocessorsthatareavailableatthismoment.
2IntelPentium4
AlthoughPentiumprocessorsarenotappliedinintegratedparallelsystemsthesedays,theyplayamajorroleintheclustercommunityasmostcomputenodesinBeowulfclustersareofthistype.Thereforewebrieflydiscussalsothistypeofprocessor.
Unfortunately,Intelonlyprovidesscantinformationonthisnewprocessor,notevenenoughtoputtogetherareliableblockdiagramoftheprocessor.Still,thereanumberofdistinctivefeatureswithrespecttotheearlierPentiumgenerations.Therearetwomainwaystoincreasetheperformanceofaprocessor:
byraisingtheclockfrequencyandbyincreasingthenumberofinstructionspercycle(IPC).Thesetwoapproachesaregenerallyinconflict:
whenonewantstoincreasetheIPCthechipwillbecomemorecomplicated.Thiswillhaveanegativeimpactontheclockfrequencybecausemoreworkhastobedoneandorganisedwithinthesameclockcycle.VeryseldomlychipdesignerssucceedinraisingbothclockfrequencyandIPCsimultaneously.AlsointhePentium4thiscouldnotbedone.Intelhaschosenforahighclockspeed(initiallyabout40%morethanthatofthePentiumIIIwiththesamefabricationtechnology)whiletheIPCdecreasedby10--20%.Thisstillgivesanetperformancegainevenifotherchangeswouldhavebeenmadetotheprocessor.Tosustaintheveryhighclockratethatthepresentprocessorshave,currently1.7GHz,averydeepinstructionpipelineisrequired.Theinstructionpipelinehasnolessthan20stages,doublethenumberofstagesinthatofthePentiumIII.Althoughthisfavoursahighclockrate,thepenaltyforapipelinemiss(e.g.,abranchmis-predict)ismuchheavierandthereforeIntelhasimprovedthebranchpredictionbyaincreasingthesizeoftheBranchTargetBufferfrom0.5to4KB.Inaddition,thePentium4hasanexecutiontracecachewhichholdspartlydecodedinstructionsofformerexecutiontracesthatcanbedrawnupon,thusforegoingtheinstructiondecodephasethatmightproduceholesintheinstructionpipeline.
Theprimarycacheisquitesmallbytoday'sstandards:
8KB.Thisisagaintoaccommodatethehighclockspee