1、加法运算Finish:;* Finish,write result into stdoutsd Result,f4addi r14,r0,PrintPartrap 5 ;系统中断,输出结果addi r2,r2,4subi r20,r20,1bnez r20,Loop* Endtrap 02、运行结果* Cl X*Standard*I/OMMInpOt Cane siVector = 2.000000 4 000000 i. 000000 3 OOODOO 10.000(00 12.000000 14.000000 16. OC 0000 13 .000000 20 OOOOOO 22.OOOO
2、OO 24.OOOOOO 26.OOOCCO 2S.000000 30.000000 OC OOOOlTot呂1381 Cyclefs CKKUtedID executed b1S1 Instructors)2 liubudiorf current in Pipeline.Hardware conf iguration:M&fnov $ize: 32763 Etptes fddEM-S tags; K required 2 fmiJEX-Stagss: 1, required Cycles: 5 fdivEX Skgei; 1, required Cycle氏 ISForwa心g dibble
3、dStalis:RAW 曲险 130 (34.12 Of 汕 C/cl詢 WAW 0 0.00 of all Cycles Stiucturd sials: 0 0.00 of al Cycles Control sials: 15 宠 of al Cydes T rap stals: 54 (14.1 72of al Cycles) Totel; 199 Stdh) (52.23% of d CdesCcnditionl Errjches):T ota1:16 (8.84 of all IralructionsL thereof: taken; 15(93.753f of al cond B
4、 ranches not takers 1 (6.25; of all cond. EranchesLoad/Store-Inst rue t ions;T oUl: 43 (27.073; of al lnstiuc?tionsL thereof: Loads; 33 (67.35 o(Lodd-/S to instructions Stores; 16 (32.65 of Load-/S lorl nstrucliorraFloatin? point staa iastrnGtionsT oh I; 16 (8L84 % of dill nstructions, thereof;AddKi
5、oris: 1B(1 00 OD of Floathg port stage mt Mullplicalions: 0 口即莒 of Floating point stage J Divisions: 0 (0,00琴 of Floating point stage rid.)Traps:Trape 18 阳94鬼 of all Instiucbonc5. 1程序相关性分析结果(1)观察程序中出现的数据/控制/结构相关。指出程序中出现上述现象的指 令组合。产生34.12%的数据相关。当对当前指令的操作数寄存器进行操作的时候,前几 条指令的运算结果还未写回结果寄存器,由此产生数据相关。5uti
6、r20120 Q-1 brczrZOddtoiLoac 咖QhOOd luDectLd1i2JH II 2e=li:jr2(i2)cdMUO没有产生结构相关。产生3.94%的控制相关。系统按照预测成功来执行指令,执行一条指令后马上将 其下一条指令trap读进来。add iUOkIOMIF ID I inI h-EM 训IniorTTutiQn about trap 0焉5卜 ii -trap 0x5IFIDAdr.: data-Firisti+OwfiCode: 0x+4ODOCO5T rminatsd 5uccmsUIIjFirst Cycle: -13Last Cycle: -GTotal
7、 Cycles: BCycles: -13(4)T eiminated succesEfuIMAFl PC (=data. Fmish+OxGIRE 制 gnflMAR (=044000005)PCPC+4 =datQ Finisbu-Oxc)3 Stallh) because ofT rap-PilineClMring!C*clts: -91T erminsled successfully System call executei Nq Stalb required.(2)考察增加浮点运算部件对性能的影响。3个浮点运算部件1个浮点运算部件Total3S1 Cyclefi executedID
8、 executed by 131 lnstrudion(sllr*$tructicnsj current in PipelheTotal:381 Qcle 阖 exectiled. ID exficuled by 1A1 lnstiuctions.21 nsltuctions) (WHtp in Pipelrie.Hardware canfiuratidin;Memory 沁;32760 Bes faddEX-Sg?$; required Cycles; ? fmulEXtcges; 1 * reqiired Cycles; 5 FdivEX-Stages; 1 required Cycles
9、; 19 Forwaning disabled.Hardware configuration: Memcrp si2e: 3276&faddE-Stages: 3, requiiedCyclBS: 2 fmiulE !-S (ages: 3, required Cycles b fdivEX-Stages: 3,raqulied Cdesc 19 Forwarding disabled.Stalls:RAW stalk: 1ID(34 1qIall Cycles)WAV/stab: 00.0K of diodes)Structural C (0.00 of all Cycles)Control
10、 15 (3.54 of all ClesTrap 肘 Is: 54 (1.17 of dl Cyder) ToM 139Ststruclian$H thereof;Addtions: 16 (100.00 of Floating point stage inst) MultiplicatBns: 0 (0 00% of Floating pent stags inst. Divisions; 0 Cl 0宓 of Floating point 就mge inst.Tups: 1BR94 塔 dd lnrtvictian$Trps: 1卩 9必 cf all In汕uuliss同一段代码执行相
11、同步,但是经过对比发现浮点运算部件的多少对于程序执行效 率并没有什么影响,浮点运算部件的多少对统计结果都不造成影响, 可能是由于该程序不存在争用浮点运算部件资源的情况(3)考察增加forward部件对性能的影响。不使用forward部件 使用forward部件使用forward部件之后执行相同的代码用的时钟周期比不使用 forward部件少了 大约100个时钟周期,由于没有结构相关,所以使用forward部件主要使得RAW0 关明显减少了,占总的时钟周期比例也减少了,对控制相关没有什么影响。总之,使用forward部件后,总的时钟周期减少,数据相关减少,流水线的性能 得到很大的改善。(4)观察
12、转移指令在转移成功和转移不成功时候的流水线开销。Condition-al Branches):T Dial 16 (8 84 of all I nstructions), thereof:taken: 15 (93.75 of al cond Branches)not taken; 1 (6.25 M of all cond. Btanchts)在本次实验中转移成功的几率比较大,进行 16次转移只有一次转移不成功的,因为系统按照预测成功来执行指令,当判断转移不成功时,系统对 trap指令进 行的操作被全部作废,转而去执行跳转到的指令。4.2双精度浮点加法求和代码清单及注释说明1、双精度浮点加法
13、求和源代码.alig n 2Prin tfPar: .word Prin tfFormatr: .space 200r为保存相加结果的向量空间 .text.global mai naddi r1,r0,0 ;r1计数相加的次数addui r4,r0,8 ;r4 为常数 8loop: ;循环计算向量相加结果subi r2,r1,20 ;r1=20 时,跳转到 finishbeqz r2,finishmultu r3,r1,r4;r3为当前分量相对于向量基址的偏移 (每个分量占8B)ld fO,a(;取a中第r1个分量ld f2,b(r3);取b中第r1个分量addd f4,f0,f2 ;相加结果
14、放在f4中sd r(r3),f4 ;将相加结果放入结果向量 r中addi r1,r1,1 ;下一分量j loopfini sh:输出向量相加的结果addi r14,r0,Pri ntfPar2、运行结果:F Dl X-Sta n dard -1/0Input *CancelThe result is3.OOODOO4.:3000007.00QOOO9.00000011oooooo13.0000001517 . QQQQOOie.90000021.oooooo2325.aooooo27.0000002931333537.0000003641489 Cyclefs executed.ID exec
15、uted by 135 Instruct io r(sl2Inriructiorfs) current in Pipeline.Haxdwaxe conf iguration:Memory 咄芥;3276B BytesfaddEXS tages: L required Cycl&s: 3 fnuJEX-Stages1 * required Cycles; 5 fdivE-SUaes; 1. uquu曲 Cydcs; 1 日FonAiarding dibbledStalls;RAW stalls: 263 (53.78 of all Cycles)WAW 計制$; 0(n.00 of ell C
16、ycles Structural stalls; 0 0 00 of al Cycles)Dant id lUfc: 21 (4.29 cf al Cycles |T w归k; 9 (1,84芻 of al CyclesTotak 293 Gtdl(s) 59 ,922; of all Qd显ConditIqxlL Brnch&s); 21 (10.77 all Instructioris). thereof 1 (47E of aiccnd. Brmches)rot laksrc 20 (95243f of all cond Branchesd-/St ore-Instructioas;T
17、stat E5门一于of al IrshudionsJ, thareoFLoads: 40 回.54篦 of Load-/5tora*lnstructionisStan$H 25 (224F; of Lood /S torsi mtiuctnm)Floatincr point stags iustru匚ti匸T qUI:40 (20.51 of 訓 I喃wutiori或 thereoLAddiors: 2D (50.00% of Ruling point itage inst.)Mullplicalions: 20 (50.00 of Floating point stage insL)Div
18、isions: 0 0.00 of Floating post stage L)Traps 31(1 54 of all Instructions:)5.2程序相关性分析结果指出程序中出现上述现象的指令 组合。产生了 53.78%的数据相关。当对当前指令的操作数寄存器进行操作的时候,前 几条指令的运算结果还未写回结果寄存器,由此产生数据相关。产生数据相关的 指令主要有:1)addi r1,r0,0subi r2,r1,202)subi r2,r1,203)multu r3,r1,r4Id f0,a(r3)4)ld f2,b( r3)addd f4,f0,f2| HD 严店:;| MFM |钞日
19、1=H-5alIJ IIfaciifXni MEM I V,BI伯IInriRUclizirc; J* Ldei已ddzS 14 K1.IZ半d ri3J M-ddrl/13i het三址 r2/1,Ch A bp rS.fdih无结构相关产生了 4.29%的控制相关。系统按照预测成功来执行指令,执行一条指令后马上 将其下一条指令trap读进来。和4.1代码类似。1个浮点运算部件 3个浮点运算部件 鼻 *4E9 Cydefs) executed.ID ewecutedby J95 Insiiuctionfs)2Instictior(曲 currently in PipelineTotl:439
20、 CydE 阖 executed.ID esecufced by 1db I nsbuctionfs.2nsttuctonfJ oireHtp n Pipeline.Hardvare contiguration; Memy 检;22帝Byte? faddEX-Staae?: 1. requied Ciclfrs12 fmulEtaoe:. required Cycles: 5 蔺 耶孤 1. (squHd Qyde19Foiwarang dibbled.Eai-dware configuration.: Msmcrysze: 32768 Byles faddE-St3Qes: 1 requie
21、d Cycles; 2 fnrdE-gbges: 3, required Cycles 5 fdvEX-Stages: 3. required Cycles: 19 ForwerAig di泪hd,RAW 血感 263 駅局算 of al Cydes) WAW 如 呢 ofdllQGl 於Structudl stalls; D (0 00% of al C/cIbs) Conlrol stair 21 (4.29 cf all 3 凸翎 Trap 細屛 cf 制lUyu閒293 Stall圍陕9盈of訓事鬭FW 如农 26315370 口f all Cycles) VrAWstdls: 0(0
22、00 of all Cycles) Stucturalskk OOdDOS of al Cycles) Control skis; 21 (4.29 of allT sp 心;9(1.94 of d Qcb 订 293 Stalls (53 32 ofC匚 3rviiQhs);Totak 21 (10.77 of all ln9&udm)L thefeor: teken: 1 (4.76 of all coni Blanches) noHakanc 20 (96.24 of al nd. BranchesConditional Branches)T vtat 21 (10,7?6 cf dl
23、InsbuctionsjL thtreof; taken! 1 (4 76% of al cond. Dranchn) not taken: 20 (95.24 of al cord 由anchKjLoad-/St ore-r nst rue t i ors;Totat (33.33t of all lnshudiont)L theracfLoids40 (E1.54X of Losd-/5toren$truclicm)Stnx 的尬 4E 需 cf LQad-/StQrdndNG4hrH)loac-/Store-ins truct ions:T otat 65 (33.33?* of all Inslructions), thereof: 40 61.54 d Load-/S lore-hstiuclionsStores: 25(38146cf Load/Store hsbudiDnsFlostinq point stags instructions nt 4 00.51of all Instruction thersor;Addition: 20 (50 UR of Roathg point ttage h$l)Mutiplica
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1