project.doc
《project.doc》由会员分享,可在线阅读,更多相关《project.doc(12页珍藏版)》请在冰豆网上搜索。
![project.doc](https://file1.bdocx.com/fileroot1/2022-10/7/1e1219ed-3ec5-4350-ada1-a29f76e32b7a/1e1219ed-3ec5-4350-ada1-a29f76e32b7a1.gif)
project
;Exercise1
;ThisisanassemblyversionofthefollowingCcode(assuminga,bandcalreadydeclared)
;
;for(inti=0;i<6;i++){
;a[i]=a[i]+b[i]+c[i];
;}
.data
a:
.space48
b:
.word10,11,12,13,0,1
c:
.word1,2,3,4,5,6
.text
;initializeregisters
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,6
Loop:
lwr5,0(r1);elementofa
lwr6,0(r2);elementofb
lwr7,0(r3);elementofc
daddr8,r5,r6;a[i]+b[i]
daddr9,r7,r8;a[i]=a[i]+b[i]+c[i];
swr9,0(r1);storevalueina[i]
daddir1,r1,8;incrementmemorypointers
daddir2,r2,8
daddir3,r3,8
daddir4,r4,-1;i++
bnezr4,Loop
end:
halt
1)Loadex1.sintothememoryofMIPS64anddisabletheforwardinglogic,thedelayslotandtheBranchtargetbufferfromtheConfiguremenuinthemaintoolbar.Beforerunningtheprogram,trytopredictwherestalls
occur,howmanyclockcyclestheywilltake,andforwhatkindofhazardstheyoccur.Thencompareyour
predictionwiththesimulationresults.
答:
时钟周期数=19×6+4+4=122
RAWdatahazard=7×6=42次。
仿真器的模拟结果为:
2)Theprogramofex1.sconsistsaloopplussomeotherinstructionsoutsideit.Afterrunningtheprogramtocompletion,estimatetheCPIusingtheStatisticswindow.Inthecaseofaprogramcontaininga“hotspot”(i.e.aninternalloopwhoseinstructionsareexecutedmuchmorefrequentlythanalltheotherinstructions)
theCPIcanberoughlyestimatedjustusingtheasymptoticCPI,i.e.
whereNoutandSoutarethenumberofinstructionsandthenumberofstallsoutsidethe“hotspot”,respectively,Listhenumberofloopcyclesoftheinnermostloop,andIChotandShotarethenumberofinstructionsandthenumberofstallsofthe“hotspot”.ComparetheasymptoticCPIwiththevalueresultingfromsimulations.Areresultscompatible?
答:
SimulatorCPI=((11+7)*6+4+5+5+1)/(11*6)=1.848
CPIAsymptotic=(11+7)/11=1.636
执行情况如下:
3)Enabletheforwardinglogicandexecutethecodeagain.ComputetheCPIagain.JustifytheremainingstallsandcommentwhysomeofthemoccurafterIDstageratherthanafterIF.
答:
SimulatorCPI=((11+1)*6+5+5+4)/(11*6)=1.303
CPIAsymptotic=(11+1)/11=1.091
不相同。
因为存在forwarding,
ID阶段可以先读取寄存器的地址,默认的寄存器的值为错,bnez指令需要放回寄存器中的值,所以不接受daddi指令。
EXE阶段forwarding的值,而要等到WB后的值。
4)DisabletheforwardinglogicandassumethattheMIPShardwarecannotdetecthazards.ModifythesourcecodebyinsertingNOPswhereappropriatewithoutreorderingthecode(NOPstuffingtechnique).Checkwiththesimulatorthatnostalloccurs,andcheckwhethertheCPIhaschanged.Dowehavebetterperformance?
答:
加入NOP:
.data
a:
.space48
b:
.word10,11,12,13,0,1
c:
.word1,2,3,4,5,6
.text
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,6
Loop:
lwr5,0(r1)
lwr6,0(r2)
lwr7,0(r3)
NOP
daddr8,r5,r6
NOP
NOP
daddr9,r7,r8
NOP
NOP
swr9,0(r1)
daddir1,r1,8
daddir2,r2,8
daddir3,r3,8
daddir4,r4,-1
NOP
NOP
bnezr4,Loop
end:
halt
加入了nop后,没有stall,CPI改变,性能变弱。
5)Rescheduletheinstructions(codemovingtechnique)inordertoavoidstallswithoutmodifyingtheprogramsemantics(checkthefinalresulttoseeifaftermovingtheinstructionstheresultisthesame).RecomputethenormalandasymptoticCPIvalues.
答:
代码如下:
执行情况:
.text
;initializeregisters
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,6
Loop:
lwr5,0(r1)
lwr6,0(r2)
lwr7,0(r3)
daddir2,r2,8
daddr8,r5,r6
daddir3,r3,8
daddr9,r7,r8
daddir4,r4,-1
swr9,0(r1)
daddir1,r1,8
bnezr4,Loop
end:
halt
故CPIAsymptotic=(11+3)/11=1.273
实际为1.296
6)Combinereschedulingandforwardingtechniquesandnotethedifferenceswithrespecttothe
forwarding‐onlyandrescheduling‐onlycases.Trytoenablethe“Branchtargetbuffer”lookatthesimulationcodeanddeterminetheCPI.Hasperformanceimproved?
Trytomodifytheoriginalcodebyadding6additionalinputvaluesinaandb.WhatdoyouexpectfromCPI?
答:
加入forwarding的执行情况:
在此基础上加入“Branchtargetbuffer”,得到的结果如下:
forwarding:
rescheduling:
把循环的次数增加到12次的时,增加输入的个数,CPI又会有提高。
代码如下:
.data
a:
.space96
b:
.word10,11,12,13,0,1,1,0,13,12,11,10
c:
.word1,2,3,4,5,6,6,5,4,3,2,1
.text
;initializeregisters
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,12
Loop:
lwr5,0(r1)
lwr6,0(r2)
lwr7,0(r3)
daddir2,r2,8
daddr8,r5,r6
daddir3,r3,8
daddr9,r7,r8
daddir4,r4,-1
swr9,0(r1)
daddir1,r1,8
bnezr4,Loop
end:
halt
程序的执行情况如下:
rescheduling:
增加循环次数后,代码变为:
.data
a:
.space96
b:
.word10,11,12,13,0,1,1,0,13,12,11,10
c:
.word1,2,3,4,5,6,6,5,4,3,2,1
.text
;initializeregisters
daddir1,r0,a
daddir2,r0,b
daddir3,r0,c
daddir4,r0,12
Loop:
lwr5,0(r1);elementofa