project.docx - 冰豆网

资源描述

project.docx

《project.docx》由会员分享，可在线阅读，更多相关《project.docx（15页珍藏版）》请在冰豆网上搜索。

project.docx

project

;Exercise1

;ThisisanassemblyversionofthefollowingCcode（assuminga,bandcalreadydeclared）

;

;for（inti=0;i<6;i++）{

;a[i]=a[i]+b[i]+c[i];

;}

.data

.space48

.word10,11,12,13,0,1

.word1,2,3,4,5,6

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

Loop:

lwr5,0（r1）;elementofa

lwr6,0（r2）;elementofb

lwr7,0（r3）;elementofc

daddr8,r5,r6;a[i]+b[i]

daddr9,r7,r8;a[i]=a[i]+b[i]+c[i];

swr9,0（r1）;storevalueina[i]

daddir1,r1,8;incrementmemorypointers

daddir2,r2,8

daddir3,r3,8

daddir4,r4,-1;i++

bnezr4,Loop

end:

halt

1）Loadex1.sintothememoryofMIPS64anddisabletheforwardinglogic,thedelayslotandtheBranchtargetbufferfromtheConfiguremenuinthemaintoolbar.Beforerunningtheprogram,trytopredictwherestalls

occur,howmanyclockcyclestheywilltake,andforwhatkindofhazardstheyoccur.Thencompareyour

predictionwiththesimulationresults.

答：

时钟周期数=19×6+4+4=122

RAWdatahazard=7×6=42次。

仿真器的模拟结果为：

2）Theprogramofex1.sconsistsaloopplussomeotherinstructionsoutsideit.Afterrunningtheprogramtocompletion,estimatetheCPIusingtheStatisticswindow.Inthecaseofaprogramcontaininga“hotspot”（i.e.aninternalloopwhoseinstructionsareexecutedmuchmorefrequentlythanalltheotherinstructions）

theCPIcanberoughlyestimatedjustusingtheasymptoticCPI,i.e.

whereNoutandSoutarethenumberofinstructionsandthenumberofstallsoutsidethe“hotspot”,respectively,Listhenumberofloopcyclesoftheinnermostloop,andIChotandShotarethenumberofinstructionsandthenumberofstallsofthe“hotspot”.ComparetheasymptoticCPIwiththevalueresultingfromsimulations.Areresultscompatible?

答：

SimulatorCPI=（（11+7）*6+4+5+5+1）/（11*6）=1.848

CPIAsymptotic=（11+7）/11=1.636

执行情况如下：

3）Enabletheforwardinglogicandexecutethecodeagain.ComputetheCPIagain.JustifytheremainingstallsandcommentwhysomeofthemoccurafterIDstageratherthanafterIF.

答：

SimulatorCPI=（（11+1）*6+5+5+4）/（11*6）=1.303

CPIAsymptotic=（11+1）/11=1.091

不相同。

因为存在forwarding，

ID阶段可以先读取寄存器的地址，默认的寄存器的值为错，bnez指令需要放回寄存器中的值，所以不接受daddi指令。

EXE阶段forwarding的值，而要等到WB后的值。

4）DisabletheforwardinglogicandassumethattheMIPShardwarecannotdetecthazards.ModifythesourcecodebyinsertingNOPswhereappropriatewithoutreorderingthecode（NOPstuffingtechnique）.Checkwiththesimulatorthatnostalloccurs,andcheckwhethertheCPIhaschanged.Dowehavebetterperformance?

答：

加入NOP:

.data

.space48

.word10,11,12,13,0,1

.word1,2,3,4,5,6

.text

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

Loop:

lwr5,0（r1）

lwr6,0（r2）

lwr7,0（r3）

NOP

daddr8,r5,r6

NOP

daddr9,r7,r8

NOP

swr9,0（r1）

daddir1,r1,8

daddir2,r2,8

daddir3,r3,8

daddir4,r4,-1

NOP

bnezr4,Loop

end:

halt

加入了nop后，没有stall，CPI改变，性能变弱。

5）Rescheduletheinstructions（codemovingtechnique）inordertoavoidstallswithoutmodifyingtheprogramsemantics（checkthefinalresulttoseeifaftermovingtheinstructionstheresultisthesame）.RecomputethenormalandasymptoticCPIvalues.

答：

代码如下：

执行情况：

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

Loop:

lwr5,0（r1）

lwr6,0（r2）

lwr7,0（r3）

daddir2,r2,8

daddr8,r5,r6

daddir3,r3,8

daddr9,r7,r8

daddir4,r4,-1

swr9,0（r1）

daddir1,r1,8

bnezr4,Loop

end:

halt

故CPIAsymptotic=（11+3）/11=1.273

实际为1.296

6）Combinereschedulingandforwardingtechniquesandnotethedifferenceswithrespecttothe

forwarding‐onlyandrescheduling‐onlycases.Trytoenablethe“Branchtargetbuffer”lookatthesimulationcodeanddeterminetheCPI.Hasperformanceimproved?

Trytomodifytheoriginalcodebyadding6additionalinputvaluesinaandb.WhatdoyouexpectfromCPI?

答：

加入forwarding的执行情况：

在此基础上加入“Branchtargetbuffer”，得到的结果如下：

forwarding：

rescheduling：

把循环的次数增加到12次的时，增加输入的个数，CPI又会有提高。

代码如下：

.data

.space96

.word10,11,12,13,0,1,1,0,13,12,11,10

.word1,2,3,4,5,6,6,5,4,3,2,1

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,12

Loop:

lwr5,0（r1）

lwr6,0（r2）

lwr7,0（r3）

daddir2,r2,8

daddr8,r5,r6

daddir3,r3,8

daddr9,r7,r8

daddir4,r4,-1

swr9,0（r1）

daddir1,r1,8

bnezr4,Loop

end:

halt

程序的执行情况如下：

rescheduling:

增加循环次数后，代码变为：

.data

.space96

.word10,11,12,13,0,1,1,0,13,12,11,10

.word1,2,3,4,5,6,6,5,4,3,2,1

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,12

Loop:

lwr5,0（r1）;elementofa

lwr6,0（r2）;elementofb

lwr7,0（r3）;elementofc

daddr8,r5,r6;a[i]+b[i]

daddr9,r7,r8;a[i]=a[i]+b[i]+c[i];

swr9,0（r1）;storevalueina[i]

daddir1,r1,8;incrementmemorypointers

daddir2,r2,8

daddir3,r3,8

daddir4,r4,-1;i++

bnezr4,Loop

end:

halt

forwarding:

7）Awell‐knowncompileroptimizationisknownas“loopunrolling”.Basically,loopunrollingistheexplicit

repetitionoftheloopcodeanumberoftimes.Inthiswayweobtainalongerloopbodythatisexecutedless

times.Considertheoriginalcodeofex1.s.Unrollthelooptwicewithoutanycodemoving,i.e.justrepeatthe

firstfourloopinstructionsandmakethenecessarychangestherein.CalculatetheCPIforthecasewithout

forwarding.Isthereanyimprovement?

答：

代码如下：

.data

.space48

.word10,11,12,13,0,1

.word1,2,3,4,5,6

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

Loop:

lwr5,0（r1）;elementofa

lwr6,0（r2）;elementofb

lwr7,0（r3）;elementofc

daddr8,r5,r6;a[i]+b[i]

daddr9,r7,r8;a[i]=a[i]+b[i]+c[i];

swr9,0（r1）;storevalueina[i]

lwr10,8（r1）;elementofa

lwr11,8（r2）;elementofb

lwr12,8（r3）;elementofc

daddr13,r10,r11;a[i]+b[i]

daddr14,r12,r13;a[i]=a[i]+b[i]+c[i];

swr14,8（r1）;storevalueina[i]

daddir1,r1,16;incrementmemorypointers

daddir2,r2,16

daddir3,r3,16

daddir4,r4,-2;i++

bnezr4,Loop

end:

halt

执行情况如下：

CPIAsymptotic=（17+13）/17=1.765

故无提高。

8）ApplycodereschedulingtothesolutionofthepreviousquestionandcalculateboththeCPIandtheasymptoticCPIvalueswithandwithoutforwarding.Isthereanyimprovement?

答：

代码如下：

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

lwr5,0（r1）

lwr6,0（r2）

lwr7,0（r3）

lwr10,8（r1）

daddr8,r5,r6

Lwr11,8（r2）;

Lwr12,8（r3）;

daddr9,r7,r8

daddir4,r4,-2

daddr13,r10,r11

swr9,0（r1）

daddir1,r1,16

daddr14,r12,r13

daddir2,r2,16

daddir3,r3,16

swr14,-8（r1）

bnezr4,Loop

End:

halt

执行情况如下：

CPINormal=（（17+0）*6+5+0+4）/（17*6）=1.088

CPIAsymptotic=（17+0）/17=1.000

9）Supposethattheaddoperationintheoriginalcodeisafloatingpointcalculationandtheloopisiteratedfor12

times.Pleaseusefloatingpointregistersfora[i],b[i],andc[i],andmodifyyourassemblycode.Pleaseanswer

thefollowingquestions:

Atleasthowmanytimesdoyouneedtounrollthelooptominimizestallswithout

forwarding?

Whatistheaveragelatencyofiterationsfortheoriginalloop?

Whatisthecodesize?

Pleaseshow

usyourcode.

Thefollowingistheinputdataofyourcode:

.data

.space96

.word10,11,12,13,0,1,1,0,13,12,11,10

.word1,2,3,4,5,6,6,5,4,3,2,1

.text…………

答：

代码如下：

执行情况如下：

.data

.space96

.word10,11,12,13,0,1,1,0,13,12,11,10

.word1,2,3,4,5,6,6,5,4,3,2,1

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,12

Loop:

l.df1,0（r1）

l.df2,0（r2）

l.df3,0（r3）

add.df4,f2,f1

add.df5,f3,f4

s.df5,0（r1）

daddir1,r1,8

daddir2,r2,8

daddir3,r3,8

daddir4,r4,-1

bnezr4,Loop

end:

halt

将上面的程序四次展开：

程序如下：

执行情况如下：

.data

.space96

.double10,11,12,13,0,1,1,0,13,12,11,10

.double1,2,3,4,5,6,6,5,4,3,2,1

.text

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,12

Loop:

l.df5,0（r1）

l.df6,0（r2）

l.df10,8（r1）

l.df11,8（r2）

add.df8,f5,f6

l.df15,16（r1）

l.df16,16（r2）

add.df13,f10,f11

l.df7,0（r3）

l.df12,8（r3）

add.df18,f15,f16

l.df17,16（r3）

add.df9,f7,f8

add.df14,f13,f12

daddir1,r1,24

add.df19,f17,f18

daddir4,r4,-3

s.df9,0（r1）

s.df14,8（r1）

daddir3,r3,24

daddir2,r2,24

s.df19,-8（r1）

bnezr4,Loop

end:

halt

WelcomeTo

Download!

欢迎您的下载，资料仅供参考！

展开阅读全文