project.docx

上传人:b****5 文档编号:8288083 上传时间:2023-01-30 格式:DOCX 页数:15 大小:318.84KB
下载 相关 举报
project.docx_第1页
第1页 / 共15页
project.docx_第2页
第2页 / 共15页
project.docx_第3页
第3页 / 共15页
project.docx_第4页
第4页 / 共15页
project.docx_第5页
第5页 / 共15页
点击查看更多>>
下载资源
资源描述

project.docx

《project.docx》由会员分享,可在线阅读,更多相关《project.docx(15页珍藏版)》请在冰豆网上搜索。

project.docx

project

project

;Exercise1

;ThisisanassemblyversionofthefollowingCcode(assuminga,bandcalreadydeclared)

;

;for(inti=0;i<6;i++){

;a[i]=a[i]+b[i]+c[i];

;}

.data

a:

.space48

b:

.word10,11,12,13,0,1

c:

.word1,2,3,4,5,6

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

Loop:

lwr5,0(r1);elementofa

lwr6,0(r2);elementofb

lwr7,0(r3);elementofc

daddr8,r5,r6;a[i]+b[i]

daddr9,r7,r8;a[i]=a[i]+b[i]+c[i];

swr9,0(r1);storevalueina[i]

daddir1,r1,8;incrementmemorypointers

daddir2,r2,8

daddir3,r3,8

daddir4,r4,-1;i++

bnezr4,Loop

end:

halt

1)Loadex1.sintothememoryofMIPS64anddisabletheforwardinglogic,thedelayslotandtheBranchtargetbufferfromtheConfiguremenuinthemaintoolbar.Beforerunningtheprogram,trytopredictwherestalls

occur,howmanyclockcyclestheywilltake,andforwhatkindofhazardstheyoccur.Thencompareyour

predictionwiththesimulationresults.

答:

时钟周期数=19×6+4+4=122

RAWdatahazard=7×6=42次。

仿真器的模拟结果为:

 

2)Theprogramofex1.sconsistsaloopplussomeotherinstructionsoutsideit.Afterrunningtheprogramtocompletion,estimatetheCPIusingtheStatisticswindow.Inthecaseofaprogramcontaininga“hotspot”(i.e.aninternalloopwhoseinstructionsareexecutedmuchmorefrequentlythanalltheotherinstructions)

theCPIcanberoughlyestimatedjustusingtheasymptoticCPI,i.e.

whereNoutandSoutarethenumberofinstructionsandthenumberofstallsoutsidethe“hotspot”,respectively,Listhenumberofloopcyclesoftheinnermostloop,andIChotandShotarethenumberofinstructionsandthenumberofstallsofthe“hotspot”.ComparetheasymptoticCPIwiththevalueresultingfromsimulations.Areresultscompatible?

答:

SimulatorCPI=((11+7)*6+4+5+5+1)/(11*6)=1.848

CPIAsymptotic=(11+7)/11=1.636

执行情况如下:

3)Enabletheforwardinglogicandexecutethecodeagain.ComputetheCPIagain.JustifytheremainingstallsandcommentwhysomeofthemoccurafterIDstageratherthanafterIF.

答:

SimulatorCPI=((11+1)*6+5+5+4)/(11*6)=1.303

CPIAsymptotic=(11+1)/11=1.091

不相同。

因为存在forwarding,

ID阶段可以先读取寄存器的地址,默认的寄存器的值为错,bnez指令需要放回寄存器中的值,所以不接受daddi指令。

EXE阶段forwarding的值,而要等到WB后的值。

 

4)DisabletheforwardinglogicandassumethattheMIPShardwarecannotdetecthazards.ModifythesourcecodebyinsertingNOPswhereappropriatewithoutreorderingthecode(NOPstuffingtechnique).Checkwiththesimulatorthatnostalloccurs,andcheckwhethertheCPIhaschanged.Dowehavebetterperformance?

答:

加入NOP:

.data

a:

.space48

b:

.word10,11,12,13,0,1

c:

.word1,2,3,4,5,6

.text

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

Loop:

lwr5,0(r1)

lwr6,0(r2)

lwr7,0(r3)

NOP

daddr8,r5,r6

NOP

NOP

daddr9,r7,r8

NOP

NOP

swr9,0(r1)

daddir1,r1,8

daddir2,r2,8

daddir3,r3,8

daddir4,r4,-1

NOP

NOP

bnezr4,Loop

end:

halt

加入了nop后,没有stall,CPI改变,性能变弱。

 

5)Rescheduletheinstructions(codemovingtechnique)inordertoavoidstallswithoutmodifyingtheprogramsemantics(checkthefinalresulttoseeifaftermovingtheinstructionstheresultisthesame).RecomputethenormalandasymptoticCPIvalues.

 

答:

代码如下:

执行情况:

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

Loop:

lwr5,0(r1)

lwr6,0(r2)

lwr7,0(r3)

daddir2,r2,8

daddr8,r5,r6

daddir3,r3,8

daddr9,r7,r8

daddir4,r4,-1

swr9,0(r1)

daddir1,r1,8

bnezr4,Loop

end:

halt

故CPIAsymptotic=(11+3)/11=1.273

实际为1.296

6)Combinereschedulingandforwardingtechniquesandnotethedifferenceswithrespecttothe

forwarding‐onlyandrescheduling‐onlycases.Trytoenablethe“Branchtargetbuffer”lookatthesimulationcodeanddeterminetheCPI.Hasperformanceimproved?

Trytomodifytheoriginalcodebyadding6additionalinputvaluesinaandb.WhatdoyouexpectfromCPI?

答:

加入forwarding的执行情况:

在此基础上加入“Branchtargetbuffer”,得到的结果如下:

forwarding:

rescheduling:

 

把循环的次数增加到12次的时,增加输入的个数,CPI又会有提高。

代码如下:

.data

a:

.space96

b:

.word10,11,12,13,0,1,1,0,13,12,11,10

c:

.word1,2,3,4,5,6,6,5,4,3,2,1

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,12

Loop:

lwr5,0(r1)

lwr6,0(r2)

lwr7,0(r3)

daddir2,r2,8

daddr8,r5,r6

daddir3,r3,8

daddr9,r7,r8

daddir4,r4,-1

swr9,0(r1)

daddir1,r1,8

bnezr4,Loop

end:

halt

程序的执行情况如下:

rescheduling:

增加循环次数后,代码变为:

.data

a:

.space96

b:

.word10,11,12,13,0,1,1,0,13,12,11,10

c:

.word1,2,3,4,5,6,6,5,4,3,2,1

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,12

Loop:

lwr5,0(r1);elementofa

lwr6,0(r2);elementofb

lwr7,0(r3);elementofc

daddr8,r5,r6;a[i]+b[i]

daddr9,r7,r8;a[i]=a[i]+b[i]+c[i];

swr9,0(r1);storevalueina[i]

daddir1,r1,8;incrementmemorypointers

daddir2,r2,8

daddir3,r3,8

daddir4,r4,-1;i++

bnezr4,Loop

end:

halt

forwarding:

7)Awell‐knowncompileroptimizationisknownas“loopunrolling”.Basically,loopunrollingistheexplicit

repetitionoftheloopcodeanumberoftimes.Inthiswayweobtainalongerloopbodythatisexecutedless

times.Considertheoriginalcodeofex1.s.Unrollthelooptwicewithoutanycodemoving,i.e.justrepeatthe

firstfourloopinstructionsandmakethenecessarychangestherein.CalculatetheCPIforthecasewithout

forwarding.Isthereanyimprovement?

答:

代码如下:

.data

a:

.space48

b:

.word10,11,12,13,0,1

c:

.word1,2,3,4,5,6

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

Loop:

lwr5,0(r1);elementofa

lwr6,0(r2);elementofb

lwr7,0(r3);elementofc

daddr8,r5,r6;a[i]+b[i]

daddr9,r7,r8;a[i]=a[i]+b[i]+c[i];

swr9,0(r1);storevalueina[i]

lwr10,8(r1);elementofa

lwr11,8(r2);elementofb

lwr12,8(r3);elementofc

daddr13,r10,r11;a[i]+b[i]

daddr14,r12,r13;a[i]=a[i]+b[i]+c[i];

swr14,8(r1);storevalueina[i]

daddir1,r1,16;incrementmemorypointers

daddir2,r2,16

daddir3,r3,16

daddir4,r4,-2;i++

bnezr4,Loop

end:

halt

执行情况如下:

CPIAsymptotic=(17+13)/17=1.765

故无提高。

 

8)ApplycodereschedulingtothesolutionofthepreviousquestionandcalculateboththeCPIandtheasymptoticCPIvalueswithandwithoutforwarding.Isthereanyimprovement?

答:

代码如下:

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

lwr5,0(r1)

lwr6,0(r2)

lwr7,0(r3)

lwr10,8(r1)

daddr8,r5,r6

Lwr11,8(r2);

Lwr12,8(r3);

daddr9,r7,r8

daddir4,r4,-2

daddr13,r10,r11

swr9,0(r1)

daddir1,r1,16

daddr14,r12,r13

daddir2,r2,16

daddir3,r3,16

swr14,-8(r1)

bnezr4,Loop

End:

halt

执行情况如下:

CPINormal=((17+0)*6+5+0+4)/(17*6)=1.088

CPIAsymptotic=(17+0)/17=1.000

 

9)Supposethattheaddoperationintheoriginalcodeisafloatingpointcalculationandtheloopisiteratedfor12

times.Pleaseusefloatingpointregistersfora[i],b[i],andc[i],andmodifyyourassemblycode.Pleaseanswer

thefollowingquestions:

Atleasthowmanytimesdoyouneedtounrollthelooptominimizestallswithout

forwarding?

Whatistheaveragelatencyofiterationsfortheoriginalloop?

Whatisthecodesize?

Pleaseshow

usyourcode.

Thefollowingistheinputdataofyourcode:

.data

a:

.space96

b:

.word10,11,12,13,0,1,1,0,13,12,11,10

c:

.word1,2,3,4,5,6,6,5,4,3,2,1

.text…………

答:

代码如下:

执行情况如下:

.data

a:

.space96

b:

.word10,11,12,13,0,1,1,0,13,12,11,10

c:

.word1,2,3,4,5,6,6,5,4,3,2,1

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,12

Loop:

l.df1,0(r1)

l.df2,0(r2)

l.df3,0(r3)

add.df4,f2,f1

add.df5,f3,f4

s.df5,0(r1)

daddir1,r1,8

daddir2,r2,8

daddir3,r3,8

daddir4,r4,-1

bnezr4,Loop

end:

halt

将上面的程序四次展开:

程序如下:

执行情况如下:

.data

a:

.space96

b:

.double10,11,12,13,0,1,1,0,13,12,11,10

c:

.double1,2,3,4,5,6,6,5,4,3,2,1

.text

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,12

Loop:

l.df5,0(r1)

l.df6,0(r2)

l.df10,8(r1)

l.df11,8(r2)

add.df8,f5,f6

l.df15,16(r1)

l.df16,16(r2)

add.df13,f10,f11

l.df7,0(r3)

l.df12,8(r3)

add.df18,f15,f16

l.df17,16(r3)

add.df9,f7,f8

add.df14,f13,f12

daddir1,r1,24

add.df19,f17,f18

daddir4,r4,-3

s.df9,0(r1)

s.df14,8(r1)

daddir3,r3,24

daddir2,r2,24

s.df19,-8(r1)

bnezr4,Loop

end:

halt

 

WelcomeTo

Download!

!

!

 

欢迎您的下载,资料仅供参考!

 

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 初中教育

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1