project.doc

上传人:b****1 文档编号:231681 上传时间:2022-10-07 格式:DOC 页数:12 大小:475KB
下载 相关 举报
project.doc_第1页
第1页 / 共12页
project.doc_第2页
第2页 / 共12页
project.doc_第3页
第3页 / 共12页
project.doc_第4页
第4页 / 共12页
project.doc_第5页
第5页 / 共12页
点击查看更多>>
下载资源
资源描述

project.doc

《project.doc》由会员分享,可在线阅读,更多相关《project.doc(12页珍藏版)》请在冰豆网上搜索。

project.doc

project

;Exercise1

;ThisisanassemblyversionofthefollowingCcode(assuminga,bandcalreadydeclared)

;

;for(inti=0;i<6;i++){

;a[i]=a[i]+b[i]+c[i];

;}

.data

a:

.space48

b:

.word10,11,12,13,0,1

c:

.word1,2,3,4,5,6

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

Loop:

lwr5,0(r1);elementofa

lwr6,0(r2);elementofb

lwr7,0(r3);elementofc

daddr8,r5,r6;a[i]+b[i]

daddr9,r7,r8;a[i]=a[i]+b[i]+c[i];

swr9,0(r1);storevalueina[i]

daddir1,r1,8;incrementmemorypointers

daddir2,r2,8

daddir3,r3,8

daddir4,r4,-1;i++

bnezr4,Loop

end:

halt

1)Loadex1.sintothememoryofMIPS64anddisabletheforwardinglogic,thedelayslotandtheBranchtargetbufferfromtheConfiguremenuinthemaintoolbar.Beforerunningtheprogram,trytopredictwherestalls

occur,howmanyclockcyclestheywilltake,andforwhatkindofhazardstheyoccur.Thencompareyour

predictionwiththesimulationresults.

答:

时钟周期数=19×6+4+4=122

RAWdatahazard=7×6=42次。

仿真器的模拟结果为:

2)Theprogramofex1.sconsistsaloopplussomeotherinstructionsoutsideit.Afterrunningtheprogramtocompletion,estimatetheCPIusingtheStatisticswindow.Inthecaseofaprogramcontaininga“hotspot”(i.e.aninternalloopwhoseinstructionsareexecutedmuchmorefrequentlythanalltheotherinstructions)

theCPIcanberoughlyestimatedjustusingtheasymptoticCPI,i.e.

whereNoutandSoutarethenumberofinstructionsandthenumberofstallsoutsidethe“hotspot”,respectively,Listhenumberofloopcyclesoftheinnermostloop,andIChotandShotarethenumberofinstructionsandthenumberofstallsofthe“hotspot”.ComparetheasymptoticCPIwiththevalueresultingfromsimulations.Areresultscompatible?

答:

SimulatorCPI=((11+7)*6+4+5+5+1)/(11*6)=1.848

CPIAsymptotic=(11+7)/11=1.636

执行情况如下:

3)Enabletheforwardinglogicandexecutethecodeagain.ComputetheCPIagain.JustifytheremainingstallsandcommentwhysomeofthemoccurafterIDstageratherthanafterIF.

答:

SimulatorCPI=((11+1)*6+5+5+4)/(11*6)=1.303

CPIAsymptotic=(11+1)/11=1.091

不相同。

因为存在forwarding,

ID阶段可以先读取寄存器的地址,默认的寄存器的值为错,bnez指令需要放回寄存器中的值,所以不接受daddi指令。

EXE阶段forwarding的值,而要等到WB后的值。

4)DisabletheforwardinglogicandassumethattheMIPShardwarecannotdetecthazards.ModifythesourcecodebyinsertingNOPswhereappropriatewithoutreorderingthecode(NOPstuffingtechnique).Checkwiththesimulatorthatnostalloccurs,andcheckwhethertheCPIhaschanged.Dowehavebetterperformance?

答:

加入NOP:

.data

a:

.space48

b:

.word10,11,12,13,0,1

c:

.word1,2,3,4,5,6

.text

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

Loop:

lwr5,0(r1)

lwr6,0(r2)

lwr7,0(r3)

NOP

daddr8,r5,r6

NOP

NOP

daddr9,r7,r8

NOP

NOP

swr9,0(r1)

daddir1,r1,8

daddir2,r2,8

daddir3,r3,8

daddir4,r4,-1

NOP

NOP

bnezr4,Loop

end:

halt

加入了nop后,没有stall,CPI改变,性能变弱。

5)Rescheduletheinstructions(codemovingtechnique)inordertoavoidstallswithoutmodifyingtheprogramsemantics(checkthefinalresulttoseeifaftermovingtheinstructionstheresultisthesame).RecomputethenormalandasymptoticCPIvalues.

答:

代码如下:

执行情况:

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

Loop:

lwr5,0(r1)

lwr6,0(r2)

lwr7,0(r3)

daddir2,r2,8

daddr8,r5,r6

daddir3,r3,8

daddr9,r7,r8

daddir4,r4,-1

swr9,0(r1)

daddir1,r1,8

bnezr4,Loop

end:

halt

故CPIAsymptotic=(11+3)/11=1.273

实际为1.296

6)Combinereschedulingandforwardingtechniquesandnotethedifferenceswithrespecttothe

forwarding‐onlyandrescheduling‐onlycases.Trytoenablethe“Branchtargetbuffer”lookatthesimulationcodeanddeterminetheCPI.Hasperformanceimproved?

Trytomodifytheoriginalcodebyadding6additionalinputvaluesinaandb.WhatdoyouexpectfromCPI?

答:

加入forwarding的执行情况:

在此基础上加入“Branchtargetbuffer”,得到的结果如下:

forwarding:

rescheduling:

把循环的次数增加到12次的时,增加输入的个数,CPI又会有提高。

代码如下:

.data

a:

.space96

b:

.word10,11,12,13,0,1,1,0,13,12,11,10

c:

.word1,2,3,4,5,6,6,5,4,3,2,1

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,12

Loop:

lwr5,0(r1)

lwr6,0(r2)

lwr7,0(r3)

daddir2,r2,8

daddr8,r5,r6

daddir3,r3,8

daddr9,r7,r8

daddir4,r4,-1

swr9,0(r1)

daddir1,r1,8

bnezr4,Loop

end:

halt

程序的执行情况如下:

rescheduling:

增加循环次数后,代码变为:

.data

a:

.space96

b:

.word10,11,12,13,0,1,1,0,13,12,11,10

c:

.word1,2,3,4,5,6,6,5,4,3,2,1

.text

;initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,12

Loop:

lwr5,0(r1);elementofa

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 考试认证 > IT认证

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1