SUN技术支持及培训部技术资料v10.docx
《SUN技术支持及培训部技术资料v10.docx》由会员分享,可在线阅读,更多相关《SUN技术支持及培训部技术资料v10.docx(13页珍藏版)》请在冰豆网上搜索。
SUN技术支持及培训部技术资料v10
说
明
得实SUN技术资料只限于得实集团SUN业务部的FE人员中流通,任何人都不得外传.
SUN技术支持及培训部:
钟剑
1.更换内存条:
现象:
①/var/dam/messages*
Dec1510:
55:
01ibs-shandong2unix:
[AFT0]errID0x00000052.77005658CorrectedMemoryErroronBoard5J3201isPersistent
Dec1510:
55:
01ibs-shandong2unix:
[AFT0]errID0x00000052.77005658ECCDataBit42wasinerrorandcorrected
Dec1510:
55:
01ibs-shandong2unix:
[AFT0]CorrectedMemoryErroronCPU15,errID0x00000052.77006f57
Dec1510:
55:
01ibs-shandong2unix:
AFSR0x00000000.00100000AFAR0x00000000.f99b1610
Dec1510:
55:
01ibs-shandong2unix:
AFSR.PSYND0x0000(Score05)AFSR.ETS0x00Fault_PC0x100731d0
Dec1510:
55:
01ibs-shandong2unix:
UDBHSyndrome0x43MemoryModuleBoard5J3200
Dec1510:
55:
01ibs-shandong2unix:
[AFT0]errID0x00000052.77006f57CorrectedMemoryErroronBoard5J3200isPersistent
Dec1510:
55:
01ibs-shandong2unix:
[AFT0]errID0x00000052.77006f57ECCDataBit42wasinerrorandcorrected
Dec1510:
55:
01ibs-shandong2unix:
[AFT0]CorrectedMemoryErroronCPU15,errID0x00000052.77008766
Dec1510:
55:
01ibs-shandong2unix:
AFSR0x00000000.00100000AFAR0x00000000.f99b1690
Dec1510:
55:
01ibs-shandong2unix:
AFSR.PSYND0x0000(Score05)AFSR.ETS0x00Fault_PC0x100731d0
Dec1510:
55:
01ibs-shandong2unix:
UDBHSyndrome0x43MemoryModuleBoard5J3201
②AllLEDstatusisok;prtdiagoutputisok;formatoutputisok;
③show-post-resultsdisplay:
board1,3,16statusareok,board5statusisfailure.onboard5,bank0=2,bank0=***,bank1=2,bank1=***
结论:
dimmJ3201,J3200onboard5haveproblem
步骤:
1跟用户确定当前系统的可以down机时间。
2init0
3pulloutboard5
4replacedimmJ3201,J3200memories
5bootupsystem
6查看/var/adm/messages*信息,确定系统已经处理完毕。
2.更换tapedrive
现象:
1probe-scsi-all->candetectthetapedrive
2boot–r;insertnewtape,mt-f/dev/rmt/0->/dev/rmt/0:
notapeloadedordriveoffline
3usingtwodifferentcleaningtapes->aftercleaning,thecleaningledisliteachtimetapeisinserted
4trywithnewtapes->tapedrivestillcann'tbeused
结论:
tapedriveisfaultandreplacetapedrive
步骤:
1跟用户确定系统的down机时间
2确定系统的状态,如果是cluster系统,先把当前机器的服务切换到另外一台备机上(使用haswitch命令),再停止该节点(使用scadminstopnode命令)
3init0;关机
4replacethetapedrive(注意jumpid的设置)
5bootupsystem
6insertagoodtape
7mtstatus
8可以备份一系统作为测试(#ufsdump0cuf/dev/rmt/0/)
3.更换D1000里的硬盘(VolumneManager管理硬盘)
现象:
1formatshowsthatc1t12d0disktypeisunknown
2therearethousandsofdiskerrorsrelatedto/sbus@3,0/QLGC,isp@0,10000/sd@c,0.(c1t12d0)
3volumesaremirroredandallplexesthatareusingc1t12d0show"nodevice"error
结论:
更换c1t12d0硬盘
步骤:
1使用vxprint查看系统的mirror的关系,确定另外一半的mirror是好的。
2确定c1t12d0硬盘的位置
3vxdiskadm->4从软件上剔除c1t12d0硬盘
4手工从D1000里拔出c1t12d0硬盘
5手工在拔出硬盘的位置插进一块新硬盘
6vxdiskadm->5从软件上加入c1t12d0硬盘
7使用vxprint查看状态,其状态项全部为Active,表明已经同步完毕
4.更换E250里的硬盘
现象:
1format->c0t8d0
2checkdiskled->onediskledisoff;theotherledison.
结论:
更换c0t8d0硬盘
步骤
1检查系统,确认c0t8d0硬盘需要更换
2向用户了解相关信息。
如:
该硬盘的作用,造成这样的原因等等。
3从用户了解到该硬盘是用来备份系统的,没有用做其他用途。
这一点从自己在用户另外一台机器上得到证实
4查看系统可以知道,系统硬盘和备盘的容量大小不同(这点非常值得注意,关系到工程师的工作步骤)
5fotmatc0t8d0(分区信息从另外一系统同一位置的硬盘中得到)
6使用newfs,在分区上建立文件系统
7使用mount命令,把相应分区mount到/mnt
8cd/mnt
9使用ufsdump和ufsrestore命令拷贝信息。
如:
ufsdump0cuf-/|ufsrestorevxf–
10重复上面三步,把所有需要备份的信息都完成位置。
⑾#cd/usr/platform/sun4u/lib/fs/ufs
⑿#installboot./bootblk/dev/rdsk/c0t8d0s0
⒀检查系统,ok.
5.更换multipack的硬盘
现象:
①from"vxprint-ht"output=>
vlv_recchunk4genENABLEDACTIVE2048000SELECT-
pllv_recchunk4-02lv_recchunk4ENABLEDACTIVE2050461CONCAT-RW
sdc2t2d0-09lv_recchunk4-02c2t2d01467282620504610c2t2d0ENA
pllv_recchunk4-01lv_recchunk4DETACHEDSTALE2050461CONCAT-WO
sdc1t2d0-09lv_recchunk4-01c1t2d01467282620504610c1t2d0ENA
==>thereisonlyone"enable,active"submirror--plex"lv_recchunk4-02",sd
"c2t2d0-09",involumelv_recchunk4.theothersubmirroris"detached,stale".
②from/var/adm/messages=>
Mar1223:
10:
10smcp02unix:
WARNING:
/pci@1f,4000/scsi@5,1/sd@2,0(sd47):
Mar1223:
10:
10smcp02unix:
ErrorforCommand:
read(10)ErrorLevel:
Retryable
Mar1223:
10:
10smcp02unix:
RequestedBlock:
15985361ErrorBlock:
15985403
Mar1223:
10:
10smcp02unix:
Vendor:
SEAGATESerialNumber:
9942534616
Mar1223:
10:
10smcp02unix:
SenseKey:
MediaError
③from"iostat-E"=>
sd47SoftErrors:
0HardErrors:
15TransportErrors:
0
Vendor:
SEAGATEProduct:
ST39103LCSUN9.0GRevision:
034ASerialNo:
9942534616
RPM:
7200Heads:
27Size:
9.06GB<9055065600bytes>
MediaError:
12DeviceNotReady:
0NoDevice:
3Recoverable:
0
IllegalRequest:
0PredictiveFailureAnalysis:
0
④accordingto/var/adm/messagesand"iostat-E",thereisbadblockon/pci@1f,4000/scsi@5,1/sd@2,0(sd47).=>whichisc2t2d0
⑤unfortunately,theonly"enable,active"submirrorisonc2t2d0=>failedtoattachanother"detach,stale"submirrorbacktothevolume.
结论:
c2t2d0硬盘上的block有问题,需要更换。
步骤:
1认清当前系统的状态:
被剔除来的硬盘c1t2d0完好,而系统当前正在用的硬盘c2t2d0需要更换。
由于c2t2d0有坏block,所以不能attachc1t2d0,然而c1t2d0中存放的数据是不全的。
2把系统服务切到另外一台机器上。
(使用haswitch命令)
3format->repaire修复c2t2d0中的block(15985403)出现ok
4vxplex-gquerydgattlv_recchunk4lv_recchunk4-01
5使用vxprint–gquerydg|more命令,状态全部为Active往下操作
6在管理querydg的机器上,vxdiskadm->4从软件上剔除c2t2d0
7系统down机(由于multipack不支持热插拔),物理上拔出c2t2d0,在原位置插入新硬盘。
8bootup系统,在管理querydg的机器上,vxdiskadm->5从软件上加入c2t2d0
9使用vxprint–gquerydg|more命令,状态全部为Active时为ok
6.更换T3电源的u1pcu2
现象:
①frulist=>
IDTYPEVENDORMODELREVISIONSERIAL
------------------------------------------------------------------
u1ctrcontrollercardSLR-MI375-0084-02-0210032535
u1d1diskdriveSEAGATEST173404FSUNA7273CE0X6L8
u1d2diskdriveSEAGATEST173404FSUNA7273CE0X7XT
u1d3diskdriveSEAGATEST173404FSUNA7273CE0X7GA
u1d4diskdriveSEAGATEST173404FSUNA7273CE0X900
u1d5diskdriveSEAGATEST173404FSUNA7273CE0XBLE
u1d6diskdriveSEAGATEST173404FSUNA7273CE0X5FH
u1d7diskdriveSEAGATEST173404FSUNA7273CE0WV8S
u1d8diskdriveSEAGATEST173404FSUNA7273CE0X22P
u1d9diskdriveSEAGATEST173404FSUNA7273CE0X8AN
u1l1loopcardSLR-MI375-0085-01-5.02Flash054986
u1l2loopcardSLR-MI375-0085-01-5.02Flash053718
u1pcu1power/coolingunitTECTROL-CAN300-1454-01(0000021453
u1pcu2power/coolingunitTECTROL-CAN300-1454-01(0000004116
u1mpnmidplaneSLR-MI370-3990-01-0000031354
②frustat=>
CTLRSTATUSSTATEROLEPARTNERTEMP
--------------------------------------------
u1ctrreadyenabledmaster-34.5
DISKSTATUSSTATEROLEPORT1PORT2TEMPVOLUME
-------------------------------------------------------------
u1d1readyenableddatadiskreadyready30v0
u1d2readyenableddatadiskreadyready34v0
u1d3readyenableddatadiskreadyready33v0
u1d4readyenableddatadiskreadyready34v0
u1d5readyenableddatadiskreadyready31v0
u1d6readyenableddatadiskreadyready30v0
u1d7readyenableddatadiskreadyready33v0
u1d8readyenableddatadiskreadyready41v0
u1d9readyenabledstandbyreadyready37v0
LOOPSTATUSSTATEMODECABLE1CABLE2TEMP
----------------------------------------------------
u1l1readyenabledmaster--29.0
u1l2readyenabledslave--34.0
POWERSTATUSSTATESOURCEOUTPUTBATTERYTEMPFAN1FAN2
-----------------------------------------------------------
u1pcu1readyenabledlinenormalnormalnormalnormalnormal
u1pcu2readyenabledlinenormalfaultnormalnormalnormal
③checkT3syslog=>
Jan2616:
11:
05LPCT[1]:
N:
u1pcu2:
Refreshingbattery
Jan2616:
16:
17BATD[1]:
N:
u1pcu2:
holdtimewas314seconds.
Jan2616:
16:
18BATD[1]:
W:
u1pcu2:
Replacebattery,holdtimelow.serialno=004116
Jan2618:
00:
04SCHD[1]:
N:
u1ctr:
u1l1temperature29.0Celsius
Jan2618:
00:
04SCHD[1]:
N:
u1ctr:
u1l2temperature34.5Celsius
Jan2700:
00:
02SCHD[1]:
N:
u1ctr:
u1l1temperature29.5Celsius
Jan2700:
00:
02SCHD[1]:
N:
u1ctr:
u1l2temperature34.5Celsius
Jan2706:
00:
00SCHD[1]:
N:
u1ctr:
u1l1temperature29.0Celsius
Jan2706:
00:
00SCHD[1]:
N:
u1ctr:
u1l2temperature34.0Celsius
Jan2707:
16:
16BATD[1]:
N:
cur_time=1012115776,StartRechargeTime=1012061776
Jan2707:
16:
16BATD[1]:
N:
u1pcu2:
Batteryrechargetimeout.
Jan2707:
16:
16BATD[1]:
N:
u1pcu2Batteryrechargetimeout.
Jan2707:
16:
16BATD[1]:
N:
BatteryRefreshingcycleendsatthispoint
④LEDononepowersupplyisred
⑤resfresh–s=>
NobatteryrefreshingTaskiscurrentlyrunning.
PCU1PCU2
-----------------------------------------------------------------
U1NormalBATLow
CurrentTimeTueJan2920:
42:
472002
LastRefreshSatJan2604:
45:
072002
NextRefreshSatFeb2304:
45:
072002
结论:
u1pcu2faultandreplaceit
步骤:
1跟用户确定down机时间,查看系统状态
2StoptheapplicationofT3andshutdowntheT3
3DisconnecttheFibreChanneltoT3
4ReplacetheControllerofT3(u1pcu2)
5PowerontheT3
6PingtheT3throughtheTerminalaftertheT3worked(ifnotping,switchtotheTPEofanotherT3)
7ViewthefirmwareversionofT3ifneedtoupdatethefirmwareversion
8ConnecttheFibreChanneltoT3iftheT3isworknormally
9StarttheapplicationofT3
10Test,telnettheServer#vxdisklist#format(iftheoutputofvxdiskandformatisinconsistent,reboottheServer#reboot---r)
7.更换Ultra5系统板
现象:
①duringhostbooting,thehostcan'tconnecttobootserverthroughnetworkportwith"linkdown"error
②watch-net-all=>"resettingtransceiverfailed"
结论:
更换systemboard
步骤:
1down机
2replacesystem
3bootup->ok
4connecttohubornotebook
5watch-net-allsuccessful则ok
8.更换Cluster客户端Ultra5的硬盘
现象:
1can'tstartupsystem,occurred"ARP/RARPtimeout"whenbootup
2probe-ide–>fastdataa