1、Zabbix监控HP服务器硬件信息Zabbix监控HP服务器硬件信息2014-05-17 22:35:01标签:监控HPzabbix原创作品,允许,时请务必以超形式标明文章原始出处、作者信息和本声明。否则将追究法律责任。sfzhang88.blog.51cto./4995876/1413009做为Linux系统工程师,在服务器的维护管理当中,除了对系统进行维护管理之外,最重要的还要对服务器的硬件进行监控,比如服务器Raid状态是否正常(如果Raid卡出问题,会影响数据的读写速度),服务器硬盘是否正常(如果硬盘坏掉,严重的情况会丢失数据),服务器电源是否有故障等。除此之外还要对服务器的CPU,存
2、,处理器等重要设备的温度进行监控,如果温度超过服务器的临界温度则进行报警通知。HP的服务器在硬件管理方面提供了自己管理工具hpacucli,通过该工具可以查看HP服务器的RAID信息,服务器硬盘等信息。1)安装hpacucli工具(下载地址:HP hpacucli管理工具)1rootmonitor#rpm-ivhhpacucli-9.40-12.0.x86_64.rpm2)查看服务器RAID信息,硬盘是否正常。123456rootmonitor#hpacuclictrlallshowconfigSmartArrayP410iinSlot0(Embedded)(sn:42FF0)arrayA(S
3、AS,UnusedSpace:0MB)logicaldrive1(279.4GB,RAID1,OK)physicaldrive1I:1:1(port1I:box1:bay1,SAS,300GB,OK)physicaldrive1I:1:2(port1I:box1:bay2,SAS,300GB,OK)3)通过hpacucli ctrl all show config detail命令可以详细地查看RAID和硬盘的信息。123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495
4、051525354555657585960616263646566676869707172737475767778rootmonitor#hpacuclictrlallshowconfigdetailSmartArrayP410iinSlot0(Embedded)BusInterface:PCISlot:0SerialNumber:42FF0CacheSerialNumber:PBCDH0CRH1FH62RAID6(ADG)Status:DisabledControllerStatus:OKChassisSlot:HardwareRevision:RevCFirmwareVersion:5.1
5、4RebuildPriority:MediumExpandPriority:MediumSurfaceScanDelay:15secsMonitorandPerformanceDelay:60minElevatorSort:EnabledDegradedPerformanceOptimization:DisabledInconsistencyRepairPolicy:DisabledPostPromptTimeout:0secsCacheBoardPresent:TrueCacheStatus:OKAcceleratorRatio:25%Read/75%WriteDriveWriteCache
6、:DisabledTotalCacheSize:512MBNo-BatteryWriteCache:DisabledCacheBackupPowerSource:CapacitorsBattery/CapacitorCount:1Battery/CapacitorStatus:OKSATANCQSupported:TrueArray:AInterfaceType:SASUnusedSpace:0MBStatus:OKLogicalDrive:1Size:279.4GBFaultTolerance:RAID1Heads:255SectorsPerTrack:32Cylinders:65535St
7、ripeSize:128KBStatus:OKArrayAccelerator:EnabledUniqueIdentifier:600508B20200002DiskName:/dev/cciss/c0d0MountPoints:/boot99MBLogicalDriveLabel:A00ADBD9PR7AMU1472898DMirrorGroup0:physicaldrive1I:1:1(port1I:box1:bay1,SAS,300GB,OK)MirrorGroup1:physicaldrive1I:1:2(port1I:box1:bay2,SAS,300GB,OK)physicaldr
8、ive1I:1:1Port:1IBox:1Bay:1Status:OKDriveType:DataDriveInterfaceType:SASSize:300GBRotationalSpeed:10000FirmwareRevision:HPD4SerialNumber:ECA1PC80GTS31234Model:HPEG0300FBDSPPHYCount:2PHYTransferRate:6.0GBPS,Unknownphysicaldrive1I:1:2Port:1IBox:1Bay:2Status:OKDriveType:DataDriveInterfaceType:SASSize:30
9、0GBRotationalSpeed:10000FirmwareRevision:HPD7SerialNumber:PMX6902DModel:HPEG0300FBDBRPHYCount:2PHYTransferRate:6.0GBPS,UnknownHP官方还有一个hpasmcli管理工具,可以很详细查看服务器CPU,存,处理器,电源等的温度信息。1)安装hpasmcli工具(下载地址:HP hpasmcli管理工具)1rootmonitor#rpm-ivhhp-health-9.40-1602.44.rhel6.x86_64.rpm2)通过工具hpasmcli可以查看服务器各部件的温度信息
10、,其中Temp表示各部件当前的温度,Threshold表示临界温度,当当前温度超过临界温度的时候就要注意啦。12345678910111213141516171819202122232425262728293031rootmonitor#hpasmcli-sshowtempSensorLocationTempThreshold-#1AMBIENT23C/73F42C/107F#2CPU#140C/104F82C/179F#3CPU#240C/104F82C/179F#4MEMORY_BD33C/91F87C/188F#5MEMORY_BD33C/91F78C/172F#6MEMORY_BD-8
11、7C/188F#7MEMORY_BD32C/89F78C/172F#8MEMORY_BD32C/89F87C/188F#9MEMORY_BD32C/89F78C/172F#10MEMORY_BD-87C/188F#11MEMORY_BD32C/89F78C/172F#12POWER_SUPPLY_BAY33C/91F59C/138F#13POWER_SUPPLY_BAY47C/116F73C/163F#14MEMORY_BD29C/84F72C/161F#15PROCESSOR_ZONE32C/89F73C/163F#16PROCESSOR_ZONE30C/86F64C/147F#17MEMO
12、RY_BD28C/82F63C/145F#18PROCESSOR_ZONE39C/102F69C/156F#19SYSTEM_BD35C/95F69C/156F#20SYSTEM_BD38C/100F71C/159F#21SYSTEM_BD44C/111F65C/149F#22SYSTEM_BD45C/113F71C/159F#23SYSTEM_BD39C/102F69C/156F#24SYSTEM_BD47C/116F69C/156F#25SYSTEM_BD35C/95F63C/145F#26SYSTEM_BD45C/113F66C/150F#27SCSI_BACKPLANE_ZONE35C
13、/95F60C/140F#28SYSTEM_BD73C/163F110C/230F3)通过hpasmcli -s show查看类似于help的帮助信息,监控的时候要重点关注 DIMM(存)、FANS(风扇)、POWERSUPPLY(电源模块)、SERVER(系统)、CPU、TEMP(温度)等信息。123456789101112131415161718192021rootmonitor#hpasmcli-sshowInvalidArgumentsSHOWASRSHOWBOOTSHOWDIMMSPDSHOWF1SHOWFANSSHOWHTSHOWIMLSHOWIPLSHOWNAMESHOWPORT
14、MAPSHOWPOWERMETERSHOWPOWERSUPPLYSHOWPXESHOWSERIALBIOS|EMBEDDED|VIRTUALSHOWSERVERSHOWTEMPSHOWTPMSHOWUIDSHOWWOL4)hpasmcli几种常用的例子。查看存信息:hpasmcli -s show dimm|egrep -i module|stat查看风扇信息:hpasmcli -s show fans查看硬件温度:hpasmcli -s show temp查看电源模块:hpasmcli -s show powersupply查看机器型号,序列号,CPU,存大小:hpasmcli -s sho
15、w server由于各种服务器的厂商不同,管理工具不同,因此Zabbix对服务器硬件方面没有很详细,全面的解决方案。之前dl528888写过zabbix通过omsa工具监控DEL服务器,也是一种很好的思路,我也借鉴过,这里非常感。Zabbix监控总结起来有两种思路:第一就是server通过agentd方式获取数据,这种方式需要定义UserParameter参数,即KEY。第二就是server通过trapper的方式获取数据,即agentd将数据主动sender给server或者proxy。我这里是通过第二种traper的方式监控的。第一种方式server有时候会取不到数据,became not
16、 supported:Received value is not suitable for value type Numeric (unsigned) and data type Decimal,会产生上面的错误。首先查看我监控的脚本,由于是通过traper的思路进行监控,log_file文件依次定义了要监控服务器的主机名(hostname),监控项key以及监控的值。1234567891011121314151617181920212223242526272829303132333435363738394041rootmonitorscripts#cathpacuclizabbix.sh#!
17、/bin/sh#createbysfzhang20140517#ThisscriptsmonitoringHPserver,suchassmartarraystatus,Hardwareinformationandservertemperature。zabbix_server=*.*.*.*#IPfromZabbixServerorproxywheredatashouldbesendto.zabbix_sender=/usr/local/zabbix/bin/zabbix_senderlog_file=/tmp/hpacuclizabbix.log#Inthefiletodefinethemo
18、nitorhost,keyandvaluehpacucli=/usr/sbin/hpacuclioptions=ctrlallshowconfigdetailhpacucli_log=/tmp/result.logPATH=$PATH:/usr/sbin:/sbin$hpacucli$options$hpacucli_logCache_status=cat$hpacucli_log|awk/CacheStatus:/print$NFController_status=cat$hpacucli_log|awk/ControllerStatus:/print$NFBattery_capacitor
19、_status=cat$hpacucli_log|awk/Battery/CapacitorStatus:/print$NFPhysicaldrive_status=$(awk-vtotal=hpacuclictrlslot=0pdallshowstatus|grepphysicaldrive|wc-l-vnormal=hpacuclictrlslot=0pdallshowstatus|awk/physicaldrive/if($NF=OK)count+=1ENDprintcountBEGINif(total=normal)printOKelseprintNO)Memory_status=$(
20、awk-vtotal=hpasmcli-sSHOWDIMM|grep-iStatus|wc-l-vnormal=hpasmcli-sSHOWDIMM|awk/Status:/if($NF=Ok)count+=1ENDprintcountBEGINif(total=normal)printOKelseprintNO)Fans_status=$(awk-vtotal=hpasmcli-sSHOWFANS|grep#|wc-l-vnormal=hpasmcli-sSHOWFANS|awk/#/if($3=Yes)count+=1ENDprintcountBEGINif(total=normal)pr
21、intOKelseprintNO)Power_status=$(awk-vtotal=hpasmcli-sSHOWPOWERSUPPLY|grepPowersupply|wc-l-vnormal=hpasmcli-sSHOWPOWERSUPPLY|awk/Condition:/if($NF=Ok)count+=1ENDprintcountBEGINif(total=normal)printOKelseprintNO)Processor_status=$(awk-vtotal=hpasmcli-sSHOWSERVER|grepProcessor:|wc-l-vnormal=hpasmcli-sS
22、HOWSERVER|awk/Status/if($NF=Ok)count+=1ENDprintcountBEGINif(total=normal)printOKelseprintNO)Power_temp_num=$(hpasmcli-sSHOWTEMP|awk/POWER_SUPPLY_BAY/print$3|awk-FCprint$1|awkBEGINmax=0if($1max)max=$1fiENDprintmax)Ambient_temp_num=$(hpasmcli-sSHOWTEMP|awk/AMBIENT/print$3|awk-FCprint$1)Cpu_temp_num=$(
23、hpasmcli-sSHOWTEMP|awk/CPU/print$3|awk-FCprint$1|awkBEGINmax=0if($1max)max=$1fiENDprintmax)Memory_temp_num=$(hpasmcli-sSHOWTEMP|awk/MEMORY_BD/print$3|awk-FCprint$1|awkBEGINmax=0if($1max)max=$1fiENDprintmax)System_temp_num=$(hpasmcli-sSHOWTEMP|awk/SYSTEM_BD/print$3|awk-FCprint$1|awkBEGINmax=0if($1max)max=$1fiENDprintmax)Processor_temp_num=$(hpasmcli-sSH
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1