HACMP日常操作手册.docx

资源描述

HACMP日常操作手册.docx

《HACMP日常操作手册.docx》由会员分享，可在线阅读，更多相关《HACMP日常操作手册.docx（16页珍藏版）》请在冰豆网上搜索。

HACMP日常操作手册.docx

HACMP日常操作手册

HACMP操作手册

强制方式停掉HACMP:

HACMP的停止分为3种，graceful（正常），takeover（手工切换），force（强制）。

下面的维护工作，很多时候需要强制停掉HACMP来进行，此时资源组不会释放，这样做的好处是，由于IP、文件系统等等没有任何影响，只是停掉HACMP本身，所以应用服务可以继续提供，实现了在线检查和变更HACMP的目的。

[host1][root][/]>smittyclstop

StopClusterServices

Typeorselectvaluesinentryfields.

PressEnterAFTERmakingalldesiredchanges.

*Stopnow,onsystemrestartorbothnow

StopClusterServicesonthesenodes[host1]

BROADCASTclustershutdown?

true

*SelectanActiononResourceGroupsUnmanageResourceGroups

一般所有节点都要进行这样操作。

强制停掉后的HACMP启动:

在修改HACMP的配置后，大多数情况下需要重新申请资源启动，这样才能使HACMP的配置重新生效。

[host1][root][/]>smittyclstart

StartClusterServices

Typeorselectvaluesinentryfields.

PressEnterAFTERmakingalldesiredchanges.

[EntryFields]

*Startnow,onsystemrestartorbothnow

StartClusterServicesonthesenodes[bgbcb04]

BROADCASTmessageatstartup?

true

StartupClusterInformationDaemon?

false

Reacquireresourcesafterforceddown?

true

日常检查及处理

为了更好地维护HACMP，平时的检查和处理是必不可少的。

下面提供的检查和处理方法除非特别说明，均是不用停机，而只需停止应用即可进行，不影响用户使用。

不过具体实施前需要仔细检查状态，再予以实施。

clverify检查

这个检查可以对包括LVM的绝大多数HACMP的配置同步状态，是HACMP检查是否同步的主要方式。

smittyclverify->VerifyHACMPConfiguration

VerifyCluster

Typeorselectvaluesinentryfields.

PressEnterAFTERmakingalldesiredchanges.

[EntryFields]

BaseHACMPVerificationMethodsboth

（Clustertopology,resources,both,none）

CustomDefinedVerificationMethods[]

ErrorCount[]

LogFiletostoreoutput[]

Verifychangesonly?

[No]

Logging[Standard]

回车即可

经过检查，结果应是OK。

如果发现不一致，需要区别对待。

对于非LVM的报错，大多数情况下不用停止应用，可以用以下步骤解决：

1.先利用强制方式停止HACMP服务。

同样停止host2的HACMP服务。

1.只检查出的问题进行修正和同步：

smittyhacmp->ExtendedConfiguration->ExtendedVerificationandSynchronization

这时由于已停止HACMP服务，可以包括”自动修正和强制同步“。

对于LVM的报错，一般是由于未使用HACMP的C-SPOC功能，单边修改文件系统、lv、VG造成的，会造成VG的timestamp不一致。

这种情况即使手工在另一边修正（通常由于应用在使用，也不能这样做），如何选取自动修正的同步，也仍然会报failed。

此时只能停掉应用，通过整理VG来解决。

cldump检查：

cldump的监测为将当前HACMP的状态快照，确认显示为UP，STABLE。

[host1][root][/]>/usr/sbin/cluster/utilities/cldump

____________________________________________________________________________

ClusterName:

test_cluster

ClusterState:

ClusterSubstate:

STABLE

_____________________________________________________________________________

NodeName:

host1State:

NetworkName:

net_ether_01State:

Address:

10.2.100.1Label:

host1_l1_svc1State:

Address:

10.2.101.1Label:

host1_l1_svc2State:

Address:

10.2.1.1Label:

host1_l1_boot1State:

Address:

10.2.11.1Label:

host1_l1_boot2State:

NetworkName:

net_ether_02State:

Address:

10.2.200.1Label:

host1_l2_svcState:

Address:

10.2.2.1Label:

host1_l2_boot1State:

Address:

10.2.12.1Label:

host1_l2_boot2State:

NodeName:

host2State:

NetworkName:

net_ether_01State:

Address:

10.2.100.2Label:

host2_l1_svc1State:

Address:

10.2.101.2Label:

host2_l1_svc2State:

Address:

10.2.1.2Label:

host2_l1_boot1State:

Address:

10.2.11.2Label:

host2_l1_boot2State:

NetworkName:

net_ether_02State:

Address:

10.2.200.2Label:

host2_l2_svcState:

Address:

10.2.2.2Label:

host2_l2_boot1State:

Address:

10.2.12.2Label:

host2_l2_boot2State:

ClusterName:

test_cluster

ResourceGroupName:

host1_RG

StartupPolicy:

OnlineOnHomeNodeOnly

FalloverPolicy:

FalloverToNextPriorityNodeInTheList

FallbackPolicy:

FallbackToHigherPriorityNodeInTheList

SitePolicy:

ignore

NodeState

------------------------------

host1ONLINE

host2OFFLINE

ResourceGroupName:

host2_RG

StartupPolicy:

OnlineOnHomeNodeOnly

FalloverPolicy:

FalloverToNextPriorityNodeInTheList

FallbackPolicy:

FallbackToHigherPriorityNodeInTheList

SitePolicy:

ignore

NodeState

------------------------------

host2ONLINE

host1OFFLINE

clstat检查

clstat可以实时监控HACMP的状态，及时确认显示为UP，STABLE

[host1][root][/]>/usr/sbin/cluster/clstat

注：

如果没有反应，运行clinfo

/etc/hosts环境检查

正常情况下，2台互备的/etc/hosts应该是一致的，当然如果是主备机方式，可能备机会多些IP和主机名。

通过对比2个文件的不同，可以确认是否存在问题。

[host1][root][/]>rshhost2:

/etc/hosts>/tmp/host2_hosts

[host1][root][/]>diff/etc/hosts/tmp/host2_hosts

脚本检查

需要注意以下事项：

i.应用的变更需要及时修正脚本，两边的脚本需要及时同步，并及时申请时间测试。

ii.上一点需要维护人员充分与应用人员沟通，运行环境的任何变更必须通过维护人员实施。

iii.维护人员启停应用要养成使用这些脚本启停的习惯，尽量避免手工启停。

iv.

[host1][root][/home/scripts]>rshhost2

"cd/home/scripts;ls-lhost1host2comm">/tmp/host2_scripts

[host1][root][/home/scripts]>ls-lhost1host2comm">/tmp/host1_scripts

[host1][root][/]>diff/tmp/host1_scripts/tmp/host2_scripts

用户检查

正常情况下，2台互备的HA使用到的用户情况应该是一致的，当然如果是主备机方式，可能备机会多些用户。

通过对比2节点的不同，可以确认是否存在问题。

[host1][root][/]>rshhost2lsuser-forarun,orarunc,tuxrun,bsx1,\

xcom>/tmp/host2_users

[host1][root][/]>lsuser-forarun,orarunc,tuxrun,bsx1,\

xcom>/tmp/host1_users>/tmp/host1_users

[host1][root][/]>diff/tmp/host1_user/tmp/host2_user

注：

两边的必然有些不同，如上次登录时间等等，只要主要部分相同就可以了。

还有两边.profile的对比，用户环境的对比。

[host1][root][/]>rshhost2su-orarun-cset>/tmp/b

[host1][root][/]>su-orarun-cset>/tmp/a

[host1][root][/]>diff/tmp/a/tmp/b

tty心跳检查

由于心跳在HACMP启动后一直由HACMP在使用，所以需要强制停掉HACMP进行检查。

∙察看tty速率

确认速率不超过9600

[host1][root][/]>stty-a

[host2][root][/]>cat/etc/hosts>/dev/tty0

host1上显示

speed9600baud;0rows;0columns;

eucw1:

0,scrw1:

…

∙检查连接和配置

[host1][root][/]>host1:

cat/etc/hosts>/dev/tty0

[host2][root][/]>host2:

cat

在host2可看到host1上/etc/hosts的内容。

同样反向检测一下。

errpt的检查

虽然有了以上许多检查，但我们最常看的errpt不要忽略，因为有些报错，需要大家引起注意，由于crontab里HACMP会增加这样一行：

00***/usr/es/sbin/cluster/utilities/clcycle1>/dev/null2>/dev/null

#HACMPforAIXLogfilerotation

即实际上每天零点，系统会自动执行HACMP的检查，如果发现问题，会在errpt看到。

除了HACMP检查会报错，其他运行过程中也有可能报错，大都是由于心跳连接问题或负载过高导致HACMP进程无法处理，需要引起注意，具体分析解决。

变更及实现

由于维护的过程出现的情况远比集成实施阶段要复杂，即使红皮书也不能覆盖所有情况。

这里只就大家常见的情况加以说明，对于更为复杂或者更为少见的情况，还是请大家翻阅红皮书，实在不行计划停机重新配置也许也是一个快速解决问题的笨方法。

这里的变更原则上是不希望停机，但实际上HACMP的变更，虽然说部分支持DARE（dynamicreconfiguration），绝大部分操作支持Forcestop完成，我们还是希望有条件的话停机完成。

对于DARE，我不是非常赞成使用，因为使用不当会造成集群不可控，危险性更大。

我一般喜欢使用先强制停止HACMP，再进行以下操作,结束同步确认后再startHACMP。

卷组变更-增加磁盘到使用的VG里:

注意，pvid一定要先认出来，否则盘会没有或不正常。

1.集群的各个节点机器运行cfgmgr，设置pvid

[host1][root][/]>cfgmgr

[host1][root][/]>lspv

….

hdisk200c1eedf6ddb9f5ehost1vg

hdisk300c1eedffa577b0ehost2vg

hdisk4nonenone

[host1][root][/]>chdev-lhdisk2-apv=yes

[host1][root][/]>lspv

….

hdisk400c1eedffc677bfenone

在host2上也要做同样操作。

1.运行C-SPOC增加盘到host2vg:

smittyhacmp->SystemManagement（C-SPOC）->HACMPLogicalVolumeManagement

->SharedVolumeGroups->SetCharacteristicsofaSharedVolumeGroup

->AddaVolumetoaSharedVolumeGroup选择节点VG和磁盘增加即可

ResouceGroupNamehost2_RG

VOLUMEGROUPnamehostvg

Referencenodehost2

VOLUMEnameshdisk4

完成后两边都可看到

hdisk300c1eedffa577b0ehost2vg

hdisk400c1eedffc677bfehost2vg

lv变更

lv的变更不多，可以变更的部分如下：

smittyhacmp->SystemManagement（C-SPOC）->HACMPLogicalVolumeManagement

->ShowCharacteristicsofaSharedLogicalVolume->SharedLogicalVolumes

->ChangeaSharedLogicalVolume选择lv

*ResourceGroupNamehost2_RG

MAXIMUMNUMBERofPHYSICALVOLUMES[8]

[/ora10runc]

MAXIMUMNUMBERofLOGICALPARTITIONS[512]

裸设备增加空间：

smittyhacmp->SystemManagement（C-SPOC）->HACMPLogicalVolumeManagement

->SharedLogicalVolumes->SetCharacteristicsofaSharedLogicalVolume

->IncreasetheSizeofaSharedLogicalVolume

[EntryFields]

ResourceGroupNamerac2_RG

LOGICALVOLUMEnameXWFTPlv

Referencenoderac2

*NumberofADDITIONALlogicalpartitions[100]

PHYSICALVOLUMEnameshdisk3

POSITIONonphysicalvolumeouter_middle

RANGEofphysicalvolumesminimum

MAXIMUMNUMBERofPHYSICALVOLUMES[]

touseforallocation

Allocateeachlogicalpartitioncopysuperstrict

onaSEPARATEphysicalvolume?

FilecontainingALLOCATIONMAP[]

文件系统变更

smittyhacmp->SystemManagement（C-SPOC）->HACMPLogicalVolumeManagement

->SharedFileSystems->EnhancedJournaledFileSystems

->Change/ShowCharacteristicsofaSharedEnhancedJournaledFileSystem

ResourceGroupNamebg595b02_RG

Filesystemname/cube

NEWmountpoint[/cube]

SIZEoffilesystem[6291456]

MountGROUP[]

PERMISSIONSread/write

MountOPTIONS[]

StartDiskAccounting?

BlockSize（bytes）4096

InlineLog?

InlineLogsize（MBytes）0

服务IP变更

增加服务IP

1.修改/etc/hosts,增加以下行

HACMP新增服务IP

smittyhacmp->ExtendedConfiguration->HACMPExtendedResourcesConfiguration

->ConfigureHACMPServiceIPLabels/Addresses

->AddaServiceIPLabel/Address->ConfigurableonMultipleNodes选择网络

AddaServiceIPLabel/AddressconfigurableonMultipleNodes（extended）

Typeorselectvaluesinentryfields.

PressEnterAFTERmakingalldesiredchanges.

*IPLabel/Addresshost1_svc2

*NetworkNamenet_ether_01

AlternateHWAddresstoaccompanyIPLabel/Address[]

同样增加host2_svc2

1.修正资源组

smittyhacmp->ExtendedConfiguration->ExtendedResourceConfiguration

->HACMPExtendedResourceGroupConfiguration

->Change/ShowResourcesandAttributesforaResourceGroup

->Change/ShowAllResourcesandAttributesforaResourceGroup

Typeorselectvaluesinentryfields.

PressEnterAFTERmakingalldesiredchanges.

[EntryFields]

ResourceGroupNameeai1d0_RG

ParticipatingNodes（DefaultNodePriority）bgbcb11bgbcb04

StartupPolicyOnlineOnHomeNodeOnly

FalloverPolicyFalloverToNextPriority

NodeInTheList

FallbackPolicyFallbackToHigherPriority

NodeInTheList

FallbackTimerPolicy（emptyisimmediate）[]

ServiceIPLabels/Addresses[host1_svchost1_svc2]

1.HACMP同步

需要同步，参见第2部分的“检查和同步HACMP配置”一节。

1.HACMP启动

注意修改启动参数使得启动时重新申请资源，触发新增服务IP生效。

这时netstat -in，可以看到生效了。

修改服务IP

1.正常

展开阅读全文