High Availability (HA) using Pacemaker

Procedure:

Step 1

Known issues:

High Availability

=================

1). When pcs cannot stop pacemaker on a node, it does not stop cman/corosync

on the remaining nodes. If you have one or more pacemaker cluster nodes in a

powered off state, and you use the --all flag on the "pcs cluster stop"

command to globally shutdown pacemaker on all cluster nodes it will kill

successfully all the pacemaker processes, but will NOT kill the corosync

processes.

This sets up a situation where quorum may be established when it should not be.

You can end up reaching quorum when pacemaker is not running on a sufficient

number of cluster nodes.

Workaround: To stop pacemaker on all cluster nodes, issue the command

"pcs cluster stop --force" on each active cluster node individually.

Do not use the --all flag if any cluster nodes are unreachable.

2). If a resource has failed, a failure message appears when you display the

cluster status. If you resolve that resource, you can clear that failure

status with the pcs resource cleanup command. This command resets the resource

status and failcount, telling the cluster to forget the operation history of

a resource and re-detect its current state. When you see repetitive logs

regarding failed operation of resource, this command also helps to get rid of

the repetitive logs.

Use the following command cleans up the resource specified by resource_id.

pcs resource cleanup resource_id

If you do not specify a resource_id, this command resets the resource status

and failcount for all resources.

3). Temporary interruption of the interface corosync is using for communication

may cause resources to be started on multiple nodes and this might cause data

corruption if the resource is a VirtualDomian resource.

Workaround: Setup corosync with redundant ring. This redundancy will reduce

the risk of losing the corosync communication.

results matching ""