CentOS Cluster-DRBD Setup

As part of a MySQL Cluster setup, I recently setup a 2-node web cluster using CentOS’s native cluster software suite with a twist. The web root was mounted on a DRBD partition in place of periodic file sync’ing. This article focuses on the cluster setup and does not cover the DRBD setup/configuration. Let’s go:

Install “Cluster Storage” group using yum:

[root@host]# yum installgroup “Cluster Storage”

Edit /etc/cluster/cluster.conf: ======================================================
<?xml version=”1.0″?>
<cluster name=”drbd_srv” config_version=”1″>
<cman two_node=”1″ expected_votes=”1″>
<clusternode name=”WEB_Node1″ votes=”1″ nodeid=”1″>
<method name=”single”>
<device name=”human” ipaddr=”″/>
<clusternode name=”WEB_Node2″ votes=”1″ nodeid=”2″>
<method name=”single”>
<device name=”human” ipaddr=”″/>
<fence_device name=”human” agent=”fence_manual”/>

Start the cluster:

[root@host]# service cman start (on both nodes)

To verify proper startup:

[root@host]# cman_tool nodes

Should show:

Node  Sts   Inc   Joined               Name
1   M     16   2009-08-11 22:13:27  WEB_Node1
2   M     24   2009-08-11 22:13:34  WEB_Node2

Status ‘M’ means normal, ‘X’ would mean there is a problem

Edit /etc/lvm/lvm.conf


locking_type = 1


locking_type = 3

and change:

filter = [“a/.*/”]


filter = [ “a|drbd.*|”, “r|.*|” ]

Start clvmd:

[root@host]# service clvmd start (on both nodes)

Set cman and clvmd to start on bootup on both nodes:

[root@host]# chkconfig –level 345 cman on
[root@host]# chkconfig –level 345 clvmd on

Run vgscan:

[root@host]# vgscan

Create a new PV (physical volume) using the drbd block device

[root@host]# pvcreate /dev/drbd1

Create a new VG (volume group) using the drbd block device

[root@host]# vgcreate name-it-something /dev/drbd1 (VolGroup01, for example)

Now when you run vgdisplay, you should see:

— Volume group —
VG Name             VolGroup01
System ID
Format                lvm2
Metadata Areas        1
Metadata Sequence No  1
VG Access             read/write
VG Status             resizable
Clustered             yes
Shared                no
MAX LV                0
Cur LV                0
Open LV               0
Max PV                0
Cur PV                1
Act PV                1
VG Size               114.81 GB
PE Size               4.00 MB
Total PE              29391
Alloc PE / Size       0 / 0
Free  PE / Size       29391 / 114.81 GB
VG UUID               k9TBBF-xdg7-as4a-2F0c-XGTv-M2Wh-CabVXZ

Notice the line that reads “Clustered yes”

Create a LV (logical volume) in the PV that we just created.
**Make sure your drbd nodes are both in primary roles before doing this.

If there is a “Split-Brain detected, dropping connection!” entry in /var/log/messages, then a manual split-brain recovery is necessary.

To manually recover from a split-brain scenario (on the split-brain machine):

[root@host]# drbdadm secondary r0
[root@host]# drbdadm — –discard-my-data connect r0

On verified that both nodes are in primary mode:

On node2:

[root@host]# service clvmd restart

On node1:

[root@host]# lvcreate -l 100%FREE -n gfs VolGroup01

Format the LV with gfs:

[root@host]# mkfs.gfs -p lock_dlm -t drbd_srv:www -j 2 /dev/VolGroup01/gfs (drbd_srv is the cluster name, www is a locking table name)

Start the gfs service (on both nodes):

[root@host]# service gfs start

Set the gfs service to start automatically (on both nodes):

[root@host]# chkconfig –level 345 gfs on

Mount the filesystem (on both nodes):

[root@host]# mount -t gfs /dev/VolGroup01/gfs /srv

Modify the startup sequence as follows:

1) network

2) drbd  (S15)
3) cman  (S21)
4) clvmd (S24)
5) gfs   (S26)

Remove the soft link for shutting down openais (K20openais in runlevel 3 for our install) because shutting down cman does the same.

Modify the shutdown sequence as follows (in reverse order as startup sequence):

1) gfs   (K21)
2) clvmd (K22)
3) cman  (K23)
4) drbd  (K24)

5) network

Reboot each server to test availability of gfs filesystem during reboot.


If one node’s drbd status is showing:

version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008-10-03 11:30:17

1: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown   r—
ns:0 nr:0 dw:40 dr:845 al:1 bm:3 lo:0 pe:0 ua:0 ap:0 oos:12288

and the other node is showing:

version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008-10-03 11:30:17

1: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown C r—
ns:0 nr:0 dw:56 dr:645 al:3 bm:2 lo:0 pe:0 ua:0 ap:0 oos:8200

Also, the node with cs:WFConnection in the status field should have the gfs filesystem mounted.

Then, there is a drbd sync issue.  To solve this, just restart the gfs cluster services on the node with “cs:Standalone” in the status field by:

1) service clvmd stop
2) service cman stop
3) service drbd restart

Verify you see Primary/Primary in the st: section of the drbd status (cat /proc/drbd)

4) service cman start
5) service clvmd start
6) service gfs start