In this tutorial we will set up a highly available server providing iSCSI targets to iSCSI initiators. Should a server become unavailable, services provided by our cluster will continue to be available to client systems.
Our highly available system will resemble the following:
iSCSI server1: node1.home.local IP address: 10.10.1.251 iSCSI server2: node2.home.local IP address: 10.10.1.252 iSCSI Virtual IP address 10.10.1.250
To begin, set up two Ubuntu 9.04 (Jaunty Jackalope) systems. In this guide, the servers will be set up in a virtual environment using KVM-84. Using a virtual environment will allow us to add additional disk devices and NICs as needed.
The following partition scheme will be used for the Operating System installation:
/dev/vda1 -- 10 GB / (primary' jfs, Bootable flag: on) /dev/vda5 -- 1 GB swap (logical)
After the installation of a minimal Ubuntu install on both servers, we will install packages required to configure a bonded network interface, and in-turn assign static IP addresses to bond0 of node1 and node2. Using a bonded interface will prevent a single point of failure should the client accessible network fail.
The majority of commands used will require us to employ the use of sudo. Alternatively we can set a password for root or sudo to the root account.
sudo su
Install ifenslave
apt-get -y install ifenslave
Append the following to /etc/modprobe.d/aliases.conf:
alias bond0 bonding options bond0 mode=0 miimon=100 downdelay=200 updelay=200 max_bonds=2
Modify the network configuration and assign eth0 and eth1 as slaves of bond0.
Example /etc/network/interfaces:
# The loopback network interface auto lo iface lo inet loopback # The interfaces that will be bonded auto eth0 iface eth0 inet manual auto eth1 iface eth1 inet manual # The target-accessible network interface auto bond0 iface bond0 inet static address 10.10.1.251 netmask 255.255.255.0 broadcast 10.10.1.255 network 10.10.1.0 gateway 10.10.1.1 up /sbin/ifenslave bond0 eth0 up /sbin/ifenslave bond0 eth1
We do not need to define eth0 or eth1 in /etc/network/interfaces as they will be brought up when the bond comes up. I have included them for documentation purposes.
We have added a module to be loaded when the system is booted. Either reboot the system, or manually modprobe bonding.
Review the current status of our bonded interface.
cat /proc/net/bonding/bond0 Example output: Ethernet Channel Bonding Driver: v3.3.0 (June 10, 2008) Bonding Mode: load balancing (round-robin) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 200 Down Delay (ms): 200 Slave Interface: eth0 MII Status: up Link Failure Count: 0 Permanent HW addr: 54:52:00:6d:f7:4d Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 54:52:00:11:36:cf
Please note: A bonded network interface supports multiple modes. In this example eth0 and eth1 are in an round-robin configuration.
Shutdown both servers and add additional devices. We will add additional disks to contain the DRBD meta data and the data that is mirrored between the two servers. We will also add an isolated network for the two servers to communicate and transfer the DRBD data.
The following partition scheme will be used for the DRBD data:
/dev/vdb1 -- 1 GB unmounted (primary) DRBD meta data /dev/vdc1 -- 1 GB umounted (primary) DRBD device used for iSCSI configuration files /dev/vdd1 -- 10 GB unmounted (primary) DRBD device used as the iSCSI target
Sample output from fdisk -l:
Disk /dev/vda: 10.7 GB, 10737418240 bytes 255 heads, 63 sectors/track, 1305 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x000d570a Device Boot Start End Blocks Id System /dev/vda1 * 1 1244 9992398+ 83 Linux /dev/vda2 1245 1305 489982+ 5 Extended /dev/vda5 1245 1305 489951 82 Linux swap / Solaris Disk /dev/vdb: 1073 MB, 1073741824 bytes root@node1:~# fdisk -l Disk /dev/vda: 10.7 GB, 10737418240 bytes 255 heads, 63 sectors/track, 1305 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x000d570a Device Boot Start End Blocks Id System /dev/vda1 * 1 1244 9992398+ 83 Linux /dev/vda2 1245 1305 489982+ 5 Extended /dev/vda5 1245 1305 489951 82 Linux swap / Solaris Disk /dev/vdb: 1073 MB, 1073741824 bytes 16 heads, 63 sectors/track, 2080 cylinders Units = cylinders of 1008 * 512 = 516096 bytes Disk identifier: 0xba6f1cad Device Boot Start End Blocks Id System /dev/vdb1 1 2080 1048288+ 83 Linux Disk /dev/vdc: 1073 MB, 1073741824 bytes 16 heads, 63 sectors/track, 2080 cylinders Units = cylinders of 1008 * 512 = 516096 bytes Disk identifier: 0xdbde4889 Device Boot Start End Blocks Id System /dev/vdc1 1 2080 1048288+ 83 Linux Disk /dev/vdd: 10.7 GB, 10737418240 bytes 16 heads, 63 sectors/track, 20805 cylinders Units = cylinders of 1008 * 512 = 516096 bytes Disk identifier: 0xf505afa1 Device Boot Start End Blocks Id System /dev/vdd1 1 20805 10485688+ 83 Linux
The isolated network between the two servers will be:
iSCSI server1: node1-private IP address: 10.10.2.251 iSCSI server2: node2-private IP address: 10.10.2.252
We will again bond these two interfaces. If our server is to be highly availble, we should eliminate all single points of failure.
Append the following to /etc/modprobe.d/aliases.conf:
alias bond1 bonding options bond1 mode=0 miimon=100 downdelay=200 updelay=200
Example /etc/network/interfaces:
# The loopback network interface auto lo iface lo inet loopback # The interfaces that will be bonded auto eth0 iface eth0 inet manual auto eth1 iface eth1 inet manual auto eth2 iface eth2 inet manual auto eth3 iface eth3 inet manual # The initiator-accessible network interface auto bond0 iface bond0 inet static address 10.10.1.251 netmask 255.255.255.0 broadcast 10.10.1.255 network 10.10.1.0 gateway 10.10.1.1 up /sbin/ifenslave bond0 eth0 up /sbin/ifenslave bond0 eth1 # The isolated network interface auto bond1 iface bond1 inet static address 10.10.2.251 netmask 255.255.255.0 broadcast 10.10.2.255 network 10.10.2.0 up /sbin/ifenslave bond1 eth2 up /sbin/ifenslave bond1 eth3
Ensure that /etc/hosts on both nodes contains the names and IP addresses of the two servers.
Example /etc/hosts:
127.0.0.1 localhost 10.10.1.251 node1.home.local node1 10.10.1.252 node2.home.local node2 10.10.2.251 node1-private 10.10.2.252 node2-private
Install NTP to ensure both servers have the same time.
apt-get -y install ntp
You can verify the time is in sync with the date command.
At this point, you can either modprobe the second bond, or restart both servers.
Install drbd and heartbeat.
apt-get -y install drbd8-utils heartbeat
As we will be using heartbeat with drbd, we need to change ownership and permissions on several DRBD related files on both servers.
chgrp haclient /sbin/drbdsetup chmod o-x /sbin/drbdsetup chmod u+s /sbin/drbdsetup chgrp haclient /sbin/drbdmeta chmod o-x /sbin/drbdmeta chmod u+s /sbin/drbdmeta
Using /etc/drbd.conf as an example create your resource configuration. We will define two resources.
- The drbd device that will contain our iSCSI configuration files
- The drbd device that will become our iSCSI target
Example /etc/drbd.conf:
resource iscsi.config { protocol C; handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; } startup { degr-wfc-timeout 120; } disk { on-io-error detach; } net { cram-hmac-alg sha1; shared-secret "password"; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 100M; verify-alg sha1; al-extents 257; } on node1 { device /dev/drbd0; disk /dev/vdc1; address 10.10.2.251:7788; meta-disk /dev/vdb1[0]; } on node2 { device /dev/drbd0; disk /dev/vdc1; address 10.10.2.252:7788; meta-disk /dev/vdb1[0]; } } resource iscsi.target.0 { protocol C; handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; } startup { degr-wfc-timeout 120; } disk { on-io-error detach; } net { cram-hmac-alg sha1; shared-secret "password"; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 100M; verify-alg sha1; al-extents 257; } on node1 { device /dev/drbd1; disk /dev/vdd1; address 10.10.2.251:7789; meta-disk /dev/vdb1[1]; } on node2 { device /dev/drbd1; disk /dev/vdd1; address 10.10.2.252:7789; meta-disk /dev/vdb1[1]; } }
Duplicate the DRBD configuration to the other server.
scp /etc/drbd.conf root@10.10.1.252:/etc/
Initialize the meta-data disk on both servers.
[node1]drbdadm create-md iscsi.config [node1]drbdadm create-md iscsi.target.0 [node2]drbdadm create-md iscsi.config [node2]drbdadm create-md iscsi.target.0
We could have initialized the meta-data disk for both resources with:
[node1]drbdadm create-md all [node2]drbdadm create-md all
If a reboot was not performed post-installation of drbd, the module for DRBD will not be loaded.
Start the drbd service (which will load the module).
[node1]/etc/init.d/drbd start [node2]/etc/init.d/drbd start
Decide which server will act as a primary for the DRBD device that will contain the iSCSI configuration files and initiate the first full sync between the two servers.
We will execute the following on node1:
drbdadm -- --overwrite-data-of-peer primary iscsi.config
Review the current status of DRBD.
cat /proc/drbd Example output: IT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by ivoks@ubuntu, 2009-01-17 07:49:56 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r--- ns:761980 nr:0 dw:0 dr:769856 al:0 bm:46 lo:10 pe:228 ua:256 ap:0 ep:1 wo:b oos:293604 [=============>......] sync'ed: 72.3% (293604/1048292)K finish: 0:00:13 speed: 21,984 (19,860) K/sec 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:10485692
I prefer to wait for the initial sync to complete before proceeding, however, waiting is not a requirement.
Once completed, format /dev/drbd0 and mount it.
[node1]mkfs.jfs /dev/drbd0 [node1]mkdir -p /srv/data [node1[mount /dev/drbd0 /srv/data
To ensure replication is working correctly, create data on node1 and then switch node2 to be primary.
[node1]dd if=/dev/zero of=/srv/data/test.zeros bs=1M count=100
Switch the Primary DRBD device to node2:
On node1: [node1]umount /srv/data [node1]drbdadm secondary iscsi.config On node2: [node2]mkdir -p /srv/data [node2]drbdadm primary iscsi.config [node2[mount /dev/drbd0 /srv/data
You should now see the 100MB file in /srv/data on node2. We will now delete this file and make node1 the primary DRBD server to ensure replication is working in both directions.
Switch the Primary DRBD device to node1:
On node2: [node2]rm /srv/data/test.zeros [node2]umount /srv/data [node2]drbdadm secondary iscsi.config On node1: [node1]drbdadm primary iscsi.config [node1]mount /dev/drbd0 /srv/data
Performing an ls /srv/data on node1 will verify the file is now removed and synchronization successfully occured in both directions.
Decide which server will act as a primary for the DRBD device that will be the iSCSI target and initiate the first full sync between the two servers.
We will execute the following on node1:
[node1]drbdadm -- --overwrite-data-of-peer primary iscsi.target.0
We could have initiated the full sync for both resources with:
[node1]drbdadm -- --overwrite-data-of-peer primary all
Next we will install iSCSI target package. The plan is to have heartbeat control the service instead of init, thus we will prevent iscsitarget from starting with the normal init routines. We will then place the iSCSI target configuration files on the DRBD device so both servers will have the information available when they are the primary DRBD device.
Install iscsitarget package on node1 and node2.
[node1]apt-get -y install iscsitarget [node2]apt-get -y install iscsitarget
The ability to run as a daemon is disabled for iscsitarget when first installed.
Enable the ability for iscsitarget to run as a daemon.
[node1]sed -i s/false/true/ /etc/default/iscsitarget [node2]sed -i s/false/true/ /etc/default/iscsitarget
Remove the runlevel init scripts for iscsitarget from node1 and node2.
[node1]update-rc.d -f iscsitarget remove [node2]update-rc.d -f iscsitarget remove
Relocate the iSCSI configuration to the DRBD device.
[node1]mkdir /srv/data/iscsi [node1]mv /etc/ietd.conf /srv/data/iscsi [node1]ln -s /srv/data/iscsi/ietd.conf /etc/ietd.conf [node2]rm /etc/ietd.conf [node2]ln -s /srv/data/iscsi/ietd.conf /etc/ietd.conf
Define our iSCSI target.
Example /srv/data/iscsi/ietd.conf:
Target iqn.2008-04.local.home:storage.disk.0 IncomingUser geekshlby secret OutgoingUser geekshlby password Lun 0 Path=/dev/drbd1,Type=blockio Alias disk0 MaxConnections 1 InitialR2T Yes ImmediateData No MaxRecvDataSegmentLength 8192 MaxXmitDataSegmentLength 8192 MaxBurstLength 262144 FirstBurstLength 65536 DefaultTime2Wait 2 DefaultTime2Retain 20 MaxOutstandingR2T 8 DataPDUInOrder Yes DataSequenceInOrder Yes ErrorRecoveryLevel 0 HeaderDigest CRC32C,None DataDigest CRC32C,None Wthreads 8
Last but not least configure heartbeat to control a Virtual IP address and failover iSCSI in case a node fails.
On node1, define the cluster within /etc/heartbeat/ha.cf.
Example /etc/heartbeat/ha.cf:
logfacility local0 keepalive 2 deadtime 30 warntime 10 initdead 120 bcast bond0 bcast bond1 node node1 node node2
On node1, define the authentication mechanism within /etc/heartbeat/authkeys the cluster will use.
Example /etc/heartbeat/authkeys:
auth 3 3 md5 password
Change the permissions of /etc/heartbeat/authkeys.
chmod 600 /etc/heartbeat/authkeys
On node1, define the resources that will run on the cluster within /etc/heartbeat/haresources. We will define the master node for the resource, the Virtual IP address, the file systems used, and the service to start.
Example /etc/heartbeat/haresources:
node1 drbddisk::iscsi.config Filesystem::/dev/drbd0::/srv/data::jfs node1 IPaddr::10.10.1.250/24/bond0 drbddisk::iscsi.target.0 iscsitarget
Copy the cluster configuration files from node1 to node2.
[node1]scp /etc/heartbeat/ha.cf root@10.10.1.252:/etc/heartbeat/ [node1]scp /etc/heartbeat/authkeys root@10.10.1.252:/etc/heartbeat/ [node1]scp /etc/heartbeat/haresources root@10.10.1.252:/etc/heartbeat/
At this point you can either:
- Unmount /srv/data, make node1 secondary for drbd, and start heartbeat
- Reboot both servers
To test connectivity to our new iSCSI target, configure an additional system to be an initiator.
I will use Ubuntu 9.04 (Jaunty Jackalope) for this as well.
Install the iSCSI initiator software.
apt-get -y install open-iscsi
The default configuration does not automatically start iSCSI node communication.
Modify the iSCSI daemon configuration to start up automatically and use the authentication methods we defined on the iSCSI target.
sed -i 's/node.startup = manual/node.startup = automatic\nnode.conn\[0\].startup = automatic/' /etc/iscsi/iscsid.conf sed -i 's/#node.session.auth.authmethod = CHAP/node.session.auth.authmethod = CHAP/' /etc/iscsi/iscsid.conf sed -i 's/#node.session.auth.username = username/node.session.auth.username = geekshlby/' /etc/iscsi/iscsid.conf sed -i 's/#node.session.auth.password = password/node.session.auth.password = secret/' /etc/iscsi/iscsid.conf sed -i 's/#node.session.auth.username_in = username_in/node.session.auth.username_in = geekshlby/' /etc/iscsi/iscsid.conf sed -i 's/#node.session.auth.password_in = password_in/node.session.auth.password_in = password/' /etc/iscsi/iscsid.conf sed -i 's/#discovery.sendtargets.auth.authmethod = CHAP/discovery.sendtargets.auth.authmethod = CHAP/' /etc/iscsi/iscsid.conf sed -i 's/#discovery.sendtargets.auth.username = username/discovery.sendtargets.auth.username = geekshlby/' /etc/iscsi/iscsid.conf sed -i 's/#discovery.sendtargets.auth.password = password/discovery.sendtargets.auth.password = secret/' /etc/iscsi/iscsid.conf sed -i 's/node.session.iscsi.InitialR2T = No/node.session.iscsi.InitialR2T = Yes/' /etc/iscsi/iscsid.conf sed -i 's/node.session.iscsi.ImmediateData = Yes/node.session.iscsi.ImmediateData = No'/ /etc/iscsi/iscsid.conf
Example /etc/iscsid.conf:
node.startup = automatic node.conn[0].startup = automatic node.session.auth.authmethod = CHAP node.session.auth.username = geekshlby node.session.auth.password = secret node.session.auth.username_in = geekshlby node.session.auth.password_in = password discovery.sendtargets.auth.authmethod = CHAP discovery.sendtargets.auth.username = geekshlby discovery.sendtargets.auth.password = secret node.session.timeo.replacement_timeout = 120 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 5 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 20 node.session.initial_login_retry_max = 8 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.iscsi.InitialR2T = Yes node.session.iscsi.ImmediateData = No node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768 node.session.iscsi.FastAbort = Yes
The iSCSI initiator name is contained in /etc/iscsi/initiatorname.iscsi.
Example /etc/iscsi/initiatorname.iscsi:
InitiatorName=iqn.2009-04.local.home:client.01
Restart the iSCSI daemon.
/etc/init.d/open-iscsi restart
We will now instruct the initiator to discover available LUNs on the target.
iscsiadm -m discovery -t st -p 10.10.1.250
Example output: 10.10.1.250:3260,1 iqn.2008-04.local.home:storage.disk.0
Once the available LUNs are discovered, restart the iSCSI initiator daemon, and we should see a new disk.
/etc/init.d/open-iscsi restart
As expected, we now have a new disk. /dev/sda is out new iSCSI block device.
Example fdisk -l sample output:
Disk /dev/vda: 4194 MB, 4194304000 bytes 255 heads, 63 sectors/track, 509 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x000f3f8b Device Boot Start End Blocks Id System /dev/vda1 * 1 480 3855568+ 83 Linux /dev/vda2 481 509 232942+ 5 Extended /dev/vda5 481 509 232911 82 Linux swap / Solaris Disk /dev/sda: 10.7 GB, 10737345024 bytes 64 heads, 32 sectors/track, 10239 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Disk identifier: 0xe1db3c07 Disk /dev/sda doesn't contain a valid partition table
Create a partition and file-system on our new iSCSI device.
fdisk /dev/sda Command (m for help): <--- n Command action e extended p primary partition (1-4) <--- p Partition number (1-4): <---1 First cylinder (1-10239, default 1): <--- enter Last cylinder, +cylinders or +size{K,M,G} (1-10239, default 10239): <--- enter Command (m for help): <--- w mkfs.jfs -q /dev/sda1
Create a mount point for the new file-system.
mkdir -p /mnt/iscsi
Update fstab to automatically mount the new filesystem at boot. Modern distros prefer to use the disk's UUID for mounting in fstab, referring to the device by its "old school" nomenclature still works as well.
Determine the UUID of our new iSCSI disk and add it to /etc/fstab with:
blkid /dev/sda1 | cut -d' ' -f2 | sed s/\"//g Example output: UUID=e227bd05-f102-4c08-ae4f-3dbfade128aa Add this UUID to fstab: printf "UUID=e227bd05-f102-4c08-ae4f-3dbfade128aa\t/mnt/iscsi\tjfs\tnoatime\t0\t0\n" >> /etc/fstab
Mount the new iSCSI block device.
mount /mnt/iscsi
Create data on the initiator node and test failover of the target. I prefer using a movie or a sound file as this will help show latency. Once you have the test data availble, play the movie or the mp3, and instruct node1 it is no longer a member of the cluster. This can be done by simply shutting down heartbeat.
[node1]/etc/init.d/heartbeat stop
Once you have tested the latency of the data transfer when node1 fails, start heart beat on node1, this will in-turn move the resources back to node1
[node1]/etc/init.d/heartbeat start
An alternative test would be to failover the nodes while writing data.