1 (edited by Basem 2011-06-27 18:33:34)

Topic: High-Availability Maildir Storage With GlusterFS + CentOS 5.x

Hi everyone this is my first contribution here and I wanted to produce something and I hope to be worthy

By: Basem Hegazy (Linux System Administrator)

This tutorial shows how to set up high-availability storage with two storage servers (CentOS 5.4) that use GlusterFS. Each storage server will be a mirror of the other storage server, and files in /var/vmail directory will be replicated automatically across both storage servers. The client (iRedMail) system (CentOS 5.4 as well) will be able to access the storage as if it was a local filesystem.

I prefer to start by this tutorial first before installing the (iRedMail) system then install the iRedMail before mounting the share folder.

I do not issue any guarantee that this will work for you!

In this tutorial I use three systems, two servers and a client:

• server1.example.com: IP address 192.168.0.100 (server)
• server2.example.com: IP address 192.168.0.101 (server)
• client1.example.com: IP address 192.168.0.102 (client), the Client in our case is the iRedMail server also.

All three systems should be able to resolve the other systems' hostnames. If this cannot be done through DNS, you should edit the /etc/hosts file so that it contains the following lines on all three systems:

vi /etc/hosts

[...]
192.168.0.100 server1.example.com server1
192.168.0.101 server2.example.com server2
192.168.0.102 client1.example.com client1
[...]

(It is also possible to use IP addresses instead of hostnames in the following setup. If you prefer to use IP addresses, you don't have to care about whether the hostnames can be resolved or not.)

2) Setting Up the GlusterFS Servers for both server1.example.com and server2.example.com:

GlusterFS my not be available as a package (RPM) some CentOS 5.x distribution, therefore I will build it myself.

First I install the prerequisites:

yum groupinstall 'Development Tools'
yum groupinstall 'Development Libraries'
yum install libibverbs-devel fuse-devel


Then we download the latest GlusterFS release from http://www.gluster.org/download.php and build it as follows:

cd /tmp
wget http://ftp.gluster.com/pub/gluster/glus … 0.9.tar.gz
tar xvfz glusterfs-2.0.9.tar.gz
cd glusterfs-2.0.9
./configure

At the end of the ./configure command, you should see something like this:

[...]
GlusterFS configure summary
===========================
FUSE client : yes
Infiniband verbs : yes
epoll IO multiplex : yes
Berkeley-DB : yes
libglusterfsclient : yes
argp-standalone : no
[root@server1 glusterfs-2.0.9]#

Then run the make command:

make && make install
ldconfig

Check the GlusterFS version afterwards (should be 2.0.9):

[root@server1 glusterfs-2.0.9]# glusterfs --version

you should see something like:

glusterfs 2.0.9 built on Mar 1 2010 15:34:50
Repository revision: v2.0.9
Copyright (c) 2006-2009 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

[root@server1 glusterfs-2.0.9]#

Next we create a few directories:


mkdir /data/
mkdir /data/export
mkdir /data/export-ns
mkdir /etc/glusterfs

Now we create the GlusterFS server configuration file /etc/glusterfs/glusterfsd.vol which defines which directory will be exported /data/export and what client is allowed to connect (192.168.0.102 = client1.example.com):

vi /etc/glusterfs/glusterfsd.vol

enter the following data:

volume posix
type storage/posix
option directory /data/export
end-volume

volume locks
type features/locks
subvolumes posix
end-volume

volume brick
type performance/io-threads
option thread-count 8
subvolumes locks
end-volume

volume server
type protocol/server
option transport-type tcp
option auth.addr.brick.allow 192.168.0.102
subvolumes brick
end-volume

Please note that it is possible to use wildcards for the IP addresses (like 192.168.*) and that you can specify multiple IP addresses separated by comma (e.g. 192.168.0.102,192.168.0.103).

Afterwards we create the following symlink"


ln -s /usr/local/sbin/glusterfsd /sbin/glusterfsd

... and then the system startup links for the GlusterFS server and start it:


chkconfig --levels 35 glusterfsd on
/etc/init.d/glusterfsd start

3) Setting Up the GlusterFS Client:
client1.example.com:

GlusterFS my not be available as a package (RPM) some CentOS 5.x distribution, therefore I will build it myself.

First I install the prerequisites:


yum groupinstall 'Development Tools'
yum groupinstall 'Development Libraries'
yum install libibverbs-devel fuse-devel

Then we load the fuse kernel module...

modprobe fuse

... And create the file /etc/rc.modules with the following contents so that the fuse kernel module will be loaded automatically whenever the system boots:


vi /etc/rc.modules
modprobe fuse

Then Make the file executable:

chmod +x /etc/rc.modules

Then we download the GlusterFS 2.0.9 sources (please note that this should be the same version as that installed on the server!) and build GlusterFS as follows:


cd /tmp
wget http://ftp.gluster.com/pub/gluster/glus … 0.9.tar.gz
tar xvfz glusterfs-2.0.9.tar.gz
cd glusterfs-2.0.9
./configure

At the end of the ./configure command, you should see something like this:

[...]

GlusterFS configure summary
===========================
FUSE client : yes
Infiniband verbs : yes
epoll IO multiplex : yes
Berkeley-DB : yes
libglusterfsclient : yes
argp-standalone : no

then run the make command:


make && make install
ldconfig

Check the GlusterFS version afterwards (should be 2.0.9):

[root@client1 glusterfs-2.0.9]# glusterfs --version

you should see something like:

glusterfs 2.0.9 built on Mar 1 2010 15:58:06
Repository revision: v2.0.9
Copyright (c) 2006-2009 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

[root@client1 glusterfs-2.0.9]#

Then we create the following two directories:


mkdir /etc/glusterfs

Next we create the file /etc/glusterfs/glusterfs.vol:


vi /etc/glusterfs/glusterfs.vol

volume remote1
type protocol/client
option transport-type tcp
option remote-host server1.example.com
option remote-subvolume brick
end-volume

volume remote2
type protocol/client
option transport-type tcp
option remote-host server2.example.com
option remote-subvolume brick
end-volume

volume replicate
type cluster/replicate
subvolumes remote1 remote2
end-volume

volume writebehind
type performance/write-behind
option window-size 1MB
subvolumes replicate
end-volume

volume cache
type performance/io-cache
option cache-size 512MB
subvolumes writebehind
end-volume

Make sure you use the correct server hostnames or IP addresses in the option remote-host lines!
That's it! Now we can Install the (iRedMail ) system before we mount the GlusterFS filesystem to /var/vmail with one of the following two commands:


glusterfs -f /etc/glusterfs/glusterfs.vol /var/vmail

Or:


mount -t glusterfs /etc/glusterfs/glusterfs.vol /var/vmail


You should now see the new share in the outputs of mount

[root@client1 ~]# mount

you will see:

/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
glusterfs#/etc/glusterfs/glusterfs.vol on /var/vmail type fuse (rw,allow_other,default_permissions,max_read=131072)

[root@client1 ~]#

... And ...

[root@client1 ~]# df –h

you will see:

Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
29G 2.2G 25G 9% /
/dev/sda1 99M 13M 82M 14% /boot
tmpfs 187M 0 187M 0% /dev/shm
glusterfs#/etc/glusterfs/glusterfs.vol
28G 2.3G 25G 9% /var/vmail
[root@client1 ~]#

(server1.example.com and server2.example.com each have 28GB of space for the GlusterFS filesystem, but because the data is mirrored, the client doesn't see 56GB (2 x 28GB), but only 28GB.)
Instead of mounting the GlusterFS share manually on the client, you could modify /etc/fstab so that the share gets mounted automatically when the client boots.

Open /etc/fstab and append the following line:

vi /etc/fstab

and append/add the following line:

[...]
/etc/glusterfs/glusterfs.vol /var/vmail glusterfs defaults 0 0

To test if your modified /etc/fstab is working, reboot the client:

reboot

After the reboot, you should find the share in the outputs of:


df -h

... and...


mount

4) Testing

Now let's create some test files on the GlusterFS share:
client1.example.com:


touch /var/vmail/test1
touch /var/vmail/test2

Now let's check the /data/export directory on server1.example.com and server2.example.com. The test1 and test2 files should be present on each node:

server1.example.com and server2.example.com:

[root@server1 ~]# ls -l /data/export

the result is:

total 0
-rw-r--r-- 1 root root 0 2010-02-22 16:50 test1
-rw-r--r-- 1 root root 0 2010-02-22 16:50 test2
[root@server1 ~]#

Now we shut down server1.example.com and add/delete some files on the GlusterFS share on client1.example.com.
server1.example.com:


shutdown -h now

client1.example.com:


touch /var/vmail/test3
touch /var/vmail/test4
rm -f /var/vmail/test2

The changes should be visible in the /data/export directory on server2.example.com:
server2.example.com:

[root@server2 ~]# ls -l /data/export

the result is:


total 0
-rw-r--r-- 1 root root 0 2010-02-22 16:50 test1
-rw-r--r-- 1 root root 0 2010-02-22 16:53 test3
-rw-r--r-- 1 root root 0 2010-02-22 16:53 test4
[root@server2 ~]#

Let's boot server1.example.com again and take a look at the /data/export directory:
server1.example.com:

[root@server1 ~]# ls -l /data/export

the result is:


total 0
-rw-r--r-- 1 root root 0 2010-02-22 16:50 test1
-rw-r--r-- 1 root root 0 2010-02-22 16:50 test2
[root@server1 ~]#

As you see, server1.example.com hasn't noticed the changes that happened while it was down. This is easy to fix, all we need to do is invoke a read command on the GlusterFS share on client1.example.com, e.g.:
client1.example.com:


ls -l /var/vmail

However the read command happened automatically each time a user trying to access his mailbox using RoundCube webmail, even you may notice that the mailbox wasn't replicated once you create it, don't bother it does when a user access his webmail.


[root@client1 ~]# ls -l /var/vmail/

.. and the result is:


total 0
-rw-r--r-- 1 root root 0 2010-02-22 16:50 test1
-rw-r--r-- 1 root root 0 2010-02-22 16:53 test3
-rw-r--r-- 1 root root 0 2010-02-22 16:53 test4
[root@client1 ~]#

Now take a look at the /data/export directory on server1.example.com again, and you should see that the changes have been replicated to that node:
server1.example.com:

[root@server1 ~]# ls -l /data/export

the result is:


total 0
-rw-r--r-- 1 root root 0 2010-02-22 16:50 test1
-rw-r--r-- 1 root root 0 2010-02-22 16:53 test3
-rw-r--r-- 1 root root 0 2010-02-22 16:53 test4
[root@server1 ~]#

Thanks and welcome for comments.

Post's attachments

Replication.pdf 264.9 kb, 33 downloads since 2011-06-26 

You don't have the permssions to download the attachments of this post.

2

Re: High-Availability Maildir Storage With GlusterFS + CentOS 5.x

hi,
my 2cents :
instead of compiling on centos : use Epel repository and :

yum install glusterfs-* -y

if you are in 64bits os , use rpm from
http://download.gluster.com/pub/gluster … ST/CentOS/

or rebuild via the following :

yum install python-ctypes bison flex  libibverbs-devel rpm-build -y
wget http://download.gluster.com/pub/gluster … -1.src.rpm
rpm --rebuild glusterfs-3.2.1-1.src.rpm

then you'll get a hot ready rpm in
/usr/src/redhat/RPMS/i386/

3

Re: High-Availability Maildir Storage With GlusterFS + CentOS 5.x

thanks for your efforts, I didn't have a try to build the Glusterfs rpm before but I will, but this would work fine too, however compiling the Glusterfs takes no time, I prefer to compile the source code instead.

4

Re: High-Availability Maildir Storage With GlusterFS + CentOS 5.x

Basem wrote:

thanks for your efforts, I didn't have a try to build the Glusterfs rpm before but I will, but this would work fine too, however compiling the Glusterfs takes no time, I prefer to compile the source code instead.

Hi, great guide. i have few questions:

1. seems you are installing iRedMail on the client only, what happens if that goes down. this seems providing redundancy to data and maybe config files (if that is mirrored across to the other 2 server)

2. how would you enable iredmail redundancy if above is "1" is correct because in certain environments the servers need to be up esp the MTA ?

5

Re: High-Availability Maildir Storage With GlusterFS + CentOS 5.x

Just one thing here - ive played with glusterfs for a while and one of my main reasons at the time was for an easily expandable file storage system, sadly however theres a downside - ITS REALLY SLOW - mail behaviour is lots of random read/writes and its also of tiny files - this is not suited for glusterfs at all im afraid, maybe fine if your only running a few mailboxes but its not really scalable (In my Opinion at least)

6

Re: High-Availability Maildir Storage With GlusterFS + CentOS 5.x

All read/write operations are slow, or just mail related operations?

7

Re: High-Availability Maildir Storage With GlusterFS + CentOS 5.x

All write operations as it waits for it to write to all servers before returning the ok signal, so if you visited a mailbox that had 100 new emails and then these obviously get marked as current it has to update them 100 files .

Gluster and small file performance is just not upto the job im afraid.

Theres a number of posts around relating to gluster and small file performance

http://comments.gmane.org/gmane.comp.fi … .user/8593

Hence my original post a while back about iredmail using the mdbox format (cross between MailDir and mbox) to try to solve some of this small file issues.

Im using gluster just to backup a mailserver atm - and its absolutly awful - i am backing up around 2TB of data but it takes days (yes days) to do it as opposed to a few hours on a normal server.

I would not recommend gluster for any serious mail platform, your liable to get timeouts on your mailclient and such as soon as you have more than a few emails in your mailbox.

Im looking at a ZFS filesystem at present to mount over nfs for mails using nexentastor - they do have a OS version as well

8

Re: High-Availability Maildir Storage With GlusterFS + CentOS 5.x

If anyone is interested heres a comparison:

/usr/filebench is on the normal disks
/home/filebench is on a gluster mount


Direct HD:

[root@storage-br1 ~]# filebench-1.4.9.1/filebench
Filebench Version 1.4.9.1
IMPORTANT: Virtual address space randomization is enabled on this machine!
It is highly recommended to disable randomization to provide stable Filebench runs.
Echo 0 to /proc/sys/kernel/randomize_va_space file to disable the randomization.
16298: 0.000: Allocated 170MB of shared memory
filebench> load varmail
16298: 3.984: Varmail Version 3.0 personality successfully loaded
16298: 3.984: Usage: set $dir=<dir>
16298: 3.984:        set $meanfilesize=<size>    defaults to 16384
16298: 3.984:        set $nfiles=<value>     defaults to 1000
16298: 3.984:        set $nthreads=<value>   defaults to 16
16298: 3.984:        set $meanappendsize=<value> defaults to 16384
16298: 3.984:        set $iosize=<size>  defaults to 1048576
16298: 3.984:        set $meandirwidth=<size> defaults to 1000000
16298: 3.984:        run runtime (e.g. run 60)
filebench> set $dir=/usr/filebench
filebench> run 60
16298: 16.235: Creating/pre-allocating files and filesets
16298: 16.237: Fileset bigfileset: 1000 files, 0 leafdirs, avg dir width = 1000000, avg dir depth = 0.5, 14.959MB
16298: 16.240: Removed any existing fileset bigfileset in 1 seconds
16298: 16.240: making tree for filset /usr/filebench/bigfileset
16298: 16.240: Creating fileset bigfileset...
16298: 16.290: Preallocated 805 of 1000 of fileset bigfileset in 1 seconds
16298: 16.290: waiting for fileset pre-allocation to finish
16301: 16.298: Starting 1 filereader instances
16302: 16.320: Starting 16 filereaderthread threads
16298: 17.339: Running...
16298: 77.343: Run took 60 seconds...
16298: 77.344: Per-Operation Breakdown
closefile4           2619ops       44ops/s   0.0mb/s      0.0ms/op      164us/op-cpu [0ms - 0ms]
readfile4            2619ops       44ops/s   0.6mb/s     11.7ms/op      561us/op-cpu [0ms - 219ms]
openfile4            2620ops       44ops/s   0.0mb/s      1.1ms/op      153us/op-cpu [0ms - 2391ms]
closefile3           2620ops       44ops/s   0.0mb/s      0.0ms/op       46us/op-cpu [0ms - 0ms]
fsyncfile3           2620ops       44ops/s   0.0mb/s    147.4ms/op     7183us/op-cpu [10ms - 3289ms]
appendfilerand3      2625ops       44ops/s   0.3mb/s      2.9ms/op      408us/op-cpu [0ms - 164ms]
readfile3            2625ops       44ops/s   0.7mb/s      7.2ms/op      423us/op-cpu [0ms - 187ms]
openfile3            2625ops       44ops/s   0.0mb/s      0.1ms/op      122us/op-cpu [0ms - 85ms]
closefile2           2625ops       44ops/s   0.0mb/s      0.0ms/op      107us/op-cpu [0ms - 0ms]
fsyncfile2           2625ops       44ops/s   0.0mb/s    131.8ms/op     4895us/op-cpu [20ms - 441ms]
appendfilerand2      2628ops       44ops/s   0.3mb/s      0.2ms/op      312us/op-cpu [0ms - 187ms]
createfile2          2628ops       44ops/s   0.0mb/s      1.2ms/op      502us/op-cpu [0ms - 854ms]
deletefile1          2628ops       44ops/s   0.0mb/s     48.7ms/op     2002us/op-cpu [0ms - 270ms]
16298: 77.344: IO Summary: 34107 ops, 568.414 ops/s, (87/88 r/w),   2.0mb/s,    760us cpu/op,  88.0ms latency
16298: 77.344: Shutting down processes




Gluster:


closefile4           4094ops       68ops/s   0.0mb/s      0.2ms/op       81us/op-cpu [0ms - 3ms]
readfile4            4094ops       68ops/s   1.0mb/s      0.8ms/op      132us/op-cpu [0ms - 18ms]
openfile4            4094ops       68ops/s   0.0mb/s      2.5ms/op      217us/op-cpu [0ms - 393ms]
closefile3           4094ops       68ops/s   0.0mb/s      0.1ms/op       61us/op-cpu [0ms - 2ms]
fsyncfile3           4094ops       68ops/s   0.0mb/s      6.6ms/op      256us/op-cpu [0ms - 1463ms]
appendfilerand3      4094ops       68ops/s   0.5mb/s      0.2ms/op       88us/op-cpu [0ms - 1ms]
readfile3            4094ops       68ops/s   1.0mb/s      0.7ms/op      134us/op-cpu [0ms - 15ms]
openfile3            4094ops       68ops/s   0.0mb/s      4.2ms/op      215us/op-cpu [0ms - 1612ms]
closefile2           4094ops       68ops/s   0.0mb/s      0.2ms/op       46us/op-cpu [0ms - 2ms]
fsyncfile2           4094ops       68ops/s   0.0mb/s      5.8ms/op      320us/op-cpu [1ms - 758ms]
appendfilerand2      4094ops       68ops/s   0.5mb/s      0.3ms/op       44us/op-cpu [0ms - 2ms]
createfile2          4094ops       68ops/s   0.0mb/s    105.3ms/op     7684us/op-cpu [2ms - 2279ms]
deletefile1          4100ops       68ops/s   0.0mb/s    105.6ms/op     7712us/op-cpu [1ms - 2194ms]
16418: 81.919: IO Summary: 53228 ops, 887.076 ops/s, (136/136 r/w),   3.1mb/s,   1332us cpu/op,  58.2ms latency


Much to my suprise gluster overall is a bit faster  - but the key things with mail are creating/deleting files and if you look at these (createfile2 / deletefile1) its a HUGE amount slower than the normal filesystem.