1 (edited by craig 2014-01-23 18:41:41)

Topic: I/O Rate: Server Overloaded

============ Required information ====
- iRedMail version: 0.8.5
- Store mail accounts in which backend (LDAP/MySQL/PGSQL): MySQL
- Linux/BSD distribution name and version: CentOS 6.4
- Related log if you're reporting an issue:
====

Hi there,

First, I'm aware I haven't upgraded to the latest version yet. However, I've had performance issues almost since day one with my installation, so I'm hoping for some suggestions to deal with this.

The problem is that peak times see typical I/O rates of up to about 3500 blocks per second, and today I had a peak of 11 000. Usually I see at least 100 messages sitting in the queue for about half an hour; right now it's over 1000 and climbing, and emails sent 2 hours ago have still not been delivered. The server *is* up, and mail is being processed, but it just can't keep up.

First, I'm obviously looking for a long-term solution. However, in the meantime I'd be happy for a short-term solution -- e.g., disable virus scanning or something like that, and a pointer on where I'd do that.

For what it's worth, I don't think this is a RAM issue. The server has 2 GB of RAM, the load right now is only 0.09, swap usage is only 98 MB, Roundcube loads in a typical/reasonable amount of time. I'm not even going to log into iRedAdmin though, as those initial database calls to display the dashboard aren't going to help the situation.

I want to get this posted, and I'll probably think of something else to add shortly.

Thanks in advance for any input anyone can provide.


Craig

----

Spider Email Archiver: On-Premises, lightweight email archiving software developed by iRedMail team. Supports Amazon S3 compatible storage and custom branding.

2 (edited by craig 2014-01-23 18:51:45)

Re: I/O Rate: Server Overloaded

Output of iotop -oa:

Total DISK READ: 11.36 K/s | Total DISK WRITE: 117.42 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
 2628 be/4 mysql        37.47 M     92.05 M  0.00 %  1.05 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
 2224 be/4 mysql      1508.00 K    608.00 K  0.00 %  0.23 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
10391 be/4 mysql         2.91 M    612.00 K  0.00 %  0.17 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
 1326 be/4 root          0.00 B     40.78 M  0.00 %  0.11 % [kjournald]
11168 be/4 mysql      1352.00 K    488.00 K  0.00 %  0.11 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
16385 be/4 postfix       4.00 K     10.52 M  0.00 %  0.10 % cleanup -z -t unix -u
 9571 be/4 mysql       764.00 K    224.00 K  0.00 %  0.07 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
 2946 be/4 mysql       364.00 K   1912.00 K  0.00 %  0.04 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
10255 be/4 apache       24.00 K     16.00 K  0.00 %  0.04 % httpd
18452 be/4 mysql       976.00 K    628.00 K  0.00 %  0.05 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
18288 be/4 postfix       0.00 B     20.95 M  0.00 %  0.04 % cleanup -z -t unix -u
17366 be/4 postfix       0.00 B      8.91 M  0.00 %  0.03 % cleanup -z -t unix -u
14362 be/4 apache       20.00 K     28.00 K  0.00 %  0.02 % httpd
19423 be/4 mysql       788.00 K    204.00 K  0.00 %  0.06 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
13133 be/4 mysql       376.00 K    168.00 K  0.00 %  0.02 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
18282 be/4 postfix       0.00 B     20.44 M  0.00 %  0.02 % cleanup -z -t unix -u
11682 be/4 mysql       864.00 K    220.00 K  0.00 %  0.02 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
23830 be/4 apache        8.00 K     16.00 K  0.00 %  0.01 % httpd
19709 be/4 vmail       304.00 K      0.00 B  0.00 %  0.05 % dovecot/pop3
20070 be/4 mysql       272.00 K    140.00 K  0.00 %  0.12 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
20015 be/4 vmail         3.04 M      0.00 B  0.00 %  0.10 % dovecot/pop3
19990 be/4 postfix       2.39 M     28.00 K  0.00 %  0.08 % smtp -n smtp-amavis -t unix -u -c -o smtp_data_done_timeout=1200 -o smtp_send_xforward_command=yes -o disable_dns_lookups=yes -o max_use=20
19512 be/4 postfix       0.00 B     10.51 M  0.00 %  0.02 % cleanup -z -t unix -u
17885 be/4 apache      272.00 K     24.00 K  0.00 %  0.01 % httpd
15767 be/4 mysql        16.00 K    172.00 K  0.00 %  0.01 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
 2151 be/4 root          0.00 B      0.00 B  0.00 %  0.01 % [flush-202:0]
 2622 be/4 mysql       372.00 K      0.00 B  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
29510 be/4 mysql         4.00 K    112.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
20016 be/4 mysql        60.00 K     36.00 K  0.00 %  0.03 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
19706 be/4 vmail       380.00 K      4.00 K  0.00 %  0.01 % dovecot/imap
18298 be/4 clam       1488.00 K      0.00 B  0.00 %  0.00 % clamd
19908 be/4 postfix       0.00 B      4.08 M  0.00 %  0.02 % cleanup -z -t unix -u
17630 be/4 root         28.00 K      0.00 B  0.00 %  0.00 % -bash
20440 be/4 apache      132.00 K     16.00 K  0.00 %  0.00 % httpd
20000 be/4 vmail         8.00 K      0.00 B  0.00 %  0.01 % dovecot/imap
 6261 be/4 postfix     336.00 K      0.00 B  0.00 %  0.00 % qmgr -l -t fifo -u
 2055 be/4 mysql        28.00 K      0.00 B  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
 2621 be/4 mysql         0.00 B    596.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
27499 be/4 vmail       136.00 K      8.00 K  0.00 %  0.00 % dovecot/imap
32465 be/4 mysql        36.00 K      0.00 B  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/~=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
18009 be/4 vmail         8.00 K      0.00 B  0.00 %  0.00 % dovecot/imap
 5943 be/4 vmail         4.00 K     12.00 K  0.00 %  0.00 % dovecot/imap
20099 be/4 amavis       80.00 K      0.00 B  0.00 %  0.01 % amavisd (ch11-20099-11)
 6754 be/4 postfix       8.00 K      0.00 B  0.00 %  0.00 % proxymap -t unix -u
20339 be/4 postfix      28.00 K      0.00 B  0.00 %  0.03 % smtp -n smtp-amavis -t unix -u -c -o smtp_data_done_timeout=1200 -o smtp_send_xforward_command=yes -o disable_dns_lookups=yes -o max_use=20
19757 be/4 vmail         8.00 K      0.00 B  0.00 %  0.00 % dovecot/imap
17889 be/4 apache       52.00 K      8.00 K  0.00 %  0.00 % httpd
17904 be/4 apache       20.00 K     20.00 K  0.00 %  0.00 % httpd
22902 be/4 apache       28.00 K     40.00 K  0.00 %  0.00 % httpd
17469 be/4 root         32.00 K      0.00 B  0.00 %  0.00 % python /usr/bin/iotop -oa
22900 be/4 apache       60.00 K     20.00 K  0.00 %  0.00 % httpd
14364 be/4 apache       16.00 K     12.00 K  0.00 %  0.00 % httpd
 9106 be/4 vmail         4.00 K      0.00 B  0.00 %  0.00 % dovecot/imap
14363 be/4 apache        8.00 K      4.00 K  0.00 %  0.00 % httpd
17881 be/4 apache       44.00 K     12.00 K  0.00 %  0.00 % httpd
17888 be/4 apache      116.00 K     24.00 K  0.00 %  0.00 % httpd
23348 be/4 apache        4.00 K     16.00 K  0.00 %  0.00 % httpd
 5958 be/4 vmail         4.00 K      0.00 B  0.00 %  0.00 % dovecot/imap

3

Re: I/O Rate: Server Overloaded

I've commented out the following line in /etc/postfix/main.cf and things have improved *slightly* in that the mail queue is gradually (very gradually) decreasing:

content_filter = smtp-amavis:[127.0.0.1]:10024

However, at this rate it will still take hours to clear the queue, by which time it will be the end of the business day when it would have cleared on its own anyway.

I think if I could temporarily refuse inbound connections on port 25 to give Postfix some breathing room, things would improve quicker. How can I do that without causing incoming email to be bounced?

Any suggestions would be appreciated. Thanks.


Craig

4

Re: I/O Rate: Server Overloaded

*) You can try command "postqueue -f" to flush the mail queue, Postfix will attempt to deliver all queued emails. Hope it will speed up the mail delivery.

*) After this issue was solved, you can re-enable 'content_filter' in Postfix main.cf, but disable spam/virus scanning for outgoing emails by following this tutorial:
http://www.iredmail.org/wiki/index.php? … oing.Mails

May i know which version of linux kernel you're running? Did you have message "Package power limit notification" in log files under /var/log/? (Please try 'grep')

5

Re: I/O Rate: Server Overloaded

Hi Zhang,

Thanks very much for your reply. Much appreciated.

ZhangHuangbin wrote:

*) You can try command "postqueue -f" to flush the mail queue, Postfix will attempt to deliver all queued emails. Hope it will speed up the mail delivery.

The queue was already being processed, just so slowly that it couldn't keep up, so I didn't see the point of doing this. Am I missing something?

ZhangHuangbin wrote:

*) After this issue was solved, you can re-enable 'content_filter' in Postfix main.cf, but disable spam/virus scanning for outgoing emails by following this tutorial:
http://www.iredmail.org/wiki/index.php? … oing.Mails

Thanks for the pointer to disable spam and virus scanning for outbound emails. I'm of two minds about that, but I suppose if I was going to scan only one direction, inbound would make more sense. Then again, every so often I get an outbreak of compromised client machines, and it's no fun trying to battle that before the anti-spam blacklists notice.

ZhangHuangbin wrote:

May i know which version of linux kernel you're running? Did you have message "Package power limit notification" in log files under /var/log/? (Please try 'grep')

[07:19:39 root@host log]# grep "Package power limit notification" *
[08:54:38 root@host log]# grep -i "power limit" *
[08:56:23 root@host log]# pwd
/var/log
[08:57:39 root@host log]# uname -a
Linux host.example.com 3.9.3-x86_64-linode33 #1 SMP Mon May 20 10:22:57 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
[08:57:42 root@host log]#

Currently, about 24 hours after my initial post, I still have "content_filter" disabled in "main.cf" and there is no problem on the server: no mail queued, and I/O looks to be down a bit, but I'll have to check the graphs in another 24 hours to do a decent visual comparison. However, normally there would be at least about 100 mails in the queue at this point with, as I said yesterday, delivery taking about half an hour. Yesterday was an anomaly (some mail was in the queue for over five hours!), but not one that can be tolerated.

Thanks. Happy to hear any thoughts or suggestions you may have.


Craig

6

Re: I/O Rate: Server Overloaded

craig wrote:

3.9.3-x86_64-linode33

Linode uses development version of Linux kernel.

Well, looks like it's caused by content-based spam/virus scanning. Did you modify any Amavisd/SpamAssassin settings?
How about turn off spam scanning in Amavisd and wait for 24 hours? let's see whether it's caused by Amavisd and Clamav. If it's fine, keep virus scanning turning on and turn on spam scanning with SpamAssassin. See whether it's caused by SpamAssassin.

7

Re: I/O Rate: Server Overloaded

Hi Zhang,

Thanks for your reply. No, I have no record of modifying any Amavisd or Spam Assassin settings from their originals. Amavisd was new to me so I didn't have a reason to think I had a better way to configure it, and likewise I've generally been fine with letting Spam Assassin run as-is.

OK, so in order to test one item at a time I'm going to modify the following configuration block in /etc/amavisd/amavisd.conf by un-commenting one line at a time:

    # don't perform spam/virus/header check.
    #bypass_spam_checks_maps => [1],
    #bypass_virus_checks_maps => [1],
    #bypass_header_checks_maps => [1],

Right?

And because this is in the "MYUSERS" section this will apply only to outgoing email, correct? If I want to try the same for incoming email, where do I find that configuration?

And since it's the weekend now, I'll need to wait until Monday because the traffic goes way down on the weekends and isn't a problem.


Craig

8

Re: I/O Rate: Server Overloaded

Oh, one other thing. I see the following section in /etc/amavisd/amavisd.conf:

    # notify administrator of locally originating malware
    virus_admin_maps => ["root\@$mydomain"],
    spam_admin_maps  => ["root\@$mydomain"],

However, although I see numbers of viruses caught in the dashboard (except now; see below), I've never received an email notification, and I've tested that the address works.

And another thing: I've just noticed that all the "Statistics Of Latest 24 Hours" and top ten senders and recipients in my dashboard are zero! The only change I've made recently was to comment out the "content_filter" line in "/etc/postfix/main.cf", but I can't see that being related. The sent and received logs seems to be available though. I haven't check the database directly yet, as I'm not on a secure enough connection, but I will.

9

Re: I/O Rate: Server Overloaded

Hi there,

Seeing as this topic was not officially closed and I was the person who started it, I'm going to post here (even though it's a year and a half later) because otherwise I'd just need to reference this thread anyway. I am still having the same problem.

To recap, since I posted this issue the RAM on this server has been doubled (to 8 GB) and it now uses an SSD. For the last eighteen months I have had the following line commented out in /etc/postfix/main.cf:

content_filter = smtp-amavis:[127.0.0.1]:10024

The results of commenting that line out are the following, off the top of my head:

  • Most importantly, the server processes mail quickly and efficiently.

  • The managesieve filters in the webmail do not work, most noticeably the "vacation" auto-responder.

  • There are no statistics in the iRedAdmin-Pro (MySQL) dashboard.

There are possibly other results too, but these are the most obvious ones.

I decided to un-comment the line above today to see what might happen, and within 120 seconds of restarting Postfix the server went from zero mails in the queue to 7. Within ten minutes the queue was at 70. I commented out the line again and the queue was back to zero within eight minutes of restarting Postfix.

I already have my /etc/amavisd/amavisd.conf file set up according to the instructions at "Disable spam virus scanning for outgoing mails".

Zhang (or anyone): What could be the problem here? How can I identify precisely what is causing the problem? I doubt that a few auto-responders are bringing the server to its knees, and if I can identify the actual issue I'd like to get those auto-responders working again.

Thanks in advance for any assistance.


Craig

10

Re: I/O Rate: Server Overloaded

craig wrote:
  • Most importantly, the server processes mail quickly and efficiently.

  • The managesieve filters in the webmail do not work, most noticeably the "vacation" auto-responder.

  • There are no statistics in the iRedAdmin-Pro (MySQL) dashboard.

*) Port 10024 is Amavisd service, it calls SpamAssassin (SA) + ClamAV for spam/virus scanning, plus DKIM signing+verification, SPF verification, and optionally, appending disclaimer text. As you may already know, SA + ClamAV will take huge system resource (CPU/RAM/disk IO), so if you comment out 'content_filter=' in Postfix, there's no spam/virus scanning, no DKIM/SPF, no disclaimer text, that's why mail flow is much faster.

*) Managesieve filter is part of Dovecot, it doesn't rely on Amavisd/SA/ClamAV, so it doesn't make sense in this case. I guess you messed some Dovecot settings or webmail settings. Just a guess, have to figure it out with detailed (Dovecot) debug log.

*) Statistics in iRedAdmin-Pro relies on Amavisd (@storage_sql_dsn setting), so if you comment out 'content_filter=' in Postfix, no more data in Amavisd database, no statistics in iRedAdmin-Pro.

craig wrote:

Zhang (or anyone): What could be the problem here? How can I identify precisely what is causing the problem? I doubt that a few auto-responders are bringing the server to its knees, and if I can identify the actual issue I'd like to get those auto-responders working again.

*) Since you have more RAM, try this:

- Amavisd: process more emails concurrently
http://www.iredmail.org/docs/concurrent.processing.html

*) About auto-responder issue, Amavisd doesn't impact it, so cannot give any help here without detailed debug log.

11

Re: I/O Rate: Server Overloaded

Zhang,

Really appreciate your reply.

First, I must apologise. The auto-responders are working properly. While helping someone set one up a while ago they didn't seem to be working, and because I had recently (at that time) made the configuration change I mentioned above (commenting out that line in "main.cf") I came to the conclusion that the problem was connected. My mistake, as I obviously didn't do enough research into the problem. One issue resolved.

Now, I have made the configuration changes you suggested at http://www.iredmail.org/docs/concurrent.processing.html , choosing 10 concurrent Amavisd processes as suggested (up from 2). Since it's the weekend and traffic is low anyway I can't be sure that this has helped, but at the moment things look fine.

I will check again on Monday and let you know what I learn. Thanks again for your help.


Craig

12

Re: I/O Rate: Server Overloaded

craig wrote:

Now, I have made the configuration changes you suggested at http://www.iredmail.org/docs/concurrent.processing.html , choosing 10 concurrent Amavisd processes as suggested (up from 2). Since it's the weekend and traffic is low anyway I can't be sure that this has helped, but at the moment things look fine.

Another thing to look into could also be the storage since amavis uses storage for temporary and banned/virus etc. mails are written to storage. What are you using for storage and what is the performance of this storage?

13

Re: I/O Rate: Server Overloaded

mir wrote:

Another thing to look into could also be the storage since amavis uses storage for temporary and banned/virus etc. mails are written to storage. What are you using for storage and what is the performance of this storage?

Thanks for your question. This is on a VPS using an SSD. There are 61 GB free on the drive. I don't have any indication that space is an issue.

When you ask "... what is the performance of this storage?", what specific specification are you looking for? Whatever it is I'll have to ask the data centre (Linode), but when they switched to SSDs a while ago they assured their customers that they were not consumer grade.

On another note, looking at the graphs for the eight hours since I made the configuration change there is a definite increase in all statistics. This is to be expected, of course, but I'm quite concerned that I/O is going to become a problem once again come Monday morning. We'll see.

Post's attachments

io_20150704_1_day_redacted.png
io_20150704_1_day_redacted.png 23.81 kb, file has never been downloaded. 

You don't have the permssions to download the attachments of this post.

14

Re: I/O Rate: Server Overloaded

craig wrote:

When you ask "... what is the performance of this storage?", what specific specification are you looking for? Whatever it is I'll have to ask the data centre (Linode), but when they switched to SSDs a while ago they assured their customers that they were not consumer grade.

With performance I mean how much I/O the underlying storage is capable of deliver. The graph you show does not give any clue to performance. It merely shows the current I/O of the server.

install fio on the server and run this command:
fio --description="Emulation of Intel IOmeter File Server Access Pattern" --name=iometer --bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10 --rw=randrw --rwmixread=80 --direct=1 --size=4g --ioengine=libaio --iodepth=64

Paste the result of running the command here.

My iredmail server which is also virtualized produces this result:
fio --description="Emulation of Intel IOmeter File Server Access Pattern" --name=iometer --bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10 --rw=randrw --rwmixread=80 --direct=1 --size=4g --ioengine=libaio --iodepth=64
iometer: (g=0): rw=randrw, bs=512-64K/512-64K/512-64K, ioengine=libaio, iodepth=64
fio-2.1.11
Starting 1 process
iometer: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m(1)] [99.5% done] [49786KB/11849KB/0KB /s] [11.7K/2788/0 iops] [eta 00m:01s]
iometer: (groupid=0, jobs=1): err= 0: pid=30881: Sat Jul  4 20:37:06 2015
  Description  : [Emulation of Intel IOmeter File Server Access Pattern]
  read : io=3274.5MB, bw=15662KB/s, iops=2570, runt=214064msec
    slat (usec): min=4, max=294241, avg=13.32, stdev=589.38
    clat (usec): min=1, max=5615.5K, avg=19915.37, stdev=64420.45
     lat (usec): min=190, max=5615.7K, avg=19929.08, stdev=64423.14
    clat percentiles (usec):
     |  1.00th=[  249],  5.00th=[  298], 10.00th=[  334], 20.00th=[  394],
     | 30.00th=[  478], 40.00th=[  636], 50.00th=[  908], 60.00th=[ 2024],
     | 70.00th=[11072], 80.00th=[27776], 90.00th=[58624], 95.00th=[94720],
     | 99.00th=[211968], 99.50th=[284672], 99.90th=[544768], 99.95th=[749568],
     | 99.99th=[3096576]
    bw (KB  /s): min=    0, max=48881, per=100.00%, avg=16044.79, stdev=7594.75
  write: io=841688KB, bw=3931.1KB/s, iops=643, runt=214064msec
    slat (usec): min=5, max=90237, avg=15.12, stdev=271.96
    clat (usec): min=129, max=3227.3K, avg=19841.34, stdev=49454.12
     lat (usec): min=207, max=3227.3K, avg=19856.87, stdev=49456.27
    clat percentiles (usec):
     |  1.00th=[  274],  5.00th=[  330], 10.00th=[  374], 20.00th=[  446],
     | 30.00th=[  532], 40.00th=[  700], 50.00th=[  980], 60.00th=[ 2256],
     | 70.00th=[11968], 80.00th=[28800], 90.00th=[60160], 95.00th=[96768],
     | 99.00th=[211968], 99.50th=[276480], 99.90th=[477184], 99.95th=[577536],
     | 99.99th=[1089536]
    bw (KB  /s): min=    3, max=12255, per=100.00%, avg=4028.52, stdev=1931.62
    lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=0.91%, 500=30.04%, 750=13.41%, 1000=7.35%
    lat (msec) : 2=8.05%, 4=3.82%, 10=5.42%, 20=6.92%, 50=11.96%
    lat (msec) : 100=7.53%, 250=3.93%, 500=0.55%, 750=0.08%, 1000=0.01%
    lat (msec) : 2000=0.02%, >=2000=0.01%
  cpu          : usr=3.01%, sys=8.39%, ctx=520486, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=550156/w=137644/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=3274.5MB, aggrb=15661KB/s, minb=15661KB/s, maxb=15661KB/s, mint=214064msec, maxt=214064msec
  WRITE: io=841688KB, aggrb=3931KB/s, minb=3931KB/s, maxb=3931KB/s, mint=214064msec, maxt=214064msec

Disk stats (read/write):
    dm-1: ios=550787/137836, merge=0/0, ticks=10914200/2732076, in_queue=13648212, util=100.00%, aggrios=551217/137937, aggrmerge=0/7, aggrticks=10917608/2732220, aggrin_queue=13649532, aggrutil=100.00%
  vdb: ios=551217/137937, merge=0/7, ticks=10917608/2732220, in_queue=13649532, util=100.00%

Which can be condensed to:
Read IOPS:  2570
Write IOPS:   643

15

Re: I/O Rate: Server Overloaded

mir wrote:

The graph you show does not give any clue to performance. It merely shows the current I/O of the server.

Yes, I'm aware of that. The last paragraph and the graph were an aside ("On another note ...") unrelated to your question/suggestion.

The server looks fine this morning. I/O is up, as is to be expected, but the server is not overwhelmed as it has been in the past.

As fio looks like a tool to generate I/O operations for testing purposes, I'm not going to run it during business hours. I'll take a look at it later. Thanks.

16

Re: I/O Rate: Server Overloaded

craig wrote:

choosing 10 concurrent Amavisd processes as suggested (up from 2)

If your have some free memory, it's ok to try to increase it again, but don't increase too many processes each time.

17 (edited by craig 2015-07-07 10:21:57)

Re: I/O Rate: Server Overloaded

mir wrote:

install fio on the server and run this command:
fio --description="Emulation of Intel IOmeter File Server Access Pattern" --name=iometer --bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10 --rw=randrw --rwmixread=80 --direct=1 --size=4g --ioengine=libaio --iodepth=64

Paste the result of running the command here.

...

Which can be condensed to:
Read IOPS:  2570
Write IOPS:   643

Here you go:

[01:41:55 root@host ~]# fio --description="Emulation of Intel IOmeter File Server Access Pattern" --name=iometer --bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10 --rw=randrw --rwmixread=80 --direct=1 --size=4g --ioengine=libaio --iodepth=64
iometer: (g=0): rw=randrw, bs=512-64K/512-64K/512-64K, ioengine=libaio, iodepth=64
fio-2.0.13
Starting 1 process
iometer: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m] [100.0% done] [99.51M/25340K/0K /s] [27.3K/6700 /0  iops] [eta 00m:00s]
iometer: (groupid=0, jobs=1): err= 0: pid=5975: Tue Jul  7 01:42:56 2015
  Description  : [Emulation of Intel IOmeter File Server Access Pattern]
  read : io=3276.5MB, bw=112740KB/s, iops=25269 , runt= 29759msec
    slat (usec): min=4 , max=2117 , avg=10.68, stdev= 7.96
    clat (usec): min=160 , max=48942 , avg=2074.05, stdev=2801.41
     lat (usec): min=172 , max=48949 , avg=2085.37, stdev=2800.97
    clat percentiles (usec):
     |  1.00th=[  580],  5.00th=[  724], 10.00th=[  812], 20.00th=[  924],
     | 30.00th=[ 1004], 40.00th=[ 1096], 50.00th=[ 1192], 60.00th=[ 1304],
     | 70.00th=[ 1480], 80.00th=[ 1912], 90.00th=[ 3952], 95.00th=[ 7072],
     | 99.00th=[15936], 99.50th=[18816], 99.90th=[23680], 99.95th=[25728],
     | 99.99th=[31104]
    bw (KB/s)  : min=79662, max=225435, per=100.00%, avg=112829.17, stdev=23172.18
  write: io=839260KB, bw=28202KB/s, iops=6314 , runt= 29759msec
    slat (usec): min=4 , max=17533 , avg=11.49, stdev=61.69
    clat (usec): min=118 , max=50955 , avg=1765.77, stdev=2328.12
     lat (usec): min=135 , max=50970 , avg=1777.91, stdev=2328.44
    clat percentiles (usec):
     |  1.00th=[  498],  5.00th=[  620], 10.00th=[  700], 20.00th=[  796],
     | 30.00th=[  868], 40.00th=[  932], 50.00th=[ 1020], 60.00th=[ 1128],
     | 70.00th=[ 1272], 80.00th=[ 1608], 90.00th=[ 3536], 95.00th=[ 6240],
     | 99.00th=[12992], 99.50th=[14912], 99.90th=[19072], 99.95th=[21120],
     | 99.99th=[27520]
    bw (KB/s)  : min=20064, max=55567, per=100.00%, avg=28231.34, stdev=5635.87
    lat (usec) : 250=0.01%, 500=0.37%, 750=7.56%, 1000=25.28%
    lat (msec) : 2=48.08%, 4=9.09%, 10=6.52%, 20=2.78%, 50=0.30%
    lat (msec) : 100=0.01%
  cpu          : usr=17.06%, sys=36.13%, ctx=35980, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=752007/w=187927/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=3276.5MB, aggrb=112740KB/s, minb=112740KB/s, maxb=112740KB/s, mint=29759msec, maxt=29759msec
  WRITE: io=839260KB, aggrb=28201KB/s, minb=28201KB/s, maxb=28201KB/s, mint=29759msec, maxt=29759msec

Disk stats (read/write):
  xvda: ios=746722/188792, merge=10315/1263, ticks=1233010/266396, in_queue=1505680, util=99.73%
[01:42:56 root@host ~]#

Am I correct in concluding that my storage is roughly ten times faster than yours?

18

Re: I/O Rate: Server Overloaded

ZhangHuangbin wrote:
craig wrote:

choosing 10 concurrent Amavisd processes as suggested (up from 2)

If your have some free memory, it's ok to try to increase it again, but don't increase too many processes each time.

I'm probably going to do this, starting with moving up to 15 possibly tomorrow. Any experience on RAM versus number of processes?

19

Re: I/O Rate: Server Overloaded

Reference: http://www.ijs.si/software/amavisd/READ … ml#d0e1231

BTW, you can monitor the amavisd-new processes with command 'amavisd-nanny'.

20

Re: I/O Rate: Server Overloaded

ZhangHuangbin wrote:

Reference: http://www.ijs.si/software/amavisd/READ … ml#d0e1231

BTW, you can monitor the amavisd-new processes with command 'amavisd-nanny'.

Thanks Zhang. I'll let you know how it goes.

21

Re: I/O Rate: Server Overloaded

craig wrote:

Am I correct in concluding that my storage is roughly ten times faster than yours?

You cannot conclude from fio which storage is faster. What you can conclude is how many I/O a storage is capable of per second while speed is a matter of how fast data is written to storage. I/O is the sum of write speed to storage, amount of cache, and write speed to cache. Amount of cache and write speed to cache is and order of magnitude more important than write speed to storage if storage is a mechanical disk while cache is less important for SSD's. fio shows that you have a lot of fast cache and high performance SSD disks. QeD. you can rule out storage as part of your problems.

22

Re: I/O Rate: Server Overloaded

Zhang and Mir,

Apologies for my silence. Been too busy to deal with this.

However, I have finally upped the "max_servers" to 15 and am going to monitor for the rest of this week.

Mir, thanks for your analysis of my output of fio. I guess for now I'll play with "max_servers" and see if I can solve the problem that way. Otherwise it seems I have a deeper problem on my hands that will continue to suck up my time as I chase it down.


Craig

23 (edited by craig 2015-08-05 07:48:31)

Re: I/O Rate: Server Overloaded

Zhang,

I know from reading some older messages when searching for posts about a topic I am about to start that you hate old threads with which you need to reacquaint yourself weeks later, so I apologise that this thread has been running for so long.

At this point I cannot see making any headway on this issue. I was able to run for a few days with "max_servers" set to 10 before the server choked before, so I have gone back down to that amount. Anything over that seems to be the immediate cause for the mail queue to back up instantly, and grow until I comment out "content_filter = smtp-amavis:[127.0.0.1]:10024" in "main,cf" and restart Postfix.

I am probably going to end up writing a script to use cron to monitor the mail queue and reconfigure and restart Postfix on the fly. I can't babysit the server 24/7.

Thanks to you and mir for the suggestions. I'll open a new thread in the future if I decide to tackle this issue again.


Craig


(Edit: Minor clarification.)

24

Re: I/O Rate: Server Overloaded

*) Did you bypass spam/virus scanning for outgoing emails to reduce server load (amavisd+sa+clamav part)?
*) Did you check our (short) performance tuning tutorial here?
http://www.iredmail.org/docs/performance.tuning.html

I'm not sure a local cache-only DNS server helps in your case, but it worths a try. Postfix and SA do many DNS queries.

25

Re: I/O Rate: Server Overloaded

Hi Zhang,

ZhangHuangbin wrote:

*) Did you bypass spam/virus scanning for outgoing emails to reduce server load (amavisd+sa+clamav part)?

Thanks for your reply. Yes, I have been bypassing outbound spam and virus scanning for several weeks, although I had not un-commented "bypass_banned_checks_maps" for some reason. I have done that now.

So the relevant section of "amavisd.conf" is now as follows:

$policy_bank{'MYUSERS'} = {
    # declare that mail was submitted by our smtp client
    originating => 1,

    # enables disclaimer insertion if available
    allow_disclaimers => 1,

    # notify administrator of locally originating malware
    virus_admin_maps => ["root\@$mydomain"],
    spam_admin_maps  => ["root\@$mydomain"],
    warnbadhsender   => 0,

    # forward to a smtpd service providing DKIM signing service
    #forward_method => 'smtp:[127.0.0.1]:10027',

    # force MTA conversion to 7-bit (e.g. before DKIM signing)
    smtpd_discard_ehlo_keywords => ['8BITMIME'],

    # don't remove NOTIFY=SUCCESS option
    terminate_dsn_on_notify_success => 0,

    # don't perform spam/virus/header check.
    bypass_spam_checks_maps => [1],
    bypass_virus_checks_maps => [1],
    bypass_header_checks_maps => [1],

    # allow sending any file names and types
    bypass_banned_checks_maps => [1],
};

Does this look right to you?

ZhangHuangbin wrote:

*) Did you check our (short) performance tuning tutorial here?
http://www.iredmail.org/docs/performance.tuning.html

I'm not sure a local cache-only DNS server helps in your case, but it worths a try. Postfix and SA do many DNS queries.

Thanks for the pointer to the performance tuning document. I had not seen it before. I do already use two DNSBLs (zen.spamhaus.org and bl.spamcop.net), but I will definitely enable postscreen (this weekend, hopefully) and I'll look for information on setting up a DNS server on localhost. Do you have a document or pointers for that?

Thanks again.


Craig