1

Topic: New everything, same problem with Amavis / SQL timeouts.

==== Required information ====
- iRedMail version (check /etc/iredmail-release): 0.9.2
- Linux/BSD distribution name and version: Debian 8.1
- Store mail accounts in which backend (LDAP/MySQL/PGSQL): MySQL
- Web server (Apache or Nginx): Apache
- Manage mail accounts with iRedAdmin-Pro? No
- Related log if you're reporting an issue: /var/log/mail.log
====

So here's the story. I was getting errors like the ones below on my old server. I had thought that by tweaking the Amavis settings, that I may have fixed the problem, only to find out that when I went to send a mailing list campaign, the server failed spectacularly, postfix collapsed under the load, and as a result, I had to completely stop the campaign from going out. I built a new server from scratch as a hot-swap machine. Completely different hardware. Upgraded from 2 to 4GB of RAM. I installed everything from scratch. I re-created all of the accounts by hand. I built all new databases, except for my sa_bayes database. I migrated the users' mailboxes by hand. Basically, everything is brand new. And yet, still, I'm getting this error. I believe that it's more than coincidence that this is happening. I created the same setup, with Debian 8.1, iRedMail, Apache, and the exact same software, on a completely different machine.  I have no idea what file (eval 95) line 200 even references, but whatever file it is, there's a problem with it. Here's what's happening.

Aug 26 03:44:31 mail postfix/qmgr[15012]: C8BD56218E1: from=<root@mail.x.x>, size=1646, nrcpt=1 (queue active)
Aug 26 03:45:35 mail amavis[15053]: (15053-03) NOTICE: reconnecting in response to: err=2013, HY000, timed out
Aug 26 03:45:35 mail amavis[15053]: (15053-03) (!)lookup_sql: timed out at (eval 95) line 200., ,
Aug 26 03:45:35 mail amavis[15053]: (15053-03) (!)Requesting process rundown, task exceeded allowed time
Aug 26 03:49:31 mail postfix/smtp[21114]: C8BD56218E1: to=<root@mail.x.x>, relay=127.0.0.1[127.0.0.1]:10024, delay=1168, delays=868/0.05/0.01/300, dsn=4.4.2, status=deferred (conversation with 127.0.0.1[127.0.0.1] timed out while sending MAIL FROM)
Aug 26 04:00:02 mail amavis[15055]: (15055-03) NOTICE: reconnecting in response to: err=2013, HY000, timed out
Aug 26 04:00:02 mail amavis[15055]: (15055-03) (!)lookup_sql: timed out at (eval 95) line 200., ,
Aug 26 04:00:02 mail amavis[15055]: (15055-03) (!)Requesting process rundown, task exceeded allowed time

I've gotta get this issue fixed. I could definitely use the help, here, because I'm at a total loss. Thanks everyone!

----

Spider Email Archiver: On-Premises, lightweight email archiving software developed by iRedMail team. Supports Amazon S3 compatible storage and custom branding.

2

Re: New everything, same problem with Amavis / SQL timeouts.

I just noticed this in my syslog. I have a cron job for phpList to process the queue every 15 minutes, as well as to check a mailbox for bounces. It seems that the cron job ran right as Amavis started complaining about having no connection to MySQL. I'm not sure if the two are related, but this is from my syslog:

Aug 26 03:45:01 mail CRON[21117]: (root) CMD (phplist -pprocessqueue >/dev/null 2>&1)
Aug 26 03:45:35 mail amavis[15053]: (15053-03) NOTICE: reconnecting in response to: err=2013, HY000, timed out
Aug 26 03:45:35 mail amavis[15053]: (15053-03) (!)lookup_sql: timed out at (eval 95) line 200., ,
Aug 26 03:45:35 mail amavis[15053]: (15053-03) (!)Requesting process rundown, task exceeded allowed time

phpList uses a MySQL database for all of its information, mailing list contents, membership, etc. The SQL database is quite large. I would still like to know to what (eval 95) line 200 is referring, however.

3 (edited by SteveLuxe 2015-08-27 02:17:56)

Re: New everything, same problem with Amavis / SQL timeouts.

I did some research. The error=2013 is a MySQL error which basically states that the thread (MySQL) that is servicing the process (Amavis) has been closed. I am wondering if perhaps MySQL is not running enough threads on this server to keep Apache, Amavis, phpList, and everything else that relies on MySQL happy, and as a result, it's killing off older threads to spawn new ones for new requests?

I did notice that after Debian recently released their upgrade for MySQL-server, this began happening. I'm really grasping at straws here, but things appear to be working normally, until the server goes under heavy load from phpList sending out a campaign, and then the Pandora's box opens, and everything begins failing miserably.

---edit---

I was a wee bit off on the whole MySQL killing off threads to spawn new ones for new processes. What I didn't realize is that, with the way that MySQL is configured in my.cnf, there is no limit on how many new threads MySQL can spawn, which keeps it scalable for when new requests are made to MySQL. Looking over the data, I realize that Amavis can just request a new connection to MySQL using a different thread than the one that MySQL closed. I'm still confused about the Amavis eval 95 line 200 part though. Why isn't this just a normal "MySQL has gone away" error? What's different about this?

4

Re: New everything, same problem with Amavis / SQL timeouts.

No clue yet.

Do you have correct SQL username/password, server address, port number in Amavisd?

5 (edited by SteveLuxe 2015-08-27 22:48:13)

Re: New everything, same problem with Amavis / SQL timeouts.

Yes, I have the correct username and password for Amavisd in the config files. Amavis is able to connect to the database just fine. I'm not getting any errors saying that it can't connect. The only thing that's changed is Debian's version of mysql-server. Here's an excerpt of the mysql error log from my system, after I rebooted this morning:

150827 10:41:23 [Note] /usr/sbin/mysqld (mysqld 5.5.44-0+deb8u1) starting as process 2210 ...
150827 10:41:23 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
150827 10:41:23 [Note] Plugin 'FEDERATED' is disabled.
150827 10:41:23 InnoDB: The InnoDB memory heap is disabled
150827 10:41:23 InnoDB: Mutexes and rw_locks use GCC atomic builtins
150827 10:41:23 InnoDB: Compressed tables use zlib 1.2.8
150827 10:41:23 InnoDB: Using Linux native AIO
150827 10:41:23 InnoDB: Initializing buffer pool, size = 128.0M
150827 10:41:23 InnoDB: Completed initialization of buffer pool
150827 10:41:23 InnoDB: highest supported file format is Barracuda.
150827 10:41:23  InnoDB: Waiting for the background threads to start
150827 10:41:24 InnoDB: 5.5.44 started; log sequence number 336435488
150827 10:41:24 [Note] Server hostname (bind-address): '127.0.0.1'; port: 3306
150827 10:41:24 [Note]   - '127.0.0.1' resolves to '127.0.0.1';
150827 10:41:24 [Note] Server socket created on IP: '127.0.0.1'.
150827 10:41:24 [Note] Event Scheduler: Loaded 0 events
150827 10:41:24 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.44-0+deb8u1'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  (Debian)

Should the buffer pool be higher? I also did some reading up on the errors that I was getting in the threads that I had created about my old server / OS / installation, regarding to postfix dying when I was trying to send an email campaign. There were problems with trivial rewrite, problems with a rw loop, postfix-qmgr dying, and problems with proxymap not working, and basically everything that was tied to MySQL just completely grinding to a halt. Everything that I have been reading has been pointing at problems with MySQL. Hopefully someone can help me with this, as this has been a serious problem.

6

Re: New everything, same problem with Amavis / SQL timeouts.

I have no idea yet. I suggest asking support in Amavisd mailing list instead:
http://www.ijs.si/software/amavisd/#support

Sorry.

7

Re: New everything, same problem with Amavis / SQL timeouts.

I increased my MySQL InnoDB buffer pool to 512M to see if perhaps it makes a difference. I have one database on this machine that is 130M+ for phpList, alone. If there were cron jobs that were running against the phpList database yesterday at almost the same time as the Amavis disconnects, that could explain why it wasn't performing properly. It was taking too long for MySQL to respond to Amavis' request for information, because MySQL was too busy performing a cron job which was taking up quite a bit of MySQL's resources... I'm just guessing here, but I'm thinking that my buffer pool was set far too low. I'm going to monitor it and see if this helps. As I said, it seems that everything continues to tie back to MySQL as being the culprit, here.

8

Re: New everything, same problem with Amavis / SQL timeouts.

ZhangHuangbin wrote:

I have no idea yet. I suggest asking support in Amavisd mailing list instead:
http://www.ijs.si/software/amavisd/#support

Sorry.

Thanks for the link, Zhang. I'll pursue that if working out the MySQL thing doesn't pan out. I'm hoping that by increasing the buffer pool for InnoDB, I've fixed the problem... I'll know soon, I suppose.

9

Re: New everything, same problem with Amavis / SQL timeouts.

I also took my MySQL thread cache size from 8 to 20. That may eliminate some of the problems with MySQL closing Amavis' threads. The more I look at it, the more I realize that I don't think that MySQL was optimized correctly for my system's needs. I think that MySQL was choking on everything that I was throwing at it.

10

Re: New everything, same problem with Amavis / SQL timeouts.

Please share your tuning after you solved this issue. smile

11

Re: New everything, same problem with Amavis / SQL timeouts.

Absolutely! So far, so good with the new tuning. I'm going to keep an eye on it for the next couple of days, and if the problem seems to resolve, I'll definitely post my tuning.

12

Re: New everything, same problem with Amavis / SQL timeouts.

So far, as of today, there have been no disconnects at all between Amavis and MySQL to speak of. I'm keeping an eye on it for another day, and keeping an eye on the tuning that I did to MySQL, but so far, the MySQL database seems to have a 100% hit rate as far as being able to satisfy connections for all of the apps that are requesting queries to the databases. If all goes well, I'll post my tuning parameters tomorrow, so that hopefully anyone else who is experiencing this kind of problem can know what to do, and hopefully not have to suffer the same type of grief that this has caused for me. I'll keep you posted. smile