1 (edited by mir 2015-05-20 06:23:56)

Topic: iredapd -1.5.0 problems

==== Required information ====
- iRedMail version: 0.9.1
- Store mail accounts in which backend (LDAP/MySQL/PGSQL): LDAP
- Web server (Apache or Nginx): Apache
- Linux/BSD distribution name and version: Debian Jessie
- Related log if you're reporting an issue:
====

iredapd suddenly stops without prior notice and no information what so ever in either its own log file or in syslog. Only indication is this from mail.log:
warning: connect to 127.0.0.1:7777: Connection refused
warning: problem talking to server 127.0.0.1:7777: Connection refused

When iredapd starts this is what is logged:
2015-05-19 23:33:19 INFO Starting iRedAPD (version: 1.5.0, backend: ldap), listening on 127.0.0.1:7777.
2015-05-19 23:33:19 INFO Loading plugin: reject_null_sender
2015-05-19 23:33:19 INFO Loading plugin: ldap_maillist_access_policy
2015-05-19 23:33:19 INFO Loading plugin: ldap_amavisd_block_blacklisted_senders
2015-05-19 23:33:19 INFO Loading plugin: ldap_recipient_restrictions
2015-05-19 23:33:19 INFO Loading plugin: amavisd_message_size_limit

To mitigate the problem I have been force to write the following script(/usr/local/sbin/check_iredapd.sh):
#!/bin/sh

CMD=/etc/init.d/iredapd
[ -x $CMD ] || exit 0

STATUS=$($CMD status)
[ $? != 0 ] && $CMD start 2>&1 > /dev/null

exit 0

And create the following file /etc/cron.d/iredapd:
MAILTO=root

* * * * * root [ -x /usr/local/sbin/check_iredapd.sh ] && /usr/local/sbin/check_iredapd.sh

----

Spider Email Archiver: On-Premises, lightweight email archiving software developed by iRedMail team. Supports Amazon S3 compatible storage and custom branding.

2

Re: iredapd -1.5.0 problems

Dear mir,

Sorry about this trouble.
Could you please modify /opt/iredapd/iredapd.py with below patch, then restart iRedAPD and keep monitor its log file?

--- iredapd.py.bak    2015-05-20 12:27:42.000000000 +0800
+++ iredapd.py    2015-05-20 12:28:58.000000000 +0800
@@ -236,6 +236,8 @@
         asyncore.loop()
     except KeyboardInterrupt:
         pass
+    except Exception, e:
+        logging.info('Error: ' + str(e))
 
 if __name__ == '__main__':
     main()

If same issue occurs, please find out log line in /var/log/iredapd.log which contains 'Error':

# grep -i 'error' /var/log/iredapd.log

Let's see what the problem is.

3

Re: iredapd -1.5.0 problems

ZhangHuangbin wrote:

Let's see what the problem is.

I have added the patch but haven't seen any wrong doing yet. I will return as soon as I experience something but it appear to be something which dependents on some external event(s).

4

Re: iredapd -1.5.0 problems

OK, let's keep tracking this issue.

5

Re: iredapd -1.5.0 problems

Hi,
I have the same problem and I have add your patch.
You can see in logs now this message :

grep -i 'error' /var/log/iredapd.log
2015-05-21 15:33:13 INFO Error: [Errno 9] Bad file descriptor
2015-05-21 16:03:14 INFO Error: [Errno 9] Bad file descriptor

I need to restart but go down after some time

I can do other test if you need

6

Re: iredapd -1.5.0 problems

Hi tofinus,

Thanks for the feedback. But "bad file descriptior" is meaningless, i have no idea what the problem is.
Could you please try below fix based on patched iredapd.py?

*) Find below line in /opt/iredapd/iredapd.py:

asyncore.loop()

*) Updated it to:

asyncore.loop(use_poll=True)

*) Restart iredapd service and keep tracking this issue.

By the way, which Python version are you running? Check it with command:

# python -V

7

Re: iredapd -1.5.0 problems

OK, here's the final solution, found the patch from Fail2ban project.
https://bitbucket.org/zhb/iredapd/commi … c101b5c25b

Steps to apply this change on your server:

*) Download patched iredapd.py:

# cd /tmp/
# wget https://bitbucket.org/zhb/iredapd/raw/dbaa05f2bfc1bcb07faa43e06e1648c101b5c25b/iredapd.py

*) Override old one, set correct file owner:

# mv /tmp/iredapd.py /opt/iredapd/iredapd.py
# chown iredapd:iredapd /opt/iredapd/iredapd.py

*) Restart iRedAPD service.

8

Re: iredapd -1.5.0 problems

Hi,

The problem is still here after the change that you say.

warning: problem talking to server 127.0.0.1:7777: Connection refused
May 22 08:55:12 srvz postfix/smtpd[22738]: warning: connect to 127.0.0.1:7777: Connection refused
May 22 08:55:12 srvz postfix/smtpd[22738]: warning: problem talking to server 127.0.0.1:7777: Connection refused
May 22 08:55:12 srvz postfix/smtpd[22738]: NOQUEUE: reject: RCPT from localhost[127.0.0.1]: 451 4.3.5 Server configuration problem; from=<xxx@domaine.xxx> to=<xxx#xxxx.xx> proto=ESMTP helo=<xxxx.xxx.xxx>

My system :
Debian 8
Python 2.7.9
PHP 5.6.7-1
iredmail 0.9.1
iRedAPD (version: 1.5.0, backend: mysql), listening on 127.0.0.1:7777.


2015-05-22 09:15:21 INFO Starting iRedAPD (version: 1.5.0, backend: mysql), listening on 127.0.0.1:7777.
2015-05-22 09:15:21 INFO Loading plugin: reject_null_sender
2015-05-22 09:15:21 INFO Loading plugin: amavisd_message_size_limit
2015-05-22 09:15:21 INFO Loading plugin: amavisd_wblist
2015-05-22 09:15:21 INFO Loading plugin: sql_alias_access_policy
2015-05-22 09:19:03 INFO [127.0.0.1] RCPT,cxxx@xxx.com -> xxx.xxx@doamnine.ch, DUNNO
2015-05-22 10:30:37 INFO Error: [Errno 9] Bad file descriptor

9

Re: iredapd -1.5.0 problems

Served only one request and exited, too bad. Could you please show me output of below command?

# tail -20f /opt/iredapd/iredapd.py

Also, could you please try to replace 'use_poll=True' by 'use_poll=False', restart iredapd service and try again?

10 (edited by mir 2015-05-24 00:34:01)

Re: iredapd -1.5.0 problems

Hi all,

There seems to be problem with iredapd even after applying latest patch. This morning my script kicked in an restartet iredapd.

Nothing was logged to /var/log/iredapd.log even though the extra log instruction is present:
    # Starting loop.
    try:
        # There's a bug report for Python 2.6/3.0 that `use_poll=True` yields
        # some 2.5 incompatibilities:
        if (sys.version_info >= (2, 7) and sys.version_info < (2, 8)) \
           or (sys.version_info >= (3, 4)):     # if python 2.7 ...
            # workaround for the "Bad file descriptor" issue on Python 2.7, gh-161
            asyncore.loop(use_poll=True)
        else:
            # fixes the "Unexpected communication problem" issue on Python 2.6 and 3.0
            asyncore.loop(use_poll=False)

    except KeyboardInterrupt:
        pass
    except Exception, e:
    logging.info('Error: ' + str(e))

From syslog:
May 23 09:50:01 iredmail CRON[16950]: (root) CMD ([ -x /usr/local/sbin/check_iredapd.sh ] && /usr/local/sbin/check_iredapd.sh)
May 23 09:50:01 iredmail CRON[16949]: (www-data) CMD ([ -x /usr/share/awstats/tools/update.sh ] && /usr/share/awstats/tools/update.sh)
May 23 09:50:11 iredmail postfix/smtpd[16877]: connect from mail139.wdc02.mcdlv.net[205.201.130.139]
May 23 09:50:12 iredmail postfix/smtpd[16877]: warning: connect to 127.0.0.1:7777: Connection refused
May 23 09:50:12 iredmail postfix/smtpd[16877]: warning: problem talking to server 127.0.0.1:7777: Connection refused
May 23 09:51:01 iredmail CRON[17063]: (root) CMD ([ -x /usr/local/sbin/check_iredapd.sh ] && /usr/local/sbin/check_iredapd.sh)

From iredapd.log:
2015-05-23 09:29:15 INFO [94.136.31.133] RCPT, x@foo.bar -> mir@my.dom, DUNNO
2015-05-23 09:51:01 INFO Starting iRedAPD (version: 1.5.0, backend: ldap), listening on 127.0.0.1:77
77.
2015-05-23 09:51:01 INFO Loading plugin: reject_null_sender
2015-05-23 09:51:01 INFO Loading plugin: ldap_maillist_access_policy
2015-05-23 09:51:01 INFO Loading plugin: ldap_amavisd_block_blacklisted_senders
2015-05-23 09:51:01 INFO Loading plugin: ldap_recipient_restrictions
2015-05-23 09:51:01 INFO Loading plugin: amavisd_message_size_limit
2015-05-23 10:00:23 INFO [205.201.130.163] RCPT, <all so removed here>, DUNNO

Above facts gives me reason to expect that what we are seeing is that either iredapd or python simply is crashing unexpected since the last catchall 'except Exception, e:' is never reached.

11

Re: iredapd -1.5.0 problems

Not sure what the root cause is, might be some issue with Python's asyncore module (or socket).
I've no idea yet, sorry. sad

12 (edited by camel1cz 2015-05-24 15:32:06)

Re: iredapd -1.5.0 problems

Most likely the same problem striked also on my server this morning sad

I read thru the end of the logs and I found out this:

- last 2 emails was from the same (external) sender to the same (local) recipient... they was from postmaster@... maybe some concurrency happened?
- just before the session dump of the first of this 2 emails in the log, I see this line:

2015-05-24 06:11:03 DEBUG Connect from 127.0.0.1, port 41628.

Any hint based on this?

@mir:
I deployed your watchdog script for iRedAPd - thanks a lot! I modified it a bit, to send me email when iRedAPd dies:

#!/bin/sh
CMD=/etc/init.d/iredapd
ADMINEMAIL=<fill your email address>
FROMEMAIL=<fill from email address>

[ -x $CMD ] || exit 0
STATUS=$($CMD status)
if [ $? != 0 ]
then
  $CMD start 2>&1 > /dev/null
  sleep 10
  mail -r $FROMEMAIL -s "iRedAPd died!" $ADMINEMAIL < /dev/null
fi
exit 0

13

Re: iredapd -1.5.0 problems

ZhangHuangbin wrote:

Not sure what the root cause is, might be some issue with Python's asyncore module (or socket).
I've no idea yet, sorry. sad

It seems there have been issues related to asyncore in the past and somewhere I think I read that there is an unfixed bug in python 2.7 which cannot be fixed since it requires adding new features and python 2.7 is in a state of feature freeze.

Could the code be rewritten without using asyncore and maybe use twisted instead?
How big a code change will it require to port it to using twisted?

14

Re: iredapd -1.5.0 problems

Hi Mir,

Are you a Python developer?

iRedAPD itself is simple and easy, we just need to rewrite the socket/daemon part, maybe use 'SocketServer' module (not sure whether it's based on asyncore/asynchat yet, need to check doc), but i personally don't want to involve big component like Twisted in iRedAPD.

Or, maybe we rewrite it in Erlang?

15

Re: iredapd -1.5.0 problems

P.S. above fix was from Fail2ban:
https://github.com/fail2ban/fail2ban/bl … er.py#L151

If Fail2ban works fine, i have no idea why iRedAPD doesn't work as expected.

16

Re: iredapd -1.5.0 problems

The worst part of this bug is, i don't know how to reproduce this issue, and have to just wait.
If we can reproduce it, at least we can try to dive into source code and (probably) figure it out.

17

Re: iredapd -1.5.0 problems

ZhangHuangbin wrote:

The worst part of this bug is, i don't know how to reproduce this issue, and have to just wait.
If we can reproduce it, at least we can try to dive into source code and (probably) figure it out.

This thread mentions the bug I am referring to: https://github.com/fail2ban/fail2ban/issues/161

Seems to be related to socket timeout.

18

Re: iredapd -1.5.0 problems

ZhangHuangbin wrote:

P.S. above fix was from Fail2ban:
https://github.com/fail2ban/fail2ban/bl … er.py#L151

If Fail2ban works fine, i have no idea why iRedAPD doesn't work as expected.

Maybe use the work-around mentioned in my former post. Change timeout from 30 to 60 sec.

19

Re: iredapd -1.5.0 problems

ZhangHuangbin wrote:

Hi Mir,

Are you a Python developer?

iRedAPD itself is simple and easy, we just need to rewrite the socket/daemon part, maybe use 'SocketServer' module (not sure whether it's based on asyncore/asynchat yet, need to check doc), but i personally don't want to involve big component like Twisted in iRedAPD.

Or, maybe we rewrite it in Erlang?

SocketServer is not based on asyncore/asynchat. It is a python wrapper over a pure C socket implementation.

Regarding Erlang: I have no experience with Erlang so in this department I am totally blank;-)

20

Re: iredapd -1.5.0 problems

mir wrote:
ZhangHuangbin wrote:

P.S. above fix was from Fail2ban:
https://github.com/fail2ban/fail2ban/bl … er.py#L151

If Fail2ban works fine, i have no idea why iRedAPD doesn't work as expected.

Maybe use the work-around mentioned in my former post. Change timeout from 30 to 60 sec.

I have added this to iredapd.py to see whether increasing timeout will solve the issue.

    # Starting loop.
    try:
        # There's a bug report for Python 2.6/3.0 that `use_poll=True` yields
        # some 2.5 incompatibilities:
        if (sys.version_info >= (2, 7) and sys.version_info < (2, 8)) \
           or (sys.version_info >= (3, 4)):     # if python 2.7 ...
            # workaround for the "Bad file descriptor" issue on Python 2.7, gh-161
            asyncore.loop(timeout=60.0, use_poll=True)
        else:
            # fixes the "Unexpected communication problem" issue on Python 2.6 and 3.0
            asyncore.loop(use_poll=False)

    except KeyboardInterrupt:
        pass
    except Exception, e:
        logging.info('Error: ' + str(e))

21

Re: iredapd -1.5.0 problems

mir wrote:

This thread mentions the bug I am referring to: https://github.com/fail2ban/fail2ban/issues/161

The 'use_poll=True' is the fix for this bug.

By the way, how often does it happen on your server? It happens on my iRedMail server (CentOS 6.x, Python-2.6.6) only once, and i was not lucky enough to catch them with debug message.

22

Re: iredapd -1.5.0 problems

ZhangHuangbin wrote:
mir wrote:

This thread mentions the bug I am referring to: https://github.com/fail2ban/fail2ban/issues/161

By the way, how often does it happen on your server? It happens on my iRedMail server (CentOS 6.x, Python-2.6.6) only once, and i was not lucky enough to catch them with debug message.

On average 2 times a day.

23

Re: iredapd -1.5.0 problems

I think the problem is not 'timeout=30', it might be this line in iredapd.py:

daemon.daemonize()

could you help change it to below line and keep monitoring?

daemon.daemonize(noClose=True)

I'm running iRedAPD with this change for about 12 hours, no problem so far. but my server cannot frequently reproduce this issue, so i'm afraid that this doesn't help much on my server.

24

Re: iredapd -1.5.0 problems

hi mir,

Any progress? Did it occur again?

25 (edited by mir 2015-05-26 00:25:15)

Re: iredapd -1.5.0 problems

It has been running now for 8 hours without a restart. I will wait until tomorrow to make any final conclusion.