Postfix logs missing information on delivery status (postfix/smtp message)

Posted by hegemon on Server Fault See other posts from Server Fault or by hegemon
Published on 2012-06-06T08:08:06Z Indexed on 2012/06/06 22:42 UTC
Read the original article Hit count: 277

Filed under:
|
|
|

I noticed a problem with postfix logs, that information on some of emails delivery status is missing. The issue affects about 1% of emails.

"Healthy" log:

<server># grep 8EB992EFBB44 postfix_log/mail04.log
Jun  5 03:09:29 mail04 postfix/smtpd[8537]: 8EB992EFBB44: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun  5 03:09:29 mail04 postfix/cleanup[34349]: 8EB992EFBB44: message-id=<[email protected]>
Jun  5 03:12:02 mail04 postfix/qmgr[76377]: 8EB992EFBB44: from=<[email protected]>, size=48845, nrcpt=1 (queue active)
Jun  5 03:15:12 mail04 postfix/smtp[35058]: 8EB992EFBB44: to=<[email protected]>, relay=mx.baz.com[xxx.xxx.xxx.xxx]:25, conn_use=70, delay=343, delays=153/190/0/0.24, dsn=2.0.0, status=sent (250 ok)
Jun  5 03:15:12 mail04 postfix/qmgr[76377]: 8EB992EFBB44: removed

"Broken" log:

<server># grep F3C362EF37CA postfix_log/mail04.log
Jun  5 04:03:27 mail04 postfix/smtpd[39666]: F3C362EF37CA: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun  5 04:03:27 mail04 postfix/cleanup[41287]: F3C362EF37CA: message-id=<[email protected]>
Jun  5 04:03:28 mail04 postfix/qmgr[76377]: F3C362EF37CA: from=<[email protected]>, size=48892, nrcpt=1 (queue active)
** here should be a log line from postfix/smtp but there is none **
Jun  5 04:03:29 mail04 postfix/qmgr[76377]: F3C362EF37CA: removed

Background information:

system: FreeBSD xxx.xxx.xxx 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64

Postfix is installed inside jail. Logs are on the same machine, log dir is mounted thru nullfs. The site has spikes of heavy load, causing disks (local) to run at 100%.

Update

The log is rotated daily, current size is ~ 500MB.

I made a test by queuing 99000 messages to same destination (in order to rule out dns/network/mx issues). 5715 messages don't have any DSN record. Failed messages queue time is spread evenly over time, i don't see any time-bound issues.

Some undelivered emails:

  envelopeid  |       processed_time
--------------+----------------------------
 8D7652EF3BAE | 2012-06-06 13:19:11.072715
 DD53A2EF3C5C | 2012-06-06 13:33:24.374783
 8C52F2EF4E3F | 2012-06-06 13:39:15.810616
 BBC572EF525C | 2012-06-06 13:44:22.762812
 E95822EF54D1 | 2012-06-06 13:52:01.134533
 839DD2EF4FBB | 2012-06-06 14:13:48.511236
 017EE2EF6234 | 2012-06-06 15:04:48.618963

Those are a few picks, such records of undelivered email occur almost every second.

<server># egrep '(8D7652EF3BAE|BBC572EF525C|017EE2EF6234)' mail04.log
Jun  6 13:19:10 mail04 postfix/smtpd[20350]: 8D7652EF3BAE: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun  6 13:19:10 mail04 postfix/cleanup[21024]: 8D7652EF3BAE: message-id=<[email protected]>
Jun  6 13:19:10 mail04 postfix/qmgr[7939]: 8D7652EF3BAE: from=<[email protected]>, size=63718, nrcpt=1 (queue active)
Jun  6 13:19:11 mail04 postfix/qmgr[7939]: 8D7652EF3BAE: removed
Jun  6 13:44:22 mail04 postfix/smtpd[20346]: BBC572EF525C: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun  6 13:44:22 mail04 postfix/cleanup[24811]: BBC572EF525C: message-id=<[email protected]>
Jun  6 13:44:22 mail04 postfix/qmgr[7939]: BBC572EF525C: from=<[email protected]>, size=63758, nrcpt=1 (queue active)
Jun  6 15:04:49 mail04 postfix/smtpd[20344]: 017EE2EF6234: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun  6 15:04:49 mail04 postfix/cleanup[35585]: 017EE2EF6234: message-id=<[email protected]>
Jun  6 15:04:49 mail04 postfix/qmgr[7939]: 017EE2EF6234: from=<[email protected]>, size=63706, nrcpt=1 (queue active)
<server>#
<server># find /var/spool/postfix/active/ -type f -print | wc -l
       1
<server>#

IMPORTANT: As you can see above some of emails doesn't event have the removed line.

© Server Fault or respective owner

Related posts about postfix

Related posts about logging