Why is BIND giving me a SERVFAIL in this case? (Notes inside)

Posted by imaginative on Server Fault See other posts from Server Fault or by imaginative
Published on 2010-04-01T14:15:18Z Indexed on 2010/04/01 14:23 UTC
Read the original article Hit count: 203

Filed under:
|
|
|

Woke up this morning to a bunch of the following:

root@foo:/etc/bind# dig @1.2.3.4 foo.example.com

; <<>> DiG 9.6.1-P2 <<>> @1.2.3.4 foo.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 36121
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;;foo.example.com.   IN A

;; Query time: 0 msec
;; SERVER: 1.2.3.4#53(1.2.3.4)
;; WHEN: Thu Apr  1 09:57:59 2010
;; MSG SIZE  rcvd: 31

Some background on the fictitious "1.2.3.4". It's a slave name server in my nameserver "farm". Technically I have ns1 (being the master) and ns2/ns3. Currently ns1/ns2 are down for maintenance, so I left ns3 at it serving live traffic. That's the point, DNS is supposed to be resilient.

Now the odd part is, "1.2.3.4" was serving requests for example.com just fine for the last 4-5 days. This morning I get a phone call that it's non-responsive. After investigation I see the message you see above, SERVFAIL.

I looked into the zone file and saw the following:

example.com               IN SOA  ns1.example.com. hostmaster.mail.example.com. (

I wondered if at this point that the nameserver thought it was not authoritative over example.com and adjusted it to the following:

example.com               IN SOA  ns3.example.com. hostmaster.mail.example.com. (

After that, it started responding again for all authoritative queries for example.com. I have no idea why. I thought these things were supposed to be normalized upon zone transfer from ns1 -> ns3?

Can someone please example why this happened and how to prevent it from happening in the future? I've never had a similar problem, and because I don't understand it well, I might be missing some critical information in this question. So please let me know if I can further add any detail to make things clearer as well.

One more thing to note: I have other domains that I'm authoritative for that have their SOA still saying ns1.example.com. and not ns3.example.com. Those domains are serving requests just fine! Is it a matter of time before they stop also and I have to change SOA to ns3.example.com? Is this also only required because ns1 and ns2 are currently offline?

© Server Fault or respective owner

Related posts about bind

Related posts about bind9