BIND sometimes crashes even if ticket #142 is fixed.
It is extremely hard to reproduce and as far as we know it happens only when stress test described in #142 is running on a loaded multi-core machine for couple of hours.
Further investigation is needed because we have no idea what is causing it. I do not see any obvious problem in bind-dyndb-ldap code but it certainly could be caused by some very weird interaction between bind-dyndb-ldap and BIND internals.
Core was generated by `/usr/sbin/named -u named -fg'. Program terminated with signal SIGSEGV, Segmentation fault. thr #0 0x00007ffff50511fd in poll () at ../sysdeps/unix/syscall-template.S:81 81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) thread apply all bt Thread 7 (Thread 0x7ffff7fde800 (LWP 12412)): #0 0x00007ffff4f90cb7 in do_sigsuspend (set=0x7fffffffe2b0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:29 #1 __GI___sigsuspend (set=set@entry=0x7fffffffe2b0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:43 #2 0x00007ffff62063e0 in isc__app_ctxrun (ctx0=0x7ffff6438480 <isc_g_appctx>) at app.c:726 #3 0x0000555555578b10 in main (argc=<optimized out>, argv=<optimized out>) at ./main.c:1250 Thread 6 (Thread 0x7ffff3c5c700 (LWP 12416)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007ffff61fd8d0 in dispatch (manager=0x7ffff7fa2010) at task.c:1069 #2 run (uap=0x7ffff7fa2010) at task.c:1294 #3 0x00007ffff5da952a in start_thread (arg=0x7ffff3c5c700) at pthread_create.c:310 #4 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 5 (Thread 0x7ffff345b700 (LWP 12417)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007ffff61fcd0b in isc__task_beginexclusive (task0=<optimized out>) at task.c:1695 #2 0x00007ffff0f79336 in run_exclusive_enter (inst=inst@entry=0x7ffff7fa3160, statep=statep@entry=0x7ffff345aacc) at lock.c:52 #3 0x00007ffff0f6e63c in configure_zone_forwarders (entry=entry@entry=0x7ffff7fbd410, inst=inst@entry=0x7ffff7fa3160, name=0x7ffff7bc2000 <root>) at ldap_helper.c:1732 #4 0x00007ffff0f6edf9 in ldap_parse_configentry (inst=0x7ffff7fa3160, entry=0x7ffff7fbd410) at ldap_helper.c:1753 #5 update_config (task=0x7ffff7fb6f50, event=0x7ffff7fb5f78) at ldap_helper.c:4501 #6 0x00007ffff61fdabf in dispatch (manager=0x7ffff7fa2010) at task.c:1122 #7 run (uap=0x7ffff7fa2010) at task.c:1294 #8 0x00007ffff5da952a in start_thread (arg=0x7ffff345b700) at pthread_create.c:310 #9 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
sockaddr=sockaddr@entry=0xdededededededede suggests use-after-free problem here:
sockaddr=sockaddr@entry=0xdededededededede
Thread 4 (Thread 0x7ffff2c5a700 (LWP 12418)): #0 isc_sockaddr_pf (sockaddr=sockaddr@entry=0xdededededededede) at sockaddr.c:352 #1 0x00007ffff78d92f0 in fctx_getaddresses (fctx=fctx@entry=0x7ffff000c010, badcache=badcache@entry=isc_boolean_false) at resolver.c:2946 #2 0x00007ffff78dc16f in fctx_try (fctx=fctx@entry=0x7ffff000c010, retrying=retrying@entry=isc_boolean_false, badcache=badcache@entry=isc_boolean_false) at resolver.c:3354 #3 0x00007ffff78dc600 in fctx_start (task=<optimized out>, event=0x7ffff000c0a8) at resolver.c:3738 #4 0x00007ffff61fdabf in dispatch (manager=0x7ffff7fa2010) at task.c:1122 #5 run (uap=0x7ffff7fa2010) at task.c:1294 #6 0x00007ffff5da952a in start_thread (arg=0x7ffff2c5a700) at pthread_create.c:310 #7 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 3 (Thread 0x7ffff2459700 (LWP 12419)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007ffff621ab08 in isc_condition_waituntil (c=c@entry=0x7ffff7fa3078, m=m@entry=0x7ffff7fa3028, t=t@entry=0x7ffff7fa306c) at condition.c:66 #2 0x00007ffff6202d1b in run (uap=0x7ffff7fa3010) at timer.c:810 #3 0x00007ffff5da952a in start_thread (arg=0x7ffff2459700) at pthread_create.c:310 #4 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 2 (Thread 0x7ffff1c58700 (LWP 12420)): #0 0x00007ffff505cda3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff6211ffe in watcher (uap=0x7ffff7fa5010) at socket.c:4193 #2 0x00007ffff5da952a in start_thread (arg=0x7ffff1c58700) at pthread_create.c:310 #3 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 1 (Thread 0x7fffe967f700 (LWP 12421)): #0 0x00007ffff50511fd in poll () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff0d31009 in poll (__timeout=<optimized out>, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:46 #2 ldap_int_select (ld=<optimized out>, timeout=<optimized out>) at os-ip.c:1138 #3 0x00007ffff0d1b555 in wait4msg (result=0x7fffe967ea08, timeout=<optimized out>, all=2, msgid=4, ld=0x7fffec313980) at result.c:312 #4 ldap_result (ld=0x7fffec313980, msgid=4, all=all@entry=2, timeout=<optimized out>, result=result@entry=0x7fffe967ea68) at result.c:117 #5 0x00007ffff0d46aec in ldap_sync_poll (ls=0x7fffe40008c0) at ldap_sync.c:881 #6 0x00007ffff0f712d3 in ldap_syncrepl_watcher (arg=0x7ffff7fa3160) at ldap_helper.c:5318 #7 0x00007ffff5da952a in start_thread (arg=0x7fffe967f700) at pthread_create.c:310 #8 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/named -u named -fg'. Program terminated with signal SIGABRT, Aborted. #0 0x00007ffff50511fd in poll () at ../sysdeps/unix/syscall-template.S:81 81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) Missing separate debuginfos, use: debuginfo-install nss-3.17.3-2.fc21.x86_64 nss-softokn-freebl-3.17.3-1.fc21.x86_64 nss-util-3.17.3-1.fc21.x86_64 sssd-client-1.12.3-2.fc21.x86_64 (gdb) thread apply all bt Thread 7 (Thread 0x7ffff7fde800 (LWP 31870)): #0 0x00007ffff4f90cb7 in do_sigsuspend (set=0x7fffffffe2b0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:29 #1 __GI___sigsuspend (set=set@entry=0x7fffffffe2b0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:43 #2 0x00007ffff62063e0 in isc__app_ctxrun (ctx0=0x7ffff6438480 <isc_g_appctx>) at app.c:726 #3 0x0000555555578b10 in main (argc=<optimized out>, argv=<optimized out>) at ./main.c:1250 Thread 6 (Thread 0x7ffff3c5c700 (LWP 31893)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007ffff61fcd0b in isc__task_beginexclusive (task0=<optimized out>) at task.c:1695 #2 0x00007ffff0f79336 in run_exclusive_enter (inst=inst@entry=0x7ffff7fa3160, statep=statep@entry=0x7ffff3c5bacc) at lock.c:52 #3 0x00007ffff0f6e63c in configure_zone_forwarders (entry=entry@entry=0x7ffff7fbd510, inst=inst@entry=0x7ffff7fa3160, name=0x7ffff7bc2000 <root>) at ldap_helper.c:1732 #4 0x00007ffff0f6edf9 in ldap_parse_configentry (inst=0x7ffff7fa3160, entry=0x7ffff7fbd510) at ldap_helper.c:1753 #5 update_config (task=0x7ffff7fb6f50, event=0x7ffff00ab2b8) at ldap_helper.c:4501 #6 0x00007ffff61fdabf in dispatch (manager=0x7ffff7fa2010) at task.c:1122 #7 run (uap=0x7ffff7fa2010) at task.c:1294 #8 0x00007ffff5da952a in start_thread (arg=0x7ffff3c5c700) at pthread_create.c:310 #9 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 5 (Thread 0x7ffff345b700 (LWP 31895)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007ffff61fd8d0 in dispatch (manager=0x7ffff7fa2010) at task.c:1069 #2 run (uap=0x7ffff7fa2010) at task.c:1294 #3 0x00007ffff5da952a in start_thread (arg=0x7ffff345b700) at pthread_create.c:310 #4 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 4 (Thread 0x7ffff2c5a700 (LWP 31896)): #0 0x00007ffff4f908c7 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007ffff4f9252a in __GI_abort () at abort.c:89 #2 0x000055555558a244 in library_fatal_error (file=<optimized out>, line=431, format=0x7ffff62213cb "unknown address family: %d", args=0x7ffff2c598a0) at ./main.c:270 #3 0x00007ffff61dd770 in isc_error_fatal (file=file@entry=0x7ffff6221380 "sockaddr.c", line=line@entry=431, format=<optimized out>) at error.c:74 #4 0x00007ffff61f9bd9 in isc_sockaddr_getport (sockaddr=sockaddr@entry=0x7ffff1196170) at sockaddr.c:431 #5 0x00007ffff7812b87 in dns_adb_findaddrinfo (adb=0x7ffff11ce010, sa=sa@entry=0x7ffff1196170, addrp=addrp@entry=0x7ffff2c59a48, now=now@entry=0) at adb.c:4445 #6 0x00007ffff78d9343 in fctx_getaddresses (fctx=fctx@entry=0x7fffe8c36010, badcache=badcache@entry=isc_boolean_false) at resolver.c:2954 #7 0x00007ffff78dc16f in fctx_try (fctx=fctx@entry=0x7fffe8c36010, retrying=retrying@entry=isc_boolean_false, badcache=badcache@entry=isc_boolean_false) at resolver.c:3354 #8 0x00007ffff78dc600 in fctx_start (task=<optimized out>, event=0x7fffe8c360a8) at resolver.c:3738 #9 0x00007ffff61fdabf in dispatch (manager=0x7ffff7fa2010) at task.c:1122 #10 run (uap=0x7ffff7fa2010) at task.c:1294 #11 0x00007ffff5da952a in start_thread (arg=0x7ffff2c5a700) at pthread_create.c:310 #12 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 3 (Thread 0x7ffff2459700 (LWP 31897)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007ffff621ab08 in isc_condition_waituntil (c=c@entry=0x7ffff7fa3078, m=m@entry=0x7ffff7fa3028, t=t@entry=0x7ffff7fa306c) at condition.c:66 #2 0x00007ffff6202d1b in run (uap=0x7ffff7fa3010) at timer.c:810 #3 0x00007ffff5da952a in start_thread (arg=0x7ffff2459700) at pthread_create.c:310 #4 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 2 (Thread 0x7ffff1c58700 (LWP 31898)): #0 0x00007ffff505cda3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff6211ffe in watcher (uap=0x7ffff7fa5010) at socket.c:4193 #2 0x00007ffff5da952a in start_thread (arg=0x7ffff1c58700) at pthread_create.c:310 #3 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 1 (Thread 0x7fffe967f700 (LWP 31911)): #0 0x00007ffff50511fd in poll () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff0d31009 in poll (__timeout=<optimized out>, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:46 #2 ldap_int_select (ld=<optimized out>, timeout=<optimized out>) at os-ip.c:1138 #3 0x00007ffff0d1b555 in wait4msg (result=0x7fffe967ea08, timeout=<optimized out>, all=2, msgid=4, ld=0x7fffec313980) at result.c:312 #4 ldap_result (ld=0x7fffec313980, msgid=4, all=all@entry=2, timeout=<optimized out>, result=result@entry=0x7fffe967ea68) at result.c:117 #5 0x00007ffff0d46aec in ldap_sync_poll (ls=0x7fffe40008c0) at ldap_sync.c:881 #6 0x00007ffff0f712d3 in ldap_syncrepl_watcher (arg=0x7ffff7fa3160) at ldap_helper.c:5318 #7 0x00007ffff5da952a in start_thread (arg=0x7fffe967f700) at pthread_create.c:310 #8 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 (gdb) t 4 [Switching to thread 4 (Thread 0x7ffff2c5a700 (LWP 31896))] #0 0x00007ffff4f908c7 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007ffff4f908c7 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007ffff4f9252a in __GI_abort () at abort.c:89 #2 0x000055555558a244 in library_fatal_error (file=<optimized out>, line=431, format=0x7ffff62213cb "unknown address family: %d", args=0x7ffff2c598a0) at ./main.c:270 #3 0x00007ffff61dd770 in isc_error_fatal (file=file@entry=0x7ffff6221380 "sockaddr.c", line=line@entry=431, format=<optimized out>) at error.c:74 #4 0x00007ffff61f9bd9 in isc_sockaddr_getport (sockaddr=sockaddr@entry=0x7ffff1196170) at sockaddr.c:431 #5 0x00007ffff7812b87 in dns_adb_findaddrinfo (adb=0x7ffff11ce010, sa=sa@entry=0x7ffff1196170, addrp=addrp@entry=0x7ffff2c59a48, now=now@entry=0) at adb.c:4445 #6 0x00007ffff78d9343 in fctx_getaddresses (fctx=fctx@entry=0x7fffe8c36010, badcache=badcache@entry=isc_boolean_false) at resolver.c:2954 #7 0x00007ffff78dc16f in fctx_try (fctx=fctx@entry=0x7fffe8c36010, retrying=retrying@entry=isc_boolean_false, badcache=badcache@entry=isc_boolean_false) at resolver.c:3354 #8 0x00007ffff78dc600 in fctx_start (task=<optimized out>, event=0x7fffe8c360a8) at resolver.c:3738 #9 0x00007ffff61fdabf in dispatch (manager=0x7ffff7fa2010) at task.c:1122 #10 run (uap=0x7ffff7fa2010) at task.c:1294 #11 0x00007ffff5da952a in start_thread (arg=0x7ffff2c5a700) at pthread_create.c:310 #12 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 (gdb) up ... (gdb) up #4 0x00007ffff61f9bd9 in isc_sockaddr_getport (sockaddr=sockaddr@entry=0x7ffff1196170) at sockaddr.c:431 431 FATAL_ERROR(__FILE__, __LINE__, (gdb) p *sockaddr $1 = {type = {sa = {sa_family = 2, sa_data = "\000\065\b\b\b\b\000\000\000\000\000\000\000"}, sin = {sin_family = 2, sin_port = 13568, sin_addr = {s_addr = 134744072}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {sin6_family = 2, sin6_port = 13568, sin6_flowinfo = 134744072, sin6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}, ss = {ss_family = 2, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}, sunix = {sun_family = 2, sun_path = "\000\065\b\b\b\b", '\000' <repeats 101 times>}}, length = 16, link = {prev = 0xffffffffffffffff, next = 0xffffffffffffffff}}
Reproducible on RHEL-6 and RHEL-7 and all Fedora versions (20+) also with the latest BIND (9.10.2rc2)
This should be extremely hard to reproduce and extremely rare event so we decided to handle it in the same way as #114: Put into Backlog and wait until more data or better tools are available.
Metadata Update from @pspacek: - Issue assigned to someone - Issue set to the milestone: The Backlog
Log in to comment on this ticket.