#156 Race condition and crash after fixing #142 and [RT #38470] in adb.c
Opened by pspacek. Modified

BIND sometimes crashes even if ticket #142 is fixed.

It is extremely hard to reproduce and as far as we know it happens only when stress test described in #142 is running on a loaded multi-core machine for couple of hours.

Further investigation is needed because we have no idea what is causing it. I do not see any obvious problem in bind-dyndb-ldap code but it certainly could be caused by some very weird interaction between bind-dyndb-ldap and BIND internals.

Core was generated by `/usr/sbin/named -u named -fg'.
Program terminated with signal SIGSEGV, Segmentation fault.
thr #0  0x00007ffff50511fd in poll () at ../sysdeps/unix/syscall-template.S:81
81  T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) thread apply all bt

Thread 7 (Thread 0x7ffff7fde800 (LWP 12412)):
#0  0x00007ffff4f90cb7 in do_sigsuspend (set=0x7fffffffe2b0) at
../sysdeps/unix/sysv/linux/sigsuspend.c:29
#1  __GI___sigsuspend (set=set@entry=0x7fffffffe2b0) at
../sysdeps/unix/sysv/linux/sigsuspend.c:43
#2  0x00007ffff62063e0 in isc__app_ctxrun (ctx0=0x7ffff6438480 <isc_g_appctx>)
at app.c:726
#3  0x0000555555578b10 in main (argc=<optimized out>, argv=<optimized out>) at
./main.c:1250

Thread 6 (Thread 0x7ffff3c5c700 (LWP 12416)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff61fd8d0 in dispatch (manager=0x7ffff7fa2010) at task.c:1069
#2  run (uap=0x7ffff7fa2010) at task.c:1294
#3  0x00007ffff5da952a in start_thread (arg=0x7ffff3c5c700) at pthread_create.c:310
#4  0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7ffff345b700 (LWP 12417)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff61fcd0b in isc__task_beginexclusive (task0=<optimized out>) at
task.c:1695
#2  0x00007ffff0f79336 in run_exclusive_enter (inst=inst@entry=0x7ffff7fa3160,
statep=statep@entry=0x7ffff345aacc) at lock.c:52
#3  0x00007ffff0f6e63c in configure_zone_forwarders
(entry=entry@entry=0x7ffff7fbd410, inst=inst@entry=0x7ffff7fa3160,
name=0x7ffff7bc2000 <root>) at ldap_helper.c:1732
#4  0x00007ffff0f6edf9 in ldap_parse_configentry (inst=0x7ffff7fa3160,
entry=0x7ffff7fbd410) at ldap_helper.c:1753
#5  update_config (task=0x7ffff7fb6f50, event=0x7ffff7fb5f78) at ldap_helper.c:4501
#6  0x00007ffff61fdabf in dispatch (manager=0x7ffff7fa2010) at task.c:1122
#7  run (uap=0x7ffff7fa2010) at task.c:1294
#8  0x00007ffff5da952a in start_thread (arg=0x7ffff345b700) at pthread_create.c:310
#9  0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

sockaddr=sockaddr@entry=0xdededededededede suggests use-after-free problem here:

Thread 4 (Thread 0x7ffff2c5a700 (LWP 12418)):
#0  isc_sockaddr_pf (sockaddr=sockaddr@entry=0xdededededededede) at
sockaddr.c:352
#1  0x00007ffff78d92f0 in fctx_getaddresses (fctx=fctx@entry=0x7ffff000c010,
badcache=badcache@entry=isc_boolean_false) at resolver.c:2946
#2  0x00007ffff78dc16f in fctx_try (fctx=fctx@entry=0x7ffff000c010,
retrying=retrying@entry=isc_boolean_false,
badcache=badcache@entry=isc_boolean_false) at resolver.c:3354
#3  0x00007ffff78dc600 in fctx_start (task=<optimized out>,
event=0x7ffff000c0a8) at resolver.c:3738
#4  0x00007ffff61fdabf in dispatch (manager=0x7ffff7fa2010) at task.c:1122
#5  run (uap=0x7ffff7fa2010) at task.c:1294
#6  0x00007ffff5da952a in start_thread (arg=0x7ffff2c5a700) at pthread_create.c:310
#7  0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7ffff2459700 (LWP 12419)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007ffff621ab08 in isc_condition_waituntil (c=c@entry=0x7ffff7fa3078,
m=m@entry=0x7ffff7fa3028, t=t@entry=0x7ffff7fa306c) at condition.c:66
#2  0x00007ffff6202d1b in run (uap=0x7ffff7fa3010) at timer.c:810
#3  0x00007ffff5da952a in start_thread (arg=0x7ffff2459700) at pthread_create.c:310
#4  0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7ffff1c58700 (LWP 12420)):
#0  0x00007ffff505cda3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff6211ffe in watcher (uap=0x7ffff7fa5010) at socket.c:4193
#2  0x00007ffff5da952a in start_thread (arg=0x7ffff1c58700) at pthread_create.c:310
#3  0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7fffe967f700 (LWP 12421)):
#0  0x00007ffff50511fd in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff0d31009 in poll (__timeout=<optimized out>, __nfds=<optimized
out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
#2  ldap_int_select (ld=<optimized out>, timeout=<optimized out>) at os-ip.c:1138
#3  0x00007ffff0d1b555 in wait4msg (result=0x7fffe967ea08, timeout=<optimized
out>, all=2, msgid=4, ld=0x7fffec313980) at result.c:312
#4  ldap_result (ld=0x7fffec313980, msgid=4, all=all@entry=2, timeout=<optimized
out>, result=result@entry=0x7fffe967ea68) at result.c:117
#5  0x00007ffff0d46aec in ldap_sync_poll (ls=0x7fffe40008c0) at ldap_sync.c:881
#6  0x00007ffff0f712d3 in ldap_syncrepl_watcher (arg=0x7ffff7fa3160) at
ldap_helper.c:5318
#7  0x00007ffff5da952a in start_thread (arg=0x7fffe967f700) at pthread_create.c:310
#8  0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109





Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/named -u named -fg'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007ffff50511fd in poll () at ../sysdeps/unix/syscall-template.S:81
81  T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
Missing separate debuginfos, use: debuginfo-install nss-3.17.3-2.fc21.x86_64
nss-softokn-freebl-3.17.3-1.fc21.x86_64 nss-util-3.17.3-1.fc21.x86_64
sssd-client-1.12.3-2.fc21.x86_64
(gdb) thread apply all bt

Thread 7 (Thread 0x7ffff7fde800 (LWP 31870)):
#0  0x00007ffff4f90cb7 in do_sigsuspend (set=0x7fffffffe2b0) at
../sysdeps/unix/sysv/linux/sigsuspend.c:29
#1  __GI___sigsuspend (set=set@entry=0x7fffffffe2b0) at
../sysdeps/unix/sysv/linux/sigsuspend.c:43
#2  0x00007ffff62063e0 in isc__app_ctxrun (ctx0=0x7ffff6438480 <isc_g_appctx>)
at app.c:726
#3  0x0000555555578b10 in main (argc=<optimized out>, argv=<optimized out>) at
./main.c:1250

Thread 6 (Thread 0x7ffff3c5c700 (LWP 31893)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff61fcd0b in isc__task_beginexclusive (task0=<optimized out>) at
task.c:1695
#2  0x00007ffff0f79336 in run_exclusive_enter (inst=inst@entry=0x7ffff7fa3160,
statep=statep@entry=0x7ffff3c5bacc) at lock.c:52
#3  0x00007ffff0f6e63c in configure_zone_forwarders
(entry=entry@entry=0x7ffff7fbd510, inst=inst@entry=0x7ffff7fa3160,
name=0x7ffff7bc2000 <root>) at ldap_helper.c:1732
#4  0x00007ffff0f6edf9 in ldap_parse_configentry (inst=0x7ffff7fa3160,
entry=0x7ffff7fbd510) at ldap_helper.c:1753
#5  update_config (task=0x7ffff7fb6f50, event=0x7ffff00ab2b8) at ldap_helper.c:4501
#6  0x00007ffff61fdabf in dispatch (manager=0x7ffff7fa2010) at task.c:1122
#7  run (uap=0x7ffff7fa2010) at task.c:1294
#8  0x00007ffff5da952a in start_thread (arg=0x7ffff3c5c700) at pthread_create.c:310
#9  0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7ffff345b700 (LWP 31895)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff61fd8d0 in dispatch (manager=0x7ffff7fa2010) at task.c:1069
#2  run (uap=0x7ffff7fa2010) at task.c:1294
#3  0x00007ffff5da952a in start_thread (arg=0x7ffff345b700) at pthread_create.c:310
#4  0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7ffff2c5a700 (LWP 31896)):
#0  0x00007ffff4f908c7 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007ffff4f9252a in __GI_abort () at abort.c:89
#2  0x000055555558a244 in library_fatal_error (file=<optimized out>, line=431,
format=0x7ffff62213cb "unknown address family: %d", args=0x7ffff2c598a0) at
./main.c:270
#3  0x00007ffff61dd770 in isc_error_fatal (file=file@entry=0x7ffff6221380
"sockaddr.c", line=line@entry=431, format=<optimized out>) at error.c:74
#4  0x00007ffff61f9bd9 in isc_sockaddr_getport
(sockaddr=sockaddr@entry=0x7ffff1196170) at sockaddr.c:431
#5  0x00007ffff7812b87 in dns_adb_findaddrinfo (adb=0x7ffff11ce010,
sa=sa@entry=0x7ffff1196170, addrp=addrp@entry=0x7ffff2c59a48, now=now@entry=0)
at adb.c:4445
#6  0x00007ffff78d9343 in fctx_getaddresses (fctx=fctx@entry=0x7fffe8c36010,
badcache=badcache@entry=isc_boolean_false) at resolver.c:2954
#7  0x00007ffff78dc16f in fctx_try (fctx=fctx@entry=0x7fffe8c36010,
retrying=retrying@entry=isc_boolean_false,
badcache=badcache@entry=isc_boolean_false) at resolver.c:3354
#8  0x00007ffff78dc600 in fctx_start (task=<optimized out>,
event=0x7fffe8c360a8) at resolver.c:3738
#9  0x00007ffff61fdabf in dispatch (manager=0x7ffff7fa2010) at task.c:1122
#10 run (uap=0x7ffff7fa2010) at task.c:1294
#11 0x00007ffff5da952a in start_thread (arg=0x7ffff2c5a700) at pthread_create.c:310
#12 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7ffff2459700 (LWP 31897)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007ffff621ab08 in isc_condition_waituntil (c=c@entry=0x7ffff7fa3078,
m=m@entry=0x7ffff7fa3028, t=t@entry=0x7ffff7fa306c) at condition.c:66
#2  0x00007ffff6202d1b in run (uap=0x7ffff7fa3010) at timer.c:810
#3  0x00007ffff5da952a in start_thread (arg=0x7ffff2459700) at pthread_create.c:310
#4  0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7ffff1c58700 (LWP 31898)):
#0  0x00007ffff505cda3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff6211ffe in watcher (uap=0x7ffff7fa5010) at socket.c:4193
#2  0x00007ffff5da952a in start_thread (arg=0x7ffff1c58700) at pthread_create.c:310
#3  0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7fffe967f700 (LWP 31911)):
#0  0x00007ffff50511fd in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff0d31009 in poll (__timeout=<optimized out>, __nfds=<optimized
out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
#2  ldap_int_select (ld=<optimized out>, timeout=<optimized out>) at os-ip.c:1138
#3  0x00007ffff0d1b555 in wait4msg (result=0x7fffe967ea08, timeout=<optimized
out>, all=2, msgid=4, ld=0x7fffec313980) at result.c:312
#4  ldap_result (ld=0x7fffec313980, msgid=4, all=all@entry=2, timeout=<optimized
out>, result=result@entry=0x7fffe967ea68) at result.c:117
#5  0x00007ffff0d46aec in ldap_sync_poll (ls=0x7fffe40008c0) at ldap_sync.c:881
#6  0x00007ffff0f712d3 in ldap_syncrepl_watcher (arg=0x7ffff7fa3160) at
ldap_helper.c:5318
#7  0x00007ffff5da952a in start_thread (arg=0x7fffe967f700) at pthread_create.c:310
#8  0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

(gdb) t 4
[Switching to thread 4 (Thread 0x7ffff2c5a700 (LWP 31896))]
#0  0x00007ffff4f908c7 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:55
55    return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007ffff4f908c7 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007ffff4f9252a in __GI_abort () at abort.c:89
#2  0x000055555558a244 in library_fatal_error (file=<optimized out>, line=431,
format=0x7ffff62213cb "unknown address family: %d", args=0x7ffff2c598a0) at
./main.c:270
#3  0x00007ffff61dd770 in isc_error_fatal (file=file@entry=0x7ffff6221380
"sockaddr.c", line=line@entry=431, format=<optimized out>) at error.c:74
#4  0x00007ffff61f9bd9 in isc_sockaddr_getport
(sockaddr=sockaddr@entry=0x7ffff1196170) at sockaddr.c:431
#5  0x00007ffff7812b87 in dns_adb_findaddrinfo (adb=0x7ffff11ce010,
sa=sa@entry=0x7ffff1196170, addrp=addrp@entry=0x7ffff2c59a48, now=now@entry=0)
at adb.c:4445
#6  0x00007ffff78d9343 in fctx_getaddresses (fctx=fctx@entry=0x7fffe8c36010,
badcache=badcache@entry=isc_boolean_false) at resolver.c:2954
#7  0x00007ffff78dc16f in fctx_try (fctx=fctx@entry=0x7fffe8c36010,
retrying=retrying@entry=isc_boolean_false,
badcache=badcache@entry=isc_boolean_false) at resolver.c:3354
#8  0x00007ffff78dc600 in fctx_start (task=<optimized out>,
event=0x7fffe8c360a8) at resolver.c:3738
#9  0x00007ffff61fdabf in dispatch (manager=0x7ffff7fa2010) at task.c:1122
#10 run (uap=0x7ffff7fa2010) at task.c:1294
#11 0x00007ffff5da952a in start_thread (arg=0x7ffff2c5a700) at pthread_create.c:310
#12 0x00007ffff505c79d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) up
...
(gdb) up
#4  0x00007ffff61f9bd9 in isc_sockaddr_getport
(sockaddr=sockaddr@entry=0x7ffff1196170) at sockaddr.c:431
431         FATAL_ERROR(__FILE__, __LINE__,
(gdb) p *sockaddr
$1 = {type = {sa = {sa_family = 2, sa_data =
"\000\065\b\b\b\b\000\000\000\000\000\000\000"}, sin = {sin_family = 2, sin_port
= 13568, sin_addr = {s_addr = 134744072},
      sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {sin6_family = 2,
sin6_port = 13568, sin6_flowinfo = 134744072, sin6_addr = {__in6_u = {__u6_addr8
= '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0,
            0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}, ss
= {ss_family = 2, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>},
sunix = {sun_family = 2,
      sun_path = "\000\065\b\b\b\b", '\000' <repeats 101 times>}}, length = 16,
link = {prev = 0xffffffffffffffff, next = 0xffffffffffffffff}}

Reproducible on RHEL-6 and RHEL-7 and all Fedora versions (20+) also with the latest BIND (9.10.2rc2)

This should be extremely hard to reproduce and extremely rare event so we decided to handle it in the same way as #114: Put into Backlog and wait until more data or better tools are available.

Metadata Update from @pspacek:
- Issue assigned to someone
- Issue set to the milestone: The Backlog

Log in to comment on this ticket.

Metadata