Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32997

sporadic SIGABRT at my_sleep (timeout) in spider/bugfix.mdev_27575

Details

    Description

      Examples include 10.5 a356a940d256abd132ea237eb8c84fa78c756276 at
      amd64-debian-10
      https://buildbot.mariadb.org/#/builders/221/builds/28439/steps/7/logs/stdio

      and 10.5 1e80601b826730f11544cc8b58e28c3e73f0cd33 amd64-ubuntu-2204-debug-ps
      https://buildbot.mariadb.org/#/builders/534/builds/12422/steps/7/logs/stdio

      Not sure why this happens as it never happens locally. The crd threads
      should be freed at spider_db_done(), by signaling the thread cond.
      spider_db_done is called when mysqld_main() reaches
      handle_connections_sockets(), which is way before call to
      close_connections(). See also a comment below for details:

      https://jira.mariadb.org/browse/MDEV-32997?focusedCommentId=279179&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-279179

      spider/bugfix.mdev_27575                 w1 [ fail ]  Found warnings/errors in server log file!
              Test ended at 2023-12-08 09:11:23
      line
      Attempting backtrace. You can use the following information to find out
      ^ Found warnings in /home/buildbot/amd64-debian-10/build/mysql-test/var/1/log/mysqld.1.1.err
      ok
       
       - found 'core' (0/1)
      Core generated by '/home/buildbot/amd64-debian-10/build/sql/mariadbd'
      Output from gdb follows. The first stack trace is from the failing thread.
      The following stack traces are from all threads (so the failing one is
      duplicated).
      --------------------------
      [New LWP 24496]
      [New LWP 24507]
      [New LWP 24563]
      [New LWP 24612]
      [New LWP 24644]
      [New LWP 24601]
      [New LWP 24602]
      [New LWP 24513]
      [New LWP 24567]
      [New LWP 25337]
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
      Core was generated by `/home/buildbot/amd64-debian-10/build/sql/mariadbd --defaults-group-suffix=.1.1'.
      Program terminated with signal SIGABRT, Aborted.
      #0  0x00007f8416ca25a1 in pthread_kill () from /lib/x86_64-linux-gnu/libpthread.so.0
      [Current thread is 1 (Thread 0x7f841665d040 (LWP 24496))]
      #0  0x00007f8416ca25a1 in pthread_kill () from /lib/x86_64-linux-gnu/libpthread.so.0
      #1  0x0000555f9f7a2e47 in my_write_core (sig=<optimized out>) at /home/buildbot/amd64-debian-10/build/mysys/stacktrace.c:424
      #2  0x0000555f9f282370 in handle_fatal_signal (sig=6) at /home/buildbot/amd64-debian-10/build/sql/signal_handler.cc:357
      #3  <signal handler called>
      #4  0x00007f841689bbd7 in select () from /lib/x86_64-linux-gnu/libc.so.6
      #5  0x0000555f9f7a0def in my_sleep (m_seconds=m_seconds@entry=1000) at /home/buildbot/amd64-debian-10/build/mysys/my_sleep.c:29
      #6  0x0000555f9efcb492 in close_connections () at /home/buildbot/amd64-debian-10/build/sql/mysqld.cc:1813
      #7  mysqld_main (argc=<optimized out>, argv=<optimized out>) at /home/buildbot/amd64-debian-10/build/sql/mysqld.cc:5741
      #8  0x00007f84167cf09b in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
      #9  0x0000555f9efbf18a in _start () at /home/buildbot/amd64-debian-10/build/include/mysql/plugin.h:215
       
      Thread 10 (Thread 0x7f83d6ffd700 (LWP 25337)):
      #0  0x00007f8416ca100c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #1  0x00007f8414106ddf in inline_mysql_cond_wait (src_file=0x7f8414165618 "/home/buildbot/amd64-debian-10/build/storage/spider/spd_table.cc", src_line=10171, mutex=0x7f839409d9f0, that=0x7f839409d9b8) at /home/buildbot/amd64-debian-10/build/include/mysql/psi/mysql_thread.h:1233
              result = <optimized out>
              result = <optimized out>
              locker = <optimized out>
              state = <optimized out>
      #2  spider_table_bg_crd_action (arg=0x7f839409d998) at /home/buildbot/amd64-debian-10/build/storage/spider/spd_table.cc:10171
              thread = 0x7f839409d998
              share = <optimized out>
              trx = <optimized out>
              error_num = 0
              spider = <optimized out>
              table = <optimized out>
              conns = <optimized out>
              thd = <optimized out>
      #3  0x00007f8416c9afa3 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #4  0x00007f84168a406f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      No symbol table info available.
       
      Thread 9 (Thread 0x7f840f7ff700 (LWP 24567)):
      #0  0x00007f8416ca100c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #1  0x0000555f9f69abc1 in buf_flush_page_cleaner () at /home/buildbot/amd64-debian-10/build/storage/innobase/buf/buf0flu.cc:2297
              soft_lsn_limit = <optimized out>
              oldest_lsn = <optimized out>
              dirty_blocks = <optimized out>
              dirty_pct = <optimized out>
              pct_lwm = <optimized out>
              n_flushed = <optimized out>
              n = <optimized out>
              last_pages = 0
              abstime = {tv_sec = 1702026619, tv_nsec = 263975000}
              lsn_limit = 0
              last_activity_count = 0
      #2  0x00007f8416c9afa3 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #3  0x00007f84168a406f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      No symbol table info available.
       
      Thread 8 (Thread 0x7f84159e4700 (LWP 24513)):
      #0  0x00007f8416ca135b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #1  0x0000555f9f433799 in inline_mysql_cond_timedwait (src_file=0x555f9f9b6fc0 "/home/buildbot/amd64-debian-10/build/storage/maria/ma_servicethread.c", src_line=116, abstime=0x7f84159e3d20, mutex=0x555fa084c9c0 <LOCK_checkpoint>, that=0x555fa084c980 <COND_checkpoint>) at /home/buildbot/amd64-debian-10/build/include/mysql/psi/mysql_thread.h:1270
              result = <optimized out>
              result = <optimized out>
              locker = <optimized out>
              state = <optimized out>
      #2  my_service_thread_sleep (control=control@entry=0x555f9ff87620 <checkpoint_control>, sleep_time=<optimized out>) at /home/buildbot/amd64-debian-10/build/storage/maria/ma_servicethread.c:115
              abstime = {tv_sec = 1702026706, tv_nsec = 155286000}
              res = 0 '\000'
      #3  0x0000555f9f42c5b9 in ma_checkpoint_background (arg=0x1e) at /home/buildbot/amd64-debian-10/build/storage/maria/ma_checkpoint.c:725
              interval = 30
              sleeps = 60
              sleep_time = 30
              log_horizon_at_last_checkpoint = 4295014401
              pagecache_flushes_at_last_checkpoint = 0
              pages_bunch_size = 0
              filter_param = {up_to_lsn = 4295014357, max_pages = 0}
              dfile = 0x0
              kfile = 0x0
      #4  0x00007f8416c9afa3 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #5  0x00007f84168a406f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      No symbol table info available.
       
      Thread 7 (Thread 0x7f8414385700 (LWP 24602)):
      #0  0x00007f8416ca100c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #1  0x0000555f9f36234b in inline_mysql_cond_wait (src_file=0x555f9f9aa078 "/home/buildbot/amd64-debian-10/build/sql/log.cc", src_line=10496, mutex=0x555fa004a810 <mysql_bin_log+3184>, that=0x555fa004a840 <mysql_bin_log+3232>) at /home/buildbot/amd64-debian-10/build/include/mysql/psi/mysql_thread.h:1233
              result = <optimized out>
              result = <optimized out>
              locker = <optimized out>
              state = <optimized out>
      #2  binlog_background_thread (arg=arg@entry=0x0) at /home/buildbot/amd64-debian-10/build/sql/log.cc:10495
              stop = false
              queue = 0x0
              next = <optimized out>
              thd = 0x7f83c0000c18
      #3  0x0000555f9f492bab in pfs_spawn_thread (arg=0x555fa0bbcb48) at /home/buildbot/amd64-debian-10/build/storage/perfschema/pfs.cc:2201
              typed_arg = 0x555fa0bbcb48
              user_arg = 0x0
              user_start_routine = 0x555f9f361eb0 <binlog_background_thread(void*)>
              pfs = <optimized out>
              klass = <optimized out>
      #4  0x00007f8416c9afa3 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #5  0x00007f84168a406f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      No symbol table info available.
       
      Thread 6 (Thread 0x7f84143d0700 (LWP 24601)):
      #0  0x00007f8416ca100c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #1  0x0000555f9f07a7bb in inline_mysql_cond_wait (that=0x555fa0019360 <COND_manager>, mutex=0x555fa00193a0 <LOCK_manager>, src_file=0x555f9f85cce8 "/home/buildbot/amd64-debian-10/build/sql/sql_manager.cc", src_line=103) at /home/buildbot/amd64-debian-10/build/include/mysql/psi/mysql_thread.h:1233
              result = <optimized out>
              result = <optimized out>
              locker = <optimized out>
              state = <optimized out>
      #2  handle_manager (arg=arg@entry=0x0) at /home/buildbot/amd64-debian-10/build/sql/sql_manager.cc:103
              cb = <optimized out>
              error = 0
              abstime = {tv_sec = 511101108316, tv_nsec = 0}
              reset_flush_time = true
      #3  0x0000555f9f492bab in pfs_spawn_thread (arg=0x555fa0e57fa8) at /home/buildbot/amd64-debian-10/build/storage/perfschema/pfs.cc:2201
              typed_arg = 0x555fa0e57fa8
              user_arg = 0x0
              user_start_routine = 0x555f9f07a510 <handle_manager(void*)>
              pfs = <optimized out>
              klass = <optimized out>
      #4  0x00007f8416c9afa3 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #5  0x00007f84168a406f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      No symbol table info available.
       
      Thread 5 (Thread 0x7f841433a700 (LWP 24644)):
      #0  0x00007f84167e35fc in sigtimedwait () from /lib/x86_64-linux-gnu/libc.so.6
      No symbol table info available.
      #1  0x0000555f9efc17db in my_sigwait (code=<synthetic pointer>, sig=0x7f8414339ca8, set=0x7f8414339cb0) at /home/buildbot/amd64-debian-10/build/include/my_pthread.h:196
              siginfo = {si_signo = 7, si_errno = 0, si_code = 0, __pad0 = 0, _sifields = {_pad = {37, 21855, 280, 0, 575, 0, 624, 0, -312, -1, 9, 0, 37, 57, 1, 0, 92, 119, 0, 0, 124, 110, 0, 0, 0, 0, -1476394976, 32643}, _kill = {si_pid = 37, si_uid = 21855}, _timer = {si_tid = 37, si_overrun = 21855, si_sigval = {sival_int = 280, sival_ptr = 0x118}}, _rt = {si_pid = 37, si_uid = 21855, si_sigval = {sival_int = 280, sival_ptr = 0x118}}, _sigchld = {si_pid = 37, si_uid = 21855, si_status = 280, si_utime = 575, si_stime = 624}, _sigfault = {si_addr = 0x555f00000025, si_addr_lsb = 280, _bounds = {_addr_bnd = {_lower = 0x23f, _upper = 0x270}, _pkey = 575}}, _sigpoll = {si_band = 93866510254117, si_fd = 280}, _sigsys = {_call_addr = 0x555f00000025, _syscall = 280, _arch = 0}}}
              siginfo = <optimized out>
      #2  signal_hand (arg=arg@entry=0x0) at /home/buildbot/amd64-debian-10/build/sql/mysqld.cc:3007
              error = <optimized out>
              origin = <optimized out>
              set = {__val = {548869, 0 <repeats 15 times>}}
              sig = 0
      #3  0x0000555f9f492bab in pfs_spawn_thread (arg=0x555fa0ec4ca8) at /home/buildbot/amd64-debian-10/build/storage/perfschema/pfs.cc:2201
              typed_arg = 0x555fa0ec4ca8
              user_arg = 0x0
              user_start_routine = 0x555f9efc1650 <signal_hand(void*)>
              pfs = <optimized out>
              klass = <optimized out>
      #4  0x00007f8416c9afa3 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #5  0x00007f84168a406f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      No symbol table info available.
       
      Thread 4 (Thread 0x7f83deffd700 (LWP 24612)):
      #0  0x00007f8416ca135b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #1  0x0000555f9f743c1d in __gthread_cond_timedwait (__abs_timeout=0x7f83deffcd40, __mutex=<optimized out>, __cond=0x555fa0cb8ad0) at /usr/include/x86_64-linux-gnu/c++/8/bits/gthr-default.h:871
      No locals.
      #2  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=<optimized out>, __lock=@0x7f83deffcd90: {_M_device = 0x555fa0cb81b8, _M_owns = true}, this=0x555fa0cb8ad0) at /usr/include/c++/8/condition_variable:178
              __s = <optimized out>
              __ns = <optimized out>
              __ts = {tv_sec = 1702026739, tv_nsec = 480040906}
              __s = <optimized out>
              __ns = <optimized out>
              __ts = <optimized out>
      #3  std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=<optimized out>, __lock=@0x7f83deffcd90: {_M_device = 0x555fa0cb81b8, _M_owns = true}, this=0x555fa0cb8ad0) at /usr/include/c++/8/condition_variable:106
      No locals.
      #4  std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (__rtime=@0x555fa0cb81e0: {__r = 60000}, __lock=@0x7f83deffcd90: {_M_device = 0x555fa0cb81b8, _M_owns = true}, this=0x555fa0cb8ad0) at /usr/include/c++/8/condition_variable:143
              __reltime = <optimized out>
              __reltime = <optimized out>
      #5  tpool::thread_pool_generic::wait_for_tasks (this=this@entry=0x555fa0cb80a0, lk=@0x7f83deffcd90: {_M_device = 0x555fa0cb81b8, _M_owns = true}, thread_data=thread_data@entry=0x555fa0cb8ad0) at /home/buildbot/amd64-debian-10/build/tpool/tpool_generic.cc:499
      No locals.
      #6  0x0000555f9f743e5d in tpool::thread_pool_generic::get_task (this=0x555fa0cb80a0, thread_var=0x555fa0cb8ad0, t=0x7f83deffcde8) at /home/buildbot/amd64-debian-10/build/tpool/tpool_generic.cc:552
              lk = {_M_device = 0x555fa0cb81b8, _M_owns = true}
      #7  0x0000555f9f74413d in tpool::thread_pool_generic::worker_main (this=0x555fa0cb80a0, thread_var=0x555fa0cb8ad0) at /home/buildbot/amd64-debian-10/build/tpool/tpool_generic.cc:597
              task = 0x555fa0e89c80
      #8  0x00007f8416bc5b2f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
      No symbol table info available.
      #9  0x00007f8416c9afa3 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #10 0x00007f84168a406f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      No symbol table info available.
       
      Thread 3 (Thread 0x7f8414dd1700 (LWP 24563)):
      #0  0x00007f841689eaf9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
      No symbol table info available.
      #1  0x0000555f9f746a65 in my_getevents (min_nr=1, nr=256, ev=0x7f8414dcede0, ctx=<optimized out>) at /home/buildbot/amd64-debian-10/build/tpool/aio_linux.cc:108
              saved_errno = <optimized out>
              ret = <optimized out>
              saved_errno = <optimized out>
              ret = <optimized out>
      #2  tpool::aio_linux::getevent_thread_routine (aio=0x555fa0cffe20) at /home/buildbot/amd64-debian-10/build/tpool/aio_linux.cc:108
              ret = <optimized out>
              events = {{data = 0x0, obj = 0x555fa0d208d0, res = 16384, res2 = 0}, {data = 0x0, obj = 0x555fa0d20990, res = 16384, res2 = 0}, {data = 0x0, obj = 0x555fa0d20a50, res = 16384, res2 = 0}, {data = 0x0, obj = 0x555fa0d20b10, res = 16384, res2 = 0}, {data = 0x0, obj = 0x555fa0d20bd0, res = 16384, res2 = 0}, {data = 0x0, obj = 0x555fa0d20c90, res = 16384, res2 = 0}, {data = 0x0, obj = 0x555fa0d20d50, res = 16384, res2 = 0}, {data = 0x0, obj = 0x555fa0d20e10, res = 16384, res2 = 0}, {data = 0x0, obj = 0x555fa0d20ed0, res = 16384, res2 = 0}, {data = 0x0, obj = 0x555fa0d20f90, res = 16384, res2 = 0}, {data = 0x0, obj = 0x555fa0d21050, res = 16384, res2 = 0}, {data = 0x0, obj = 0x555fa0d21110, res = 16384, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 0} <repeats 244 times>}
      #3  0x00007f8416bc5b2f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
      No symbol table info available.
      #4  0x00007f8416c9afa3 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #5  0x00007f84168a406f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      No symbol table info available.
       
      Thread 2 (Thread 0x7f841655d700 (LWP 24507)):
      #0  0x00007f8416ca135b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #1  0x0000555f9f7a6f01 in inline_mysql_cond_timedwait (that=0x555fa08ec960 <COND_timer>, mutex=0x555fa08ec9a0 <LOCK_timer>, src_file=0x555f9fa24190 "/home/buildbot/amd64-debian-10/build/mysys/thr_timer.c", src_line=321, abstime=0x7f841655cdb0) at /home/buildbot/amd64-debian-10/build/include/mysql/psi/mysql_thread.h:1270
              result = <optimized out>
              result = <optimized out>
              locker = <optimized out>
              state = <optimized out>
      #2  timer_handler (arg=<optimized out>) at /home/buildbot/amd64-debian-10/build/mysys/thr_timer.c:321
              error = <optimized out>
              top_time = 0x555fa0cb82b8
              now = <optimized out>
              abstime = {tv_sec = 1702026680, tv_nsec = 328723000}
      #3  0x00007f8416c9afa3 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #4  0x00007f84168a406f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      No symbol table info available.
       
      Thread 1 (Thread 0x7f841665d040 (LWP 24496)):
      #0  0x00007f8416ca25a1 in pthread_kill () from /lib/x86_64-linux-gnu/libpthread.so.0
      No symbol table info available.
      #1  0x0000555f9f7a2e47 in my_write_core (sig=<optimized out>) at /home/buildbot/amd64-debian-10/build/mysys/stacktrace.c:424
      No locals.
      #2  0x0000555f9f282370 in handle_fatal_signal (sig=6) at /home/buildbot/amd64-debian-10/build/sql/signal_handler.cc:357
              curr_time = 1702026678
              tm = {tm_sec = 18, tm_min = 11, tm_hour = 9, tm_mday = 8, tm_mon = 11, tm_year = 123, tm_wday = 5, tm_yday = 341, tm_isdst = 0, tm_gmtoff = 0, tm_zone = 0x555fa0bb6280 "UTC"}
              thd = 0x0
              print_invalid_query_pointer = false
      #3  <signal handler called>
      No symbol table info available.
      #4  0x00007f841689bbd7 in select () from /lib/x86_64-linux-gnu/libc.so.6
      No symbol table info available.
      #5  0x0000555f9f7a0def in my_sleep (m_seconds=m_seconds@entry=1000) at /home/buildbot/amd64-debian-10/build/mysys/my_sleep.c:29
              t = {tv_sec = 0, tv_usec = 0}
      #6  0x0000555f9efcb492 in close_connections () at /home/buildbot/amd64-debian-10/build/sql/mysqld.cc:1813
              n_threads_awaiting_ack = 0
              n_threads_awaiting_ack = <optimized out>
              waiting_threads = <optimized out>
              i = <optimized out>
              i = <optimized out>
      #7  mysqld_main (argc=<optimized out>, argv=<optimized out>) at /home/buildbot/amd64-debian-10/build/sql/mysqld.cc:5741
              please_close_stdin = <optimized out>
              ho_error = <optimized out>
              new_thread_stack_size = <optimized out>
              user = <optimized out>
      #8  0x00007f84167cf09b in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
      No symbol table info available.
      #9  0x0000555f9efbf18a in _start () at /home/buildbot/amd64-debian-10/build/include/mysql/plugin.h:215
      No symbol table info available.
       
      Compressed file /home/buildbot/amd64-debian-10/build/mysql-test/var/1/log/spider/bugfix.mdev_27575/mysqld.1.1/data/core
       - saving '/home/buildbot/amd64-debian-10/build/mysql-test/var/1/log/spider/bugfix.mdev_27575/' to '/home/buildbot/amd64-debian-10/build/mysql-test/var/log/bugfix.mdev_27575/'
      

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment -

            This is happening a bit too often these days. Let's disable the test until we find a solution.

            example occurrence:
            https://buildbot.mariadb.org/#/builders/221/builds/29320

            ycp Yuchen Pei added a comment - This is happening a bit too often these days. Let's disable the test until we find a solution. example occurrence: https://buildbot.mariadb.org/#/builders/221/builds/29320
            ycp Yuchen Pei added a comment - Happening to spider/bugfix.mdev_28739 too: https://buildbot.mariadb.net/buildbot/builders/kvm-fulltest/builds/48573/steps/mtr_nm/logs/stdio
            ycp Yuchen Pei added a comment - - edited

            Following up on serg's observation of the stack in MDEV-33343

            • the server gets core in select called from sleep called from close_connections. Why? Well, clearly it's shutdown and whatever it was selecting or sleeping on gets destroyed.
            • the server sleeps in close_connections when it's waiting for the thread counter to become 0
            • threads don't end, because thread stack shows spider_table_bg_sts_action and spider_table_bg_crd_action threads waiting on conditions.
            • Why there are sts/crd bg threads if your intention was to remove spider persistent stats?

            These bg threads gather stats and store them in-memory rather than
            persistently. According to the stack they are waiting on a condition
            (3) which should be signaled (1) when spider is shutdown, which when I
            run the test locally, happens when mysqld_main() is at the call of
            handle_connections_sockets() (4).

            The sleep that waits for the threads until timeout is inside
            close_connections() (5), which is called after the call to
            handle_connections_sockets(). Thus the question is why the threads
            are still alive by then? Below the code blocks and stacks are at 10.5
            354e97cd722c6b8721f05e75084a6843e32b6b74.

            void spider_free_sts_threads(
              SPIDER_THREAD *spider_thread
            ) {
            //  [... 5 lines elided]
              if (!thread_killed)
              {
                if (spider_thread->thd_wait)
                {
                    pthread_cond_signal(&spider_thread->cond); // (1)
                }
                pthread_cond_wait(&spider_thread->sync_cond, &spider_thread->mutex);
              }
            //  [... 8 lines elided]
            }
            void *spider_table_bg_sts_action(
              void *arg
            ) {
            //  [... 65 lines elided]
                if (!thread->queue_first)
                {
                  DBUG_PRINT("info",("spider bg sts has no job"));
                  thread->thd_wait = TRUE;
                  pthread_cond_wait(&thread->cond, &thread->mutex); // (3)
                  thread->thd_wait = FALSE;
                  if (thd->killed)
                    thread->killed = TRUE;
                  continue;
                }
            //  [... 80 lines elided]
            }
            //  [... 2 lines elided]
            int mysqld_main(int argc, char **argv)
            #endif
            {
            //  [... 411 lines elided]
              handle_connections_sockets(); // (4)
             
              mysql_mutex_lock(&LOCK_start_thread);
              select_thread_in_use=0;
              mysql_cond_broadcast(&COND_start_thread);
              mysql_mutex_unlock(&LOCK_start_thread);
            #endif /* _WIN32 */
             
              /* Shutdown requested */
              char *user= shutdown_user.load(std::memory_order_relaxed);
              sql_print_information(ER_DEFAULT(ER_NORMAL_SHUTDOWN), my_progname,
                                    user ? user : "unknown");
              if (user)
                my_free(user);
            //  [... 11 lines elided]
              close_connections();          // (5)
            //  [... 29 lines elided]
            }
            

            spider_free_sts_threads > spider_db_done > ha_finalize_handlerton > plugin_deinitialize > reap_plugins > plugin_unlock > ha_close_connection > THD::free_connection > unlink_thd > do_handle_one_connection > handle_one_connection

            (hmm, I just noticed this is an expanded version of my analysis in the description...)

            ycp Yuchen Pei added a comment - - edited Following up on serg 's observation of the stack in MDEV-33343 the server gets core in select called from sleep called from close_connections. Why? Well, clearly it's shutdown and whatever it was selecting or sleeping on gets destroyed. the server sleeps in close_connections when it's waiting for the thread counter to become 0 threads don't end, because thread stack shows spider_table_bg_sts_action and spider_table_bg_crd_action threads waiting on conditions. Why there are sts/crd bg threads if your intention was to remove spider persistent stats? These bg threads gather stats and store them in-memory rather than persistently. According to the stack they are waiting on a condition (3) which should be signaled (1) when spider is shutdown, which when I run the test locally, happens when mysqld_main() is at the call of handle_connections_sockets() (4). The sleep that waits for the threads until timeout is inside close_connections() (5), which is called after the call to handle_connections_sockets() . Thus the question is why the threads are still alive by then? Below the code blocks and stacks are at 10.5 354e97cd722c6b8721f05e75084a6843e32b6b74. void spider_free_sts_threads( SPIDER_THREAD *spider_thread ) { // [... 5 lines elided] if (!thread_killed) { if (spider_thread->thd_wait) { pthread_cond_signal(&spider_thread->cond); // (1) } pthread_cond_wait(&spider_thread->sync_cond, &spider_thread->mutex); } // [... 8 lines elided] } void *spider_table_bg_sts_action( void *arg ) { // [... 65 lines elided] if (! thread ->queue_first) { DBUG_PRINT( "info" ,( "spider bg sts has no job" )); thread ->thd_wait = TRUE; pthread_cond_wait(& thread ->cond, & thread ->mutex); // (3) thread ->thd_wait = FALSE; if (thd->killed) thread ->killed = TRUE; continue ; } // [... 80 lines elided] } // [... 2 lines elided] int mysqld_main( int argc, char **argv) #endif { // [... 411 lines elided] handle_connections_sockets(); // (4)   mysql_mutex_lock(&LOCK_start_thread); select_thread_in_use=0; mysql_cond_broadcast(&COND_start_thread); mysql_mutex_unlock(&LOCK_start_thread); #endif /* _WIN32 */   /* Shutdown requested */ char *user= shutdown_user.load(std::memory_order_relaxed); sql_print_information(ER_DEFAULT(ER_NORMAL_SHUTDOWN), my_progname, user ? user : "unknown" ); if (user) my_free(user); // [... 11 lines elided] close_connections(); // (5) // [... 29 lines elided] } spider_free_sts_threads > spider_db_done > ha_finalize_handlerton > plugin_deinitialize > reap_plugins > plugin_unlock > ha_close_connection > THD::free_connection > unlink_thd > do_handle_one_connection > handle_one_connection (hmm, I just noticed this is an expanded version of my analysis in the description...)
            ycp Yuchen Pei added a comment -

            > Thus the question is why the threads are still alive by then?

            A closer look shows handle_connection_sockets() creates new threads, including the one that should eventually call spider_db_done(), so it could be that it does not wait for the threads to finish before the callsite at mysqld_main() moves forward to the call of close_connections().

            The failures don't seem to happen very often nowadays...

            ycp Yuchen Pei added a comment - > Thus the question is why the threads are still alive by then? A closer look shows handle_connection_sockets() creates new threads, including the one that should eventually call spider_db_done() , so it could be that it does not wait for the threads to finish before the callsite at mysqld_main() moves forward to the call of close_connections() . The failures don't seem to happen very often nowadays...

            People

              ycp Yuchen Pei
              ycp Yuchen Pei
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.