Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20323

MariaDB asserts in srv_error_monitor_thread at srv0srv.cc:2410

Details

    Description

      The 10.1 branch has been stable for us for a while now.
      10.2 crashes for us on a regular basis, so upgrading is not an option. (See MDEV-16923, and others)

      Unfortunately, 10.1.40 crashed for us with an update on JOINs.
      So we upgraded to 10.1.41, but since we did that the server crashes multiple times a day with the following log message:

      Server version: 10.1.41-MariaDB-1~bionic
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=192
      max_threads=502
      thread_count=45
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1233831 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7fa04ec10008
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f9f52fb3d68 thread_stack 0x49000
      *** buffer overflow detected ***: /usr/sbin/mysqld terminated
      Fatal signal 6 while backtracing
      

      Attachments

        Activity

          george.ccto George To added a comment -

          Hello MariaDB support,

          We encountered the same - InnoDB: Assertion failure in thread 140581082752768 in file srv0srv.cc line 2410.

          Do you have the procedure for coredump ?

          Dec  1 03:14:32 xxxx mysqld: Mutex at 0x7fdc808112e8 '&dict_sys->mutex', lock var 1
          Dec  1 03:14:32 xxxx mysqld: Last time reserved by thread 140581074360064 in file not yet reserved line 0, waiters flag 1
          Dec  1 03:14:32 xxxx mysqld: InnoDB: Warning: semaphore wait:
          Dec  1 03:14:32 xxxx mysqld: --Thread 140581379458816 has waited at dict0dict.cc line 1138 for 844.00 seconds the semaphore:
          Dec  1 03:14:32 xxxx mysqld: Mutex at 0x7fdc808112e8 '&dict_sys->mutex', lock var 1
          Dec  1 03:14:32 xxxx mysqld: Last time reserved by thread 140581074360064 in file not yet reserved line 0, waiters flag 1
          Dec  1 03:14:32 xxxx mysqld: InnoDB: Warning: semaphore wait:
          Dec  1 03:14:32 xxxx mysqld: --Thread 140581378537216 has waited at dict0dict.cc line 1138 for 844.00 seconds the semaphore:
          Dec  1 03:14:32 xxxx mysqld: Mutex at 0x7fdc808112e8 '&dict_sys->mutex', lock var 1
          Dec  1 03:14:32 xxxx mysqld: Last time reserved by thread 140581074360064 in file not yet reserved line 0, waiters flag 1
          Dec  1 03:14:32 xxxx mysqld: InnoDB: Warning: semaphore wait:
          Dec  1 03:14:32 xxxx mysqld: --Thread 140581378230016 has waited at dict0dict.cc line 1138 for 844.00 seconds the semaphore:
          Dec  1 03:15:02 xxxx journal: Suppressed 921 messages from /system.slice/mysqld.service
          Dec  1 03:15:02 xxxx mysqld: InnoDB: ###### Diagnostic info printed to the standard error stream
          Dec  1 03:15:02 xxxx mysqld: InnoDB: Error: semaphore wait has lasted > 600 seconds
          Dec  1 03:15:02 xxxx mysqld: InnoDB: We intentionally crash the server, because it appears to be hung.
          Dec  1 03:15:02 xxxx mysqld: 2019-12-01 03:15:02 7fdb957fe700  InnoDB: Assertion failure in thread 140581082752768 in file srv0srv.cc line 2410
          Dec  1 03:15:02 xxxx mysqld: InnoDB: We intentionally generate a memory trap.
          Dec  1 03:15:02 xxxx mysqld: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
          Dec  1 03:15:02 xxxx mysqld: InnoDB: If you get repeated assertion failures or crashes, even
          Dec  1 03:15:02 xxxx mysqld: InnoDB: immediately after the mysqld startup, there may be
          Dec  1 03:15:02 xxxx mysqld: InnoDB: corruption in the InnoDB tablespace. Please refer to
          Dec  1 03:15:02 xxxx mysqld: InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
          Dec  1 03:15:02 xxxx mysqld: InnoDB: about forcing recovery.
          Dec  1 03:15:02 xxxx mysqld: 191201  3:15:02 [ERROR] mysqld got signal 6 ;
          Dec  1 03:15:02 xxxx mysqld: This could be because you hit a bug. It is also possible that this binary
          Dec  1 03:15:02 xxxx mysqld: or one of the libraries it was linked against is corrupt, improperly built,
          Dec  1 03:15:02 xxxx mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
          Dec  1 03:15:02 xxxx mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
          Dec  1 03:15:02 xxxx mysqld: We will try our best to scrape up some info that will hopefully help
          Dec  1 03:15:02 xxxx mysqld: diagnose the problem, but since we have already crashed,
          Dec  1 03:15:02 xxxx mysqld: something is definitely wrong and this may fail.
          Dec  1 03:15:02 xxxx mysqld: Server version: 10.1.41-MariaDB
          Dec  1 03:15:02 xxxx mysqld: key_buffer_size=134217728
          Dec  1 03:15:02 xxxx mysqld: read_buffer_size=2097152
          Dec  1 03:15:02 xxxx mysqld: max_used_connections=174
          Dec  1 03:15:02 xxxx mysqld: max_threads=502
          Dec  1 03:15:02 xxxx mysqld: thread_count=173
          Dec  1 03:15:02 xxxx mysqld: It is possible that mysqld could use up to
          Dec  1 03:15:02 xxxx mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2197671 K  bytes of memory
          Dec  1 03:15:02 xxxx mysqld: Hope that's ok; if not, decrease some variables in the equation.
          Dec  1 03:15:02 xxxx mysqld: Thread pointer: 0x0
          Dec  1 03:15:02 xxxx mysqld: Attempting backtrace. You can use the following information to find out
          Dec  1 03:15:02 xxxx mysqld: where mysqld died. If you see no messages after this, something went
          Dec  1 03:15:02 xxxx mysqld: terribly wrong...
          Dec  1 03:15:02 xxxx mysqld: stack_bottom = 0x0 thread_stack 0x49000
          Dec  1 03:15:02 xxxx mysqld: *** buffer overflow detected ***: /usr/sbin/mysqld terminated
          Dec  1 03:15:02 xxxx mysqld: ======= Backtrace: =========
          Dec  1 03:15:02 xxxx mysqld: /lib64/libc.so.6(__fortify_fail+0x37)[0x7fdc83d53527]
          Dec  1 03:15:02 xxxx mysqld: /lib64/libc.so.6(+0x1166a2)[0x7fdc83d516a2]
          Dec  1 03:15:02 xxxx mysqld: /lib64/libc.so.6(+0x118487)[0x7fdc83d53487]
          Dec  1 03:15:02 xxxx mysqld: /usr/sbin/mysqld(my_addr_resolve+0xd0)[0x55a270e96c40]
          Dec  1 03:15:02 xxxx mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x55a270e82e82]
          Dec  1 03:15:02 xxxx mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x305)[0x55a2709a55d5]
          Dec  1 03:15:02 xxxx mysqld: /lib64/libpthread.so.0(+0xf5f0)[0x7fdc85bd45f0]
          Dec  1 03:15:02 xxxx mysqld: /lib64/libc.so.6(gsignal+0x37)[0x7fdc83c71337]
          Dec  1 03:15:02 xxxx mysqld: /lib64/libc.so.6(abort+0x148)[0x7fdc83c72a28]
          Dec  1 03:15:02 xxxx mysqld: /usr/sbin/mysqld(+0x941359)[0x55a270d20359]
          Dec  1 03:15:02 xxxx mysqld: /lib64/libpthread.so.0(+0x7e65)[0x7fdc85bcce65]
          Dec  1 03:15:02 xxxx mysqld: /lib64/libc.so.6(clone+0x6d)[0x7fdc83d3988d]
          

          george.ccto George To added a comment - Hello MariaDB support, We encountered the same - InnoDB: Assertion failure in thread 140581082752768 in file srv0srv.cc line 2410. Do you have the procedure for coredump ? Dec 1 03 : 14 : 32 xxxx mysqld: Mutex at 0x7fdc808112e8 '&dict_sys->mutex' , lock var 1 Dec 1 03 : 14 : 32 xxxx mysqld: Last time reserved by thread 140581074360064 in file not yet reserved line 0 , waiters flag 1 Dec 1 03 : 14 : 32 xxxx mysqld: InnoDB: Warning: semaphore wait: Dec 1 03 : 14 : 32 xxxx mysqld: --Thread 140581379458816 has waited at dict0dict.cc line 1138 for 844.00 seconds the semaphore: Dec 1 03 : 14 : 32 xxxx mysqld: Mutex at 0x7fdc808112e8 '&dict_sys->mutex' , lock var 1 Dec 1 03 : 14 : 32 xxxx mysqld: Last time reserved by thread 140581074360064 in file not yet reserved line 0 , waiters flag 1 Dec 1 03 : 14 : 32 xxxx mysqld: InnoDB: Warning: semaphore wait: Dec 1 03 : 14 : 32 xxxx mysqld: --Thread 140581378537216 has waited at dict0dict.cc line 1138 for 844.00 seconds the semaphore: Dec 1 03 : 14 : 32 xxxx mysqld: Mutex at 0x7fdc808112e8 '&dict_sys->mutex' , lock var 1 Dec 1 03 : 14 : 32 xxxx mysqld: Last time reserved by thread 140581074360064 in file not yet reserved line 0 , waiters flag 1 Dec 1 03 : 14 : 32 xxxx mysqld: InnoDB: Warning: semaphore wait: Dec 1 03 : 14 : 32 xxxx mysqld: --Thread 140581378230016 has waited at dict0dict.cc line 1138 for 844.00 seconds the semaphore: Dec 1 03 : 15 : 02 xxxx journal: Suppressed 921 messages from /system.slice/mysqld.service Dec 1 03 : 15 : 02 xxxx mysqld: InnoDB: ###### Diagnostic info printed to the standard error stream Dec 1 03 : 15 : 02 xxxx mysqld: InnoDB: Error: semaphore wait has lasted > 600 seconds Dec 1 03 : 15 : 02 xxxx mysqld: InnoDB: We intentionally crash the server, because it appears to be hung. Dec 1 03 : 15 : 02 xxxx mysqld: 2019 - 12 - 01 03 : 15 : 02 7fdb957fe700 InnoDB: Assertion failure in thread 140581082752768 in file srv0srv.cc line 2410 Dec 1 03 : 15 : 02 xxxx mysqld: InnoDB: We intentionally generate a memory trap. Dec 1 03 : 15 : 02 xxxx mysqld: InnoDB: Submit a detailed bug report to https: //jira.mariadb.org/ Dec 1 03 : 15 : 02 xxxx mysqld: InnoDB: If you get repeated assertion failures or crashes, even Dec 1 03 : 15 : 02 xxxx mysqld: InnoDB: immediately after the mysqld startup, there may be Dec 1 03 : 15 : 02 xxxx mysqld: InnoDB: corruption in the InnoDB tablespace. Please refer to Dec 1 03 : 15 : 02 xxxx mysqld: InnoDB: http: //dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html Dec 1 03 : 15 : 02 xxxx mysqld: InnoDB: about forcing recovery. Dec 1 03 : 15 : 02 xxxx mysqld: 191201 3 : 15 : 02 [ERROR] mysqld got signal 6 ; Dec 1 03 : 15 : 02 xxxx mysqld: This could be because you hit a bug. It is also possible that this binary Dec 1 03 : 15 : 02 xxxx mysqld: or one of the libraries it was linked against is corrupt, improperly built, Dec 1 03 : 15 : 02 xxxx mysqld: or misconfigured. This error can also be caused by malfunctioning hardware. Dec 1 03 : 15 : 02 xxxx mysqld: To report this bug, see https: //mariadb.com/kb/en/reporting-bugs Dec 1 03 : 15 : 02 xxxx mysqld: We will try our best to scrape up some info that will hopefully help Dec 1 03 : 15 : 02 xxxx mysqld: diagnose the problem, but since we have already crashed, Dec 1 03 : 15 : 02 xxxx mysqld: something is definitely wrong and this may fail. Dec 1 03 : 15 : 02 xxxx mysqld: Server version: 10.1 . 41 -MariaDB Dec 1 03 : 15 : 02 xxxx mysqld: key_buffer_size= 134217728 Dec 1 03 : 15 : 02 xxxx mysqld: read_buffer_size= 2097152 Dec 1 03 : 15 : 02 xxxx mysqld: max_used_connections= 174 Dec 1 03 : 15 : 02 xxxx mysqld: max_threads= 502 Dec 1 03 : 15 : 02 xxxx mysqld: thread_count= 173 Dec 1 03 : 15 : 02 xxxx mysqld: It is possible that mysqld could use up to Dec 1 03 : 15 : 02 xxxx mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2197671 K bytes of memory Dec 1 03 : 15 : 02 xxxx mysqld: Hope that's ok; if not, decrease some variables in the equation. Dec 1 03 : 15 : 02 xxxx mysqld: Thread pointer: 0x0 Dec 1 03 : 15 : 02 xxxx mysqld: Attempting backtrace. You can use the following information to find out Dec 1 03 : 15 : 02 xxxx mysqld: where mysqld died. If you see no messages after this , something went Dec 1 03 : 15 : 02 xxxx mysqld: terribly wrong... Dec 1 03 : 15 : 02 xxxx mysqld: stack_bottom = 0x0 thread_stack 0x49000 Dec 1 03 : 15 : 02 xxxx mysqld: *** buffer overflow detected ***: /usr/sbin/mysqld terminated Dec 1 03 : 15 : 02 xxxx mysqld: ======= Backtrace: ========= Dec 1 03 : 15 : 02 xxxx mysqld: /lib64/libc.so. 6 (__fortify_fail+ 0x37 )[ 0x7fdc83d53527 ] Dec 1 03 : 15 : 02 xxxx mysqld: /lib64/libc.so. 6 (+ 0x1166a2 )[ 0x7fdc83d516a2 ] Dec 1 03 : 15 : 02 xxxx mysqld: /lib64/libc.so. 6 (+ 0x118487 )[ 0x7fdc83d53487 ] Dec 1 03 : 15 : 02 xxxx mysqld: /usr/sbin/mysqld(my_addr_resolve+ 0xd0 )[ 0x55a270e96c40 ] Dec 1 03 : 15 : 02 xxxx mysqld: /usr/sbin/mysqld(my_print_stacktrace+ 0x1c2 )[ 0x55a270e82e82 ] Dec 1 03 : 15 : 02 xxxx mysqld: /usr/sbin/mysqld(handle_fatal_signal+ 0x305 )[ 0x55a2709a55d5 ] Dec 1 03 : 15 : 02 xxxx mysqld: /lib64/libpthread.so. 0 (+ 0xf5f0 )[ 0x7fdc85bd45f0 ] Dec 1 03 : 15 : 02 xxxx mysqld: /lib64/libc.so. 6 (gsignal+ 0x37 )[ 0x7fdc83c71337 ] Dec 1 03 : 15 : 02 xxxx mysqld: /lib64/libc.so. 6 (abort+ 0x148 )[ 0x7fdc83c72a28 ] Dec 1 03 : 15 : 02 xxxx mysqld: /usr/sbin/mysqld(+ 0x941359 )[ 0x55a270d20359 ] Dec 1 03 : 15 : 02 xxxx mysqld: /lib64/libpthread.so. 0 (+ 0x7e65 )[ 0x7fdc85bcce65 ] Dec 1 03 : 15 : 02 xxxx mysqld: /lib64/libc.so. 6 (clone+ 0x6d )[ 0x7fdc83d3988d ]
          CrewOne Sander Pilon added a comment -

          We have migraties to percona, wit much succes. No more crashes.

          CrewOne Sander Pilon added a comment - We have migraties to percona, wit much succes. No more crashes.

          Stack trace from george.ccto's occurrence of the failure is attached as gdb.txt

          elenst Elena Stepanova added a comment - Stack trace from george.ccto 's occurrence of the failure is attached as gdb.txt
          george.ccto George To added a comment -

          F.Y.I. -
          A few days ago, we have updated MariaDB from 10.1.41 to 10.1.44 , it seems the problem disappeared (for a few days already).

          george.ccto George To added a comment - F.Y.I. - A few days ago, we have updated MariaDB from 10.1.41 to 10.1.44 , it seems the problem disappeared (for a few days already).

          In gdb.txt (as well as the Description), I see that the server crashed while trying to produce a stack trace:

          mariadb-10.1.41

          #6  0x000055f142b40c40 in my_addr_resolve (ptr=<optimized out>, loc=loc@entry=0x7fdb42ffd0b0) at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/mysys/my_addr_resolve.c:222
          #7  0x000055f142b2ce82 in print_with_addr_resolve (n=<optimized out>, addrs=0x7fdb42ffd0d0) at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/mysys/stacktrace.c:253
          #8  my_print_stacktrace (stack_bottom=<optimized out>, thread_stack=299008) at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/mysys/stacktrace.c:271
          #9  0x000055f14264f5d5 in handle_fatal_signal (sig=6) at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/sql/signal_handler.cc:166
          #10 <signal handler called>
          #11 0x00007fdb7082d337 in raise () from /lib64/libc.so.6
          #12 0x00007fdb7082ea28 in abort () from /lib64/libc.so.6
          #13 0x000055f1429ca359 in srv_error_monitor_thread () at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/storage/xtradb/srv/srv0srv.cc:2410
          

          That is, XtraDB intentionally initiated a kill of itself after a hang. In addition, the server crashed while trying to produce a stack trace. cvicentiu should be aware of this issue and try to remove the use of functions that are not safe to use in signal handlers. There is similar crash-while-crashing output in MDEV-19310.

          Now, what could cause the hang? I was going to suggest MDEV-13983, but it turns out that that regression was fixed in 10.1.41.

          Most threads are waiting for dict_sys->mutex while trying to open an InnoDB table.

          Thread 5 (purge) is attempting to acquire dict_operation_lock in shared mode. That should prevent others from acquiring it in exclusive mode.

          A possible cause of the deadlock could be between Thread 4 and Thread 2. This strongly looks like a duplicate of MDEV-20621, which was fixed in 10.1.42. Also george.ccto’s comment supports this.

          CrewOne, do you use fulltext indexes on InnoDB or XtraDB tables? If not, your crash should be due to something else. Also, you mention "update on JOINs". Were there any messages in the server error log before the crash?

          paladox, jcrespo, the InnoDB diagnostic output is next to useless for diagnosing hangs. We need stack traces of all threads during the hang, and to ease debugging, the information on who is holding the mutexes that almost everyone is waiting for:

          gdb -ex 'set height 0' -ex 'thread apply all backtrace' -ex 'print *dict_operation_lock' -ex 'print *dict_sys.mutex' -ex detach -ex quit -p $(pgrep -x myqsld)
          

          marko Marko Mäkelä added a comment - In gdb.txt (as well as the Description), I see that the server crashed while trying to produce a stack trace: mariadb-10.1.41 #6 0x000055f142b40c40 in my_addr_resolve (ptr=<optimized out>, loc=loc@entry=0x7fdb42ffd0b0) at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/mysys/my_addr_resolve.c:222 #7 0x000055f142b2ce82 in print_with_addr_resolve (n=<optimized out>, addrs=0x7fdb42ffd0d0) at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/mysys/stacktrace.c:253 #8 my_print_stacktrace (stack_bottom=<optimized out>, thread_stack=299008) at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/mysys/stacktrace.c:271 #9 0x000055f14264f5d5 in handle_fatal_signal (sig=6) at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/sql/signal_handler.cc:166 #10 <signal handler called> #11 0x00007fdb7082d337 in raise () from /lib64/libc.so.6 #12 0x00007fdb7082ea28 in abort () from /lib64/libc.so.6 #13 0x000055f1429ca359 in srv_error_monitor_thread () at /usr/src/debug/mariadb-10.1.41-linux-x86_64/server/src_0/storage/xtradb/srv/srv0srv.cc:2410 That is, XtraDB intentionally initiated a kill of itself after a hang. In addition, the server crashed while trying to produce a stack trace. cvicentiu should be aware of this issue and try to remove the use of functions that are not safe to use in signal handlers. There is similar crash-while-crashing output in MDEV-19310 . Now, what could cause the hang? I was going to suggest MDEV-13983 , but it turns out that that regression was fixed in 10.1.41. Most threads are waiting for dict_sys->mutex while trying to open an InnoDB table. Thread 5 (purge) is attempting to acquire dict_operation_lock in shared mode. That should prevent others from acquiring it in exclusive mode. A possible cause of the deadlock could be between Thread 4 and Thread 2. This strongly looks like a duplicate of MDEV-20621 , which was fixed in 10.1.42. Also george.ccto ’s comment supports this. CrewOne , do you use fulltext indexes on InnoDB or XtraDB tables? If not, your crash should be due to something else. Also, you mention "update on JOINs". Were there any messages in the server error log before the crash? paladox , jcrespo , the InnoDB diagnostic output is next to useless for diagnosing hangs. We need stack traces of all threads during the hang, and to ease debugging, the information on who is holding the mutexes that almost everyone is waiting for: gdb -ex 'set height 0' -ex 'thread apply all backtrace' -ex 'print *dict_operation_lock' -ex 'print *dict_sys.mutex' -ex detach -ex quit -p $(pgrep -x myqsld)

          People

            marko Marko Mäkelä
            CrewOne Sander Pilon
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.