Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23101

SIGSEGV in lock_rec_unlock() when Galera is enabled

Details

    Description

      Server is crashing few times a day with the following stacktrace:

      /lib64/libc.so.6(+0x115b62)[0x7fcadad58b62]
      /lib64/libc.so.6(+0x117947)[0x7fcadad5a947]
      /usr/sbin/mysqld(my_addr_resolve+0xda)[0x55f2dc88214a]
      /usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x55f2dc86a862]
      /usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x55f2dc300f7f]
      /lib64/libpthread.so.0(+0xf5d0)[0x7fcadc9a65d0]
      /usr/sbin/mysqld(+0xa9a908)[0x55f2dc4eb908]
      /usr/sbin/mysqld(+0xa9d47e)[0x55f2dc4ee47e]
      /usr/sbin/mysqld(+0xb22626)[0x55f2dc573626]
      /usr/sbin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_orderybPySA_+0x2110)[0x55f2dc1a5c60]
      /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2ff5)[0x55f2dc0fcd15]
      /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x1fb)[0x55f2dc1028ab]
      /usr/sbin/mysqld(+0x5a0090)[0x55f2dbff1090]
      /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x28f8)[0x55f2dc1059d8]
      /usr/sbin/mysqld(_Z10do_commandP3THD+0x109)[0x55f2dc106049]
      /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1fa)[0x55f2dc1e3bfa]
      /usr/sbin/mysqld(handle_one_connection+0x3d)[0x55f2dc1e3cdd]
      /usr/sbin/mysqld(+0xdcb78d)[0x55f2dc81c78d]
      /lib64/libpthread.so.0(+0x7dd5)[0x7fcadc99edd5]
      /lib64/libc.so.6(clone+0x6d)[0x7fcadad40ead]
      

      3 Top frames, resolved

      /usr/sbin/mysqld(+0xa9a908)[0x55f2dc4eb908] lock_rec_has_to_wait_in_queue(ib_lock_t const*) (/usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/lock0priv.ic:79).
       /usr/sbin/mysqld(0xa9d47e)[0x55f2dc4ee47e] lock_rec_unlock(trx_t*, buf_block_t const*, unsigned char const*, lock_mode) (/usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:4214).
      /usr/sbin/mysqld(+0xb22626)[0x55f2dc573626] row_unlock_for_mysql(row_prebuilt_t*, unsigned long) (/usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0mysql.cc:2093).
      

      We tried to set up for core dump generation, but it appears it crashes trying to access memory and therefore no core dump is generated:

      200703 13:10:16 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.4.13-MariaDB-log
      key_buffer_size=134217728
      read_buffer_size=2097152
      max_used_connections=10223
      max_threads=20002
      thread_count=8406
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 123517134 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7faf48086b98
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7facc4190cc0 thread_stack 0x49000
      *** buffer overflow detected ***: /usr/sbin/mysqld terminated
      ======= Backtrace: =========
      /lib64/libc.so.6(__fortify_fail+0x37)[0x7fcadad5a9e7]
      /lib64/libc.so.6(+0x115b62)[0x7fcadad58b62]
      /lib64/libc.so.6(+0x117947)[0x7fcadad5a947]
      /usr/sbin/mysqld(my_addr_resolve+0xda)[0x55f2dc88214a]
      /usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x55f2dc86a862]
      /usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x55f2dc300f7f]
      /lib64/libpthread.so.0(+0xf5d0)[0x7fcadc9a65d0]
      /usr/sbin/mysqld(+0xa9a908)[0x55f2dc4eb908]
      /usr/sbin/mysqld(+0xa9d47e)[0x55f2dc4ee47e]
      /usr/sbin/mysqld(+0xb22626)[0x55f2dc573626]
      /usr/sbin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_orderybPySA_+0x2110)[0x55f2dc1a5c60]
      /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2ff5)[0x55f2dc0fcd15]
      /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x1fb)[0x55f2dc1028ab]
      /usr/sbin/mysqld(+0x5a0090)[0x55f2dbff1090]
      /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x28f8)[0x55f2dc1059d8]
      /usr/sbin/mysqld(_Z10do_commandP3THD+0x109)[0x55f2dc106049]
      /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1fa)[0x55f2dc1e3bfa]
      /usr/sbin/mysqld(handle_one_connection+0x3d)[0x55f2dc1e3cdd]
      /usr/sbin/mysqld(+0xdcb78d)[0x55f2dc81c78d]
      /lib64/libpthread.so.0(+0x7dd5)[0x7fcadc99edd5]
      /lib64/libc.so.6(clone+0x6d)[0x7fcadad40ead]
      ======= Memory map: ========
      55f2dba51000-55f2dceca000 r-xp 00000000 fd:00 436210356                  /usr/sbin/mysqld
      55f2dd0ca000-55f2dd1fd000 r--p 01479000 fd:00 436210356                  /usr/sbin/mysqld
      55f2dd1fd000-55f2dd2b7000 rw-p 015ac000 fd:00 436210356                  /usr/sbin/mysqld
      55f2dd2b7000-55f2ddb66000 rw-p 00000000 00:00 0 
      55f2de4cb000-55f348284000 rw-p 00000000 00:00 0                          [heap]
      7fabcc000000-7fabcdb8a000 rw-p 00000000 00:00 0 
       
      [ ... ]
       
      7fcadcdd6000-7fcadcdd7000 rw-p 00000000 00:00 0 
      7fff76cfc000-7fff76d1d000 rw-p 00000000 00:00 0                          [stack]
      7fff76d84000-7fff76d86000 r-xp 00000000 00:00 0                          [vdso]
      ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
      Fatal signal 6 while backtracing
      
      

      Attachments

        1. thd.txt
          71 kB
        2. trx.txt
          8 kB

        Issue Links

          Activity

            rpizzi Rick Pizzi (Inactive) created issue -
            elenst Elena Stepanova made changes -
            Field Original Value New Value
            Labels need_feedback

            Resolving the last 3 levels of stack trace:

            /usr/sbin/mysqld(+0xa9a908)[0x55f2dc4eb908]
            lock_rec_has_to_wait_in_queue(ib_lock_t const*) (/usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/lock0priv.ic:79).
             
            /usr/sbin/mysqld(0xa9d47e)[0x55f2dc4ee47e]
            lock_rec_unlock(trx_t*, buf_block_t const*, unsigned char const*, lock_mode) (/usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:4214).
             
             
            /usr/sbin/mysqld(+0xb22626)[0x55f2dc573626]
            row_unlock_for_mysql(row_prebuilt_t*, unsigned long) (/usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0mysql.cc:2093).
            
            

            rpizzi Rick Pizzi (Inactive) added a comment - Resolving the last 3 levels of stack trace: /usr/sbin/mysqld(+0xa9a908)[0x55f2dc4eb908] lock_rec_has_to_wait_in_queue(ib_lock_t const*) (/usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/lock0priv.ic:79).   /usr/sbin/mysqld(0xa9d47e)[0x55f2dc4ee47e] lock_rec_unlock(trx_t*, buf_block_t const*, unsigned char const*, lock_mode) (/usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:4214).     /usr/sbin/mysqld(+0xb22626)[0x55f2dc573626] row_unlock_for_mysql(row_prebuilt_t*, unsigned long) (/usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0mysql.cc:2093).

            The closest that we have among open issues seems to be MDEV-14846 (and consequently MDEV-20666 which MDEV-14846 links itself to). Both of them are only known to cause problems on debug builds, but it doesn't mean that the underlying cause can't affect non-debug ones.

            elenst Elena Stepanova added a comment - The closest that we have among open issues seems to be MDEV-14846 (and consequently MDEV-20666 which MDEV-14846 links itself to). Both of them are only known to cause problems on debug builds, but it doesn't mean that the underlying cause can't affect non-debug ones.
            elenst Elena Stepanova made changes -
            elenst Elena Stepanova made changes -
            elenst Elena Stepanova made changes -
            Fix Version/s 10.4 [ 22408 ]
            Assignee Oleksandr Byelkin [ sanja ]
            Labels need_feedback

            rpizzi if it crashes repeatadly where is test suite?

            sanja Oleksandr Byelkin added a comment - rpizzi if it crashes repeatadly where is test suite?

            This is a production system crashing, of a customer of RDBA. We were unable to reproduce the crash, especially because the cause is not known.

            rpizzi Rick Pizzi (Inactive) added a comment - This is a production system crashing, of a customer of RDBA. We were unable to reproduce the crash, especially because the cause is not known.

            Ah, that stack trace in innodb is a part which was unresolved
            so it looks like innodb crash

            sanja Oleksandr Byelkin added a comment - Ah, that stack trace in innodb is a part which was unresolved so it looks like innodb crash
            sanja Oleksandr Byelkin made changes -
            Comment [ A comment with security level 'Developers' was removed. ]
            sanja Oleksandr Byelkin made changes -
            Comment [ I see here at least 2 different traces...
            1) in my_addr_resolve
            2) lock_rec_has_to_wait_in_queue in innodb
            is it correct? how thay are related? ]
            sanja Oleksandr Byelkin made changes -
            Assignee Oleksandr Byelkin [ sanja ] Marko Mäkelä [ marko ]

            update processing order by very strange, what they do at startup?

            sanja Oleksandr Byelkin added a comment - update processing order by very strange, what they do at startup?

            It seems that lock is an invalid pointer:

                            for (lock = first_lock; lock != NULL;
                                     lock = lock_rec_get_next(heap_no, lock)) {
                                    if (lock_get_wait(lock)
                                            && !lock_rec_has_to_wait_in_queue(lock)) {
            

            The signal is sent in lock_rec_has_to_wait_in_queue(lock), which appears to be inlined in the stack trace:

            	for (lock = lock_rec_get_first_on_page_addr(hash, space, page_no);
            	     lock != wait_lock;
            	     lock = lock_rec_get_next_on_page_const(lock)) {
             
            		const byte*	p = (const byte*) &lock[1];
             
            		if (heap_no < lock_rec_get_n_bits(lock)
            

            We would fail to stop the iteration if we never find wait_lock. Invoking lock_rec_get_n_bits(lock=NULL) would certainly cause the SIGSEGV.

            rpizzi, Does the crash involve Galera? Or innodb_lock_schedule_algorithm=VATS? The root cause of this failure should be an invalid call to unlock a row. It might also involve some complex join. Unlocking rows ("semi-consistent reads") was originally introduced in MySQL Bug #3300.

            marko Marko Mäkelä added a comment - It seems that lock is an invalid pointer: for (lock = first_lock; lock != NULL; lock = lock_rec_get_next(heap_no, lock)) { if (lock_get_wait(lock) && !lock_rec_has_to_wait_in_queue(lock)) { The signal is sent in lock_rec_has_to_wait_in_queue(lock) , which appears to be inlined in the stack trace: for (lock = lock_rec_get_first_on_page_addr(hash, space, page_no); lock != wait_lock; lock = lock_rec_get_next_on_page_const(lock)) {   const byte* p = ( const byte*) &lock[1];   if (heap_no < lock_rec_get_n_bits(lock) We would fail to stop the iteration if we never find wait_lock . Invoking lock_rec_get_n_bits(lock=NULL) would certainly cause the SIGSEGV. rpizzi , Does the crash involve Galera? Or innodb_lock_schedule_algorithm=VATS ? The root cause of this failure should be an invalid call to unlock a row. It might also involve some complex join. Unlocking rows ("semi-consistent reads") was originally introduced in MySQL Bug #3300.

            Yes this is a galera node, crash always happens on the node that has the master role marko

            rpizzi Rick Pizzi (Inactive) added a comment - Yes this is a galera node, crash always happens on the node that has the master role marko

            As far as I can tell, lock_rec_unlock() is continuously holding lock_sys->mutex, which should protect the creation and destruction of explicit record locks. The most plausible explanation for the crash is the following code in lock_rec_has_to_wait_in_queue(lock), which causes us to skip wait_lock and reach the end of the list:

            	for (lock = lock_rec_get_first_on_page_addr(hash, space, page_no);
            	     lock != wait_lock;
            	     lock = lock_rec_get_next_on_page_const(lock)) {
             
            		const byte*	p = (const byte*) &lock[1];
             
            		if (heap_no < lock_rec_get_n_bits(lock)
            		    && (p[bit_offset] & bit_mask)
            		    && lock_has_to_wait(wait_lock, lock)) {
            #ifdef WITH_WSREP
            			if (wsrep_thd_is_BF(wait_lock->trx->mysql_thd, FALSE) &&
            			    wsrep_thd_is_BF(lock->trx->mysql_thd, TRUE)) {
            				if (UNIV_UNLIKELY(wsrep_debug)) {
            					mtr_t mtr;
            					ib::info() << "WSREP: waiting BF trx: " << ib::hex(wait_lock->trx->id)
            						   << " query: " << wsrep_thd_query(wait_lock->trx->mysql_thd);
            					lock_rec_print(stderr, wait_lock, mtr);
            					ib::info() << "WSREP: do not wait another BF trx: " << ib::hex(lock->trx->id)
            						   << " query: " << wsrep_thd_query(lock->trx->mysql_thd);
            					lock_rec_print(stderr, lock, mtr);
            				}
            				/* don't wait for another BF lock */
            				continue;
            			}
            #endif /* WITH_WSREP */
             
            			return(lock);
            		}
            	}
             
            	return(NULL);
            

            rpizzi, do you see those WSREP messages in the server error log right before the crash?

            It might be that this bug would be addressed by fixing MDEV-21910, or that the above WITH_WSREP code is trying to work around MDEV-21910. In any case, I think that the code is misplaced, and the code should be moved to the relevant caller of lock_rec_has_to_wait_in_queue().

            The problematic code is already present in 10.1 (which is the first major version where the WITH_WSREP changes are part of the main server).

            marko Marko Mäkelä added a comment - As far as I can tell, lock_rec_unlock() is continuously holding lock_sys->mutex , which should protect the creation and destruction of explicit record locks. The most plausible explanation for the crash is the following code in lock_rec_has_to_wait_in_queue(lock) , which causes us to skip wait_lock and reach the end of the list: for (lock = lock_rec_get_first_on_page_addr(hash, space, page_no); lock != wait_lock; lock = lock_rec_get_next_on_page_const(lock)) {   const byte* p = ( const byte*) &lock[1];   if (heap_no < lock_rec_get_n_bits(lock) && (p[bit_offset] & bit_mask) && lock_has_to_wait(wait_lock, lock)) { #ifdef WITH_WSREP if (wsrep_thd_is_BF(wait_lock->trx->mysql_thd, FALSE) && wsrep_thd_is_BF(lock->trx->mysql_thd, TRUE)) { if (UNIV_UNLIKELY(wsrep_debug)) { mtr_t mtr; ib::info() << "WSREP: waiting BF trx: " << ib::hex(wait_lock->trx->id) << " query: " << wsrep_thd_query(wait_lock->trx->mysql_thd); lock_rec_print(stderr, wait_lock, mtr); ib::info() << "WSREP: do not wait another BF trx: " << ib::hex(lock->trx->id) << " query: " << wsrep_thd_query(lock->trx->mysql_thd); lock_rec_print(stderr, lock, mtr); } /* don't wait for another BF lock */ continue ; } #endif /* WITH_WSREP */   return (lock); } }   return (NULL); rpizzi , do you see those WSREP messages in the server error log right before the crash? It might be that this bug would be addressed by fixing MDEV-21910 , or that the above WITH_WSREP code is trying to work around MDEV-21910 . In any case, I think that the code is misplaced, and the code should be moved to the relevant caller of lock_rec_has_to_wait_in_queue() . The problematic code is already present in 10.1 (which is the first major version where the WITH_WSREP changes are part of the main server).
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Component/s Galera [ 10124 ]
            Fix Version/s 10.1 [ 16100 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.5 [ 23123 ]
            Affects Version/s 10.1 [ 16100 ]
            Affects Version/s 10.2 [ 14601 ]
            Affects Version/s 10.3 [ 22126 ]
            Affects Version/s 10.4 [ 22408 ]
            Affects Version/s 10.5 [ 23123 ]
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ jplindst ]
            marko Marko Mäkelä made changes -
            Summary server crashing repeatedly SIGSEGV in lock_rec_unlock() when Galera is enabled

            No we did not see that error but maybe it requires wsrep_debug = ON. I will try and enable it.

            rpizzi Rick Pizzi (Inactive) added a comment - No we did not see that error but maybe it requires wsrep_debug = ON. I will try and enable it.

            I believe that this code should be disabled by default. However, until MDEV-19563 was addressed, Galera was recommending setting the deprecated parameter innodb_locks_unsafe_for_binlog. The lock_rec_unlock() should only be effective if ‘semi-consistent reads’ are enabled (when that deprecated parameter is set, or the transaction isolation level is READ UNCOMMITTED or READ COMMITTED).

            marko Marko Mäkelä added a comment - I believe that this code should be disabled by default. However, until MDEV-19563 was addressed, Galera was recommending setting the deprecated parameter innodb_locks_unsafe_for_binlog . The lock_rec_unlock() should only be effective if ‘semi-consistent reads’ are enabled (when that deprecated parameter is set, or the transaction isolation level is READ UNCOMMITTED or READ COMMITTED ).

            In Galera case it should go that piece of code only if we have two transactions running on applier node making them both Brute Force (BF) as there is wsrep_thd_is_BF conditions for both locks. I'm surprised if error log does not have anything else about this situation.

            jplindst Jan Lindström (Inactive) added a comment - In Galera case it should go that piece of code only if we have two transactions running on applier node making them both Brute Force (BF) as there is wsrep_thd_is_BF conditions for both locks. I'm surprised if error log does not have anything else about this situation.

            So get surprised Nothing in logs.

            rpizzi Rick Pizzi (Inactive) added a comment - So get surprised Nothing in logs.
            nicklamb Nick (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Nick [ nicklamb ]
            rpizzi Rick Pizzi (Inactive) made changes -

            We need to run the server with the already existing option to skip backtraces. I have investigated the source code and in fact that's where it crashes the 2nd time, so using --skip-stack-trace should avoid that.

            rpizzi Rick Pizzi (Inactive) added a comment - We need to run the server with the already existing option to skip backtraces. I have investigated the source code and in fact that's where it crashes the 2nd time, so using --skip-stack-trace should avoid that.
            julien.fritsch Julien Fritsch made changes -
            Assignee Nick [ nicklamb ] Julius Goryavsky [ sysprg ]
            julien.fritsch Julien Fritsch made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            rpizzi Does the problem repeat with --wsrep-debug=ON ? That setting has some negative effect on performance but I think we need that error log with it to narrow down the search.

            jplindst Jan Lindström (Inactive) added a comment - rpizzi Does the problem repeat with --wsrep-debug=ON ? That setting has some negative effect on performance but I think we need that error log with it to narrow down the search.
            julien.fritsch Julien Fritsch made changes -
            Assignee Julius Goryavsky [ sysprg ] Jan Lindström [ jplindst ]
            julien.fritsch Julien Fritsch made changes -
            Labels need_feedback

            jplindst the crash did not happen since about 2 weeks now (without any change that we are aware of).
            We have disabled backtracing in the hope to get a core dump. wsrep_debug in galera 4 is too much verbose and can't be enabled with customers which have a busy workload like in this case, as it produces gigabytes of logs per hour. If we get a core it's not important anyway.

            Thanks

            Rick

            rpizzi Rick Pizzi (Inactive) added a comment - jplindst the crash did not happen since about 2 weeks now (without any change that we are aware of). We have disabled backtracing in the hope to get a core dump. wsrep_debug in galera 4 is too much verbose and can't be enabled with customers which have a busy workload like in this case, as it produces gigabytes of logs per hour. If we get a core it's not important anyway. Thanks Rick

            I was trying to understand why we would not want to wait another BF transaction lock in lock_rec_has_to_wait_in_queue() function. However, I could not find any reason, if something like that happens there is no option except wait for your turn. I even tested it by removing the code and running regression test suite and could not find a single regression. For some reason there is more complex conditions on e.g. MySQL 8.0 version of this code.

            jplindst Jan Lindström (Inactive) added a comment - I was trying to understand why we would not want to wait another BF transaction lock in lock_rec_has_to_wait_in_queue() function. However, I could not find any reason, if something like that happens there is no option except wait for your turn. I even tested it by removing the code and running regression test suite and could not find a single regression. For some reason there is more complex conditions on e.g. MySQL 8.0 version of this code.
            jplindst Jan Lindström (Inactive) made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Fix Version/s 10.1 [ 16100 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            Roel Roel Van de Paar added a comment - - edited

            *** buffer overflow detected ***
            

            : If it is possible to use an ASAN instrumented binary for some time (for example on a clone/binlog slave, or even on a low-use production machine) then we may be able to more easily see the bug show up the error log.

            Roel Roel Van de Paar added a comment - - edited *** buffer overflow detected *** : If it is possible to use an ASAN instrumented binary for some time (for example on a clone/binlog slave, or even on a low-use production machine) then we may be able to more easily see the bug show up the error log.
            Roel Roel Van de Paar made changes -
            Description Server is crashing few times a day with the following stacktrace:

            {code}
            /lib64/libc.so.6(+0x115b62)[0x7fcadad58b62]
            /lib64/libc.so.6(+0x117947)[0x7fcadad5a947]
            /usr/sbin/mysqld(my_addr_resolve+0xda)[0x55f2dc88214a]
            /usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x55f2dc86a862]
            /usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x55f2dc300f7f]
            /lib64/libpthread.so.0(+0xf5d0)[0x7fcadc9a65d0]
            /usr/sbin/mysqld(+0xa9a908)[0x55f2dc4eb908]
            /usr/sbin/mysqld(+0xa9d47e)[0x55f2dc4ee47e]
            /usr/sbin/mysqld(+0xb22626)[0x55f2dc573626]
            /usr/sbin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_orderybPySA_+0x2110)[0x55f2dc1a5c60]
            /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2ff5)[0x55f2dc0fcd15]
            /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x1fb)[0x55f2dc1028ab]
            /usr/sbin/mysqld(+0x5a0090)[0x55f2dbff1090]
            /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x28f8)[0x55f2dc1059d8]
            /usr/sbin/mysqld(_Z10do_commandP3THD+0x109)[0x55f2dc106049]
            /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1fa)[0x55f2dc1e3bfa]
            /usr/sbin/mysqld(handle_one_connection+0x3d)[0x55f2dc1e3cdd]
            /usr/sbin/mysqld(+0xdcb78d)[0x55f2dc81c78d]
            /lib64/libpthread.so.0(+0x7dd5)[0x7fcadc99edd5]
            /lib64/libc.so.6(clone+0x6d)[0x7fcadad40ead]
            {code}

            We tried to set up for core dump generation, but it appears it crashes trying to access memory and therefore no core dump is generated:

            {code}
            200703 13:10:16 [ERROR] mysqld got signal 11 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.

            To report this bug, see https://mariadb.com/kb/en/reporting-bugs

            We will try our best to scrape up some info that will hopefully help
            diagnose the problem, but since we have already crashed,
            something is definitely wrong and this may fail.

            Server version: 10.4.13-MariaDB-log
            key_buffer_size=134217728
            read_buffer_size=2097152
            max_used_connections=10223
            max_threads=20002
            thread_count=8406
            It is possible that mysqld could use up to
            key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 123517134 K bytes of memory
            Hope that's ok; if not, decrease some variables in the equation.

            Thread pointer: 0x7faf48086b98
            Attempting backtrace. You can use the following information to find out
            where mysqld died. If you see no messages after this, something went
            terribly wrong...
            stack_bottom = 0x7facc4190cc0 thread_stack 0x49000
            *** buffer overflow detected ***: /usr/sbin/mysqld terminated
            ======= Backtrace: =========
            /lib64/libc.so.6(__fortify_fail+0x37)[0x7fcadad5a9e7]
            /lib64/libc.so.6(+0x115b62)[0x7fcadad58b62]
            /lib64/libc.so.6(+0x117947)[0x7fcadad5a947]
            /usr/sbin/mysqld(my_addr_resolve+0xda)[0x55f2dc88214a]
            /usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x55f2dc86a862]
            /usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x55f2dc300f7f]
            /lib64/libpthread.so.0(+0xf5d0)[0x7fcadc9a65d0]
            /usr/sbin/mysqld(+0xa9a908)[0x55f2dc4eb908]
            /usr/sbin/mysqld(+0xa9d47e)[0x55f2dc4ee47e]
            /usr/sbin/mysqld(+0xb22626)[0x55f2dc573626]
            /usr/sbin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_orderybPySA_+0x2110)[0x55f2dc1a5c60]
            /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2ff5)[0x55f2dc0fcd15]
            /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x1fb)[0x55f2dc1028ab]
            /usr/sbin/mysqld(+0x5a0090)[0x55f2dbff1090]
            /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x28f8)[0x55f2dc1059d8]
            /usr/sbin/mysqld(_Z10do_commandP3THD+0x109)[0x55f2dc106049]
            /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1fa)[0x55f2dc1e3bfa]
            /usr/sbin/mysqld(handle_one_connection+0x3d)[0x55f2dc1e3cdd]
            /usr/sbin/mysqld(+0xdcb78d)[0x55f2dc81c78d]
            /lib64/libpthread.so.0(+0x7dd5)[0x7fcadc99edd5]
            /lib64/libc.so.6(clone+0x6d)[0x7fcadad40ead]
            ======= Memory map: ========
            55f2dba51000-55f2dceca000 r-xp 00000000 fd:00 436210356 /usr/sbin/mysqld
            55f2dd0ca000-55f2dd1fd000 r--p 01479000 fd:00 436210356 /usr/sbin/mysqld
            55f2dd1fd000-55f2dd2b7000 rw-p 015ac000 fd:00 436210356 /usr/sbin/mysqld
            55f2dd2b7000-55f2ddb66000 rw-p 00000000 00:00 0
            55f2de4cb000-55f348284000 rw-p 00000000 00:00 0 [heap]
            7fabcc000000-7fabcdb8a000 rw-p 00000000 00:00 0

            [ ... ]

            7fcadcdd6000-7fcadcdd7000 rw-p 00000000 00:00 0
            7fff76cfc000-7fff76d1d000 rw-p 00000000 00:00 0 [stack]
            7fff76d84000-7fff76d86000 r-xp 00000000 00:00 0 [vdso]
            ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
            Fatal signal 6 while backtracing

            {code}
            Server is crashing few times a day with the following stacktrace:

            {code}
            /lib64/libc.so.6(+0x115b62)[0x7fcadad58b62]
            /lib64/libc.so.6(+0x117947)[0x7fcadad5a947]
            /usr/sbin/mysqld(my_addr_resolve+0xda)[0x55f2dc88214a]
            /usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x55f2dc86a862]
            /usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x55f2dc300f7f]
            /lib64/libpthread.so.0(+0xf5d0)[0x7fcadc9a65d0]
            /usr/sbin/mysqld(+0xa9a908)[0x55f2dc4eb908]
            /usr/sbin/mysqld(+0xa9d47e)[0x55f2dc4ee47e]
            /usr/sbin/mysqld(+0xb22626)[0x55f2dc573626]
            /usr/sbin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_orderybPySA_+0x2110)[0x55f2dc1a5c60]
            /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2ff5)[0x55f2dc0fcd15]
            /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x1fb)[0x55f2dc1028ab]
            /usr/sbin/mysqld(+0x5a0090)[0x55f2dbff1090]
            /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x28f8)[0x55f2dc1059d8]
            /usr/sbin/mysqld(_Z10do_commandP3THD+0x109)[0x55f2dc106049]
            /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1fa)[0x55f2dc1e3bfa]
            /usr/sbin/mysqld(handle_one_connection+0x3d)[0x55f2dc1e3cdd]
            /usr/sbin/mysqld(+0xdcb78d)[0x55f2dc81c78d]
            /lib64/libpthread.so.0(+0x7dd5)[0x7fcadc99edd5]
            /lib64/libc.so.6(clone+0x6d)[0x7fcadad40ead]
            {code}

            {noformat:title=3 Top frames, resolved}
            /usr/sbin/mysqld(+0xa9a908)[0x55f2dc4eb908] lock_rec_has_to_wait_in_queue(ib_lock_t const*) (/usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/lock0priv.ic:79).
             /usr/sbin/mysqld(0xa9d47e)[0x55f2dc4ee47e] lock_rec_unlock(trx_t*, buf_block_t const*, unsigned char const*, lock_mode) (/usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:4214).
            /usr/sbin/mysqld(+0xb22626)[0x55f2dc573626] row_unlock_for_mysql(row_prebuilt_t*, unsigned long) (/usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0mysql.cc:2093).
            {noformat}

            We tried to set up for core dump generation, but it appears it crashes trying to access memory and therefore no core dump is generated:

            {code}
            200703 13:10:16 [ERROR] mysqld got signal 11 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.

            To report this bug, see https://mariadb.com/kb/en/reporting-bugs

            We will try our best to scrape up some info that will hopefully help
            diagnose the problem, but since we have already crashed,
            something is definitely wrong and this may fail.

            Server version: 10.4.13-MariaDB-log
            key_buffer_size=134217728
            read_buffer_size=2097152
            max_used_connections=10223
            max_threads=20002
            thread_count=8406
            It is possible that mysqld could use up to
            key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 123517134 K bytes of memory
            Hope that's ok; if not, decrease some variables in the equation.

            Thread pointer: 0x7faf48086b98
            Attempting backtrace. You can use the following information to find out
            where mysqld died. If you see no messages after this, something went
            terribly wrong...
            stack_bottom = 0x7facc4190cc0 thread_stack 0x49000
            *** buffer overflow detected ***: /usr/sbin/mysqld terminated
            ======= Backtrace: =========
            /lib64/libc.so.6(__fortify_fail+0x37)[0x7fcadad5a9e7]
            /lib64/libc.so.6(+0x115b62)[0x7fcadad58b62]
            /lib64/libc.so.6(+0x117947)[0x7fcadad5a947]
            /usr/sbin/mysqld(my_addr_resolve+0xda)[0x55f2dc88214a]
            /usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x55f2dc86a862]
            /usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x55f2dc300f7f]
            /lib64/libpthread.so.0(+0xf5d0)[0x7fcadc9a65d0]
            /usr/sbin/mysqld(+0xa9a908)[0x55f2dc4eb908]
            /usr/sbin/mysqld(+0xa9d47e)[0x55f2dc4ee47e]
            /usr/sbin/mysqld(+0xb22626)[0x55f2dc573626]
            /usr/sbin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_orderybPySA_+0x2110)[0x55f2dc1a5c60]
            /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2ff5)[0x55f2dc0fcd15]
            /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x1fb)[0x55f2dc1028ab]
            /usr/sbin/mysqld(+0x5a0090)[0x55f2dbff1090]
            /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x28f8)[0x55f2dc1059d8]
            /usr/sbin/mysqld(_Z10do_commandP3THD+0x109)[0x55f2dc106049]
            /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1fa)[0x55f2dc1e3bfa]
            /usr/sbin/mysqld(handle_one_connection+0x3d)[0x55f2dc1e3cdd]
            /usr/sbin/mysqld(+0xdcb78d)[0x55f2dc81c78d]
            /lib64/libpthread.so.0(+0x7dd5)[0x7fcadc99edd5]
            /lib64/libc.so.6(clone+0x6d)[0x7fcadad40ead]
            ======= Memory map: ========
            55f2dba51000-55f2dceca000 r-xp 00000000 fd:00 436210356 /usr/sbin/mysqld
            55f2dd0ca000-55f2dd1fd000 r--p 01479000 fd:00 436210356 /usr/sbin/mysqld
            55f2dd1fd000-55f2dd2b7000 rw-p 015ac000 fd:00 436210356 /usr/sbin/mysqld
            55f2dd2b7000-55f2ddb66000 rw-p 00000000 00:00 0
            55f2de4cb000-55f348284000 rw-p 00000000 00:00 0 [heap]
            7fabcc000000-7fabcdb8a000 rw-p 00000000 00:00 0

            [ ... ]

            7fcadcdd6000-7fcadcdd7000 rw-p 00000000 00:00 0
            7fff76cfc000-7fff76d1d000 rw-p 00000000 00:00 0 [stack]
            7fff76d84000-7fff76d86000 r-xp 00000000 00:00 0 [vdso]
            ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
            Fatal signal 6 while backtracing

            {code}
            rpizzi Rick Pizzi (Inactive) added a comment - - edited

            Roelthis is what we (RDBA Team) have been asking since the beginning.
            We can certainly run a special binary as long as it is production quality, the only issue is the crash is not happening anymore since 2 weeks (it was happening almost daily before). We did not change anything and neither did the customer.

            Rick

            rpizzi Rick Pizzi (Inactive) added a comment - - edited Roel this is what we (RDBA Team) have been asking since the beginning. We can certainly run a special binary as long as it is production quality, the only issue is the crash is not happening anymore since 2 weeks (it was happening almost daily before). We did not change anything and neither did the customer. Rick
            serg Sergei Golubchik made changes -
            Priority Blocker [ 1 ] Major [ 3 ]
            valerii Valerii Kravchuk made changes -
            Labels need_feedback
            valerii Valerii Kravchuk made changes -
            Priority Major [ 3 ] Critical [ 2 ]

            This confirms that we could end up referencing NULL pointer on lock_rec_get_n_bits because of bug on lock_rec_has_to_wait_in_queue that is fixed a8ec45863b958757da61af3b2ce0a38b0a79d92c for 10.2 and it should be merged to other releases.

            jplindst Jan Lindström (Inactive) added a comment - This confirms that we could end up referencing NULL pointer on lock_rec_get_n_bits because of bug on lock_rec_has_to_wait_in_queue that is fixed a8ec45863b958757da61af3b2ce0a38b0a79d92c for 10.2 and it should be merged to other releases.
            marko Marko Mäkelä made changes -

            valerii posted information that confirms that the READ COMMITTED isolation level was being used: trx->isolation_level=1=TRX_ISO_READ_COMMITTED. It is the designed way of enabling the semi-consistent read for UPDATE operations. The idea of MySQL Bug #3300 is to unlock non-matching rows in a table scan.

            Setting the isolation level to the default REPEATABLE READ should avoid hitting this Galera bug.

            jplindst must debug this in order to create a repeatable test case. It is not enough to prevent a crash when Galera is enabled; we must also ensure that all locks are being replicated correctly across the cluster.

            marko Marko Mäkelä added a comment - valerii posted information that confirms that the READ COMMITTED isolation level was being used: trx->isolation_level=1=TRX_ISO_READ_COMMITTED . It is the designed way of enabling the semi-consistent read for UPDATE operations. The idea of MySQL Bug #3300 is to unlock non-matching rows in a table scan. Setting the isolation level to the default REPEATABLE READ should avoid hitting this Galera bug. jplindst must debug this in order to create a repeatable test case. It is not enough to prevent a crash when Galera is enabled; we must also ensure that all locks are being replicated correctly across the cluster.
            jplindst Jan Lindström (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Labels need_feedback

            I checked all assignments to trx_t::isolation_level in MariaDB 10.4.13, and it looks like the only way for InnoDB to use READ COMMITTED is when thd_tx_isolation() returns ISO_READ_COMMITTED. Outside the SQL parser, the only assignments that refer to that constant are related to Galera:

            git grep -w ISO_READ_COMMITTED mariadb-10.4.13

            mariadb-10.4.13:sql/wsrep_high_priority_service.cc:  thd->variables.tx_isolation= ISO_READ_COMMITTED;
            mariadb-10.4.13:sql/wsrep_high_priority_service.cc:  thd->tx_isolation          = ISO_READ_COMMITTED;
            mariadb-10.4.13:sql/wsrep_schema.cc:  thd->variables.tx_isolation= ISO_READ_COMMITTED;
            mariadb-10.4.13:sql/wsrep_sst.cc:    thd->variables.tx_isolation= ISO_READ_COMMITTED;
            mariadb-10.4.13:sql/wsrep_storage_service.cc:  thd->variables.tx_isolation = ISO_READ_COMMITTED;
            

            Based on this finding, it might be that this bug cannot be worked around.

            marko Marko Mäkelä added a comment - I checked all assignments to trx_t::isolation_level in MariaDB 10.4.13, and it looks like the only way for InnoDB to use READ COMMITTED is when thd_tx_isolation() returns ISO_READ_COMMITTED . Outside the SQL parser, the only assignments that refer to that constant are related to Galera: git grep -w ISO_READ_COMMITTED mariadb-10.4.13 mariadb-10.4.13:sql/wsrep_high_priority_service.cc: thd->variables.tx_isolation= ISO_READ_COMMITTED; mariadb-10.4.13:sql/wsrep_high_priority_service.cc: thd->tx_isolation = ISO_READ_COMMITTED; mariadb-10.4.13:sql/wsrep_schema.cc: thd->variables.tx_isolation= ISO_READ_COMMITTED; mariadb-10.4.13:sql/wsrep_sst.cc: thd->variables.tx_isolation= ISO_READ_COMMITTED; mariadb-10.4.13:sql/wsrep_storage_service.cc: thd->variables.tx_isolation = ISO_READ_COMMITTED; Based on this finding, it might be that this bug cannot be worked around.

            Analysis:

            • wsrep_high_priority_service : Used only on applier threads (i.e. brute force)
            • wsrep_schema : Used only when node starts from empty database
            • wsrep_sst : Used on joiner during SST only
            • wsrep_storage_service : Used for streaming replication

            There is no indication that these are involved on this case.

            jplindst Jan Lindström (Inactive) added a comment - Analysis: wsrep_high_priority_service : Used only on applier threads (i.e. brute force) wsrep_schema : Used only when node starts from empty database wsrep_sst : Used on joiner during SST only wsrep_storage_service : Used for streaming replication There is no indication that these are involved on this case.
            jplindst Jan Lindström (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) added a comment - - edited

            I made analysis from full stack dump and found out following points:

            Thread 9222 (Thread 0x7f694c3bf700 (LWP 54589)):
            #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
            #1  0x000055dee6f5c561 in wait (this=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:158
            #2  wait_low (reset_sig_count=<optimized out>, this=0x55df51c9c5d0) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:325
            #3  os_event_wait_low (event=0x55df51c9c5d0, reset_sig_count=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:502
            #4  0x000055dee6feddb8 in sync_array_wait_event (arr=0x55deea37b190, cell=@0x7f694c3ba7e0: 0x7f858c6db810) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/sync/sync0arr.cc:471
            #5  0x000055dee6f1525f in TTASEventMutex<GenericPolicy>::enter (this=this@entry=0x55dee7d30280 <lock_sys+64>, max_spins=60, max_spins@entry=30, max_delay=max_delay@entry=4, filename=filename@entry=0x55dee74aa480 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.13/storage/innobase/lock/lock0lock.cc", line=line@entry=1937) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/ib0mutex.h:477
            #6  0x000055dee6f2750b in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (n_spins=n_spins@entry=30, n_delay=4, line=line@entry=1937, name=0x55dee74aa480 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.13/storage/innobase/lock/lock0lock.cc", this=0x55dee7d30280 <lock_sys+64>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/ib0mutex.h:594
            #7  0x000055dee6f2e051 in lock_rec_lock (impl=impl@entry=false, mode=1027, block=0x7f7fa2f61118, heap_no=55, index=index@entry=0x7f6f5009db78, thr=thr@entry=0x7f6b88779e48) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:1937
            #8  0x000055dee6f2ebc9 in lock_clust_rec_read_check_and_lock (flags=flags@entry=0, block=<optimized out>, rec=rec@entry=0x7f7fa47e5e4a <Address 0x7f7fa47e5e4a out of bounds>, index=index@entry=0x7f6f5009db78, offsets=<optimized out>, mode=<optimized out>, gap_mode=1024, thr=thr@entry=0x7f6b88779e48) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:5829
            #9  0x000055dee6a3d5a0 in sel_set_rec_lock (pcur=<optimized out>, rec=0x7f7fa47e5e4a <Address 0x7f7fa47e5e4a out of bounds>, index=0x7f6f5009db78, offsets=<optimized out>, mode=<optimized out>, type=<optimized out>, thr=0x7f6b88779e48, mtr=0x7f694c3bb870) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0sel.cc:1264
            #10 0x000055dee6fc99c9 in row_search_mvcc (buf=<optimized out>, mode=PAGE_CUR_G, prebuilt=0x7f6b88779628, match_mode=<optimized out>, direction=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0sel.cc:5063
            #11 0x000055dee6ee91d7 in ha_innobase::general_fetch (this=0x7f6b883c3500, buf=0x7f6b886e9ea0 "\365)\216\005", direction=1, match_mode=0) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/handler/ha_innodb.cc:9544
            #12 0x000055dee6d3f411 in handler::ha_rnd_next (this=0x7f6b883c3500, buf=0x7f6b886e9ea0 "\365)\216\005") at /usr/src/debug/MariaDB-10.4.13/src_0/sql/handler.cc:2839
            #13 0x000055dee6e6da6c in rr_sequential (info=0x7f694c3bc020) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/records.cc:477
            #14 0x000055dee6bdea68 in read_record (this=0x7f694c3bc020) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/records.h:69
            #15 mysql_update (thd=thd@entry=0x7f6ef8305878, table_list=0x7f6ef8313f00, fields=..., values=..., conds=0x7f6ef8314d50, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=false, found_return=found_return@entry=0x7f694c3bc560, updated_return=updated_return@entry=0x7f694c3bc620) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_update.cc:975
            #16 0x000055dee6b36d15 in mysql_execute_command (thd=thd@entry=0x7f6ef8305878) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:4376
            #17 0x000055dee6b3c8ab in mysql_parse (thd=thd@entry=0x7f6ef8305878, rawbuf=rawbuf@entry=0x7f6ef8313ba0 "UPDATE ..."..., length=length@entry=317, parser_state=parser_state@entry=0x7f694c3be350, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:7900
            

            This is local UPDATE waiting lock_sys->mutex in lock_rec_lock().

            Thread 6699 (Thread 0x7f696e127700 (LWP 715)):
            #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
            #1  0x000055dee6f5c561 in wait (this=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:158
            #2  wait_low (reset_sig_count=<optimized out>, this=0x55df51c9c5d0) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:325
            #3  os_event_wait_low (event=0x55df51c9c5d0, reset_sig_count=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:502
            #4  0x000055dee6feddb8 in sync_array_wait_event (arr=0x55deea37b190, cell=@0x7f696e123460: 0x7f858c6db190) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/sync/sync0arr.cc:471
            #5  0x000055dee6f1525f in TTASEventMutex<GenericPolicy>::enter (this=this@entry=0x55dee7d30280 <lock_sys+64>, max_spins=60, max_spins@entry=30, max_delay=max_delay@entry=4, filename=filename@entry=0x55dee74aa480 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.13/storage/innobase/lock/lock0lock.cc", line=line@entry=4170) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/ib0mutex.h:477
            #6  0x000055dee6f2750b in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (n_spins=30, n_delay=4, line=line@entry=4170, name=0x55dee74aa480 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.13/storage/innobase/lock/lock0lock.cc", this=0x55dee7d30280 <lock_sys+64>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/ib0mutex.h:594
            #7  0x000055dee6f283a4 in lock_rec_unlock (trx=trx@entry=0x7f6bd00c9148, block=0x7f7fa2f5c0f0, rec=<optimized out>, lock_mode=LOCK_X) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:4170
            #8  0x000055dee6fad626 in row_unlock_for_mysql (prebuilt=0x7f6f882010b8, has_latches_on_recs=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0mysql.cc:2091
            #9  0x000055dee6bdfc60 in mysql_update (thd=thd@entry=0x7f704c285228, table_list=0x7f704c27b190, fields=..., values=..., conds=0x7f704c27bfe0, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=false, found_return=found_return@entry=0x7f696e124560, updated_return=updated_return@entry=0x7f696e124620) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_update.cc:1168
            #10 0x000055dee6b36d15 in mysql_execute_command (thd=thd@entry=0x7f704c285228) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:4376
            #11 0x000055dee6b3c8ab in mysql_parse (thd=thd@entry=0x7f704c285228, rawbuf=rawbuf@entry=0x7f704c27ae30 "UPDATE ..."..., length=length@entry=317, parser_state=parser_state@entry=0x7f696e126350, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:7900
            

            Here local UPDATE waiting lock_sys->mutex on lock_rec_unlock().

            Thread 3484 (Thread 0x7f6968817700 (LWP 54282)):
            #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
            #1  0x000055dee6f5c561 in wait (this=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:158
            #2  wait_low (reset_sig_count=<optimized out>, this=0x55df51c9c5d0) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:325
            #3  os_event_wait_low (event=0x55df51c9c5d0, reset_sig_count=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:502
            #4  0x000055dee6feddb8 in sync_array_wait_event (arr=0x55deea37b190, cell=@0x7f6968813460: 0x7f858c6db410) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/sync/sync0arr.cc:471
            #5  0x000055dee6f1525f in TTASEventMutex<GenericPolicy>::enter (this=this@entry=0x55dee7d30280 <lock_sys+64>, max_spins=60, max_spins@entry=30, max_delay=max_delay@entry=4, filename=filename@entry=0x55dee74aa480 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.13/storage/innobase/lock/lock0lock.cc", line=line@entry=4170) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/ib0mutex.h:477
            #6  0x000055dee6f2750b in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (n_spins=30, n_delay=4, line=line@entry=4170, name=0x55dee74aa480 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.13/storage/innobase/lock/lock0lock.cc", this=0x55dee7d30280 <lock_sys+64>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/ib0mutex.h:594
            #7  0x000055dee6f283a4 in lock_rec_unlock (trx=trx@entry=0x7f6ea427bdd8, block=0x7f7fa2f5d608, rec=<optimized out>, lock_mode=LOCK_X) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:4170
            #8  0x000055dee6fad626 in row_unlock_for_mysql (prebuilt=0x7f6ab8578ec8, has_latches_on_recs=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0mysql.cc:2091
            #9  0x000055dee6bdfc60 in mysql_update (thd=thd@entry=0x55df52f78a68, table_list=0x55df52f87930, fields=..., values=..., conds=0x55df52f88780, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=false, found_return=found_return@entry=0x7f6968814560, updated_return=updated_return@entry=0x7f6968814620) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_update.cc:1168
            #10 0x000055dee6b36d15 in mysql_execute_command (thd=thd@entry=0x55df52f78a68) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:4376
            #11 0x000055dee6b3c8ab in mysql_parse (thd=thd@entry=0x55df52f78a68, rawbuf=rawbuf@entry=0x55df52f875d0 "UPDATE ..."..., length=length@entry=317, parser_state=parser_state@entry=0x7f6968816350, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:7900
            

            This is also local UPDATE waiting for lock_sys->mutex in lock_rec_unlock(). Finally, the crashing thread:

            Thread 1 (Thread 0x7f6a24847700 (LWP 423789)):
            #0  0x00007f884411e9d1 in __pthread_kill (threadid=<optimized out>, signo=11) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:59
            #1  0x000055dee6d3affe in handle_fatal_signal (sig=11) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/signal_handler.cc:343
            #2  <signal handler called>
            #3  lock_rec_get_n_bits (lock=0x0) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/lock0priv.ic:79
            #4  lock_rec_has_to_wait_in_queue (wait_lock=wait_lock@entry=0x7f80d12e3210) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:2050
            #5  0x000055dee6f2847e in lock_rec_unlock (trx=trx@entry=0x7f858c6a69b8, block=<optimized out>, rec=<optimized out>, lock_mode=LOCK_X) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:4214
            #6  0x000055dee6fad626 in row_unlock_for_mysql (prebuilt=0x7f6f50094958, has_latches_on_recs=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0mysql.cc:2091
            #7  0x000055dee6bdfc60 in mysql_update (thd=thd@entry=0x7f6d8c0921d8, table_list=0x7f6d8c0a17c0, fields=..., values=..., conds=0x7f6d8c0a2610, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=false, found_return=found_return@entry=0x7f6a24844560, updated_return=updated_return@entry=0x7f6a24844620) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_update.cc:1168
            #8  0x000055dee6b36d15 in mysql_execute_command (thd=thd@entry=0x7f6d8c0921d8) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:4376
            #9  0x000055dee6b3c8ab in mysql_parse (thd=thd@entry=0x7f6d8c0921d8, rawbuf=rawbuf@entry=0x7f6d8c0a1460 "UPDATE ..."..., length=length@entry=317, parser_state=parser_state@entry=0x7f6a24846350, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:7900
            

            Finds NULL pointer on lock variable. Analysis:

            • Customer is using single-master Galera topology. There is no indication of any applier threads running during crash.
            • Transaction isolation level for crashing trx is READ COMMITTED
            • There is no indication of Galera replay
            • As this is single-master condition in lock_rec_has_to_wait_in_queue() i.e.

              			if (wsrep_thd_is_BF(wait_lock->trx->mysql_thd, FALSE) &&
              			    wsrep_thd_is_BF(lock->trx->mysql_thd, TRUE)) {
              

              is always false.

            • As this is master node this is not lock replication problem
            • At this point I do not see any evidence that this is Galera bug
            • Listed threads are only one's that are interesting at the crash point
            jplindst Jan Lindström (Inactive) added a comment - - edited I made analysis from full stack dump and found out following points: Thread 9222 (Thread 0x7f694c3bf700 (LWP 54589)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x000055dee6f5c561 in wait (this=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:158 #2 wait_low (reset_sig_count=<optimized out>, this=0x55df51c9c5d0) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:325 #3 os_event_wait_low (event=0x55df51c9c5d0, reset_sig_count=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:502 #4 0x000055dee6feddb8 in sync_array_wait_event (arr=0x55deea37b190, cell=@0x7f694c3ba7e0: 0x7f858c6db810) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/sync/sync0arr.cc:471 #5 0x000055dee6f1525f in TTASEventMutex<GenericPolicy>::enter (this=this@entry=0x55dee7d30280 <lock_sys+64>, max_spins=60, max_spins@entry=30, max_delay=max_delay@entry=4, filename=filename@entry=0x55dee74aa480 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.13/storage/innobase/lock/lock0lock.cc", line=line@entry=1937) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/ib0mutex.h:477 #6 0x000055dee6f2750b in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (n_spins=n_spins@entry=30, n_delay=4, line=line@entry=1937, name=0x55dee74aa480 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.13/storage/innobase/lock/lock0lock.cc", this=0x55dee7d30280 <lock_sys+64>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/ib0mutex.h:594 #7 0x000055dee6f2e051 in lock_rec_lock (impl=impl@entry=false, mode=1027, block=0x7f7fa2f61118, heap_no=55, index=index@entry=0x7f6f5009db78, thr=thr@entry=0x7f6b88779e48) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:1937 #8 0x000055dee6f2ebc9 in lock_clust_rec_read_check_and_lock (flags=flags@entry=0, block=<optimized out>, rec=rec@entry=0x7f7fa47e5e4a <Address 0x7f7fa47e5e4a out of bounds>, index=index@entry=0x7f6f5009db78, offsets=<optimized out>, mode=<optimized out>, gap_mode=1024, thr=thr@entry=0x7f6b88779e48) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:5829 #9 0x000055dee6a3d5a0 in sel_set_rec_lock (pcur=<optimized out>, rec=0x7f7fa47e5e4a <Address 0x7f7fa47e5e4a out of bounds>, index=0x7f6f5009db78, offsets=<optimized out>, mode=<optimized out>, type=<optimized out>, thr=0x7f6b88779e48, mtr=0x7f694c3bb870) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0sel.cc:1264 #10 0x000055dee6fc99c9 in row_search_mvcc (buf=<optimized out>, mode=PAGE_CUR_G, prebuilt=0x7f6b88779628, match_mode=<optimized out>, direction=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0sel.cc:5063 #11 0x000055dee6ee91d7 in ha_innobase::general_fetch (this=0x7f6b883c3500, buf=0x7f6b886e9ea0 "\365)\216\005", direction=1, match_mode=0) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/handler/ha_innodb.cc:9544 #12 0x000055dee6d3f411 in handler::ha_rnd_next (this=0x7f6b883c3500, buf=0x7f6b886e9ea0 "\365)\216\005") at /usr/src/debug/MariaDB-10.4.13/src_0/sql/handler.cc:2839 #13 0x000055dee6e6da6c in rr_sequential (info=0x7f694c3bc020) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/records.cc:477 #14 0x000055dee6bdea68 in read_record (this=0x7f694c3bc020) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/records.h:69 #15 mysql_update (thd=thd@entry=0x7f6ef8305878, table_list=0x7f6ef8313f00, fields=..., values=..., conds=0x7f6ef8314d50, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=false, found_return=found_return@entry=0x7f694c3bc560, updated_return=updated_return@entry=0x7f694c3bc620) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_update.cc:975 #16 0x000055dee6b36d15 in mysql_execute_command (thd=thd@entry=0x7f6ef8305878) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:4376 #17 0x000055dee6b3c8ab in mysql_parse (thd=thd@entry=0x7f6ef8305878, rawbuf=rawbuf@entry=0x7f6ef8313ba0 "UPDATE ..."..., length=length@entry=317, parser_state=parser_state@entry=0x7f694c3be350, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:7900 This is local UPDATE waiting lock_sys->mutex in lock_rec_lock(). Thread 6699 (Thread 0x7f696e127700 (LWP 715)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x000055dee6f5c561 in wait (this=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:158 #2 wait_low (reset_sig_count=<optimized out>, this=0x55df51c9c5d0) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:325 #3 os_event_wait_low (event=0x55df51c9c5d0, reset_sig_count=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:502 #4 0x000055dee6feddb8 in sync_array_wait_event (arr=0x55deea37b190, cell=@0x7f696e123460: 0x7f858c6db190) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/sync/sync0arr.cc:471 #5 0x000055dee6f1525f in TTASEventMutex<GenericPolicy>::enter (this=this@entry=0x55dee7d30280 <lock_sys+64>, max_spins=60, max_spins@entry=30, max_delay=max_delay@entry=4, filename=filename@entry=0x55dee74aa480 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.13/storage/innobase/lock/lock0lock.cc", line=line@entry=4170) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/ib0mutex.h:477 #6 0x000055dee6f2750b in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (n_spins=30, n_delay=4, line=line@entry=4170, name=0x55dee74aa480 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.13/storage/innobase/lock/lock0lock.cc", this=0x55dee7d30280 <lock_sys+64>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/ib0mutex.h:594 #7 0x000055dee6f283a4 in lock_rec_unlock (trx=trx@entry=0x7f6bd00c9148, block=0x7f7fa2f5c0f0, rec=<optimized out>, lock_mode=LOCK_X) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:4170 #8 0x000055dee6fad626 in row_unlock_for_mysql (prebuilt=0x7f6f882010b8, has_latches_on_recs=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0mysql.cc:2091 #9 0x000055dee6bdfc60 in mysql_update (thd=thd@entry=0x7f704c285228, table_list=0x7f704c27b190, fields=..., values=..., conds=0x7f704c27bfe0, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=false, found_return=found_return@entry=0x7f696e124560, updated_return=updated_return@entry=0x7f696e124620) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_update.cc:1168 #10 0x000055dee6b36d15 in mysql_execute_command (thd=thd@entry=0x7f704c285228) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:4376 #11 0x000055dee6b3c8ab in mysql_parse (thd=thd@entry=0x7f704c285228, rawbuf=rawbuf@entry=0x7f704c27ae30 "UPDATE ..."..., length=length@entry=317, parser_state=parser_state@entry=0x7f696e126350, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:7900 Here local UPDATE waiting lock_sys->mutex on lock_rec_unlock(). Thread 3484 (Thread 0x7f6968817700 (LWP 54282)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x000055dee6f5c561 in wait (this=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:158 #2 wait_low (reset_sig_count=<optimized out>, this=0x55df51c9c5d0) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:325 #3 os_event_wait_low (event=0x55df51c9c5d0, reset_sig_count=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/os/os0event.cc:502 #4 0x000055dee6feddb8 in sync_array_wait_event (arr=0x55deea37b190, cell=@0x7f6968813460: 0x7f858c6db410) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/sync/sync0arr.cc:471 #5 0x000055dee6f1525f in TTASEventMutex<GenericPolicy>::enter (this=this@entry=0x55dee7d30280 <lock_sys+64>, max_spins=60, max_spins@entry=30, max_delay=max_delay@entry=4, filename=filename@entry=0x55dee74aa480 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.13/storage/innobase/lock/lock0lock.cc", line=line@entry=4170) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/ib0mutex.h:477 #6 0x000055dee6f2750b in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (n_spins=30, n_delay=4, line=line@entry=4170, name=0x55dee74aa480 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.13/storage/innobase/lock/lock0lock.cc", this=0x55dee7d30280 <lock_sys+64>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/ib0mutex.h:594 #7 0x000055dee6f283a4 in lock_rec_unlock (trx=trx@entry=0x7f6ea427bdd8, block=0x7f7fa2f5d608, rec=<optimized out>, lock_mode=LOCK_X) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:4170 #8 0x000055dee6fad626 in row_unlock_for_mysql (prebuilt=0x7f6ab8578ec8, has_latches_on_recs=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0mysql.cc:2091 #9 0x000055dee6bdfc60 in mysql_update (thd=thd@entry=0x55df52f78a68, table_list=0x55df52f87930, fields=..., values=..., conds=0x55df52f88780, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=false, found_return=found_return@entry=0x7f6968814560, updated_return=updated_return@entry=0x7f6968814620) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_update.cc:1168 #10 0x000055dee6b36d15 in mysql_execute_command (thd=thd@entry=0x55df52f78a68) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:4376 #11 0x000055dee6b3c8ab in mysql_parse (thd=thd@entry=0x55df52f78a68, rawbuf=rawbuf@entry=0x55df52f875d0 "UPDATE ..."..., length=length@entry=317, parser_state=parser_state@entry=0x7f6968816350, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:7900 This is also local UPDATE waiting for lock_sys->mutex in lock_rec_unlock(). Finally, the crashing thread: Thread 1 (Thread 0x7f6a24847700 (LWP 423789)): #0 0x00007f884411e9d1 in __pthread_kill (threadid=<optimized out>, signo=11) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:59 #1 0x000055dee6d3affe in handle_fatal_signal (sig=11) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/signal_handler.cc:343 #2 <signal handler called> #3 lock_rec_get_n_bits (lock=0x0) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/include/lock0priv.ic:79 #4 lock_rec_has_to_wait_in_queue (wait_lock=wait_lock@entry=0x7f80d12e3210) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:2050 #5 0x000055dee6f2847e in lock_rec_unlock (trx=trx@entry=0x7f858c6a69b8, block=<optimized out>, rec=<optimized out>, lock_mode=LOCK_X) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/lock/lock0lock.cc:4214 #6 0x000055dee6fad626 in row_unlock_for_mysql (prebuilt=0x7f6f50094958, has_latches_on_recs=<optimized out>) at /usr/src/debug/MariaDB-10.4.13/src_0/storage/innobase/row/row0mysql.cc:2091 #7 0x000055dee6bdfc60 in mysql_update (thd=thd@entry=0x7f6d8c0921d8, table_list=0x7f6d8c0a17c0, fields=..., values=..., conds=0x7f6d8c0a2610, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=false, found_return=found_return@entry=0x7f6a24844560, updated_return=updated_return@entry=0x7f6a24844620) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_update.cc:1168 #8 0x000055dee6b36d15 in mysql_execute_command (thd=thd@entry=0x7f6d8c0921d8) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:4376 #9 0x000055dee6b3c8ab in mysql_parse (thd=thd@entry=0x7f6d8c0921d8, rawbuf=rawbuf@entry=0x7f6d8c0a1460 "UPDATE ..."..., length=length@entry=317, parser_state=parser_state@entry=0x7f6a24846350, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /usr/src/debug/MariaDB-10.4.13/src_0/sql/sql_parse.cc:7900 Finds NULL pointer on lock variable. Analysis: Customer is using single-master Galera topology. There is no indication of any applier threads running during crash. Transaction isolation level for crashing trx is READ COMMITTED There is no indication of Galera replay As this is single-master condition in lock_rec_has_to_wait_in_queue() i.e. if (wsrep_thd_is_BF(wait_lock->trx->mysql_thd, FALSE) && wsrep_thd_is_BF(lock->trx->mysql_thd, TRUE)) { is always false. As this is master node this is not lock replication problem At this point I do not see any evidence that this is Galera bug Listed threads are only one's that are interesting at the crash point
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Marko Mäkelä [ marko ]
            jplindst Jan Lindström (Inactive) made changes -
            Attachment thd.txt [ 53215 ]
            jplindst Jan Lindström (Inactive) made changes -
            Attachment trx.txt [ 53216 ]
            jplindst Jan Lindström (Inactive) made changes -
            Labels need_feedback
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ jplindst ]

            https://github.com/MariaDB/server/commit/fb2fc051004d11b4aeea81683596f505b58cab23

            This is only for requested condition move with additional instrumentation when something goes wrong. I have failed to reproduce the actual case using RQG, mtr and sysbench. At the moment I do not have good ideas how to continue. One possibility is to give special build to customer and see if we still hit same bug. At least in that case we have additional information on error log to analyze.

            jplindst Jan Lindström (Inactive) added a comment - https://github.com/MariaDB/server/commit/fb2fc051004d11b4aeea81683596f505b58cab23 This is only for requested condition move with additional instrumentation when something goes wrong. I have failed to reproduce the actual case using RQG, mtr and sysbench. At the moment I do not have good ideas how to continue. One possibility is to give special build to customer and see if we still hit same bug. At least in that case we have additional information on error log to analyze.
            jplindst Jan Lindström (Inactive) made changes -
            Status In Progress [ 3 ] In Review [ 10002 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Marko Mäkelä [ marko ]

            Some code cleanup in this area is welcome, but I think that more effort is needed to reproduce this error.

            Please revise the patch and request a further review.

            When it comes to the suspicion that the loop would fail to terminate without Galera involvement, I think that it is extremely unlikely. The function lock_rec_unlock() would acquire lock_sys.mutex and trx_t::mutex, find a matching lock, and proceed to release the lock while holding those mutexes. The lock should not possibly disappear while we are holding those mutexes. Besides, locks should only be released by the thread that is executing the transaction, not by other threads. Other threads may only convert implicit locks into explicit ones, on behalf of the lock-holder transaction. That activity is blocked by both said mutexes.

            marko Marko Mäkelä added a comment - Some code cleanup in this area is welcome, but I think that more effort is needed to reproduce this error. Please revise the patch and request a further review. When it comes to the suspicion that the loop would fail to terminate without Galera involvement, I think that it is extremely unlikely. The function lock_rec_unlock() would acquire lock_sys.mutex and trx_t::mutex , find a matching lock, and proceed to release the lock while holding those mutexes. The lock should not possibly disappear while we are holding those mutexes. Besides, locks should only be released by the thread that is executing the transaction, not by other threads. Other threads may only convert implicit locks into explicit ones, on behalf of the lock-holder transaction. That activity is blocked by both said mutexes.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ jplindst ]
            Status In Review [ 10002 ] Stalled [ 10000 ]

            https://github.com/MariaDB/server/commit/c2c4a77004e38bcdb735c330b5b9529c625b372d

            I have run lot of randgen runs on both release build and debug build, but still failed to reproduce this problem (some others I have found and created MDEVs for them).

            jplindst Jan Lindström (Inactive) added a comment - https://github.com/MariaDB/server/commit/c2c4a77004e38bcdb735c330b5b9529c625b372d I have run lot of randgen runs on both release build and debug build, but still failed to reproduce this problem (some others I have found and created MDEVs for them).
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Marko Mäkelä [ marko ]
            Status Stalled [ 10000 ] In Review [ 10002 ]

            This looks mostly OK now. Please revert some unnecessary changes (in particular, the change that would have made lock_trx_handle_wait() dead code), and request one more review.

            marko Marko Mäkelä added a comment - This looks mostly OK now. Please revert some unnecessary changes (in particular, the change that would have made lock_trx_handle_wait() dead code), and request one more review.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ jplindst ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) added a comment - - edited 10.2: https://github.com/MariaDB/server/commit/6830ebe08139fda352573b2f224f3bcec280494f 10.3: https://github.com/MariaDB/server/commit/45093c0da83ecad05414d3092cb637df30361506 10.4: https://github.com/MariaDB/server/commit/2713dacd5d69654536b2a2c03eec693c799f025b 10.5: https://github.com/MariaDB/server/commit/d48ba030015b07d04ba2c78985be2a8208933131

            jplindst the customer is running the debug build since last week. We will try to trigger the crash again. Usually it appears a few hours later some DDL has been run.

            rpizzi Rick Pizzi (Inactive) added a comment - jplindst the customer is running the debug build since last week. We will try to trigger the crash again. Usually it appears a few hours later some DDL has been run.

            There is bigger difference between 10.3 and 10.4 (Galera 4), but all that is on server code.

            jplindst Jan Lindström (Inactive) added a comment - There is bigger difference between 10.3 and 10.4 (Galera 4), but all that is on server code.
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Marko Mäkelä [ marko ]
            Status Stalled [ 10000 ] In Review [ 10002 ]

            I posted a few minor comments to the 10.2 version. OK push to 10.2 after addressing those. I will use the other versions for reference when merging to later branches.

            marko Marko Mäkelä added a comment - I posted a few minor comments to the 10.2 version. OK push to 10.2 after addressing those. I will use the other versions for reference when merging to later branches.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ jplindst ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            issue.field.resolutiondate 2020-09-10 10:19:56.0 2020-09-10 10:19:56.562
            jplindst Jan Lindström (Inactive) made changes -
            Fix Version/s 10.2.34 [ 24505 ]
            Fix Version/s 10.3.25 [ 24506 ]
            Fix Version/s 10.4.15 [ 24507 ]
            Fix Version/s 10.5.6 [ 24508 ]
            Fix Version/s 10.6.0 [ 24431 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.5.7 [ 25019 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.5.6 [ 24508 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.4.16 [ 25020 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.4.15 [ 24507 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.3.26 [ 25021 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.3.25 [ 24506 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.2.35 [ 25022 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.2.34 [ 24505 ]
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 110893 ] MariaDB v4 [ 158061 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 177484 152496

            People

              jplindst Jan Lindström (Inactive)
              rpizzi Rick Pizzi (Inactive)
              Votes:
              5 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.