Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29575

Access to innodb_trx, innodb_locks and innodb_lock_waits along with detached XA's can cause SIGSEGV

Details

    Description

      trx->mysql_thd can be zeroed-out between thd_get_thread_id() and thd_query_safe() calls in fill_trx_row(). trx_disconnect_prepared() zeroes out trx->mysql_thd. And this can cause null pointer dereferencing in fill_trx_row().

      The bug can be reproduced with the following new sync point:

      diff --git a/storage/innobase/trx/trx0i_s.cc b/storage/innobase/trx/trx0i_s.cc
      index 2dc39118d3d..c2cc8c970b0 100644
      --- a/storage/innobase/trx/trx0i_s.cc
      +++ b/storage/innobase/trx/trx0i_s.cc
      @@ -461,6 +461,8 @@ fill_trx_row(
              row->trx_mysql_thread_id = thd_get_thread_id(trx->mysql_thd);
       
              char    query[TRX_I_S_TRX_QUERY_MAX_LEN + 1];
      +       ut_d(if (trx->state == TRX_STATE_PREPARED)
      +           DEBUG_SYNC_C("fill_trx_row_before_query_request"));
              if (size_t stmt_len = thd_query_safe(trx->mysql_thd, query,
                                                   sizeof query)) {
                      row->trx_query = static_cast<const char*>(
      diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
      index 3b19d213d5a..92bf9de375a 100644
      --- a/storage/innobase/trx/trx0trx.cc
      +++ b/storage/innobase/trx/trx0trx.cc
      @@ -550,6 +550,7 @@ void trx_disconnect_prepared(trx_t *trx)
         trx->read_view.close();
         trx->is_recovered= true;
         trx->mysql_thd= NULL;
      +  DEBUG_SYNC_C("trx_disconnect_prepared_reset_thd");
         /* todo/fixme: suggest to do it at innodb prepare */
         trx->will_lock= false;
         trx_sys.rw_trx_hash.put_pins(trx);
      

      and the following test case:

      --source include/have_innodb.inc                                                
      --source include/have_debug.inc                                                 
      --source include/have_debug_sync.inc                                            
      --source include/count_sessions.inc                                             
                                                                                      
      --connection default                                                            
      create table t (a int) engine=innodb;                                           
      insert into t values(1);                                                        
                                                                                      
      --connect (con_xa, localhost, root,,)                                           
      SET DEBUG_SYNC="trx_disconnect_prepared_reset_thd SIGNAL thd_reset";            
      xa start '1';                                                                   
      insert into t values(1);                                                        
      xa end '1';                                                                     
      xa prepare '1';                                                                 
                                                                                      
      --connection default                                                            
      SET DEBUG_SYNC="fill_trx_row_before_query_request SIGNAL reached WAIT_FOR fill_row_cont";
      --send select * from information_schema.innodb_trx;                             
                                                                                      
      --connect (con_sync, localhost, root,,)                                         
      SET DEBUG_SYNC="now WAIT_FOR reached";                                          
      --disconnect con_xa                                                             
      SET DEBUG_SYNC="now WAIT_FOR thd_reset";                                        
      SET DEBUG_SYNC="now SIGNAL fill_row_cont";                                      
      --disconnect con_sync                                                           
                                                                                      
      --connection default                                                            
      --disable_result_log                                                            
      # Must crash here with SIGSEGV if not fixed                                     
      --reap;                                                                         
      --enable_result_log                                                             
      xa commit '1';                                                                  
      drop table t;                                                                   
      SET DEBUG_SYNC="RESET";                                                         
      --source include/wait_until_count_sessions.inc      
      

      It does not affect 10.3 as 10.3 does not detach XA on disconnection (compare THD::cleanup() in 10.3 and 10.4+ and see trans_xa_detach() in 10.4+ for details).

      Until MDEV-29368 is fixed the workaround is not to use innodb_trx, innodb_locks and innodb_lock_waits from information_schema along with detached XA's.

      Attachments

        Issue Links

          Activity

            vlad.lesin Vladislav Lesin created issue -
            vlad.lesin Vladislav Lesin made changes -
            Field Original Value New Value
            vlad.lesin Vladislav Lesin made changes -
            vlad.lesin Vladislav Lesin made changes -
            vlad.lesin Vladislav Lesin made changes -
            julien.fritsch Julien Fritsch made changes -
            Assignee Vladislav Lesin [ vlad.lesin ]
            julien.fritsch Julien Fritsch made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            vlad.lesin Vladislav Lesin made changes -
            vlad.lesin Vladislav Lesin made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            vlad.lesin Vladislav Lesin made changes -
            Description trx->mysql_thd can be zeroed-out between thd_get_thread_id() and thd_query_safe() calls in fill_trx_row(). trx_disconnect_prepared() zeroes out trx->mysql_thd. And this can cause null pointer dereferencing in fill_trx_row().

            The bug can be reproduced with the following new sync point:

            {code:java}
            --- a/storage/innobase/trx/trx0i_s.cc
            +++ b/storage/innobase/trx/trx0i_s.cc
            @@ -457,6 +457,7 @@ fill_trx_row(
                    row->trx_mysql_thread_id = thd_get_thread_id(trx->mysql_thd);
             
                    char query[TRX_I_S_TRX_QUERY_MAX_LEN + 1];
            + DEBUG_SYNC_C("fill_trx_row_before_query_safe");
                    if (size_t stmt_len = thd_query_safe(trx->mysql_thd, query,
                                                         sizeof query)) {
                            row->trx_query = static_cast<const char*>(
            {code}

            and the following test case:

            {code:java}
            --source include/have_innodb.inc
            --source include/have_debug.inc
            --source include/have_debug_sync.inc
            --source include/count_sessions.inc
                                                                                            
            create table t1 (a int) engine=innodb;
            insert into t1 values(1);
                                                                                            
            --connect (con_xa, localhost, root,,)
            xa start '1';
            insert into t1 values(1);
            xa end '1';
            xa prepare '1';
                                                                                            
            --connection default
            SET DEBUG_SYNC="fill_trx_row_before_query_safe SIGNAL reached WAIT_FOR cont";
            --send select * from information_schema.innodb_trx;
                                                                                            
            --connect (con_sync, localhost, root,,)
            SET DEBUG_SYNC="now WAIT_FOR reached";
            --disconnect con_xa
            SET DEBUG_SYNC="now SIGNAL cont";
            --disconnect con_sync
                                                                                            
            --connection default
            # Must crash here with SIGSEGV if not fixed
            --reap;
            xa commit '1';
            drop table t;
            --source include/wait_until_count_sessions.inc
            {code}

            Note the above test case can be unstable, as "fill_trx_row_before_query_safe sync" point must wait until trx_disconnect_prepared() zeroes out trx->mysql_thd.

             It does not affect 10.3 as 10.3 does not detach XA on disconnection (compare THD::cleanup() in 10.3 and 10.4+ and see trans_xa_detach() in 10.4+ for details).

            Until MDEV-29368 is fixed the workaround is not to use innodb_trx, innodb_locks and innodb_lock_waits from information_schema along with detached XA's.
            trx->mysql_thd can be zeroed-out between thd_get_thread_id() and thd_query_safe() calls in fill_trx_row(). trx_disconnect_prepared() zeroes out trx->mysql_thd. And this can cause null pointer dereferencing in fill_trx_row().

            The bug can be reproduced with the following new sync point:

            {code:java}
            diff --git a/storage/innobase/trx/trx0i_s.cc b/storage/innobase/trx/trx0i_s.cc
            index 2dc39118d3d..c2cc8c970b0 100644
            --- a/storage/innobase/trx/trx0i_s.cc
            +++ b/storage/innobase/trx/trx0i_s.cc
            @@ -461,6 +461,8 @@ fill_trx_row(
                    row->trx_mysql_thread_id = thd_get_thread_id(trx->mysql_thd);
             
                    char query[TRX_I_S_TRX_QUERY_MAX_LEN + 1];
            + ut_d(if (trx->state == TRX_STATE_PREPARED)
            + DEBUG_SYNC_C("fill_trx_row_before_query_request"));
                    if (size_t stmt_len = thd_query_safe(trx->mysql_thd, query,
                                                         sizeof query)) {
                            row->trx_query = static_cast<const char*>(
            diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
            index 3b19d213d5a..92bf9de375a 100644
            --- a/storage/innobase/trx/trx0trx.cc
            +++ b/storage/innobase/trx/trx0trx.cc
            @@ -550,6 +550,7 @@ void trx_disconnect_prepared(trx_t *trx)
               trx->read_view.close();
               trx->is_recovered= true;
               trx->mysql_thd= NULL;
            + DEBUG_SYNC_C("trx_disconnect_prepared_reset_thd");
               /* todo/fixme: suggest to do it at innodb prepare */
               trx->will_lock= false;
               trx_sys.rw_trx_hash.put_pins(trx);
            {code}

            and the following test case:

            {code:java}
            --source include/have_innodb.inc
            --source include/have_debug.inc
            --source include/have_debug_sync.inc
            --source include/count_sessions.inc
                                                                                            
            --connection default
            create table t (a int) engine=innodb;
            insert into t values(1);
                                                                                            
            --connect (con_xa, localhost, root,,)
            SET DEBUG_SYNC="trx_disconnect_prepared_reset_thd SIGNAL thd_reset";
            xa start '1';
            insert into t values(1);
            xa end '1';
            xa prepare '1';
                                                                                            
            --connection default
            SET DEBUG_SYNC="fill_trx_row_before_query_request SIGNAL reached WAIT_FOR fill_row_cont";
            --send select * from information_schema.innodb_trx;
                                                                                            
            --connect (con_sync, localhost, root,,)
            SET DEBUG_SYNC="now WAIT_FOR reached";
            --disconnect con_xa
            SET DEBUG_SYNC="now WAIT_FOR thd_reset";
            SET DEBUG_SYNC="now SIGNAL fill_row_cont";
            --disconnect con_sync
                                                                                            
            --connection default
            --disable_result_log
            # Must crash here with SIGSEGV if not fixed
            --reap;
            --enable_result_log
            xa commit '1';
            drop table t;
            SET DEBUG_SYNC="RESET";
            --source include/wait_until_count_sessions.inc
            {code}

             It does not affect 10.3 as 10.3 does not detach XA on disconnection (compare THD::cleanup() in 10.3 and 10.4+ and see trans_xa_detach() in 10.4+ for details).

            Until MDEV-29368 is fixed the workaround is not to use innodb_trx, innodb_locks and innodb_lock_waits from information_schema along with detached XA's.
            vlad.lesin Vladislav Lesin made changes -
            Summary access to innodb_trx, innodb_locks and innodb_lock_waits along with detached XA's can cause SIGSEGV Access to innodb_trx, innodb_locks and innodb_lock_waits along with detached XA's can cause SIGSEGV
            vlad.lesin Vladislav Lesin made changes -
            vlad.lesin Vladislav Lesin made changes -
            Fix Version/s 10.3 [ 22126 ]
            vlad.lesin Vladislav Lesin made changes -
            Fix Version/s 10.3.37 [ 28404 ]
            Fix Version/s 10.4.27 [ 28405 ]
            Fix Version/s 10.5.18 [ 28421 ]
            Fix Version/s 10.6.11 [ 28441 ]
            Fix Version/s 10.7.7 [ 28442 ]
            Fix Version/s 10.8.6 [ 28443 ]
            Fix Version/s 10.9.4 [ 28444 ]
            Fix Version/s 10.10.2 [ 28410 ]
            Fix Version/s 10.11.1 [ 28454 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 147679

            People

              vlad.lesin Vladislav Lesin
              vlad.lesin Vladislav Lesin
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.