Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-40171

Binlog Fatal Error Can Cause Server to Crash

    XMLWordPrintable

Details

    Description

      The server can crash (segfault) if one thread causes a fatal error (e.g.
      a failed rotation due to disk space) while another thread is directly
      binlogging. This is due to a bug in the lock synchronization of
      MYSQL_BIN_LOG::write(), where the final check deciding whether or
      not to binlog because of `is_open()` is not done while holding the
      binlog lock. This allows another thread to come in and modify the binlog
      state (e.g. permantently close it) while the DDL committing thread
      thinks everything is ok. This JIRA highlights that this can cause a
      crash when the binlog is closed, although there are likely other strange
      behaviors that can be attributed to this.

      Reproduction is a bit awkward and requires code modification.

      Test case:

      --source include/have_log_bin.inc
      --source include/have_innodb.inc
       
      --echo #
      --echo # Setup the race condition:
      --echo #  * connection con1 writes a DDL and pauses (sleeps) after the un-locked is_open() check
      --echo #  * connection default triggers the binlog to fatally close
      --echo #
      --connection con1
      --echo # con1 (DDL): pauses inside MYSQL_BIN_LOG::write() after is_open()
      --send create table t_ddl (id int primary key) engine=innodb
       
      --connection default
      --echo # Let con1 reach the pause inside write()
      sleep 1;
       
      --echo # Fatally close the binlog via a failed rotation while con1 is paused
      --error ER_ERROR_ON_WRITE
      FLUSH LOGS;
       
      --echo # CRASH: Resume con1 to binlog, but the log is fatally closed
      --connection con1
      --reap
      

      Code modifications (done in 10.6, though DBUG_EXECUTE_IF was changed to
      DBUG_IF in later versions, so later versions will need to be amended
      appropriately):

      diff --git a/sql/log.cc b/sql/log.cc
      index 962557ad769..19a030b3a86 100644
      --- a/sql/log.cc
      +++ b/sql/log.cc
      @@ -5485,8 +5485,9 @@ int MYSQL_BIN_LOG::new_file_impl()
             r.checksum_alg= relay_log_checksum_alg;
           DBUG_ASSERT(!is_relay_log ||
                       relay_log_checksum_alg != BINLOG_CHECKSUM_ALG_UNDEF);
      -    if (DBUG_EVALUATE_IF("fault_injection_new_file_rotate_event",
      -                         (error= close_on_error= TRUE), FALSE) ||
      +    if ((DBUG_EVALUATE_IF("fault_injection_new_file_rotate_event",
      +                          (error= close_on_error= TRUE), FALSE) ||
      +         ((errno= 2) && (error= close_on_error= TRUE))) ||
               (error= write_event(&r)))
           {
             DBUG_EXECUTE_IF("fault_injection_new_file_rotate_event", errno= 2;);
      @@ -6745,6 +6746,13 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info, my_bool *with_annotate)
         if ((WSREP_EMULATE_BINLOG(thd) &&
              IF_WSREP(thd->wsrep_cs().mode() == wsrep::client_state::m_local, 0)) || is_open())
         {
      +    /*
      +      The above if condition commits us to writing to the binlog, but we are
      +      not holding the binlog lock. Once in this branch, pause (sleep()) to
      +      allow another thread to modify the binlog state from underneath us.
      +    */
      +    sleep(2);
      +
           my_off_t UNINIT_VAR(my_org_b_tell);
       #ifdef HAVE_REPLICATION
      

      with crash output:

      2026-06-26 12:32:13 5 [ERROR] Could not use ./master-bin.000002 for logging (error 2). Turning logging off for the whole duration of the MariaDB server process. To turn it on again: fix the cause, shutdown the MariaDB server and restart it.
      260626 12:32:14 [ERROR] /home/brandon/workspace/server_106/build_opt/sql/mariadbd got signal 11 ;
      Sorry, we probably made a mistake, and this is a bug.
       
      Your assistance in bug reporting will enable us to fix this for the next release.
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs about how to report
      a bug on https://jira.mariadb.org/.
       
      Please include the information from the server start above, to the end of the
      information below.
       
      Server version: 10.6.28-MariaDB-log source revision: d404a47c25b8d5d3b37d9d62c14311934523fd17
       
      The information page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/
      contains instructions to obtain a better version of the backtrace below.
      Following these instructions will help MariaDB developers provide a fix quicker.
       
      Attempting backtrace. Include this in the bug report.
      (note: Retrieving this information may fail)
       
      Thread pointer: 0x76dafc000cd8
      stack_bottom = 0x76db3c489000 thread_stack 0x49000
      ./server_106/build_opt/sql/mariadbd(my_print_stacktrace+0x29)[0x6494cb973c09]
      ./server_106/build_opt/sql/mariadbd(handle_fatal_signal+0x199)[0x6494cb4c8e79]
      .store_rt)[0x76db43045330]
      .naligned-erms.S:461(__memcpy_avx_unaligned_erms)[0x76db43188beb]
      ./server_106/build_opt/sql/mariadbd(_my_b_write+0x5b)[0x6494cb95a22b]
      ./server_106/build_opt/sql/mariadbd(_ZN16Log_event_writer14write_internalEPKhm+0x13)[0x6494cb5cebb3]
      ./server_106/build_opt/sql/mariadbd(_ZN16Log_event_writer12write_headerEPhm+0xe9)[0x6494cb5cee69]
      ./server_106/build_opt/sql/mariadbd(_ZN9Log_event12write_headerEm+0xf3)[0x6494cb5cf0e3]
      ./server_106/build_opt/sql/mariadbd(_ZN14Gtid_log_event5writeEv+0xf1)[0x6494cb5d4a11]
      ./server_106/build_opt/sql/mariadbd(_ZN13MYSQL_BIN_LOG11write_eventEP9Log_eventP17binlog_cache_dataP11st_io_cache+0xb0)[0x6494cb5b77d0]
      ./server_106/build_opt/sql/mariadbd(_ZN13MYSQL_BIN_LOG16write_gtid_eventEP3THDbbybb+0x179)[0x6494cb5b8e09]
      ./server_106/build_opt/sql/mariadbd(_ZN13MYSQL_BIN_LOG5writeEP9Log_eventPc+0x3f3)[0x6494cb5b9823]
      ./server_106/build_opt/sql/mariadbd(_ZN3THD12binlog_queryENS_22enum_binlog_query_typeEPKcmbbbi+0x261)[0x6494cb267cc1]
      ./server_106/build_opt/sql/mariadbd(_Z13write_bin_logP3THDbPKcmb+0xbd)[0x6494cb343bdd]
      ./server_106/build_opt/sql/mariadbd(_Z18mysql_create_tableP3THDP10TABLE_LISTP22Table_specification_stP10Alter_info+0x388)[0x6494cb349228]
      ./server_106/build_opt/sql/mariadbd(_ZN25Sql_cmd_create_table_like7executeEP3THD+0x76a)[0x6494cb357d9a]
      ./server_106/build_opt/sql/mariadbd(_Z21mysql_execute_commandP3THDb+0x7e2)[0x6494cb2ae522]
      ./server_106/build_opt/sql/mariadbd(_Z11mysql_parseP3THDPcjP12Parser_state+0x254)[0x6494cb2aa064]
      ./server_106/build_opt/sql/mariadbd(_Z16dispatch_command19enum_server_commandP3THDPcjb+0xdd3)[0x6494cb2a84f3]
      ./server_106/build_opt/sql/mariadbd(_Z10do_commandP3THDb+0x268)[0x6494cb2aa468]
      ./server_106/build_opt/sql/mariadbd(_Z24do_handle_one_connectionP7CONNECTb+0x14d)[0x6494cb3b59dd]
      ./server_106/build_opt/sql/mariadbd(handle_one_connection+0xc0)[0x6494cb3b57a0]
      ./server_106/build_opt/sql/mariadbd(+0xbb7149)[0x6494cb6e3149]
      nptl/pthread_create.c:447(start_thread)[0x76db4309caa4]
      x86_64/clone3.S:80(clone3)[0x76db43129c6c]
      

      Note on a debug build, the result is an assertion, rather than a
      segfault (version here is 12.3, I think):

      ./server/mysys/mf_iocache.c:631: int _my_b_write(IO_CACHE *, const uchar *, size_t): Assertion `Count >= rest_length' failed.
      260626 12:16:00 [ERROR] /home/brandon/workspace/server/build/sql/mariadbd got signal 6 ;
      Sorry, we probably made a mistake, and this is a bug.
      

      Attachments

        Issue Links

          Activity

            People

              bnestere Brandon Nesterenko
              bnestere Brandon Nesterenko
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - 3d
                  3d
                  Remaining:
                  Time Spent - 3h Remaining Estimate - 2d 5h
                  2d 5h
                  Logged:
                  Time Spent - 3h Remaining Estimate - 2d 5h
                  3h

                  Git Integration

                    Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.