Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24625

Failure to open binary log does not cause a fatal error, but leads to further assorted sporadic problems

    XMLWordPrintable

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.4
    • 10.4
    • Replication
    • None

    Description

      Reported for 10.4 because that's the branch I was dealing with. I don't have information whether other versions behave the same way.

      The initial subject of investigation was a cluster of assorted sporadic assertion failures happening in concurrent tests in a specific environment with binary logging enabled.

      All failures were united by the fact that they happened immediately or very soon after an error like this appeared in the error log:

      2021-01-18 20:22:41 15 [ERROR] Could not use mysql-bin for logging (error 2). Turning logging off for the whole duration of the MariaDB server process. To turn it on again: fix the cause, shutdown the MariaDB server and restart it.
      2021-01-18 20:22:41 15 [ERROR] Could not use ./mysql-bin.000002 for logging (error 0). Turning logging off for the whole duration of the MariaDB server process. To turn it on again: fix the cause, shutdown the MariaDB server and restart it.
      

      The root cause, in this case, turned out to be an error in wolfSSL code which led to a failing call to my_random_bytes in MYSQL_BIN_LOG::open which, in turn, led to the binlog opening error. However, while looking into it, I found the handling of binlog opening errors on MariaDB side be rather questionable.

      The ERROR records are written as a part of error handling in MYSQL_BIN_LOG::open, after which the function returns with non-success.
      Turning binary logging off can be a pretty big deal, people don't enable it without a reason. Yet, MariaDB takes the situation rather lightly. There are about 15 points of failure in MYSQL_BIN_LOG::open alone, so it's far from a unique situation; and all of them seemingly lead to the same outcome, and no fatal errors or even debug assertions happen at this point.

      However, shortly after this various problems start happening. As a side effect, at least the following assertion failures occur sporadically, interchangeably, and sometimes simultaneously:

      mysqld: /data/src/10.4-bug/sql/sql_class.cc:6560: int THD::binlog_write_row(TABLE*, bool, const uchar*): Assertion `((__builtin_expect(((WSREP_ON_) != 0),0) && this->variables.wsrep_on) && wsrep_emulate_bin_log) || mysql_bin_log.is_open()' failed.
       
      #7  0x00007f84d7d4af36 in __GI___assert_fail (assertion=0x55f48fdf7338 "((__builtin_expect(((WSREP_ON_) != 0),0) && this->variables.wsrep_on) && wsrep_emulate_bin_log) || mysql_bin_log.is_open()", file=0x55f48fdf53d0 "/data/src/10.4-bug/sql/sql_class.cc", line=6560, function=0x55f48fdf72b8 "int THD::binlog_write_row(TABLE*, bool, const uchar*)") at assert.c:101
      #8  0x000055f48f09ce82 in THD::binlog_write_row (this=0x7f8450000d90, table=0x7f84500c86a0, is_trans=false, record=0x7f84500ca158 "\200\b") at /data/src/10.4-bug/sql/sql_class.cc:6560
      #9  0x000055f48f48cb39 in Write_rows_log_event::binlog_row_logging_function (thd=0x7f8450000d90, table=0x7f84500c86a0, is_transactional=false, before_record=0x0, after_record=0x7f84500ca158 "\200\b") at /data/src/10.4-bug/sql/log_event.h:4806
      #10 0x000055f48f485005 in binlog_log_row_internal (table=0x7f84500c86a0, before_record=0x0, after_record=0x7f84500ca158 "\200\b", log_func=0x55f48f48ca84 <Write_rows_log_event::binlog_row_logging_function(THD*, TABLE*, bool, unsigned char const*, unsigned char const*)>) at /data/src/10.4-bug/sql/handler.cc:6422
      #11 0x000055f48f485110 in binlog_log_row (table=0x7f84500c86a0, before_record=0x0, after_record=0x7f84500ca158 "\200\b", log_func=0x55f48f48ca84 <Write_rows_log_event::binlog_row_logging_function(THD*, TABLE*, bool, unsigned char const*, unsigned char const*)>) at /data/src/10.4-bug/sql/handler.cc:6442
      #12 0x000055f48f486847 in handler::ha_write_row (this=0x7f84500c9968, buf=0x7f84500ca158 "\200\b") at /data/src/10.4-bug/sql/handler.cc:6760
      #13 0x000055f48f0bfc0b in write_record (thd=0x7f8450000d90, table=0x7f84500c86a0, info=0x7f84d46b1670) at /data/src/10.4-bug/sql/sql_insert.cc:2060
      #14 0x000055f48f0f8120 in read_sep_field (thd=0x7f8450000d90, info=..., table_list=0x7f8450012300, fields_vars=..., set_fields=..., set_values=..., read_info=..., enclosed=..., skip_lines=0, ignore_check_option_errors=true) at /data/src/10.4-bug/sql/sql_load.cc:1164
      #15 0x000055f48f0f65d0 in mysql_load (thd=0x7f8450000d90, ex=0x7f8450012268, table_list=0x7f8450012300, fields_vars=..., set_fields=..., set_values=..., handle_duplicates=DUP_ERROR, ignore=true, read_file_from_client=false) at /data/src/10.4-bug/sql/sql_load.cc:669
      #16 0x000055f48f10d1cc in mysql_execute_command (thd=0x7f8450000d90) at /data/src/10.4-bug/sql/sql_parse.cc:4981
      

      mysqld: /data/src/10.4-bug/mysys/mf_iocache.c:598: _my_b_write: Assertion `Count >= rest_length' failed.
       
      #7  0x00007fb1910def36 in __GI___assert_fail (assertion=0x55d12c50507f "Count >= rest_length", file=0x55d12c504e38 "/data/src/10.4-bug/mysys/mf_iocache.c", line=598, function=0x55d12c5055e0 <__PRETTY_FUNCTION__.14595> "_my_b_write") at assert.c:101
      #8  0x000055d12c05abab in _my_b_write (info=0x55d12cc117c0 <mysql_bin_log+928>, Buffer=0x7fb17974cb90 "\252\304\005`\242o", Count=19) at /data/src/10.4-bug/mysys/mf_iocache.c:598
      #9  0x000055d12c05903f in my_b_write (info=0x55d12cc117c0 <mysql_bin_log+928>, Buffer=0x7fb17974cb90 "\252\304\005`\242o", Count=19) at /data/src/10.4-bug/include/my_sys.h:548
      #10 0x000055d12c05ce6a in my_b_safe_write (info=0x55d12cc117c0 <mysql_bin_log+928>, Buffer=0x7fb17974cb90 "\252\304\005`\242o", Count=19) at /data/src/10.4-bug/mysys/mf_iocache.c:1854
      #11 0x000055d12b926bf9 in Log_event_writer::write_internal (this=0x7fb17974cc80, pos=0x7fb17974cb90 "\252\304\005`\242o", len=19) at /data/src/10.4-bug/sql/log_event.cc:1600
      #12 0x000055d12b926eec in Log_event_writer::encrypt_and_write (this=0x7fb17974cc80, pos=0x7fb17974cb90 "\252\304\005`\242o", len=19) at /data/src/10.4-bug/sql/log_event.cc:1646
      #13 0x000055d12b9271b7 in Log_event_writer::write_header (this=0x7fb17974cc80, pos=0x7fb17974cb90 "\252\304\005`\242o", len=19) at /data/src/10.4-bug/sql/log_event.cc:1689
      #14 0x000055d12b9276cd in Log_event::write_header (this=0x7fb17974cd70, event_data_length=19) at /data/src/10.4-bug/sql/log_event.cc:1782
      #15 0x000055d12b934322 in Gtid_log_event::write (this=0x7fb17974cd70) at /data/src/10.4-bug/sql/log_event.cc:8061
      #16 0x000055d12b6dd371 in Log_event_writer::write (this=0x7fb17974cc80, ev=0x7fb17974cd70) at /data/src/10.4-bug/sql/log_event.h:5236
      #17 0x000055d12b90d95f in MYSQL_BIN_LOG::write_event (this=0x55d12cc11420 <mysql_bin_log>, ev=0x7fb17974cd70, cache_data=0x0, file=0x55d12cc117c0 <mysql_bin_log+928>) at /data/src/10.4-bug/sql/log.cc:5373
      #18 0x000055d12b365ba1 in MYSQL_BIN_LOG::write_event (this=0x55d12cc11420 <mysql_bin_log>, ev=0x7fb17974cd70) at /data/src/10.4-bug/sql/log.h:820
      #19 0x000055d12b9100a4 in MYSQL_BIN_LOG::write_gtid_event (this=0x55d12cc11420 <mysql_bin_log>, thd=0x7fb11c000d90, standalone=true, is_transactional=false, commit_id=0) at /data/src/10.4-bug/sql/log.cc:6100
      #20 0x000055d12b910dd6 in MYSQL_BIN_LOG::write (this=0x55d12cc11420 <mysql_bin_log>, event_info=0x7fb17974d1a0, with_annotate=0x0) at /data/src/10.4-bug/sql/log.cc:6420
      #21 0x000055d12b3fcd7c in THD::binlog_query (this=0x7fb11c000d90, qtype=THD::STMT_QUERY_TYPE, query_arg=0x7fb11c3cbd40 "CREATE DEFINER=`rqg`@`localhost` PROCEDURE IF NOT EXISTS `sp_1068769`()\nBEGIN INSERT INTO seq8 VALUES (-64, 113, 33309, -7346215417171410944, 6, 61, 0, 0) ; FLUSH PRIVILEGES ; END", query_len=179, is_trans=false, direct=false, suppress_use=false, errcode=0) at /data/src/10.4-bug/sql/sql_class.cc:7226
      #22 0x000055d12b98e516 in Sp_handler::sp_create_routine (this=0x55d12ca030f0 <sp_handler_procedure>, thd=0x7fb11c000d90, sp=0x7fb11c66eab8) at /data/src/10.4-bug/sql/sp.cc:1465
      

      mysqld: /data/src/10.4-bug/sql/log.cc:7988: void MYSQL_BIN_LOG::trx_group_commit_leader(MYSQL_BIN_LOG::group_commit_entry*): Assertion `is_open()' failed.
       
      #7  0x00007ff5632cbf36 in __GI___assert_fail (assertion=0x558f7dc4e79a "is_open()", file=0x558f7dc4c82c "/data/src/10.4-bug/sql/log.cc", line=7988, function=0x558f7dc506f0 "void MYSQL_BIN_LOG::trx_group_commit_leader(MYSQL_BIN_LOG::group_commit_entry*)") at assert.c:101
      #8  0x0000558f7d22f42d in MYSQL_BIN_LOG::trx_group_commit_leader (this=0x558f7e52b420 <mysql_bin_log>, leader=0x7ff54c717e80) at /data/src/10.4-bug/sql/log.cc:7988
      #9  0x0000558f7d22ed17 in MYSQL_BIN_LOG::write_transaction_to_binlog_events (this=0x558f7e52b420 <mysql_bin_log>, entry=0x7ff54c717e80) at /data/src/10.4-bug/sql/log.cc:7819
      #10 0x0000558f7d22e066 in MYSQL_BIN_LOG::write_transaction_to_binlog (this=0x558f7e52b420 <mysql_bin_log>, thd=0x7ff4a4000d90, cache_mngr=0x7ff4a4349fe0, end_ev=0x7ff54c718020, all=true, using_stmt_cache=true, using_trx_cache=true) at /data/src/10.4-bug/sql/log.cc:7440
      #11 0x0000558f7d21cdf9 in binlog_flush_cache (thd=0x7ff4a4000d90, cache_mngr=0x7ff4a4349fe0, end_ev=0x7ff54c718020, all=true, using_stmt=true, using_trx=true) at /data/src/10.4-bug/sql/log.cc:1788
      #12 0x0000558f7d21d43d in binlog_commit_flush_xid_caches (thd=0x7ff4a4000d90, cache_mngr=0x7ff4a4349fe0, all=true, xid=3034) at /data/src/10.4-bug/sql/log.cc:1896
      #13 0x0000558f7d234cde in MYSQL_BIN_LOG::log_and_order (this=0x558f7e52b420 <mysql_bin_log>, thd=0x7ff4a4000d90, xid=3034, all=true, need_prepare_ordered=false, need_commit_ordered=true) at /data/src/10.4-bug/sql/log.cc:9779
      #14 0x0000558f7d0ee59e in ha_commit_trans (thd=0x7ff4a4000d90, all=true) at /data/src/10.4-bug/sql/handler.cc:1646
      #15 0x0000558f7cf22719 in trans_commit_implicit (thd=0x7ff4a4000d90) at /data/src/10.4-bug/sql/transaction.cc:301
      #16 0x0000558f7cd80eb6 in mysql_execute_command (thd=0x7ff4a4000d90) at /data/src/10.4-bug/sql/sql_parse.cc:3745
      #17 0x0000558f7cd8f087 in mysql_parse (thd=0x7ff4a4000d90, rawbuf=0x7ff4a4012128 "ALTER /* QNO 476 CON_ID 22 */ ONLINE IGNORE TABLE `t8_Aria` DROP FOREIGN KEY e", length=78, parser_state=0x7ff54c7195a0, is_com_multi=false, is_next_command=false) at /data/src/10.4-bug/sql/sql_parse.cc:7958
      

      mysqld: /home/elenst/src/10.4/sql/log.cc:5976: int MYSQL_BIN_LOG::flush_and_set_pending_rows_event(THD*, Rows_log_event*, bool): Assertion `((thd && (__builtin_expect(((WSREP_ON_) != 0),0) && thd->variables.wsrep_on)) && wsrep_emulate_bin_log) || mysql_bin_log.is_open()' failed.
       
      #7  0x00007f8db228c0d2 in __assert_fail () from /lib64/libc.so.6
      #8  0x00005622dac87d39 in MYSQL_BIN_LOG::flush_and_set_pending_rows_event (this=0x5622dc2cff00 <mysql_bin_log>, thd=0x7f8d08000af0, event=0x7f8d080fd3e0, is_transactional=false) at /home/elenst/src/10.4/sql/log.cc:5976
      #9  0x00005622da78552e in THD::binlog_prepare_pending_rows_event<Write_rows_log_event> (this=0x7f8d08000af0, table=0x7f8d080334c0, serv_id=111, needed=18, is_transactional=false, hint=0x0) at /home/elenst/src/10.4/sql/sql_class.cc:6419
      #10 0x00005622da77d28b in THD::binlog_write_row (this=0x7f8d08000af0, table=0x7f8d080334c0, is_trans=false, record=0x7f8d080f9bb8 "\370\006\002") at /home/elenst/src/10.4/sql/sql_class.cc:6589
      #11 0x00005622dab67d4e in Write_rows_log_event::binlog_row_logging_function (thd=0x7f8d08000af0, table=0x7f8d080334c0, is_transactional=false, before_record=0x0, after_record=0x7f8d080f9bb8 "\370\006\002") at /home/elenst/src/10.4/sql/log_event.h:4806
      #12 0x00005622dab602f3 in binlog_log_row_internal (table=0x7f8d080334c0, before_record=0x0, after_record=0x7f8d080f9bb8 "\370\006\002", log_func=0x5622dab67ca0 <Write_rows_log_event::binlog_row_logging_function(THD*, TABLE*, bool, unsigned char const*, unsigned char const*)>) at /home/elenst/src/10.4/sql/handler.cc:6422
      #13 0x00005622dab603f5 in binlog_log_row (table=0x7f8d080334c0, before_record=0x0, after_record=0x7f8d080f9bb8 "\370\006\002", log_func=0x5622dab67ca0 <Write_rows_log_event::binlog_row_logging_function(THD*, TABLE*, bool, unsigned char const*, unsigned char const*)>) at /home/elenst/src/10.4/sql/handler.cc:6442
      #14 0x00005622dab61c2f in handler::ha_write_row (this=0x7f8d08034328, buf=0x7f8d080f9bb8 "\370\006\002") at /home/elenst/src/10.4/sql/handler.cc:6760
      #15 0x00005622da79f9c1 in write_record (thd=0x7f8d08000af0, table=0x7f8d080334c0, info=0x7f8d9c550630) at /home/elenst/src/10.4/sql/sql_insert.cc:1736
      #16 0x00005622da7d7635 in read_sep_field (thd=0x7f8d08000af0, info=..., table_list=0x7f8d08012060, fields_vars=..., set_fields=..., set_values=..., read_info=..., enclosed=..., skip_lines=0, ignore_check_option_errors=false) at /home/elenst/src/10.4/sql/sql_load.cc:1164
      #17 0x00005622da7d5b09 in mysql_load (thd=0x7f8d08000af0, ex=0x7f8d08011fc8, table_list=0x7f8d08012060, fields_vars=..., set_fields=..., set_values=..., handle_duplicates=DUP_REPLACE, ignore=false, read_file_from_client=false) at /home/elenst/src/10.4/sql/sql_load.cc:671
      

      Probably there are many more of them, and these are just most frequent or most obvious. And sometimes none of these occur, and the concurrent test "successfully" reaches the end.

      It is not in itself surprising that all these things fail when the binary log previously failed to open. However, the behavior is largely inconsistent.


      - if all these are debug-only errors which can never affect release builds, then they are bogus assertions which should be eased;

      - assertions with a complex condition, where the check for binary log being open is only a part of it, should probably be revised and maybe split;

      - assertions which only check that the binary log is open should probably be replaced or at least accompanied by a debug assertion in MYSQL_BIN_LOG::open error handling, so that the initial problem is caught timely and deterministically, instead of relying on aftermath checks;

      - finally, if failures upon these checks indicate a real probable or inevitable non-debug problem, then maybe the whole idea of turning off binary logging at runtime should be reconsidered. After all, it's better if the server shuts down gracefully right upon binlog opening failure, than crash or corrupt data later at random times in an obscure way.

      Attachments

        Issue Links

          Activity

            People

              bnestere Brandon Nesterenko
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.