Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24302

RESET MASTER hangs as Innodb does not report on binlog checkpoint

Details

    Description

      The failed one in http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/22164
      commit 657fcdf430f39a3103dff51a6a2b2bd3 with the following stack trace

      server!inline_mysql_cond_wait(struct st_mysql_cond * that = 0x00007ffb`ec653b68, struct st_mysql_mutex * mutex = 0x00007ffb`ec653b38, char * src_file = 0x00007ffb`ea98b140 "D:\winx64-debug\build\src\sql\log.cc", unsigned int src_line = 0x10f8) [D:\winx64-debug\build\src\include\mysql\psi\mysql_thread.h @ 1222]
      server!MYSQL_BIN_LOG::reset_logs(class THD * thd = 0x00000165`d43d15c8, bool create_new_log = true, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\log.cc @ 4345]
      server!reset_master(class THD * thd = 0x00000165`d43d15c8, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\sql_repl.cc @ 3966]
      server!reload_acl_and_cache(class THD * thd = 0x00000165`d43d15c8, unsigned int64 options = 0x80, struct TABLE_LIST * tables = 0x00000000`00000000, int * write_to_binlog = 0x0000002a`c42fdc44) [D:\winx64-debug\build\src\sql\sql_reload.cc @ 362]
      server!mysql_execute_command(class THD * thd = 0x00000165`d43d15c8) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 5480]
      server!mysql_parse(class THD * thd = 0x00000165`d43d15c8, char * rawbuf = 0x00000165`d44da280 "--- memory read error at address 0x00000165`d44da280 ---", unsigned int length = 0xc, class Parser_state * parser_state = 
      

      As it was never observed before the extra to 10.5 commits
      need examination to clear out
      possibility of missed out commit_checkpoint_notify_ha() invocation from innobase_mysql_log_notify() which is one of possibilities.
      657fcdf430f and 7b1252c03d7 are rated as potentially relevant to the failure showdown.

      A similar failure

      main.mysqldump-max 'innodb'              w2 [ fail ]  timeout after 900 seconds
      

      exists in http://buildbot.askmonty.org/buildbot/builders/win32-debug/builds/18570/steps/test/logs/stdio

      Upon more analysis the most probable suspect

      --- a/storage/innobase/handler/ha_innodb.cc
      +++ b/storage/innobase/handler/ha_innodb.cc
      @@ -4444,12 +4444,6 @@ innobase_mysql_log_notify(
       	struct pending_checkpoint *	entry;
       	struct pending_checkpoint *	last_ready;
       
      -	/* It is safe to do a quick check for NULL first without lock.
      -	Even if we should race, we will at most skip one checkpoint and
      -	take the next one, which is harmless. */
      -	if (!pending_checkpoint_list)
      -		return;
      -
      

      is identified. , to eliminate (the pasted block is from being tested fixes). Removal won't hurt performance in normal cases when binlog rotation is not frequent event (say not few in a second).

      Attachments

        Issue Links

          Activity

            Elkin Andrei Elkin created issue -
            Elkin Andrei Elkin made changes -
            Field Original Value New Value
            Assignee Andrei Elkin [ elkin ]
            Elkin Andrei Elkin made changes -
            Description The failed one in http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/22164
            commit {{657fcdf430f39a3103dff51a6a2b2bd3}} with the following stack trace


            {noformat}
            server!inline_mysql_cond_wait(struct st_mysql_cond * that = 0x00007ffb`ec653b68, struct st_mysql_mutex * mutex = 0x00007ffb`ec653b38, char * src_file = 0x00007ffb`ea98b140 "D:\winx64-debug\build\src\sql\log.cc", unsigned int src_line = 0x10f8) [D:\winx64-debug\build\src\include\mysql\psi\mysql_thread.h @ 1222]
            server!MYSQL_BIN_LOG::reset_logs(class THD * thd = 0x00000165`d43d15c8, bool create_new_log = true, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\log.cc @ 4345]
            server!reset_master(class THD * thd = 0x00000165`d43d15c8, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\sql_repl.cc @ 3966]
            server!reload_acl_and_cache(class THD * thd = 0x00000165`d43d15c8, unsigned int64 options = 0x80, struct TABLE_LIST * tables = 0x00000000`00000000, int * write_to_binlog = 0x0000002a`c42fdc44) [D:\winx64-debug\build\src\sql\sql_reload.cc @ 362]
            server!mysql_execute_command(class THD * thd = 0x00000165`d43d15c8) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 5480]
            server!mysql_parse(class THD * thd = 0x00000165`d43d15c8, char * rawbuf = 0x00000165`d44da280 "--- memory read error at address 0x00000165`d44da280 ---", unsigned int length = 0xc, class Parser_state * parser_state =
            {noformat}

            As it was never observed before the extra to 10.5 commits
             need examination to clear out
            possibility of missed out {{commit_checkpoint_notify_ha()}} invocation from {{innobase_mysql_log_notify()}} which is one of possibilities.
            657fcdf430f and 7b1252c03d7 are rated as potentially relevant to the failure showdown.
            The failed one in http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/22164
            commit {{657fcdf430f39a3103dff51a6a2b2bd3}} with the following stack trace


            {noformat}
            server!inline_mysql_cond_wait(struct st_mysql_cond * that = 0x00007ffb`ec653b68, struct st_mysql_mutex * mutex = 0x00007ffb`ec653b38, char * src_file = 0x00007ffb`ea98b140 "D:\winx64-debug\build\src\sql\log.cc", unsigned int src_line = 0x10f8) [D:\winx64-debug\build\src\include\mysql\psi\mysql_thread.h @ 1222]
            server!MYSQL_BIN_LOG::reset_logs(class THD * thd = 0x00000165`d43d15c8, bool create_new_log = true, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\log.cc @ 4345]
            server!reset_master(class THD * thd = 0x00000165`d43d15c8, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\sql_repl.cc @ 3966]
            server!reload_acl_and_cache(class THD * thd = 0x00000165`d43d15c8, unsigned int64 options = 0x80, struct TABLE_LIST * tables = 0x00000000`00000000, int * write_to_binlog = 0x0000002a`c42fdc44) [D:\winx64-debug\build\src\sql\sql_reload.cc @ 362]
            server!mysql_execute_command(class THD * thd = 0x00000165`d43d15c8) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 5480]
            server!mysql_parse(class THD * thd = 0x00000165`d43d15c8, char * rawbuf = 0x00000165`d44da280 "--- memory read error at address 0x00000165`d44da280 ---", unsigned int length = 0xc, class Parser_state * parser_state =
            {noformat}

            As it was never observed before the extra to 10.5 commits
             need examination to clear out
            possibility of missed out {{commit_checkpoint_notify_ha()}} invocation from {{innobase_mysql_log_notify()}} which is one of possibilities.
            657fcdf430f and 7b1252c03d7 are rated as potentially relevant to the failure showdown.

            A similar failure

            {noformat}
            main.mysqldump-max 'innodb' w2 [ fail ] timeout after 900 seconds
            {noformat}

            exists in http://buildbot.askmonty.org/buildbot/builders/win32-debug/builds/18570/steps/test/logs/stdio
            Elkin Andrei Elkin made changes -
            Description The failed one in http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/22164
            commit {{657fcdf430f39a3103dff51a6a2b2bd3}} with the following stack trace


            {noformat}
            server!inline_mysql_cond_wait(struct st_mysql_cond * that = 0x00007ffb`ec653b68, struct st_mysql_mutex * mutex = 0x00007ffb`ec653b38, char * src_file = 0x00007ffb`ea98b140 "D:\winx64-debug\build\src\sql\log.cc", unsigned int src_line = 0x10f8) [D:\winx64-debug\build\src\include\mysql\psi\mysql_thread.h @ 1222]
            server!MYSQL_BIN_LOG::reset_logs(class THD * thd = 0x00000165`d43d15c8, bool create_new_log = true, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\log.cc @ 4345]
            server!reset_master(class THD * thd = 0x00000165`d43d15c8, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\sql_repl.cc @ 3966]
            server!reload_acl_and_cache(class THD * thd = 0x00000165`d43d15c8, unsigned int64 options = 0x80, struct TABLE_LIST * tables = 0x00000000`00000000, int * write_to_binlog = 0x0000002a`c42fdc44) [D:\winx64-debug\build\src\sql\sql_reload.cc @ 362]
            server!mysql_execute_command(class THD * thd = 0x00000165`d43d15c8) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 5480]
            server!mysql_parse(class THD * thd = 0x00000165`d43d15c8, char * rawbuf = 0x00000165`d44da280 "--- memory read error at address 0x00000165`d44da280 ---", unsigned int length = 0xc, class Parser_state * parser_state =
            {noformat}

            As it was never observed before the extra to 10.5 commits
             need examination to clear out
            possibility of missed out {{commit_checkpoint_notify_ha()}} invocation from {{innobase_mysql_log_notify()}} which is one of possibilities.
            657fcdf430f and 7b1252c03d7 are rated as potentially relevant to the failure showdown.

            A similar failure

            {noformat}
            main.mysqldump-max 'innodb' w2 [ fail ] timeout after 900 seconds
            {noformat}

            exists in http://buildbot.askmonty.org/buildbot/builders/win32-debug/builds/18570/steps/test/logs/stdio
            The failed one in http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/22164
            commit {{657fcdf430f39a3103dff51a6a2b2bd3}} with the following stack trace


            {noformat}
            server!inline_mysql_cond_wait(struct st_mysql_cond * that = 0x00007ffb`ec653b68, struct st_mysql_mutex * mutex = 0x00007ffb`ec653b38, char * src_file = 0x00007ffb`ea98b140 "D:\winx64-debug\build\src\sql\log.cc", unsigned int src_line = 0x10f8) [D:\winx64-debug\build\src\include\mysql\psi\mysql_thread.h @ 1222]
            server!MYSQL_BIN_LOG::reset_logs(class THD * thd = 0x00000165`d43d15c8, bool create_new_log = true, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\log.cc @ 4345]
            server!reset_master(class THD * thd = 0x00000165`d43d15c8, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\sql_repl.cc @ 3966]
            server!reload_acl_and_cache(class THD * thd = 0x00000165`d43d15c8, unsigned int64 options = 0x80, struct TABLE_LIST * tables = 0x00000000`00000000, int * write_to_binlog = 0x0000002a`c42fdc44) [D:\winx64-debug\build\src\sql\sql_reload.cc @ 362]
            server!mysql_execute_command(class THD * thd = 0x00000165`d43d15c8) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 5480]
            server!mysql_parse(class THD * thd = 0x00000165`d43d15c8, char * rawbuf = 0x00000165`d44da280 "--- memory read error at address 0x00000165`d44da280 ---", unsigned int length = 0xc, class Parser_state * parser_state =
            {noformat}

            As it was never observed before the extra to 10.5 commits
             need examination to clear out
            possibility of missed out {{commit_checkpoint_notify_ha()}} invocation from {{innobase_mysql_log_notify()}} which is one of possibilities.
            657fcdf430f and 7b1252c03d7 are rated as potentially relevant to the failure showdown.

            A similar failure

            {noformat}
            main.mysqldump-max 'innodb' w2 [ fail ] timeout after 900 seconds
            {noformat}

            exists in http://buildbot.askmonty.org/buildbot/builders/win32-debug/builds/18570/steps/test/logs/stdio

            Upon more analysis the most probable suspect

            {code:java}
            --- a/storage/innobase/handler/ha_innodb.cc
            +++ b/storage/innobase/handler/ha_innodb.cc
            @@ -4444,12 +4444,6 @@ innobase_mysql_log_notify(
              struct pending_checkpoint * entry;
              struct pending_checkpoint * last_ready;
             
            - /* It is safe to do a quick check for NULL first without lock.
            - Even if we should race, we will at most skip one checkpoint and
            - take the next one, which is harmless. */
            - if (!pending_checkpoint_list)
            - return;
            -
            {code}
            is identified, to eliminate (the pasted block is from being tested fixes). Removal won't hurt performance in normal cases
            when binlog rotation is not frequent event (say not few in a second).

            alice Alice Sherepa made changes -
            Issue Type Task [ 3 ] Bug [ 1 ]
            alice Alice Sherepa made changes -
            Affects Version/s 10.5 [ 23123 ]
            Affects Version/s 10.6 [ 24028 ]
            marko Marko Mäkelä made changes -
            Elkin Andrei Elkin made changes -
            Description The failed one in http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/22164
            commit {{657fcdf430f39a3103dff51a6a2b2bd3}} with the following stack trace


            {noformat}
            server!inline_mysql_cond_wait(struct st_mysql_cond * that = 0x00007ffb`ec653b68, struct st_mysql_mutex * mutex = 0x00007ffb`ec653b38, char * src_file = 0x00007ffb`ea98b140 "D:\winx64-debug\build\src\sql\log.cc", unsigned int src_line = 0x10f8) [D:\winx64-debug\build\src\include\mysql\psi\mysql_thread.h @ 1222]
            server!MYSQL_BIN_LOG::reset_logs(class THD * thd = 0x00000165`d43d15c8, bool create_new_log = true, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\log.cc @ 4345]
            server!reset_master(class THD * thd = 0x00000165`d43d15c8, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\sql_repl.cc @ 3966]
            server!reload_acl_and_cache(class THD * thd = 0x00000165`d43d15c8, unsigned int64 options = 0x80, struct TABLE_LIST * tables = 0x00000000`00000000, int * write_to_binlog = 0x0000002a`c42fdc44) [D:\winx64-debug\build\src\sql\sql_reload.cc @ 362]
            server!mysql_execute_command(class THD * thd = 0x00000165`d43d15c8) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 5480]
            server!mysql_parse(class THD * thd = 0x00000165`d43d15c8, char * rawbuf = 0x00000165`d44da280 "--- memory read error at address 0x00000165`d44da280 ---", unsigned int length = 0xc, class Parser_state * parser_state =
            {noformat}

            As it was never observed before the extra to 10.5 commits
             need examination to clear out
            possibility of missed out {{commit_checkpoint_notify_ha()}} invocation from {{innobase_mysql_log_notify()}} which is one of possibilities.
            657fcdf430f and 7b1252c03d7 are rated as potentially relevant to the failure showdown.

            A similar failure

            {noformat}
            main.mysqldump-max 'innodb' w2 [ fail ] timeout after 900 seconds
            {noformat}

            exists in http://buildbot.askmonty.org/buildbot/builders/win32-debug/builds/18570/steps/test/logs/stdio

            Upon more analysis the most probable suspect

            {code:java}
            --- a/storage/innobase/handler/ha_innodb.cc
            +++ b/storage/innobase/handler/ha_innodb.cc
            @@ -4444,12 +4444,6 @@ innobase_mysql_log_notify(
              struct pending_checkpoint * entry;
              struct pending_checkpoint * last_ready;
             
            - /* It is safe to do a quick check for NULL first without lock.
            - Even if we should race, we will at most skip one checkpoint and
            - take the next one, which is harmless. */
            - if (!pending_checkpoint_list)
            - return;
            -
            {code}
            is identified, to eliminate (the pasted block is from being tested fixes). Removal won't hurt performance in normal cases
            when binlog rotation is not frequent event (say not few in a second).

            The failed one in http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/22164
            commit {{657fcdf430f39a3103dff51a6a2b2bd3}} with the following stack trace


            {noformat}
            server!inline_mysql_cond_wait(struct st_mysql_cond * that = 0x00007ffb`ec653b68, struct st_mysql_mutex * mutex = 0x00007ffb`ec653b38, char * src_file = 0x00007ffb`ea98b140 "D:\winx64-debug\build\src\sql\log.cc", unsigned int src_line = 0x10f8) [D:\winx64-debug\build\src\include\mysql\psi\mysql_thread.h @ 1222]
            server!MYSQL_BIN_LOG::reset_logs(class THD * thd = 0x00000165`d43d15c8, bool create_new_log = true, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\log.cc @ 4345]
            server!reset_master(class THD * thd = 0x00000165`d43d15c8, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\sql_repl.cc @ 3966]
            server!reload_acl_and_cache(class THD * thd = 0x00000165`d43d15c8, unsigned int64 options = 0x80, struct TABLE_LIST * tables = 0x00000000`00000000, int * write_to_binlog = 0x0000002a`c42fdc44) [D:\winx64-debug\build\src\sql\sql_reload.cc @ 362]
            server!mysql_execute_command(class THD * thd = 0x00000165`d43d15c8) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 5480]
            server!mysql_parse(class THD * thd = 0x00000165`d43d15c8, char * rawbuf = 0x00000165`d44da280 "--- memory read error at address 0x00000165`d44da280 ---", unsigned int length = 0xc, class Parser_state * parser_state =
            {noformat}

            As it was never observed before the extra to 10.5 commits
             need examination to clear out
            possibility of missed out {{commit_checkpoint_notify_ha()}} invocation from {{innobase_mysql_log_notify()}} which is one of possibilities.
            657fcdf430f and 7b1252c03d7 are rated as potentially relevant to the failure showdown.

            A similar failure

            {noformat}
            main.mysqldump-max 'innodb' w2 [ fail ] timeout after 900 seconds
            {noformat}

            exists in http://buildbot.askmonty.org/buildbot/builders/win32-debug/builds/18570/steps/test/logs/stdio

            Upon more analysis the most probable suspect

            {code:java}
            --- a/storage/innobase/handler/ha_innodb.cc
            +++ b/storage/innobase/handler/ha_innodb.cc
            @@ -4444,12 +4444,6 @@ innobase_mysql_log_notify(
              struct pending_checkpoint * entry;
              struct pending_checkpoint * last_ready;
             
            - /* It is safe to do a quick check for NULL first without lock.
            - Even if we should race, we will at most skip one checkpoint and
            - take the next one, which is harmless. */
            - if (!pending_checkpoint_list)
            - return;
            -
            {code}
            is identified. -, to eliminate (the pasted block is from being tested fixes). Removal won't hurt performance in normal cases when binlog rotation is not frequent event (say not few in a second).-

            Elkin Andrei Elkin made changes -
            Elkin Andrei Elkin made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Vladislav Vaintroub [ wlad ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            marko Marko Mäkelä made changes -
            wlad Vladislav Vaintroub made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            wlad Vladislav Vaintroub made changes -
            Assignee Vladislav Vaintroub [ wlad ] Andrei Elkin [ elkin ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6 [ 24028 ]
            Labels affects-tests
            monty Michael Widenius made changes -
            Priority Major [ 3 ] Blocker [ 1 ]
            Elkin Andrei Elkin made changes -
            Summary RESET MASTER hangs in innodb.group_commit_crash_no_optimize_thread RESET MASTER hangs as Innodb does not report on binlog checkpoint
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Vladislav Vaintroub [ wlad ]
            Status Stalled [ 10000 ] In Review [ 10002 ]
            wlad Vladislav Vaintroub made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            wlad Vladislav Vaintroub made changes -
            Assignee Vladislav Vaintroub [ wlad ] Marko Mäkelä [ marko ]
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2021-03-29 12:50:02.0 2021-03-29 12:50:02.627
            marko Marko Mäkelä made changes -
            Component/s Storage Engine - InnoDB [ 10129 ]
            Fix Version/s 10.5.10 [ 25204 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 116236 ] MariaDB v4 [ 158627 ]

            People

              marko Marko Mäkelä
              Elkin Andrei Elkin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.