Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25611

RESET MASTER still causes the server to hang

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.5, 10.6
    • Fix Version/s: 10.5
    • Component/s: Replication
    • Labels:

      Description

      We got a hung server:

      10.6 2ceadb390309f9c29786bb5ec2a6459dfe88344e

      versioning.rpl 'innodb,row,trx_id'       w3 [ fail ]  timeout after 900 seconds
              Test ended at 2021-05-06 15:23:30
       
      Test case timeout after 900 seconds
       
      == D:/winx64-debug/build/mysql-test/var/3/log/rpl.log == 
      include/master-slave.inc
       
       == D:/winx64-debug/build/mysql-test/var/3/tmp/analyze-timeout-mysqld.2.err ==
      SHOW PROCESSLIST;
      Id	User	Host	db	Command	Time	State	Info	Progress
      16	root	localhost:38509	NULL	Query	0	starting	SHOW PROCESSLIST	0.000
       
       == D:/winx64-debug/build/mysql-test/var/3/tmp/analyze-timeout-mysqld.1.err ==
      SHOW PROCESSLIST;
      Id	User	Host	db	Command	Time	State	Info	Progress
      15	root	localhost:37368	test	Query	960	starting	RESET MASTER	0.000
      17	root	localhost:38510	NULL	Query	0	starting	SHOW PROCESSLIST	0.000
       
      mysqltest failed but provided no output
      

      Among the stack traces, no InnoDB code is executing (only buf_flush_page_cleaner() is in an untimed wait). The culprit for the hang seems to be a thread that is executing RESET MASTER:

      10.6 2ceadb390309f9c29786bb5ec2a6459dfe88344e

      server!pthread_cond_wait(struct _RTL_CONDITION_VARIABLE * cond = 0x00007fff`622137a8, struct _RTL_CRITICAL_SECTION * mutex = 0x00007fff`62213778) [D:\winx64-debug\build\src\mysys\my_wincond.c @ 91]
      server!psi_cond_wait(struct st_mysql_cond * that = 0x00007fff`622137a8, struct st_mysql_mutex * mutex = 0x00007fff`62213778, char * file = 0x00007fff`60671dc8 "D:\winx64-debug\build\src\sql\log.cc", unsigned int line = 0x10f8) [D:\winx64-debug\build\src\mysys\my_thr_init.c @ 596]
      server!inline_mysql_cond_wait(struct st_mysql_cond * that = 0x00007fff`622137a8, struct st_mysql_mutex * mutex = 0x00007fff`62213778, char * src_file = 0x00007fff`60671dc8 "D:\winx64-debug\build\src\sql\log.cc", unsigned int src_line = 0x10f8) [D:\winx64-debug\build\src\include\mysql\psi\mysql_thread.h @ 1070]
      server!MYSQL_BIN_LOG::reset_logs(class THD * thd = 0x00000227`de152788, bool create_new_log = true, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\log.cc @ 4345]
      server!reset_master(class THD * thd = 0x00000227`de152788, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\sql_repl.cc @ 3966]
      server!reload_acl_and_cache(class THD * thd = 0x00000227`de152788, unsigned int64 options = 0x80, struct TABLE_LIST * tables = 0x00000000`00000000, int * write_to_binlog = 0x00000063`731edb74) [D:\winx64-debug\build\src\sql\sql_reload.cc @ 362]
      server!mysql_execute_command(class THD * thd = 0x00000227`de152788) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 5452]
      server!mysql_parse(class THD * thd = 0x00000227`de152788, char * rawbuf = 0x00000227`de1ca690 "--- memory read error at address 0x00000227`de1ca690 ---", unsigned int length = 0xc, class Parser_state * parser_state = 0x00000063`731ee950) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 8018]
      server!dispatch_command(enum_server_command command = COM_QUERY (0n3), class THD * thd = 0x00000227`de152788, char * packet = 0x00000227`ddfceec9 "--- memory read error at address 0x00000227`ddfceec9 ---", unsigned int packet_length = 0xc, bool blocking = false) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 1899]
      server!do_command(class THD * thd = 0x00000227`de152788, bool blocking = false) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 1406]
      server!threadpool_process_request(class THD * thd = 0x00000227`de152788) [D:\winx64-debug\build\src\sql\threadpool_common.cc @ 394]
      server!tp_callback(struct TP_connection * c = 0x00000227`ddf80e30) [D:\winx64-debug\build\src\sql\threadpool_common.cc @ 203]
      server!tp_callback(struct _TP_CALLBACK_INSTANCE * instance = 0x00000063`731ef968, void * context = 0x00000227`ddf80e30) [D:\winx64-debug\build\src\sql\threadpool_win.cc @ 279]
      server!io_completion_callback(struct _TP_CALLBACK_INSTANCE * instance = 0x00000063`731ef968, void * context = 0x00000227`ddf80e30, void * overlapped = 0x00000227`ddf80ea0, unsigned long io_result = 0, unsigned int64 nbytes = 0x11, struct _TP_IO * io = 0x00000227`de05a5e0) [D:\winx64-debug\build\src\sql\threadpool_win.cc @ 299]
      

      I think that also 10.5 must be affected by this. On older server versions, it is probably harder to reproduce this, because we never observed MDEV-24302 on older versions than 10.5. This could correlate with scheduling improvements in 10.5 (MDEV-23855) and 10.6 (MDEV-20612, MDEV-25404); earlier, there could have been more scattered calls to write InnoDB log that would wake up RESET MASTER from its wait.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Elkin Andrei Elkin
              Reporter:
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated: