Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6825

mysqld server "deadlocks" after issuing STOP SLAVE command while replicated DELETE query is executing (freeing items vs killing slave)

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.5.37, 5.5.39
    • Fix Version/s: 5.5.43
    • Environment:

      Description

      Hello,

      This problem I noticed after replacing MySQL 5.5.x with MariaDB 5.5.
      It occured on both MariaDB 5.5.39, 5.5.37 as well as some older MariaDB 5.5 versions.

      It is possible to render MariaDB instance unusable by putting it into some kind of internal "deadlock state":

      How to reproduce:

      1. Use master-slave setup with statement based replication and InnoDB as storage engine.
      2. Wait until slave server replicates and executes a DELETE query replicated from master.
      3. Issue "STOP SLAVE" command on the slave server before execution of the DELETE query is completed by the replication sql_thread.

      Effect:
      MariaDB on slave server will enter "deadlock state" (kind of):

      • STOP SLAVE stalls on State: "Killing slave"
      • DELETE query stalls on State: "Freeing items"
      • CPU load generated by mysqld drops to nearly 0,
      • IO load generated by mysqld drops to 0,

      Last entry in mysq.err:

      140909  3:05:02 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.034913' at position 53766638

      Output head from "SHOW PROCESSLIST" (after "deadlock state" is in place for a few hours here, normally our DELETE queries take much less time to complete):

       
      Id      User    Host    db      Command Time    State   Info    Progress
      24611   system user             NULL    Connect 89203   Waiting for master to send event        NULL    0.000
      24612   system user             NULL    Connect 17161   freeing items   NULL    0.000
      35399   [user1-namechanged]  [ip-removed]:43814    NULL    Query   15421   Killing slave   stop slave      0.000
      35400   [user2-namechanged]       [ip-removed]:50631    NULL    Query   15420   executing       SHOW /*!50002 GLOBAL */ STATUS  0.000
      35401   [user2-namechanged]       [ip-removed]:51717    NULL    Sleep   15396           NULL    0.000
      35402   [user2-namechanged]       [ip-removed]:51718    NULL    Sleep   15371           NULL    0.000
      35403   [user3-namechanged] [ip-removed]:52953    NULL    Query   15347   executing       show global status      0.000
      35404   [user2-namechanged]       [ip-removed]:51719    NULL    Sleep   15346           NULL    0.000
      35405   [user2-namechanged]       [ip-removed]:51720    NULL    Sleep   15321           NULL    0.000

      (full processlist attached)

      Output head from running Poor Man's Profiler on mysqld in the "deadlock state":

          855 __lll_lock_wait,_L_lock_1172,pthread_mutex_lock,inline_mysql_mutex_lock,at,at,one_thread_per_connection_end,do_handle_one_connection,handle_one_connection,start_thread,clone,??
            2 __lll_lock_wait,_L_lock_1172,pthread_mutex_lock,inline_mysql_mutex_lock,at,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread,clone,??
            1 read,vio_read,vio_read_buff,my_real_read,my_net_read,cli_safe_read,handle_slave_io,start_thread,clone,??
            1 pthread_cond_wait@@GLIBC_2.3.2,os_cond_wait,reset_sig_count=5),srv_purge_thread,start_thread,clone,??
            1 pthread_cond_wait@@GLIBC_2.3.2,os_cond_wait,reset_sig_count=43),srv_master_thread,start_thread,clone,??
            1 pthread_cond_timedwait@@GLIBC_2.3.2,os_cond_wait_timed,time_in_usec=<value,srv_monitor_thread,start_thread,clone,??
            1 pthread_cond_timedwait@@GLIBC_2.3.2,os_cond_wait_timed,time_in_usec=<value,srv_LRU_dump_restore_thread,start_thread,clone,??
            1 pthread_cond_timedwait@@GLIBC_2.3.2,os_cond_wait_timed,time_in_usec=<value,srv_lock_timeout_thread,start_thread,clone,??
            1 pthread_cond_timedwait@@GLIBC_2.3.2,os_cond_wait_timed,time_in_usec=<value,srv_error_monitor_thread,start_thread,clone,??
            1 pthread_cond_timedwait@@GLIBC_2.3.2,inline_mysql_cond_timedwait,term_lock=0x7f7a1cb26d48,,terminate_slave_threads,stop_slave,mysql_execute_command,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread,clone,??
            1 pthread_cond_timedwait@@GLIBC_2.3.2,inline_mysql_cond_timedwait,sleep_time=<value,ma_checkpoint_background,start_thread,clone,??
            1 poll,handle_connections_sockets,mysqld_main,__libc_start_main,_start
            1 __lll_lock_wait,_L_lock_1172,pthread_mutex_lock,inline_mysql_mutex_lock,tables=0x7f8e36dd2338,,get_schema_tables_result,JOIN::exec,mysql_select,handle_select,execute_sqlcom_select,execute_show_status,at,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread,clone,??
            1 __lll_lock_wait,_L_lock_1172,pthread_mutex_lock,inline_mysql_mutex_lock,tables=0x7f8e36cd5338,,get_schema_tables_result,JOIN::exec,mysql_select,handle_select,execute_sqlcom_select,execute_show_status,at,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread,clone,??

      (full output attached)

        Attachments

        1. hanged_pmp.txt
          55 kB
        2. hanged_processlist_edited.txt
          55 kB
        3. master1.my.cnf
          1 kB
        4. slave1.my.cnf
          1 kB

          Activity

            People

            • Assignee:
              serg Sergei Golubchik
              Reporter:
              maczal Maciej Zalewski
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: