Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9595

Shutdown takes forever with many replication channels

Details

    Description

      Shutting down MariaDB by SIGTERM when it has many (20) replication channels (with parallel replication enabled) seems never-ending. But if I stop all slaves before shutdown, it takes a few seconds. Moreover, mysqld reports errors on I/O threads.

      I'd expect that server stops all slaves itself.

      # sudo systemctl status mariadb -n 64
      ● mariadb.service - MariaDB server
         Loaded: loaded (/nix/store/4hm49xrn8md0swvc3sm9n2qhafwllgc1-unit-mariadb.service/mariadb.service)
         Active: deactivating (stop-sigterm) since Fri 2016-02-19 21:06:29 SGT; 1h 50min ago
        Process: 3189 ExecStartPre=/nix/store/m8hhisn7gag7iycddw8hflila3gi71h3-unit-script/bin/mariadb-pre-start (code=exited, status=0/SUCCESS)
       Main PID: 3640 (mysqld)
         CGroup: /system.slice/mariadb.service
                 └─3640 /nix/store/z7x0ndrckrsmqr3163hp9kgs9hwags61-mariadb-10.1.11/bin/mysqld --defaults-file=/nix/store/8lgvar6h0696ngpsladkvmqwbd5k6x5v-mysqld.cnf
       
      Feb 19 20:04:23 grumpycat mysqld[3640]: 2016-02-19 20:04:23 139653215652608 [Note] Master 'oms-my': Slave SQL thread initialized, starting replication in l...n: 3761334
      Feb 19 21:06:29 grumpycat systemd[1]: Stopping MariaDB server...
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139652892388096 [Note] /nix/store/z7x0ndrckrsmqr3163hp9kgs9hwags61-mariadb-10.1.11/bin/mysqld: ...l shutdown
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139652892388096 [Note] Event Scheduler: Purging the queue. 0 events
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139653540129536 [Note] Master 'bob-th': Error reading relay log event: slave SQL thread was killed
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139652246162176 [ERROR] Master 'bob-th': Error reading packet from server: Lost connection to M...rrno=2013)
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139652246162176 [Note] Master 'bob-th': Slave I/O thread exiting, read up to log 'mysql-bin-cha...on 3256007
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139653539826432 [Note] Master 'zops': Error reading relay log event: slave SQL thread was killed
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139652894509824 [ERROR] Master 'zops': Error reading packet from server: Lost connection to MyS...rrno=2013)
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139652894509824 [Note] Master 'zops': Slave I/O thread exiting, read up to log 'mysql-bin-chang...sition 422
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139653999532800 [Note] Master 'bob-my': Error reading relay log event: slave SQL thread was killed
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139652570942208 [ERROR] Master 'bob-my': Error reading packet from server: Lost connection to M...rrno=2013)
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139652570942208 [Note] Master 'bob-my': Slave I/O thread exiting, read up to log 'mysql-bin-cha...on 3947621
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139653538310912 [Note] Master 'bob-hk': Error reading relay log event: slave SQL thread was killed
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139652570336000 [ERROR] Master 'bob-hk': Error reading packet from server: Lost connection to M...rrno=2013)
      Feb 19 21:06:29 grumpycat mysqld[3640]: 2016-02-19 21:06:29 139652570336000 [Note] Master 'bob-hk': Slave I/O thread exiting, read up to log 'mysql-bin-cha...on 1303108
      Feb 19 21:06:30 grumpycat mysqld[3640]: 2016-02-19 21:06:30 139653217471232 [Note] Master 'bob-ph': Error reading relay log event: slave SQL thread was killed
      Feb 19 21:06:30 grumpycat mysqld[3640]: 2016-02-19 21:06:30 139652892691200 [ERROR] Master 'bob-ph': Error reading packet from server: Lost connection to M...rrno=2013)
      Feb 19 21:06:30 grumpycat mysqld[3640]: 2016-02-19 21:06:30 139652892691200 [Note] Master 'bob-ph': Slave I/O thread exiting, read up to log 'mysql-bin-cha...on 8157090
      ...

      Server has 30G of RAM, and 4 CPUs. iotop shows ~100K/s read and write for mysqld process

      [mysqld]
      basedir = /nix/store/z7x0ndrckrsmqr3163hp9kgs9hwags61-mariadb-10.1.11
      init_file = /nix/store/71malbvn6kkn29yiwrsalnpirf2b6v62-init
      pid_file = /run/mysqld/mysqld.pid
      plugin_load = unix_socket=auth_socket.so
      datadir = /mariadb/db
      ignore_db_dirs = lost+found
      innodb_buffer_pool_instances = 64
      innodb_buffer_pool_size = 19327352832
      innodb_file_format = barracuda
      innodb_file_per_table = ON
      innodb_flush_log_at_trx_commit = 2
      innodb_flush_method = O_DIRECT
      innodb_lock_wait_timeout = 1800
      innodb_log_file_size = 314572800
      join_buffer_size = 1048576
      log_slave_updates = OFF
      max_allowed_packet = 268435456
      max_connections = 1000
      net_read_timeout = 1000
      net_write_timeout = 1000
      port = 3306
      relay_log = /mariadb/relay/cat-bin
      server_id = 112233
      skip_log_bin
      slave_compressed_protocol = ON
      slave_domain_parallel_threads = 10
      slave_net_timeout = 600
      slave_parallel_max_queued = 8388608
      slave_parallel_threads = 40
      sort_buffer_size = 4194304
      ssl_cert = /nix/store/rmlq71y8fnyx1k7sp3qx55fqwg45pl40-cats-cert.pem
      ssl_key = /run/keys/cats-key.pem
      table_open_cache = 30000
      ; replicate_*
      !include /nix/store/czaiav63ddqrz18sizhfy69jvv04nfcn-mysqld-repl.cnf

      Attachments

        Activity

          ip1981 Igor Pashev created issue -
          elenst Elena Stepanova made changes -
          Field Original Value New Value
          Component/s Replication [ 10100 ]
          Fix Version/s 10.1 [ 16100 ]
          Assignee Kristian Nielsen [ knielsen ]

          Do you have a suggestion for reproducing this? Setting up 20-way
          multi-source replication is not trivial, and it's not clear what "seems
          never-ending" means, exactly?

          The error messages look like just the normal messages when a slave is
          stopped, or did I miss something?

          I can imagine that shutdown with such a setup could take a while. What
          happens is that each of the 20 master-slave connections can have 8MB (your
          value of slave_parallel_max_queued) of events queued up. At stop, a
          conservative safe point to stop is selected in each of these queues, so
          there might be a bunch of events still need to be executed before stop can
          complete. And with slave_parallel_threads=40 and
          slave_domain_parallel_threads=10, most of the 20 master-slave might have to
          wait for the others to release their threads.

          Though this is assuming everything is busy and catching up to their masters
          ... and "seems never-ending" sounds more that what would be expected.

          I checked the code, and I did not immediately spot that parallel replication
          stop would act differently for STOP ALL SLAVES and SIGTERM. The stop of
          parallel replication threads happens the same way after the SQL thread is
          stopped. But it is definitely possible that there could be a bug or race in
          slave stop, it is just not clear what it could be from the information given
          here...

          Maybe you could do a detailed check of the error log to see if all SQL and
          IO threads stopped? I mean, if each of your 20 master-slave connections say
          something like "slave SQL thread was killed" and "Slave I/O thread exiting".
          It would help understand if the problem is related to stopping the IO/SQL
          threads, or if it is related to parallel replication completing its queue of
          events...

          knielsen Kristian Nielsen added a comment - Do you have a suggestion for reproducing this? Setting up 20-way multi-source replication is not trivial, and it's not clear what "seems never-ending" means, exactly? The error messages look like just the normal messages when a slave is stopped, or did I miss something? I can imagine that shutdown with such a setup could take a while. What happens is that each of the 20 master-slave connections can have 8MB (your value of slave_parallel_max_queued) of events queued up. At stop, a conservative safe point to stop is selected in each of these queues, so there might be a bunch of events still need to be executed before stop can complete. And with slave_parallel_threads=40 and slave_domain_parallel_threads=10, most of the 20 master-slave might have to wait for the others to release their threads. Though this is assuming everything is busy and catching up to their masters ... and "seems never-ending" sounds more that what would be expected. I checked the code, and I did not immediately spot that parallel replication stop would act differently for STOP ALL SLAVES and SIGTERM. The stop of parallel replication threads happens the same way after the SQL thread is stopped. But it is definitely possible that there could be a bug or race in slave stop, it is just not clear what it could be from the information given here... Maybe you could do a detailed check of the error log to see if all SQL and IO threads stopped? I mean, if each of your 20 master-slave connections say something like "slave SQL thread was killed" and "Slave I/O thread exiting". It would help understand if the problem is related to stopping the IO/SQL threads, or if it is related to parallel replication completing its queue of events...
          ip1981 Igor Pashev added a comment -

          I waited 24 hours.

          I can reproduce it:

          # sudo systemctl status mariadb -l -n 60 | grep 'Slave I/O thread exiting' | wc -l
          20
          # sudo systemctl status mariadb -l -n 60 | grep 'Error reading packet from server' | wc -l
          14
          # sudo systemctl status mariadb -l -n 60 | grep 'slave SQL thread was killed' | wc -l
          20
           
          # sudo systemctl status mariadb -l -n 60
          ● mariadb.service - MariaDB server
             Loaded: loaded (/nix/store/zx1q5mdi1jfxf2gkyc0kf2xajyjqmf2l-unit-mariadb.service/mariadb.service)
             Active: deactivating (stop-sigterm) since Fri 2016-02-26 18:52:37 SGT; 5min ago
           Main PID: 15106 (mysqld)
             CGroup: /system.slice/mariadb.service
                     └─15106 /nix/store/z7x0ndrckrsmqr3163hp9kgs9hwags61-mariadb-10.1.11/bin/mysqld --defaults-file=/nix/store/8xbdbp9646d0hkld1kjjkfzvlicwill4-mysqld.cnf
           
          Feb 26 13:21:43 grumpycat mysqld[15106]: 2016-02-26 13:21:43 140357032143616 [Note] Master 'bob-au': InnoDB: Online DDL : Start applying row log
          Feb 26 13:21:43 grumpycat mysqld[15106]: 2016-02-26 13:21:43 140357032143616 [Note] Master 'bob-au': InnoDB: Online DDL : End of applying row log
          Feb 26 13:21:43 grumpycat mysqld[15106]: 2016-02-26 13:21:43 140357032143616 [Note] Master 'bob-au': InnoDB: Online DDL : Completed
          Feb 26 18:52:37 grumpycat systemd[1]: Stopping MariaDB server...
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140355741813504 [Note] /nix/store/z7x0ndrckrsmqr3163hp9kgs9hwags61-mariadb-10.1.11/bin/mysqld: Normal shutdown
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140355741813504 [Note] Event Scheduler: Purging the queue. 0 events
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357032446720 [Note] Master 'bob-au': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.140993', position 2514261
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356062956288 [Note] Master 'sc': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.055135', position 3899105
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356064471808 [Note] Master 'bob-th': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356064774912 [ERROR] Master 'bob-th': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356064774912 [Note] Master 'bob-th': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029169', position 4014853
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357032749824 [Note] Master 'zops': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356388342528 [Note] Master 'bob-vn': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.031459', position 27519726
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357033052928 [ERROR] Master 'zops': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357033052928 [Note] Master 'zops': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.191525', position 144203
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356387736320 [Note] Master 'bob-id': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029404', position 11517216
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356709182208 [Note] Master 'bob-my': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356709485312 [ERROR] Master 'bob-my': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356709485312 [Note] Master 'bob-my': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.027500', position 8437840
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356388645632 [Note] Master 'bob-hk': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356708879104 [ERROR] Master 'bob-hk': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356708879104 [Note] Master 'bob-hk': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.032108', position 19164112
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356711000832 [Note] Master 'bob-ph': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356711303936 [ERROR] Master 'bob-ph': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356711303936 [Note] Master 'bob-ph': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029561', position 37345365
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140355742722816 [Note] Master 'sc': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357032143616 [Note] Master 'bob-au': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356709788416 [Note] Master 'bob-sg': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356710091520 [ERROR] Master 'bob-sg': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356710091520 [Note] Master 'bob-sg': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029613', position 25842793
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356711607040 [Note] Master 'bob-tw': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357031840512 [ERROR] Master 'bob-tw': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357031840512 [Note] Master 'bob-tw': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.032533', position 1015256
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356388039424 [Note] Master 'bob-vn': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356387433216 [Note] Master 'bob-id': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357914893056 [Note] Master 'oms-my': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.027098', position 3658520
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356386827008 [Note] Master 'oms-id': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356387130112 [ERROR] Master 'oms-id': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356387130112 [Note] Master 'oms-id': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.028565', position 1862130
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356063259392 [Note] Master 'delivery': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.036789', position 549027
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356386220800 [Note] Master 'oms-hk': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356386523904 [ERROR] Master 'oms-hk': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356386523904 [Note] Master 'oms-hk': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.027210', position 658686
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140355742116608 [Note] Master 'oms-tw': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140355742419712 [ERROR] Master 'oms-tw': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140355742419712 [Note] Master 'oms-tw': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.003260', position 15332
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356710394624 [Note] Master 'oms-ph': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356710697728 [ERROR] Master 'oms-ph': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356710697728 [Note] Master 'oms-ph': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029009', position 2157019
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356065684224 [Note] Master 'oms-sg': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356385917696 [ERROR] Master 'oms-sg': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356385917696 [Note] Master 'oms-sg': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029242', position 2189116
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356065078016 [Note] Master 'oms-vn': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356065381120 [ERROR] Master 'oms-vn': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356065381120 [Note] Master 'oms-vn': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.031309', position 1933682
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356063562496 [Note] Master 'delivery': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356063865600 [Note] Master 'oms-th': Error reading relay log event: slave SQL thread was killed
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356064168704 [ERROR] Master 'oms-th': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356064168704 [Note] Master 'oms-th': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029404', position 380272
          Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357033356032 [Note] Master 'oms-my': Error reading relay log event: slave SQL thread was killed

          ip1981 Igor Pashev added a comment - I waited 24 hours. I can reproduce it: # sudo systemctl status mariadb -l -n 60 | grep 'Slave I/O thread exiting' | wc -l 20 # sudo systemctl status mariadb -l -n 60 | grep 'Error reading packet from server' | wc -l 14 # sudo systemctl status mariadb -l -n 60 | grep 'slave SQL thread was killed' | wc -l 20   # sudo systemctl status mariadb -l -n 60 ● mariadb.service - MariaDB server Loaded: loaded (/nix/store/zx1q5mdi1jfxf2gkyc0kf2xajyjqmf2l-unit-mariadb.service/mariadb.service) Active: deactivating (stop-sigterm) since Fri 2016-02-26 18:52:37 SGT; 5min ago Main PID: 15106 (mysqld) CGroup: /system.slice/mariadb.service └─15106 /nix/store/z7x0ndrckrsmqr3163hp9kgs9hwags61-mariadb-10.1.11/bin/mysqld --defaults-file=/nix/store/8xbdbp9646d0hkld1kjjkfzvlicwill4-mysqld.cnf   Feb 26 13:21:43 grumpycat mysqld[15106]: 2016-02-26 13:21:43 140357032143616 [Note] Master 'bob-au': InnoDB: Online DDL : Start applying row log Feb 26 13:21:43 grumpycat mysqld[15106]: 2016-02-26 13:21:43 140357032143616 [Note] Master 'bob-au': InnoDB: Online DDL : End of applying row log Feb 26 13:21:43 grumpycat mysqld[15106]: 2016-02-26 13:21:43 140357032143616 [Note] Master 'bob-au': InnoDB: Online DDL : Completed Feb 26 18:52:37 grumpycat systemd[1]: Stopping MariaDB server... Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140355741813504 [Note] /nix/store/z7x0ndrckrsmqr3163hp9kgs9hwags61-mariadb-10.1.11/bin/mysqld: Normal shutdown Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140355741813504 [Note] Event Scheduler: Purging the queue. 0 events Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357032446720 [Note] Master 'bob-au': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.140993', position 2514261 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356062956288 [Note] Master 'sc': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.055135', position 3899105 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356064471808 [Note] Master 'bob-th': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356064774912 [ERROR] Master 'bob-th': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356064774912 [Note] Master 'bob-th': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029169', position 4014853 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357032749824 [Note] Master 'zops': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356388342528 [Note] Master 'bob-vn': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.031459', position 27519726 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357033052928 [ERROR] Master 'zops': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357033052928 [Note] Master 'zops': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.191525', position 144203 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356387736320 [Note] Master 'bob-id': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029404', position 11517216 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356709182208 [Note] Master 'bob-my': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356709485312 [ERROR] Master 'bob-my': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356709485312 [Note] Master 'bob-my': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.027500', position 8437840 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356388645632 [Note] Master 'bob-hk': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356708879104 [ERROR] Master 'bob-hk': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356708879104 [Note] Master 'bob-hk': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.032108', position 19164112 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356711000832 [Note] Master 'bob-ph': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356711303936 [ERROR] Master 'bob-ph': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356711303936 [Note] Master 'bob-ph': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029561', position 37345365 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140355742722816 [Note] Master 'sc': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357032143616 [Note] Master 'bob-au': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356709788416 [Note] Master 'bob-sg': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356710091520 [ERROR] Master 'bob-sg': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356710091520 [Note] Master 'bob-sg': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029613', position 25842793 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356711607040 [Note] Master 'bob-tw': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357031840512 [ERROR] Master 'bob-tw': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357031840512 [Note] Master 'bob-tw': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.032533', position 1015256 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356388039424 [Note] Master 'bob-vn': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356387433216 [Note] Master 'bob-id': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357914893056 [Note] Master 'oms-my': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.027098', position 3658520 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356386827008 [Note] Master 'oms-id': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356387130112 [ERROR] Master 'oms-id': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356387130112 [Note] Master 'oms-id': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.028565', position 1862130 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356063259392 [Note] Master 'delivery': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.036789', position 549027 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356386220800 [Note] Master 'oms-hk': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356386523904 [ERROR] Master 'oms-hk': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356386523904 [Note] Master 'oms-hk': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.027210', position 658686 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140355742116608 [Note] Master 'oms-tw': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140355742419712 [ERROR] Master 'oms-tw': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140355742419712 [Note] Master 'oms-tw': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.003260', position 15332 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356710394624 [Note] Master 'oms-ph': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356710697728 [ERROR] Master 'oms-ph': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356710697728 [Note] Master 'oms-ph': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029009', position 2157019 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356065684224 [Note] Master 'oms-sg': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356385917696 [ERROR] Master 'oms-sg': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356385917696 [Note] Master 'oms-sg': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029242', position 2189116 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356065078016 [Note] Master 'oms-vn': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356065381120 [ERROR] Master 'oms-vn': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356065381120 [Note] Master 'oms-vn': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.031309', position 1933682 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356063562496 [Note] Master 'delivery': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356063865600 [Note] Master 'oms-th': Error reading relay log event: slave SQL thread was killed Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356064168704 [ERROR] Master 'oms-th': Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140356064168704 [Note] Master 'oms-th': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.029404', position 380272 Feb 26 18:52:37 grumpycat mysqld[15106]: 2016-02-26 18:52:37 140357033356032 [Note] Master 'oms-my': Error reading relay log event: slave SQL thread was killed
          ip1981 Igor Pashev added a comment -

          I tried with slave_domain_parallel_threads = 2 - no difference.

          ip1981 Igor Pashev added a comment - I tried with slave_domain_parallel_threads = 2 - no difference.
          knielsen Kristian Nielsen made changes -
          Attachment mdev9595.patch [ 41910 ]
          ip1981 Igor Pashev added a comment -

          Let me try it out

          ip1981 Igor Pashev added a comment - Let me try it out
          knielsen Kristian Nielsen made changes -
          Attachment mdev9595.patch [ 41910 ]
          knielsen Kristian Nielsen made changes -
          Attachment mdev9595.patch [ 41911 ]

          Ok, I found something in the code. A race during shutdown, which would seem
          likely to trigger at SIGTERM shutdown with many master connections running.
          It seems likely (though not certain) that this race could be what causes the
          hang that you are seeing.

          Ok, so you were quick to notice the patch Yes, it would be great if you
          can try the attached patch (which is just the below one-liner) and see if
          it fixes the hang for you.

          diff --git a/sql/slave.cc b/sql/slave.cc
          index 72bed45..e9bcfb0 100644
          --- a/sql/slave.cc
          +++ b/sql/slave.cc
          @@ -4955,6 +4955,7 @@ pthread_handler_t handle_slave_sql(void *arg)
             */
             mysql_mutex_lock(&LOCK_active_mi);
             if (opt_slave_parallel_threads > 0 &&
          +      master_info_index &&// master_info_index is set to NULL on server shutdown
                 !master_info_index->any_slave_sql_running())
               rpl_parallel_inactivate_pool(&global_rpl_thread_pool);
             mysql_mutex_unlock(&LOCK_active_mi);
          

          knielsen Kristian Nielsen added a comment - Ok, I found something in the code. A race during shutdown, which would seem likely to trigger at SIGTERM shutdown with many master connections running. It seems likely (though not certain) that this race could be what causes the hang that you are seeing. Ok, so you were quick to notice the patch Yes, it would be great if you can try the attached patch (which is just the below one-liner) and see if it fixes the hang for you. diff --git a/sql/slave.cc b/sql/slave.cc index 72bed45..e9bcfb0 100644 --- a/sql/slave.cc +++ b/sql/slave.cc @@ -4955,6 +4955,7 @@ pthread_handler_t handle_slave_sql(void *arg) */ mysql_mutex_lock(&LOCK_active_mi); if (opt_slave_parallel_threads > 0 && + master_info_index &&// master_info_index is set to NULL on server shutdown !master_info_index->any_slave_sql_running()) rpl_parallel_inactivate_pool(&global_rpl_thread_pool); mysql_mutex_unlock(&LOCK_active_mi);
          ip1981 Igor Pashev added a comment - - edited

          Guys, Kristian, you are the best

          It takes 40 secs to cleanly shutdown or restart.

          To be clear: the patch helped.

          ip1981 Igor Pashev added a comment - - edited Guys, Kristian, you are the best It takes 40 secs to cleanly shutdown or restart. To be clear: the patch helped.
          ip1981 Igor Pashev added a comment -

          Mar 03 16:12:49 grumpycat mysqld[7229]: 2016-03-03 16:12:49 140095975377664 [ERROR] Master 'oms-th': Error reading packet from server: Lost connection to MySQL server during ..._errno=2013)
          Mar 03 16:12:49 grumpycat mysqld[7229]: 2016-03-03 16:12:49 140095975377664 [Note] Master 'oms-th': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.031106', position 3356802
          Mar 03 16:12:49 grumpycat mysqld[7229]: 2016-03-03 16:12:49 140097589578496 [Note] Master 'oms-my': Error reading relay log event: slave SQL thread was killed
          Mar 03 16:13:01 grumpycat mysqld[7229]: 2016-03-03 16:13:01 140076247906048 [Note] InnoDB: FTS optimize thread exiting.
          Mar 03 16:13:01 grumpycat mysqld[7229]: 2016-03-03 16:13:01 140097266313984 [Note] InnoDB: Starting shutdown...
          Mar 03 16:13:01 grumpycat mysqld[7229]: 2016-03-03 16:13:01 140076264691456 [Warning] InnoDB: Dumping buffer pool(s) to .//ib_buffer_pool
          Mar 03 16:13:01 grumpycat mysqld[7229]: 2016-03-03 16:13:01 140076264691456 [Warning] InnoDB: Buffer pool(s) dump completed at 160303 16:13:01
          Mar 03 16:13:29 grumpycat mysqld[7229]: 2016-03-03 16:13:29 140097266313984 [Note] InnoDB: Shutdown completed; log sequence number 6971445438743
          Mar 03 16:13:29 grumpycat mysqld[7229]: 2016-03-03 16:13:29 140097266313984 [Note] /nix/store/c4xfa0cyszgppf4079gwsz484vj2aqnd-mariadb-10.1.11/bin/mysqld: Shutdown complete
          
          

          ip1981 Igor Pashev added a comment - Mar 03 16:12:49 grumpycat mysqld[7229]: 2016-03-03 16:12:49 140095975377664 [ERROR] Master 'oms-th': Error reading packet from server: Lost connection to MySQL server during ..._errno=2013) Mar 03 16:12:49 grumpycat mysqld[7229]: 2016-03-03 16:12:49 140095975377664 [Note] Master 'oms-th': Slave I/O thread exiting, read up to log 'mysql-bin-changelog.031106', position 3356802 Mar 03 16:12:49 grumpycat mysqld[7229]: 2016-03-03 16:12:49 140097589578496 [Note] Master 'oms-my': Error reading relay log event: slave SQL thread was killed Mar 03 16:13:01 grumpycat mysqld[7229]: 2016-03-03 16:13:01 140076247906048 [Note] InnoDB: FTS optimize thread exiting. Mar 03 16:13:01 grumpycat mysqld[7229]: 2016-03-03 16:13:01 140097266313984 [Note] InnoDB: Starting shutdown... Mar 03 16:13:01 grumpycat mysqld[7229]: 2016-03-03 16:13:01 140076264691456 [Warning] InnoDB: Dumping buffer pool(s) to .//ib_buffer_pool Mar 03 16:13:01 grumpycat mysqld[7229]: 2016-03-03 16:13:01 140076264691456 [Warning] InnoDB: Buffer pool(s) dump completed at 160303 16:13:01 Mar 03 16:13:29 grumpycat mysqld[7229]: 2016-03-03 16:13:29 140097266313984 [Note] InnoDB: Shutdown completed; log sequence number 6971445438743 Mar 03 16:13:29 grumpycat mysqld[7229]: 2016-03-03 16:13:29 140097266313984 [Note] /nix/store/c4xfa0cyszgppf4079gwsz484vj2aqnd-mariadb-10.1.11/bin/mysqld: Shutdown complete

          Excellent, I've pushed the patch into 10.0.

          Thanks, Igor, for reporting this bug, and for quickly testing the fix!

          knielsen Kristian Nielsen added a comment - Excellent, I've pushed the patch into 10.0. Thanks, Igor, for reporting this bug, and for quickly testing the fix!
          knielsen Kristian Nielsen made changes -
          Fix Version/s 10.0.25 [ 21701 ]
          Fix Version/s 10.1.13 [ 21803 ]
          Fix Version/s 10.1 [ 16100 ]
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 74029 ] MariaDB v4 [ 150129 ]

          People

            knielsen Kristian Nielsen
            ip1981 Igor Pashev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.