Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13667

Connection stuck in Unlocking tables state

Details

    Description

      After upgrade to MariaDB from 10.1 to 10.2.8.
      We are using wsrep with 3 nodes cluster.
      One node for Write and two nodes for read data.

      sometimes I see several connections that stuck in Unlocking tables in Write node.
      Even I kill them, they will be alive.
      I don't know when it happen, but I know it maybe happen on different databases and not related to specific database or table.

      my.cnf setting(this is same in all nodes)

      #
      # This group is read both both by the client and the server
      # use it for options that affect everything
      #
      [client-server]
       
      #
      # include all files from the config directory
      #
      !includedir /etc/my.cnf.d
       
      [mysqld]
      skip-name-resolve
      datadir         = /home/mysql/
      tmpdir          = /home/mysqltmp/
      port            = 3306
      socket          = /home/mysql/mysql.sock
       
      # General #
      log_error                      = /home/mysql/node1.err
      default-storage-engine         = innodb
      query_cache_limit              = 0
      query_cache_size               = 0
      query_cache_type               = 0
      bind-address                   = 0.0.0.0
      max-connections                = 500
      max-allowed-packet             = 15M
      table_open_cache               = 60000
      tmp_table_size                 = 512M
       
      #FTS
      innodb_ft_result_cache_limit=256M
       
       
      # Binlog #
      binlog_format                  = ROW
      sync_binlog                    = 0
       
      # Galera #
      wsrep_on=on
      wsrep_provider=/usr/lib64/galera/libgalera_smm.so
      wsrep_cluster_address="gcomm://xx.xx.xx.xx,xx.xx.xx.xx,xx.xx.xx.xx"
      wsrep_cluster_name='galera_cluster'
      wsrep_node_address='xx.xx.xx.xx'
      wsrep_node_name='node1'
      wsrep_provider_options="gcache.size=25G;gcs.fc_limit=4000;gcache.page_size=512M;gcs.fc_master_slave=YES;gcs.fc_factor=1.0;"
      wsrep_sst_method=xtrabackup-v2
      wsrep_sst_auth=xtrabackup:1111111
       
       
      #Replication
      gtid_domain_id=1
      server_id=1
      log_slave_updates=1
      log_bin=binlog
      expire_logs_days=2
       
      # Innodb #
      innodb_file_format             = BARRACUDA
      innodb-flush-method            = O_DIRECT
      innodb_autoinc_lock_mode       = 2
      innodb_log_file_size           = 1G
      innodb_log_buffer_size         = 512M
      innodb_file_per_table          = 1
      innodb_flush_log_at_trx_commit = 0
      innodb_buffer_pool_size        = 100G
      innodb_buffer_pool_instances   = 20
      innodb_write_io_threads        = 16
      innodb_read_io_threads         = 16
      innodb_change_buffering        = all
      transaction-isolation          = READ-COMMITTED
       
      [sst]
      compressor="pigz"
      decompressor="pigz -d"
      

      Attachments

        1. 2.png
          2.png
          10 kB
        2. df.log
          0.4 kB
        3. mariadb.jpg
          mariadb.jpg
          32 kB
        4. response.sql
          136 kB
        5. top.log
          31 kB

        Issue Links

          Activity

            HamoonDBA , your English is good, just it is still not clear how your system recovers from the problem.
            I understand that Server remains operational and KILL command doesn't help - but for how long connection remains in 'Unlocking tables' ? E.g. is it hanging for few hours after KILL or few days or weeks or until you restart Server? (how long it usually takes before restart then?)

            anikitin Andrii Nikitin (Inactive) added a comment - HamoonDBA , your English is good, just it is still not clear how your system recovers from the problem. I understand that Server remains operational and KILL command doesn't help - but for how long connection remains in 'Unlocking tables' ? E.g. is it hanging for few hours after KILL or few days or weeks or until you restart Server? (how long it usually takes before restart then?)

            Thank you Andrii
            Ok. We have three nodes in our Cluster.
            When this problem happen it is remains until restart Server. The important point is that I can't restart MariaDB gracefully because Mariadb Stuck for this query to complete kill it so I have to kill mariadb process and then restart again,because I couldn't restart gracefully after restart the node is going to SST method.
            Two of the nodes (node1 and node2) are stop now because I want to solve the problem then join two nodes again to cluster.
            But in the last on node3, after this problem happened again, after three days the node was crash with this Error:

            2017-09-05 20:20:40 0x7faa3381e700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.8/storage/innobase/dict/dict0stats.cc line 1572
            InnoDB: Failing assertion: offsets_rec != NULL
            InnoDB: We intentionally generate a memory trap.
            InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
            InnoDB: If you get repeated assertion failures or crashes, even
            InnoDB: immediately after the mysqld startup, there may be
            InnoDB: corruption in the InnoDB tablespace. Please refer to
            InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
            InnoDB: about forcing recovery.
            170905 20:20:40 [ERROR] mysqld got signal 6 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.
             
            To report this bug, see https://mariadb.com/kb/en/reporting-bugs
             
            We will try our best to scrape up some info that will hopefully help
            diagnose the problem, but since we have already crashed,
            something is definitely wrong and this may fail.
             
            Server version: 10.2.8-MariaDB-log
            key_buffer_size=134217728
            read_buffer_size=131072
            max_used_connections=400
            max_threads=502
            thread_count=280
            It is possible that mysqld could use up to
            key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1234023 K  bytes of memory
            Hope that's ok; if not, decrease some variables in the equation.
             
            Thread pointer: 0x0
            Attempting backtrace. You can use the following information to find out
            where mysqld died. If you see no messages after this, something went
            terribly wrong...
            stack_bottom = 0x0 thread_stack 0x49000
            /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7fbb519aca7e]
            /usr/sbin/mysqld(handle_fatal_signal+0x30d)[0x7fbb513f2dbd]
            /lib64/libpthread.so.0(+0xf370)[0x7fbb5095b370]
            /lib64/libc.so.6(gsignal+0x37)[0x7fbb4eee41d7]
            /lib64/libc.so.6(abort+0x148)[0x7fbb4eee58c8]
            /usr/sbin/mysqld(+0x423578)[0x7fbb511ae578]
            /usr/sbin/mysqld(+0xa76c70)[0x7fbb51801c70]
            /usr/sbin/mysqld(+0xa78e77)[0x7fbb51803e77]
            /usr/sbin/mysqld(+0xa7c69d)[0x7fbb5180769d]
            /usr/sbin/mysqld(+0xa7e733)[0x7fbb51809733]
            /lib64/libpthread.so.0(+0x7dc5)[0x7fbb50953dc5]
            /lib64/libc.so.6(clone+0x6d)[0x7fbb4efa676d]
            The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
            information that should help you find out what is causing the crash.
            

            and after start again this problem happened again (Unlocking tables state)

            HamoonDBA Hamoon Mohammadian Pour added a comment - Thank you Andrii Ok. We have three nodes in our Cluster. When this problem happen it is remains until restart Server. The important point is that I can't restart MariaDB gracefully because Mariadb Stuck for this query to complete kill it so I have to kill mariadb process and then restart again,because I couldn't restart gracefully after restart the node is going to SST method. Two of the nodes (node1 and node2) are stop now because I want to solve the problem then join two nodes again to cluster. But in the last on node3, after this problem happened again, after three days the node was crash with this Error: 2017 - 09 - 05 20 : 20 : 40 0x7faa3381e700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb- 10.2 . 8 /storage/innobase/dict/dict0stats.cc line 1572 InnoDB: Failing assertion: offsets_rec != NULL InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http: //bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http: //dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 170905 20 : 20 : 40 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.   To report this bug, see https: //mariadb.com/kb/en/reporting-bugs   We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail.   Server version: 10.2 . 8 -MariaDB-log key_buffer_size= 134217728 read_buffer_size= 131072 max_used_connections= 400 max_threads= 502 thread_count= 280 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1234023 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.   Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this , something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 /usr/sbin/mysqld(my_print_stacktrace+ 0x2e )[ 0x7fbb519aca7e ] /usr/sbin/mysqld(handle_fatal_signal+ 0x30d )[ 0x7fbb513f2dbd ] /lib64/libpthread.so. 0 (+ 0xf370 )[ 0x7fbb5095b370 ] /lib64/libc.so. 6 (gsignal+ 0x37 )[ 0x7fbb4eee41d7 ] /lib64/libc.so. 6 (abort+ 0x148 )[ 0x7fbb4eee58c8 ] /usr/sbin/mysqld(+ 0x423578 )[ 0x7fbb511ae578 ] /usr/sbin/mysqld(+ 0xa76c70 )[ 0x7fbb51801c70 ] /usr/sbin/mysqld(+ 0xa78e77 )[ 0x7fbb51803e77 ] /usr/sbin/mysqld(+ 0xa7c69d )[ 0x7fbb5180769d ] /usr/sbin/mysqld(+ 0xa7e733 )[ 0x7fbb51809733 ] /lib64/libpthread.so. 0 (+ 0x7dc5 )[ 0x7fbb50953dc5 ] /lib64/libc.so. 6 (clone+ 0x6d )[ 0x7fbb4efa676d ] The manual page at http: //dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. and after start again this problem happened again (Unlocking tables state)

            HamoonDBA this looks related to MDEV-13534 ; it is possible that disabling InnoDB persistent statistics may help to work around the problem. Small chance exists that it may help the problem connection as well.

            To disable persistent stats - put following line into [mysqld] section of .cnf file:
            innodb_stats_persistent=0

            To disable persistent stats for current server without restart - execute SQL command:

            set global innodb_stats_persistent=0;
            

            The bug is fixed in 10.2.9 , which is not released yet

            anikitin Andrii Nikitin (Inactive) added a comment - HamoonDBA this looks related to MDEV-13534 ; it is possible that disabling InnoDB persistent statistics may help to work around the problem. Small chance exists that it may help the problem connection as well. To disable persistent stats - put following line into [mysqld] section of .cnf file: innodb_stats_persistent=0 To disable persistent stats for current server without restart - execute SQL command: set global innodb_stats_persistent=0; The bug is fixed in 10.2.9 , which is not released yet

            Hi Andrii
            I disabled innodb_stats_persistent Online without restart. after change this variable it seems stuck query doesn't happen again but the current stuck sessions still persist, So I try to restart server but as I said I had to restart MariaDB with Kill -9 process.
            I'm testing again to see problem happen again or not.

            HamoonDBA Hamoon Mohammadian Pour added a comment - Hi Andrii I disabled innodb_stats_persistent Online without restart. after change this variable it seems stuck query doesn't happen again but the current stuck sessions still persist, So I try to restart server but as I said I had to restart MariaDB with Kill -9 process. I'm testing again to see problem happen again or not.

            Andrii Nikitin unfortunately I had to downgrade all of our nodes to MariaDB 10.0
            My boss forced me to downgrade (because we had only one node) and I couldn't test it again to be sure problem solve or not.

            HamoonDBA Hamoon Mohammadian Pour added a comment - Andrii Nikitin unfortunately I had to downgrade all of our nodes to MariaDB 10.0 My boss forced me to downgrade (because we had only one node) and I couldn't test it again to be sure problem solve or not.

            People

              jplindst Jan Lindström (Inactive)
              HamoonDBA Hamoon Mohammadian Pour
              Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.