Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6751

possible deadlock with ALTER + threadpool + events

Details

    Description

      Am running the following on a slave:

      • Largish (24h, 600M rows, 200G) ALTER TABLE
      • Events with INFORMATION_SCHEMA queries
      • Threadpool pool-of-threads active
      • Replication active
      • No other significant traffic

      After several hours, MariaDB locks up with 0% CPU and disk activity, and no response on existing or new connections on port, extra_port, or socket.

      Attached are gdb backtraces for two occurrences, examples of the ALTER and the INFORMATION_SCHEMA activity, and other info. Would appreciate any insight from devs to identify the deadlock, and to narrow down the variables for a test case that isn't 200G.

      Am presently trialing the ALTER outside the threadpool using the extra_port, with all other settings unchanged.

      Other notes:

      • It doesn't seem to be a thread pool overload, as there aren't enough threads in the backtrace.
      • The INFORMATION_SCHEMA event traffic uses GET_LOCK to serialize some activity and prevent pile-up.

      Attachments

        1. db1062_lockup_gdb.2.log
          172 kB
        2. db1062_lockup_gdb.3.log
          183 kB
        3. db1062_lockup_gdb.log
          162 kB
        4. db1062_lockup.ALTER.txt
          4 kB
        5. db1062_lockup.cmake.txt
          0.6 kB
        6. db1062_lockup.global_vars.txt
          325 kB
        7. db1062_lockup.i_s_query.txt
          0.4 kB
        8. db1062_lockup.processlist.txt
          13 kB

        Activity

          sean Sean Pringle added a comment - - edited

          Tried disabling thread pool, replication, other traffic, all to no effect. The problem recurred. Each time after restart I waited until transactions had completed rollback.

          Observing the threads stuck in buf_, mtr_, and log_ calls I tried reverting to a single buffer pool instance, innodb_buffer_pool_instances=1 (which matches our 5.5 config). That allowed the ALTER to complete normally without a hiccup.

          A flushing lock-cycle of some sort?

          sean Sean Pringle added a comment - - edited Tried disabling thread pool, replication, other traffic, all to no effect. The problem recurred. Each time after restart I waited until transactions had completed rollback. Observing the threads stuck in buf_, mtr_, and log_ calls I tried reverting to a single buffer pool instance, innodb_buffer_pool_instances=1 (which matches our 5.5 config). That allowed the ALTER to complete normally without a hiccup. A flushing lock-cycle of some sort?

          If you can experiment with your instance, could you maybe try setting innodb_buffer_pool_instances to 1?
          We are currently investigating a problem which looks related to multiple buffer pool instances. I am not sure at all that your deadlock is the same issue, but it won't hurt to try.

          elenst Elena Stepanova added a comment - If you can experiment with your instance, could you maybe try setting innodb_buffer_pool_instances to 1? We are currently investigating a problem which looks related to multiple buffer pool instances. I am not sure at all that your deadlock is the same issue, but it won't hurt to try.

          Do you see long semaphore wait warnings/errors on error log ?

          jplindst Jan Lindström (Inactive) added a comment - Do you see long semaphore wait warnings/errors on error log ?
          sean Sean Pringle added a comment -

          @Elena, yes innodb_buffer_pool_instances=1 helped. See comment above.

          @Jan, no long semaphore warnings, and no 600 second abort.

          sean Sean Pringle added a comment - @Elena, yes innodb_buffer_pool_instances=1 helped. See comment above. @Jan, no long semaphore warnings, and no 600 second abort.

          Elena, yes innodb_buffer_pool_instances=1 helped. See comment above.

          Sorry, I somehow missed that part of the comment. Thanks.

          elenst Elena Stepanova added a comment - Elena, yes innodb_buffer_pool_instances=1 helped. See comment above. Sorry, I somehow missed that part of the comment. Thanks.

          People

            jplindst Jan Lindström (Inactive)
            sean Sean Pringle
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.