Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33977

MariaDB hangs on --wsrep-recover phase

    XMLWordPrintable

Details

    • Bug
    • Status: Confirmed (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 10.6.17, 10.11.7
    • 10.6, 10.11
    • None
    • None

    Description

      An otherwise healthy Galera cluster presents hangs on node restart when the node has been running for a few days.
      The hang happens in the galera_recovery/--wsrep-recover phase.
      If one waits for hours and the node eventually restarts, restarting it right away will be fast, the hang seems to happen only when the node has been running for days.

      When recovery is run, it produces file: /tmp/wsrep_recovery.xxxxxxx

      2024-04-10 11:07:20 0 [Note] Starting MariaDB 10.11.7-MariaDB-log source revision 87e13722a95af5d9378d990caf48cb6874439347 as process 539904
      2024-04-10 11:07:20 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2024-04-10 11:07:20 0 [Note] InnoDB: Number of transaction pools: 1
      2024-04-10 11:07:20 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2024-04-10 11:07:20 0 [Note] InnoDB: Using Linux native AIO
      2024-04-10 11:07:20 0 [Note] InnoDB: Initializing buffer pool, total size = 12.000GiB, chunk size = 192.000MiB
      2024-04-10 11:07:20 0 [Note] InnoDB: Completed initialization of buffer pool
      2024-04-10 11:07:20 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
      2024-04-10 11:07:20 0 [Note] InnoDB: End of log at LSN=2145632313933
      2024-04-10 11:07:20 0 [Note] InnoDB: 128 rollback segments are active.
      2024-04-10 11:07:20 0 [Note] InnoDB: Setting file '/mysql/data/ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
      2024-04-10 11:07:20 0 [Note] InnoDB: File '/mysql/data/ibtmp1' size is now 12.000MiB.
      2024-04-10 11:07:20 0 [Note] InnoDB: log sequence number 2145632313933; transaction id 2789662810
      2024-04-10 11:07:20 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
      2024-04-10 11:07:20 0 [Note] Plugin 'FEEDBACK' is disabled.
      

      At this point one CPU core is at 100% and no progress is seen.

      The MariaDB log is not involved in this phase so probably not relevant, nothing logged there as well as no file updated in the datadir.

      The aggregated stack trace is attached as well as the graphed version.

      Relevant config:
      innodb_buffer_pool_size = 120G
      innodb_log_file_size = 5120M
      innodb_log_buffer_size = 500M

      Attachments

        1. CS0733762-106ES-recovery-trace.log.gz
          1 kB
          Claudio Nanni
        2. flame-june04-mdb106-recovery-trace.svg
          29 kB
          Claudio Nanni
        3. offcputime.out
          14 kB
          Jeffrey
        4. recovery-hang-trace.log
          13 kB
          Claudio Nanni
        5. recovery-hang-trace.png
          32 kB
          Claudio Nanni
        6. strace.out.gz
          34 kB
          Jeffrey

        Issue Links

          Activity

            People

              janlindstrom Jan Lindström
              claudio.nanni Claudio Nanni
              Votes:
              14 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.