Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33977

MariaDB hangs on --wsrep-recover phase

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Duplicate
    • 10.6.17, 10.11.7
    • N/A
    • Galera
    • None

    Description

      An otherwise healthy Galera cluster presents hangs on node restart when the node has been running for a few days.
      The hang happens in the galera_recovery/--wsrep-recover phase.
      If one waits for hours and the node eventually restarts, restarting it right away will be fast, the hang seems to happen only when the node has been running for days.

      When recovery is run, it produces file: /tmp/wsrep_recovery.xxxxxxx

      2024-04-10 11:07:20 0 [Note] Starting MariaDB 10.11.7-MariaDB-log source revision 87e13722a95af5d9378d990caf48cb6874439347 as process 539904
      2024-04-10 11:07:20 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2024-04-10 11:07:20 0 [Note] InnoDB: Number of transaction pools: 1
      2024-04-10 11:07:20 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2024-04-10 11:07:20 0 [Note] InnoDB: Using Linux native AIO
      2024-04-10 11:07:20 0 [Note] InnoDB: Initializing buffer pool, total size = 12.000GiB, chunk size = 192.000MiB
      2024-04-10 11:07:20 0 [Note] InnoDB: Completed initialization of buffer pool
      2024-04-10 11:07:20 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
      2024-04-10 11:07:20 0 [Note] InnoDB: End of log at LSN=2145632313933
      2024-04-10 11:07:20 0 [Note] InnoDB: 128 rollback segments are active.
      2024-04-10 11:07:20 0 [Note] InnoDB: Setting file '/mysql/data/ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
      2024-04-10 11:07:20 0 [Note] InnoDB: File '/mysql/data/ibtmp1' size is now 12.000MiB.
      2024-04-10 11:07:20 0 [Note] InnoDB: log sequence number 2145632313933; transaction id 2789662810
      2024-04-10 11:07:20 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
      2024-04-10 11:07:20 0 [Note] Plugin 'FEEDBACK' is disabled.
      

      At this point one CPU core is at 100% and no progress is seen.

      The MariaDB log is not involved in this phase so probably not relevant, nothing logged there as well as no file updated in the datadir.

      The aggregated stack trace is attached as well as the graphed version.

      Relevant config:
      innodb_buffer_pool_size = 120G
      innodb_log_file_size = 5120M
      innodb_log_buffer_size = 500M

      Attachments

        Issue Links

          Activity

            sysprg Julius Goryavsky added a comment - - edited

            There's also a possibility that it's related to MDEV-34134.

            sysprg Julius Goryavsky added a comment - - edited There's also a possibility that it's related to MDEV-34134 .
            bnestere Brandon Nesterenko added a comment - - edited

            It looks like this is caused by MDEV-33799 (fixed in 10.11.8), but presents as a hang instead of a segfault.

            bnestere Brandon Nesterenko added a comment - - edited It looks like this is caused by MDEV-33799 (fixed in 10.11.8), but presents as a hang instead of a segfault.
            bnestere Brandon Nesterenko added a comment - - edited

            With more investigation (I didn't see the flame graph earlier), I think I see what is going on. When MDEV-31413 added the replication of mysql.gtid_slave_pos, it seems there is no cleanup of records (on a regular slave, the records are automatically cleaned up as they are no longer needed). This is problematic, because at recovery time, the entire mysql.gtid_slave_pos table is read into memory (during the load_gtid_state_cb), along with additional malloc calls for each row during mysql_manager_submit (which builds a linked list with individual mallocs per each row in mysql.gtid_slave_pos). On the regular slave, this is not a problem as gtid_slave_pos won't be too big. However, on the node which replicates gtid_slave_pos, I imagine this can take up a significant amount of memory. On my laptop, I can generate 12MB worth of records in just 5 minutes (just in the size of the mysql.gtid_slave_pos alone, i.e. not counting the additional metadata/linked list).

            So I would expect that during the hang, the server runs out of resources and leads to an effective halt of the system. To confirm, do we have any memory profiling information about the hang? E.g. memory/swap usage?

            bnestere Brandon Nesterenko added a comment - - edited With more investigation (I didn't see the flame graph earlier), I think I see what is going on. When MDEV-31413 added the replication of mysql.gtid_slave_pos, it seems there is no cleanup of records (on a regular slave, the records are automatically cleaned up as they are no longer needed). This is problematic, because at recovery time, the entire mysql.gtid_slave_pos table is read into memory (during the load_gtid_state_cb), along with additional malloc calls for each row during mysql_manager_submit (which builds a linked list with individual mallocs per each row in mysql.gtid_slave_pos). On the regular slave, this is not a problem as gtid_slave_pos won't be too big. However, on the node which replicates gtid_slave_pos, I imagine this can take up a significant amount of memory. On my laptop, I can generate 12MB worth of records in just 5 minutes (just in the size of the mysql.gtid_slave_pos alone, i.e. not counting the additional metadata/linked list). So I would expect that during the hang, the server runs out of resources and leads to an effective halt of the system. To confirm, do we have any memory profiling information about the hang? E.g. memory/swap usage?
            ryudidoo Anirudh Menon added a comment - - edited

            We are currently blocked going to production because of this bug. This has been assigned to the original reporter for feedback. What kind of feedback is needed? Is it some kind of tests that need to be done or some additional details need to be provided? We can chip in. Anything that will help to progress on this issue would be helpful.

            ryudidoo Anirudh Menon added a comment - - edited We are currently blocked going to production because of this bug. This has been assigned to the original reporter for feedback. What kind of feedback is needed? Is it some kind of tests that need to be done or some additional details need to be provided? We can chip in. Anything that will help to progress on this issue would be helpful.

            Thanks ryudidoo for the report. Moving this out of needs feedback, as Richard has reproduced the issue and confirmed my theory (and filed MDEV-34924 accordingly). Assigning to sysprg of the Galera team for next steps.

            bnestere Brandon Nesterenko added a comment - Thanks ryudidoo for the report. Moving this out of needs feedback, as Richard has reproduced the issue and confirmed my theory (and filed MDEV-34924 accordingly). Assigning to sysprg of the Galera team for next steps.

            People

              janlindstrom Jan Lindström
              claudio.nanni Claudio Nanni
              Votes:
              14 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.