Details
-
Bug
-
Status: Confirmed (View Workflow)
-
Critical
-
Resolution: Unresolved
-
10.6.17, 10.11.7
-
None
-
None
Description
An otherwise healthy Galera cluster presents hangs on node restart when the node has been running for a few days.
The hang happens in the galera_recovery/--wsrep-recover phase.
If one waits for hours and the node eventually restarts, restarting it right away will be fast, the hang seems to happen only when the node has been running for days.
When recovery is run, it produces file: /tmp/wsrep_recovery.xxxxxxx
2024-04-10 11:07:20 0 [Note] Starting MariaDB 10.11.7-MariaDB-log source revision 87e13722a95af5d9378d990caf48cb6874439347 as process 539904
|
2024-04-10 11:07:20 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
|
2024-04-10 11:07:20 0 [Note] InnoDB: Number of transaction pools: 1
|
2024-04-10 11:07:20 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
|
2024-04-10 11:07:20 0 [Note] InnoDB: Using Linux native AIO
|
2024-04-10 11:07:20 0 [Note] InnoDB: Initializing buffer pool, total size = 12.000GiB, chunk size = 192.000MiB
|
2024-04-10 11:07:20 0 [Note] InnoDB: Completed initialization of buffer pool
|
2024-04-10 11:07:20 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
|
2024-04-10 11:07:20 0 [Note] InnoDB: End of log at LSN=2145632313933
|
2024-04-10 11:07:20 0 [Note] InnoDB: 128 rollback segments are active.
|
2024-04-10 11:07:20 0 [Note] InnoDB: Setting file '/mysql/data/ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
|
2024-04-10 11:07:20 0 [Note] InnoDB: File '/mysql/data/ibtmp1' size is now 12.000MiB.
|
2024-04-10 11:07:20 0 [Note] InnoDB: log sequence number 2145632313933; transaction id 2789662810
|
2024-04-10 11:07:20 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
|
2024-04-10 11:07:20 0 [Note] Plugin 'FEEDBACK' is disabled.
|
At this point one CPU core is at 100% and no progress is seen.
The MariaDB log is not involved in this phase so probably not relevant, nothing logged there as well as no file updated in the datadir.
The aggregated stack trace is attached as well as the graphed version.
Relevant config:
innodb_buffer_pool_size = 120G
innodb_log_file_size = 5120M
innodb_log_buffer_size = 500M
Attachments
Issue Links
- is caused by
-
MDEV-31413 Node has been dropped from the cluster on Startup / Shutdown with async replica
- Closed
-
MDEV-34924 gtid_slave_pos table neven been deleted on non replica nodes (wsrep_gtid_mode = 1)
- Confirmed
- relates to
-
MDEV-33799 mysql_manager_submit Segfault at Startup Still Possible During Recovery
- Closed
-
MDEV-34211 Seconds_Behind_Master gives impossible values
- Open
-
MDEV-35627 mysql.gtid_slave_pos gets really big
- Open