Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.11
-
None
-
Gentoo Linux
mariadb-10.11.11
galera-26.4.22
Description
Hello,
while trying to test solutions for https://jira.mariadb.org/browse/MDEV-35926 we've found another problem during the Galera SST process.
For context, this is a galera cluster of 3 nodes with around 4TB of data and using mariabackup as SST method and daily cluster backup.
When the SST process is started, it successfully starts to transmit data. However after a few hours passes, data transmission stops with following error in the mariabackup.backup.log:
[01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/report.frm to <STDOUT> |
[01] 2025-03-26 04:42:32 ...done |
[01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/sms_statistic.frm to <STDOUT> |
[01] 2025-03-26 04:42:32 ...done |
[01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/email_statistic_opened.frm to <STDOUT> |
[01] 2025-03-26 04:42:32 ...done |
[00] 2025-03-26 04:42:32 Finished backing up non-InnoDB tables and files
|
[00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 257758688468141 |
[00] 2025-03-26 04:42:32 Retrying read of log at LSN=257754432284012 |
[00] 2025-03-26 04:42:33 Retrying read of log at LSN=257754432284012 |
[00] 2025-03-26 04:42:34 Retrying read of log at LSN=257754432284012 |
[00] 2025-03-26 04:42:35 Retrying read of log at LSN=257754432284012 |
[00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012 |
[00] 2025-03-26 04:42:37 Was only able to copy log from 257737152711197 to 257754432284012, not 257758688468141; try increasing innodb_log_file_size
|
mariabackup: Stopping log copying thread[00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012 |
We've already tested increasing the innodb_log_file_size from original 90GB to current 220GB. This did not help with SST but the daily backup was able to finish.
During the time we've ran the SSTs, the traffic was not high enough to fill the log.
We've noticed that the process waits for
[00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 25775868846814 |
to be read with a timeout of 5 seconds, when it runs out the SST fails. Please notice that lsn it is waiting for is from the future
We've tried to increase the timeout by adding
[mariabackup]
|
ftwrl-wait-timeout=60
|
But this did not change anything even after adding it to the [SST] section as well.
Do you have any idea what could be causing this?