[MDEV-36423] Galera SST using mariabackup fails while transmitting data - Jira

XML

Word

Printable

Details

Type: Bug
Status: Open (View Workflow)
Priority: Critical
Resolution: Unresolved
Affects Version/s: 10.11
Fix Version/s: 10.11
Component/s: Galera, Galera SST, mariabackup
Labels:
None
Environment:
Gentoo Linux
mariadb-10.11.11
galera-26.4.22

Description

Hello,

while trying to test solutions for MDEV-35926 we've found another problem during the Galera SST process.

For context, this is a galera cluster of 3 nodes with around 4TB of data and using mariabackup as SST method and daily cluster backup.

When the SST process is started, it successfully starts to transmit data. However after a few hours passes, data transmission stops with following error in the mariabackup.backup.log:

[01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/report.frm to <STDOUT>

[01] 2025-03-26 04:42:32         ...done

[01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/sms_statistic.frm to <STDOUT>

[01] 2025-03-26 04:42:32         ...done

[01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/email_statistic_opened.frm to <STDOUT>

[01] 2025-03-26 04:42:32         ...done

[00] 2025-03-26 04:42:32 Finished backing up non-InnoDB tables and files

[00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 257758688468141

[00] 2025-03-26 04:42:32 Retrying read of log at LSN=257754432284012

[00] 2025-03-26 04:42:33 Retrying read of log at LSN=257754432284012

[00] 2025-03-26 04:42:34 Retrying read of log at LSN=257754432284012

[00] 2025-03-26 04:42:35 Retrying read of log at LSN=257754432284012

[00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012

[00] 2025-03-26 04:42:37 Was only able to copy log from 257737152711197 to 257754432284012, not 257758688468141; try increasing innodb_log_file_size

mariabackup: Stopping log copying thread[00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012

We've already tested increasing the innodb_log_file_size from original 90GB to current 220GB. This did not help with SST but the daily backup was able to finish.
During the time we've ran the SSTs, the traffic was not high enough to fill the log.

We've noticed that the process waits for

[00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 25775868846814

to be read with a timeout of 5 seconds, when it runs out the SST fails. Please notice that lsn it is waiting for is from the future

We've tried to increase the timeout by adding

[mariabackup]

ftwrl-wait-timeout=60

But this did not change anything even after adding it to the [SST] section as well.

Do you have any idea what could be causing this?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

logs.7z.001
10.00 MB
2025-04-07 11:50
logs.7z.002
10.00 MB
2025-04-07 11:50
logs.7z.003
4.72 MB
2025-04-07 11:50

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Jakub Gajdos

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2025-03-28 14:01

Updated:: 2025-07-10 19:57