Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36423

Galera SST using mariabackup fails while transmitting data

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.11
    • 10.11
    • Galera, Galera SST, mariabackup
    • None
    • Gentoo Linux
      mariadb-10.11.11
      galera-26.4.22

    Description

      Hello,

      while trying to test solutions for https://jira.mariadb.org/browse/MDEV-35926 we've found another problem during the Galera SST process.

      For context, this is a galera cluster of 3 nodes with around 4TB of data and using mariabackup as SST method and daily cluster backup.

      When the SST process is started, it successfully starts to transmit data. However after a few hours passes, data transmission stops with following error in the mariabackup.backup.log:

      [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/report.frm to <STDOUT>
      [01] 2025-03-26 04:42:32         ...done
      [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/sms_statistic.frm to <STDOUT>
      [01] 2025-03-26 04:42:32         ...done
      [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/email_statistic_opened.frm to <STDOUT>
      [01] 2025-03-26 04:42:32         ...done
      [00] 2025-03-26 04:42:32 Finished backing up non-InnoDB tables and files
      [00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 257758688468141
      [00] 2025-03-26 04:42:32 Retrying read of log at LSN=257754432284012
      [00] 2025-03-26 04:42:33 Retrying read of log at LSN=257754432284012
      [00] 2025-03-26 04:42:34 Retrying read of log at LSN=257754432284012
      [00] 2025-03-26 04:42:35 Retrying read of log at LSN=257754432284012
      [00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012
      [00] 2025-03-26 04:42:37 Was only able to copy log from 257737152711197 to 257754432284012, not 257758688468141; try increasing innodb_log_file_size
      mariabackup: Stopping log copying thread[00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012
      

      We've already tested increasing the innodb_log_file_size from original 90GB to current 220GB. This did not help with SST but the daily backup was able to finish.
      During the time we've ran the SSTs, the traffic was not high enough to fill the log.

      We've noticed that the process waits for

      [00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 25775868846814
      

      to be read with a timeout of 5 seconds, when it runs out the SST fails. Please notice that lsn it is waiting for is from the future

      We've tried to increase the timeout by adding

      [mariabackup]
      ftwrl-wait-timeout=60
      

      But this did not change anything even after adding it to the [SST] section as well.

      Do you have any idea what could be causing this?

      Attachments

        Activity

          Gajdos Jakub Gajdos created issue -
          Gajdos Jakub Gajdos made changes -
          Field Original Value New Value
          Description Hello,

          while trying to test solutions for https://jira.mariadb.org/browse/MDEV-35926 we've found another problem during the Galera SST process.

          For context, this is a galera cluster of 3 nodes with around 4TB of data and using mariabackup as SST method and daily cluster backup.

          When the SST process is started, it successfully starts to transmit data. However after a few hours passes, data transmission stops with following error:

          {code:bash}
          [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/report.frm to <STDOUT>
          [01] 2025-03-26 04:42:32 ...done
          [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/sms_statistic.frm to <STDOUT>
          [01] 2025-03-26 04:42:32 ...done
          [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/email_statistic_opened.frm to <STDOUT>
          [01] 2025-03-26 04:42:32 ...done
          [00] 2025-03-26 04:42:32 Finished backing up non-InnoDB tables and files
          [00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 257758688468141
          [00] 2025-03-26 04:42:32 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:33 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:34 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:35 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:37 Was only able to copy log from 257737152711197 to 257754432284012, not 257758688468141; try increasing innodb_log_file_size
          mariabackup: Stopping log copying thread[00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012
          {code}

          We've already tested increasing the innodb_log_file_size from original 90GB to current 220GB. This did not help with SST but the daily backup was able to finish.
          During the time we've ran the SSTs, the traffic was not high enough to fill the log.

          We've noticed that the process waints for

          {code:bash}
          [00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 25775868846814
          {code}

          to be read with a timout of 5 seconds, when it runs out the SST fails.

          We've tried to increase the timeout by adding

          {code:bash}
          [mariabackup]
          ftwrl-wait-timeout=60
          {code}

          But this did not change anything even after adding it to the \[SST\] section as well.

          Do you have any idea what could be causing this?
          Hello,

          while trying to test solutions for https://jira.mariadb.org/browse/MDEV-35926 we've found another problem during the Galera SST process.

          For context, this is a galera cluster of 3 nodes with around 4TB of data and using mariabackup as SST method and daily cluster backup.

          When the SST process is started, it successfully starts to transmit data. However after a few hours passes, data transmission stops with following error in the mariabackup.backup.log:

          {code:bash}
          [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/report.frm to <STDOUT>
          [01] 2025-03-26 04:42:32 ...done
          [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/sms_statistic.frm to <STDOUT>
          [01] 2025-03-26 04:42:32 ...done
          [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/email_statistic_opened.frm to <STDOUT>
          [01] 2025-03-26 04:42:32 ...done
          [00] 2025-03-26 04:42:32 Finished backing up non-InnoDB tables and files
          [00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 257758688468141
          [00] 2025-03-26 04:42:32 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:33 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:34 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:35 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:37 Was only able to copy log from 257737152711197 to 257754432284012, not 257758688468141; try increasing innodb_log_file_size
          mariabackup: Stopping log copying thread[00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012
          {code}

          We've already tested increasing the innodb_log_file_size from original 90GB to current 220GB. This did not help with SST but the daily backup was able to finish.
          During the time we've ran the SSTs, the traffic was not high enough to fill the log.

          We've noticed that the process waints for

          {code:bash}
          [00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 25775868846814
          {code}

          to be read with a timout of 5 seconds, when it runs out the SST fails.

          We've tried to increase the timeout by adding

          {code:bash}
          [mariabackup]
          ftwrl-wait-timeout=60
          {code}

          But this did not change anything even after adding it to the \[SST\] section as well.

          Do you have any idea what could be causing this?
          Gajdos Jakub Gajdos made changes -
          Summary Galera SST Galera SST using mariabackup fails while transmitting data
          Gajdos Jakub Gajdos made changes -
          Description Hello,

          while trying to test solutions for https://jira.mariadb.org/browse/MDEV-35926 we've found another problem during the Galera SST process.

          For context, this is a galera cluster of 3 nodes with around 4TB of data and using mariabackup as SST method and daily cluster backup.

          When the SST process is started, it successfully starts to transmit data. However after a few hours passes, data transmission stops with following error in the mariabackup.backup.log:

          {code:bash}
          [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/report.frm to <STDOUT>
          [01] 2025-03-26 04:42:32 ...done
          [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/sms_statistic.frm to <STDOUT>
          [01] 2025-03-26 04:42:32 ...done
          [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/email_statistic_opened.frm to <STDOUT>
          [01] 2025-03-26 04:42:32 ...done
          [00] 2025-03-26 04:42:32 Finished backing up non-InnoDB tables and files
          [00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 257758688468141
          [00] 2025-03-26 04:42:32 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:33 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:34 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:35 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:37 Was only able to copy log from 257737152711197 to 257754432284012, not 257758688468141; try increasing innodb_log_file_size
          mariabackup: Stopping log copying thread[00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012
          {code}

          We've already tested increasing the innodb_log_file_size from original 90GB to current 220GB. This did not help with SST but the daily backup was able to finish.
          During the time we've ran the SSTs, the traffic was not high enough to fill the log.

          We've noticed that the process waints for

          {code:bash}
          [00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 25775868846814
          {code}

          to be read with a timout of 5 seconds, when it runs out the SST fails.

          We've tried to increase the timeout by adding

          {code:bash}
          [mariabackup]
          ftwrl-wait-timeout=60
          {code}

          But this did not change anything even after adding it to the \[SST\] section as well.

          Do you have any idea what could be causing this?
          Hello,

          while trying to test solutions for https://jira.mariadb.org/browse/MDEV-35926 we've found another problem during the Galera SST process.

          For context, this is a galera cluster of 3 nodes with around 4TB of data and using mariabackup as SST method and daily cluster backup.

          When the SST process is started, it successfully starts to transmit data. However after a few hours passes, data transmission stops with following error in the mariabackup.backup.log:

          {code:bash}
          [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/report.frm to <STDOUT>
          [01] 2025-03-26 04:42:32 ...done
          [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/sms_statistic.frm to <STDOUT>
          [01] 2025-03-26 04:42:32 ...done
          [01] 2025-03-26 04:42:32 Streaming ./app_user_5054672/email_statistic_opened.frm to <STDOUT>
          [01] 2025-03-26 04:42:32 ...done
          [00] 2025-03-26 04:42:32 Finished backing up non-InnoDB tables and files
          [00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 257758688468141
          [00] 2025-03-26 04:42:32 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:33 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:34 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:35 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012
          [00] 2025-03-26 04:42:37 Was only able to copy log from 257737152711197 to 257754432284012, not 257758688468141; try increasing innodb_log_file_size
          mariabackup: Stopping log copying thread[00] 2025-03-26 04:42:37 Retrying read of log at LSN=257754432284012
          {code}

          We've already tested increasing the innodb_log_file_size from original 90GB to current 220GB. This did not help with SST but the daily backup was able to finish.
          During the time we've ran the SSTs, the traffic was not high enough to fill the log.

          We've noticed that the process waits for

          {code:bash}
          [00] 2025-03-26 04:42:32 Waiting for log copy thread to read lsn 25775868846814
          {code}

          to be read with a timeout of 5 seconds, when it runs out the SST fails. Please notice that *lsn* it is waiting for is from the future

          We've tried to increase the timeout by adding

          {code:bash}
          [mariabackup]
          ftwrl-wait-timeout=60
          {code}

          But this did not change anything even after adding it to the \[SST\] section as well.

          Do you have any idea what could be causing this?
          serg Sergei Golubchik made changes -
          Assignee Jan Lindström [ JIRAUSER53125 ]
          serg Sergei Golubchik made changes -
          Affects Version/s 10.11 [ 27614 ]
          serg Sergei Golubchik made changes -
          Fix Version/s 10.11 [ 27614 ]

          People

            janlindstrom Jan Lindström
            Gajdos Jakub Gajdos
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.