Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26391

mariabackup always triggers node desync

Details

    Description

      This is most probably result of fixes related to MDEV-23080 as this behavior appeared after updating to version 10.4.21.

      We run galera cluster of 3 nodes. On every backup performed with mariabackup after update we get "Member desyncs itself from group" during backup final phase.

      Mariabackup is started with:

      mariabackup -u root -p PASSWORD --backup --galera-info --stream=xbstream --parallel 8 --use-memory=16G --socket=/var/run/mysqld/mysqld.sock --datadir=/var/lib/mysql 2>>/var/log/mariabackup_copy.log| /usr/bin/zstd --fast -T8 -q -o /home/mariabackup/backup.zst
      

      While creating the backup, after phase of streaming InnoDB data, phase of streaming non-InnoDB data comes, for example:
      mariabackup log:

      [00] 2021-08-17 02:54:31 Acquiring BACKUP LOCKS...
      [00] 2021-08-17 02:54:34 Starting to backup non-InnoDB tables and files
      ...
      [00] 2021-08-17 02:57:53 Finished backing up non-InnoDB tables and files
      [00] 2021-08-17 02:57:53 Waiting for log copy thread to read lsn 32481906458304
      ...
      [00] 2021-08-17 02:57:56 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
      [00] 2021-08-17 02:57:56 mariabackup: The latest check point (for incremental): '32481906568593'
      [00] 2021-08-17 02:57:56 Executing BACKUP STAGE END
      [00] 2021-08-17 02:57:56 All tables unlocked
      [00] 2021-08-17 02:57:56 Streaming ib_buffer_pool to <STDOUT>
      [00] 2021-08-17 02:57:56 Backup created in directory '/xtrabackup_backupfiles/'
      [00] 2021-08-17 02:57:56 MySQL binlog position: filename 'mariadb-bin.019684', position '421', GTID of the last change ''
      [00] 2021-08-17 02:57:56 Streaming backup-my.cnf
      [00] 2021-08-17 02:57:56 Streaming xtrabackup_info
      [00] 2021-08-17 02:57:56 Redo log (from LSN 32481655310193 to 32481906568602) was copied.
      [00] 2021-08-17 02:57:56 completed OK!
      

      Within this phase, .TRG, .PAR, .FRM and other metadata files are copied. But after last update node started to report self-desync immediately when backup comes to this phase.
      Node remains desynced until backup finished and then synchronizes with others.
      mysqld log:

      2021-08-17  2:54:34 18009831 [Note] WSREP: Desyncing and pausing the provider
      2021-08-17  2:54:34 0 [Note] WSREP: Member 0.0 (node2.localdomain) desyncs itself from group
      2021-08-17  2:54:34 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 3821784561)
      2021-08-17  2:54:34 18009831 [Note] WSREP: pause
      2021-08-17  2:54:34 18009831 [Note] WSREP: Provider paused at 0ca12340-****-****-****-******ed4dfb:3821784561 (30463042)
      2021-08-17  2:54:34 18009831 [Note] WSREP: Provider paused at: 3821784561
      2021-08-17  2:57:56 18009831 [Note] WSREP: Resuming and resyncing the provider
      2021-08-17  2:57:56 18009831 [Note] WSREP: resume
      2021-08-17  2:57:56 18009831 [Note] WSREP: resuming provider at 30463042
      2021-08-17  2:57:56 18009831 [Note] WSREP: Provider resumed.
      2021-08-17  2:57:56 0 [Note] WSREP: Member 0.0 (node2.localdomain) resyncs itself to group.
      2021-08-17  2:57:56 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 3821789518)
      2021-08-17  2:57:57 0 [Note] WSREP: Member 0.0 (node2.localdomain) synced with group.
      2021-08-17  2:57:57 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 3821789533)
      2021-08-17  2:57:57 24 [Note] WSREP: Server node2.localdomain synced with group
      

      Note desync event just after starting to stream non-InnoDB files!

      As dataset includes significant number of databases (just several thousands), phase of streaming non-InnoDB data can take several minutes, real lentgh of this desynced phase depends on number of tables the cluster handles, can be even longer than what we hit. For all this time the node remains desynced.

      There are two questions:

      • Is it really necessary to put node into Desynced state while streaming non-InnoDB data (considering the fact that WSREP only replicates InnoDB transactions) and
      • is there any safe workaround on this that wouldn't render backup into inconsistent set of data?

      We use mariabackup for long time already, but there was no such a behavior before.
      Have to say that this issue neutralizes main mariabackup advantage of being non-blocking and artificially decreases cluster availability. Can this be fixed without breaking previous fixes for MDEV-23080?

      Attachments

        1. bt_all.txt
          78 kB
        2. error.log
          79 kB

        Issue Links

          Activity

            euglorg Eugene created issue -
            euglorg Eugene made changes -
            Field Original Value New Value
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 124361 ] MariaDB v4 [ 143098 ]
            elenst Elena Stepanova made changes -
            Fix Version/s 10.4 [ 22408 ]
            Assignee Jan Lindström [ jplindst ]
            julien.fritsch Julien Fritsch made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Seppo Jaakola [ seppo ]
            valerii Valerii Kravchuk made changes -
            Priority Minor [ 4 ] Major [ 3 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            seppo Seppo Jaakola made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            seppo Seppo Jaakola made changes -
            Status In Progress [ 3 ] Needs Feedback [ 10501 ]
            valerii Valerii Kravchuk made changes -
            Status Needs Feedback [ 10501 ] Open [ 1 ]
            seppo Seppo Jaakola made changes -
            Status Open [ 1 ] Needs Feedback [ 10501 ]
            valerii Valerii Kravchuk made changes -
            Status Needs Feedback [ 10501 ] Open [ 1 ]
            seppo Seppo Jaakola made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            Affects Version/s 10.4.21 [ 26030 ]
            Environment Linux 5.10.58-gentoo x86_64 AMD EPYC 7451
            Issue Type Bug [ 1 ] Task [ 3 ]
            ccalender Chris Calender (Inactive) made changes -
            Assignee Seppo Jaakola [ seppo ] Jan Lindström [ jplindst ]
            ccalender Chris Calender (Inactive) made changes -
            Fix Version/s 10.4 [ 22408 ]
            jplindst Jan Lindström (Inactive) made changes -
            Fix Version/s 10.4 [ 22408 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status In Progress [ 3 ] In Testing [ 10301 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Ramesh Sivaraman [ JIRAUSER48189 ]
            ramesh Ramesh Sivaraman made changes -
            Assignee Ramesh Sivaraman [ JIRAUSER48189 ] Jan Lindström [ jplindst ]
            Status In Testing [ 10301 ] Stalled [ 10000 ]
            ramesh Ramesh Sivaraman made changes -
            Attachment bt_all.txt [ 65590 ]
            Attachment error.log [ 65591 ]
            ramesh Ramesh Sivaraman made changes -
            Assignee Jan Lindström [ jplindst ] Seppo Jaakola [ seppo ]
            seppo Seppo Jaakola made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            seppo Seppo Jaakola made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Seppo Jaakola [ seppo ] Jan Lindström [ jplindst ]
            jplindst Jan Lindström (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            issue.field.resolutiondate 2023-01-17 09:38:58.0 2023-01-17 09:38:58.972
            jplindst Jan Lindström (Inactive) made changes -
            Fix Version/s 10.6.12 [ 28513 ]
            Fix Version/s 10.7.8 [ 28515 ]
            Fix Version/s 10.9.5 [ 28519 ]
            Fix Version/s 10.10.3 [ 28521 ]
            Fix Version/s 10.4 [ 22408 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 180160 115217 162633

            People

              jplindst Jan Lindström (Inactive)
              euglorg Eugene
              Votes:
              8 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.