Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7903

xtrabackup SST failing with maria-10.0-galera

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.0.16-galera
    • 10.0.20-galera
    • Galera SST
    • None
    • prod

    Description

      Performing an SST using xtrabackup-v2 methods
      fails with following errors:

       
      150330 11:09:37 [ERROR] I/O error reading the header from the binary log, errno=175, io cache code=0
      150330 11:09:37 [ERROR] I/O error reading the header from the binary log
      150330 11:09:37 [ERROR] Can't init tc log
      150330 11:09:37 [ERROR] Aborting

      It creates some corrupt binary log file on the joiner node

      root@con:~# ls -lh /data/mysql/binlog/
      total 4.0K
      -rw-rw---- 1 mysql mysql 32 Mar 30 11:09 binlog.index
      -rw-rw---- 1 mysql mysql  0 Mar 30 11:09 binlog.state

      This issue is repeatable, set log_bin to non-default location on all the nodes e.g. we have set it to

      log_bin=/data/mysql/binlog/binlog

      Work around was to change log_bin location to its default location on all the nodes i.e.

      log_bin=binlog

      SST using xtrabackup-v2 on a joiner node would be a success.

      Attachments

        Issue Links

          Activity

            A part of the fix depends on MDEV-7110 as PXB uses log_bin_basename to determine binlog directory.

            nirbhay_c Nirbhay Choubey (Inactive) added a comment - A part of the fix depends on MDEV-7110 as PXB uses log_bin_basename to determine binlog directory.
            doliver3 David Oliver added a comment -

            The workaround only worked for me if I used the exact name and also commented out the log_bin_index like this. For each test I simply bootstrap the first node then start the second and if it didn't fail then I would restart the second again. It failed with the name mariadb-bin and whenever the log_bin_index was specified even when the name was provided without a path as just the filename like log_bin_index = mariadb-bin.index. WSREP_SST cleaning datadir failed in these cases where the log_bin_index name was set. It would be good to confirm that whatever fix is provided also works for the WSREP_SST cleaning step. I'm testing on Ubuntu 14.10 with latest MariaDB Galera packages.

            WORKAROUND THAT WORKED:
            in /etc/mysql/my.cnf
            log_bin=binlog
            #log_bin_index = /var/log/mysql/mariadb-bin.index

            FAILED SCENARIO 2:
            in /etc/mysql/conf.d/galera.conf
            log_bin = /var/log/mysql/mariadb-bin
            log_bin_index = /var/log/mysql/mariadb-bin.index

            worked initially then on restart of the joiner node, it showed the error already described in this issue

            FAILED SCENARIO 3:
            in /etc/mysql/conf.d/galera.conf
            log_bin = mariadb-bin
            log_bin_index = mariadb-bin.index

            150503 15:49:57 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
            150503 15:49:57 [Note] WSREP: Requesting state transfer: success, donor: 1
            WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20150503 15:49:57.377)
            WSREP_SST: [INFO] Proceeding with SST (20150503 15:49:57.879)
            WSREP_SST: [INFO] Cleaning the existing datadir (20150503 15:49:57.882)
            removed '/var/lib/mysql/gvwstate.dat'
            WSREP_SST: [INFO] Cleaning the binlog directory /var/log/mysql
            . as well (20150503 15:49:57.903)
            find: paths must precede expression: ./mariadb-bin\.[0-9]+$
            Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
            WSREP_SST: [ERROR] Cleanup after exit with status:1 (20150503 15:49:57.909)
            WSREP_SST: [INFO] Removing the sst_in_progress file (20150503 15:49:57.913)
            150503 15:49:57 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.167.8' --auth 'xtra:blankPencils10' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '1533' --binlog 'mariadb-bin' : 1 (Operation not permitted)
            150503 15:49:57 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
            150503 15:49:57 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
            150503 15:49:57 [ERROR] Aborting

            FAILED SCENARIO 3:
            in /etc/mysql/conf.d/galera.conf
            log_bin = /var/lib/mysql/mariadb-bin
            log_bin_index = /var/lib/mysql/mariadb-bin.index

            WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20150503 16:13:08.917)
            WSREP_SST: [INFO] Proceeding with SST (20150503 16:13:09.456)
            WSREP_SST: [INFO] Cleaning the existing datadir (20150503 16:13:09.462)
            removed '/var/lib/mysql/gvwstate.dat'
            WSREP_SST: [INFO] Cleaning the binlog directory /var/log/mysql
            /var/lib/mysql as well (20150503 16:13:09.488)
            find: paths must precede expression: /var/lib/mysql/mariadb-bin\.[0-9]+$
            Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
            WSREP_SST: [ERROR] Cleanup after exit with status:1 (20150503 16:13:09.494)
            WSREP_SST: [INFO] Removing the sst_in_progress file (20150503 16:13:09.498)
            150503 16:13:09 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.167.8' --auth 'xtra:blankPencils10' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '17625' --binlog '/var/lib/mysql/mariadb-bin' : 1 (Operation not permitted)
            150503 16:13:09 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
            150503 16:13:09 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
            150503 16:13:09 [ERROR] Aborting

            doliver3 David Oliver added a comment - The workaround only worked for me if I used the exact name and also commented out the log_bin_index like this. For each test I simply bootstrap the first node then start the second and if it didn't fail then I would restart the second again. It failed with the name mariadb-bin and whenever the log_bin_index was specified even when the name was provided without a path as just the filename like log_bin_index = mariadb-bin.index. WSREP_SST cleaning datadir failed in these cases where the log_bin_index name was set. It would be good to confirm that whatever fix is provided also works for the WSREP_SST cleaning step. I'm testing on Ubuntu 14.10 with latest MariaDB Galera packages. WORKAROUND THAT WORKED: in /etc/mysql/my.cnf log_bin=binlog #log_bin_index = /var/log/mysql/mariadb-bin.index FAILED SCENARIO 2: in /etc/mysql/conf.d/galera.conf log_bin = /var/log/mysql/mariadb-bin log_bin_index = /var/log/mysql/mariadb-bin.index worked initially then on restart of the joiner node, it showed the error already described in this issue FAILED SCENARIO 3: in /etc/mysql/conf.d/galera.conf log_bin = mariadb-bin log_bin_index = mariadb-bin.index 150503 15:49:57 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0) 150503 15:49:57 [Note] WSREP: Requesting state transfer: success, donor: 1 WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS [@] } ) (20150503 15:49:57.377) WSREP_SST: [INFO] Proceeding with SST (20150503 15:49:57.879) WSREP_SST: [INFO] Cleaning the existing datadir (20150503 15:49:57.882) removed '/var/lib/mysql/gvwstate.dat' WSREP_SST: [INFO] Cleaning the binlog directory /var/log/mysql . as well (20150503 15:49:57.903) find: paths must precede expression: ./mariadb-bin\. [0-9] +$ Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression] WSREP_SST: [ERROR] Cleanup after exit with status:1 (20150503 15:49:57.909) WSREP_SST: [INFO] Removing the sst_in_progress file (20150503 15:49:57.913) 150503 15:49:57 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.167.8' --auth 'xtra:blankPencils10' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '1533' --binlog 'mariadb-bin' : 1 (Operation not permitted) 150503 15:49:57 [ERROR] WSREP: Failed to read uuid:seqno from joiner script. 150503 15:49:57 [ERROR] WSREP: SST failed: 1 (Operation not permitted) 150503 15:49:57 [ERROR] Aborting FAILED SCENARIO 3: in /etc/mysql/conf.d/galera.conf log_bin = /var/lib/mysql/mariadb-bin log_bin_index = /var/lib/mysql/mariadb-bin.index WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS [@] } ) (20150503 16:13:08.917) WSREP_SST: [INFO] Proceeding with SST (20150503 16:13:09.456) WSREP_SST: [INFO] Cleaning the existing datadir (20150503 16:13:09.462) removed '/var/lib/mysql/gvwstate.dat' WSREP_SST: [INFO] Cleaning the binlog directory /var/log/mysql /var/lib/mysql as well (20150503 16:13:09.488) find: paths must precede expression: /var/lib/mysql/mariadb-bin\. [0-9] +$ Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression] WSREP_SST: [ERROR] Cleanup after exit with status:1 (20150503 16:13:09.494) WSREP_SST: [INFO] Removing the sst_in_progress file (20150503 16:13:09.498) 150503 16:13:09 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.167.8' --auth 'xtra:blankPencils10' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '17625' --binlog '/var/lib/mysql/mariadb-bin' : 1 (Operation not permitted) 150503 16:13:09 [ERROR] WSREP: Failed to read uuid:seqno from joiner script. 150503 16:13:09 [ERROR] WSREP: SST failed: 1 (Operation not permitted) 150503 16:13:09 [ERROR] Aborting

            Hi,

            problem comes from following snippet in xtrabackup-v2:

            /usr/bin/xtrabackup-v2

            873             for bfiles in $(ls -1 ${BINLOG_FILENAME}.*);do
            874                 echo ${BINLOG_DIRNAME}/${bfiles} >> ${BINLOG_FILENAME}.index
            875             done

            When commenting this code, the node is starting correctly.

            I think regression comes from https://mariadb.atlassian.net/browse/MDEV-6391 patch, which was included in 10.0.17.

            I am not sure if we have access to xtrabackup-v2 code. Only binary logs (files ending with numbers) should be put in the index file:

            Proposed "Fix":

            /usr/bin/xtrabackup-v2

            873             for bfiles in $(ls -1 ${BINLOG_FILENAME}.[0-9]+);do
            874                 echo ${BINLOG_DIRNAME}/${bfiles} >> ${BINLOG_FILENAME}.index
            875             done

            Joffrey

            joffrey Joffrey MICHAIE (Inactive) added a comment - Hi, problem comes from following snippet in xtrabackup-v2: /usr/bin/xtrabackup-v2 873 for bfiles in $(ls -1 ${BINLOG_FILENAME}.*);do 874 echo ${BINLOG_DIRNAME}/${bfiles} >> ${BINLOG_FILENAME}.index 875 done When commenting this code, the node is starting correctly. I think regression comes from https://mariadb.atlassian.net/browse/MDEV-6391 patch, which was included in 10.0.17. I am not sure if we have access to xtrabackup-v2 code. Only binary logs (files ending with numbers) should be put in the index file: Proposed "Fix": /usr/bin/xtrabackup-v2 873 for bfiles in $(ls -1 ${BINLOG_FILENAME}.[0-9]+);do 874 echo ${BINLOG_DIRNAME}/${bfiles} >> ${BINLOG_FILENAME}.index 875 done Joffrey

            joffrey I can't reproduce the scenario where state file ends up in index file on joiner node.
            The reason possibly being mariadb deletes state file during startup and later recreates it
            on normal shutdown to recover GTID state. So, during SST the state file should not exist
            on donor node in first place.

            This issue, however, seem related to the missing log_bin_basename variable in 10.0.
            PXB uses log_bin_basename to determine binary log's base directory.

            ^Gxbstream: Can't get stat of './mysql.000003' (Errcode: 2 - No such file or directory)
            innobackupex: 'xbstream -c' returned with exit code 1.
            innobackupex: Ignoring nonexistent file '/home/nirbhay/project/git-repo/maria/10.0-galera/install/data1//mysql.000003'.

            https://github.com/percona/percona-xtrabackup/blob/2.2/storage/innobase/xtrabackup/innobackupex.pl#L3225,L3233

            Without this variable, binary log directory outside data directory will simply not work.

            BTW, I have already ported these missing variables to 10.1 (MDEV-7110). But I am not
            sure if its safe to backport it to 10.0 given its GA.

            nirbhay_c Nirbhay Choubey (Inactive) added a comment - joffrey I can't reproduce the scenario where state file ends up in index file on joiner node. The reason possibly being mariadb deletes state file during startup and later recreates it on normal shutdown to recover GTID state. So, during SST the state file should not exist on donor node in first place. This issue, however, seem related to the missing log_bin_basename variable in 10.0. PXB uses log_bin_basename to determine binary log's base directory. ^Gxbstream: Can't get stat of './mysql.000003' (Errcode: 2 - No such file or directory) innobackupex: 'xbstream -c' returned with exit code 1. innobackupex: Ignoring nonexistent file '/home/nirbhay/project/git-repo/maria/10.0-galera/install/data1//mysql.000003'. https://github.com/percona/percona-xtrabackup/blob/2.2/storage/innobase/xtrabackup/innobackupex.pl#L3225,L3233 Without this variable, binary log directory outside data directory will simply not work. BTW, I have already ported these missing variables to 10.1 ( MDEV-7110 ). But I am not sure if its safe to backport it to 10.0 given its GA.

            Ok, so its the state file left on the joiner node that's causing the problem.

            nirbhay_c Nirbhay Choubey (Inactive) added a comment - Ok, so its the state file left on the joiner node that's causing the problem.
            nirbhay_c Nirbhay Choubey (Inactive) added a comment - https://github.com/MariaDB/server/commit/5467b12dc3894c36eba092430a785806e0d008c9

            People

              nirbhay_c Nirbhay Choubey (Inactive)
              aftab.khan aftab khan
              Votes:
              5 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.