Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31155

How to troubleshoot "operation not permitted" error in Galera cluster?

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.6.10
    • 10.6
    • Galera SST
    • None
    • RHEL7 on VMware

    Description

      Hi,

      Our databases are mostly Galera cluster - 2 db nodes + 1 arbitrator.
      There was a patching last year from MariDB 10.6.7 (with Galera 4-26-11) to 10.6.10 (with Galera 4-26-12).
      After the patching, when we restart one of DB node, there is a chance of hitting "operation not permitted" error in both donor & joiner. Then, joiner cannot join back cluster.
      We tried several ways but still could not figure out the root cause.

      • no SST log found in DB
      • switch from SST method from mariabackup to rsync
      • enable wsrep_debug=SERVER
        The last action in each incident is to restart (bootstrap) the remaining DB node first. Then, we can start the joinor to join back cluster. But this is a service downtime.

      This problem happened 4~5 times already out of 20 to 100 DB node restart.

      Any way to troubleshoot in next occurrence?

      We guess the problem is at donor node side. But since we need to resume the cluster, we restarted donor and cannot troubleshoot from donor at this moment. Only can troubleshoot in next occurrence.

      DB log of one case are uploaded:
      mariadb-error.log-node1-20230415
      mariadb-error.log-node2-20230415

      DB parameter file are uploaded:
      mariadb.cnf.node1
      mariadb.cnf.node2

      Regards,
      William Wong

      Attachments

        Activity

          danblack Daniel Black added a comment -

          The main error is in the node1 logs:

          2023-04-14 12:32:57 0 [ERROR] WSREP: Failed to read from: wsrep_sst_mariabackup --role 'donor' --address 't2vdbs-itcttass-ttastdb1-2-01-db:18303/xtrabackup_sst//1' --local-port 19307 --socket '/var/run/mariadb/.mariadb.itcttass_ttastdb1.socket' --progress 0 --datadir '/u01/mydata/itcttass_ttastdb1/' --defaults-file '/u01/mycnf/itcttass_ttastdb1/mariadb.cnf' --gtid 'c1062071-70d1-11eb-a95b-db537cbe8f2e:46173018' --gtid-domain-id 0 --binlog '/u01/myarch/itcttass_ttastdb1/mariadb-bin' --bypass --mysqld-args --defaults-file=/u01/mycnf/itcttass_ttastdb1/mariadb.cnf --wsrep_start_position=c1062071-70d1-11eb-a95b-db537cbe8f2e:45650007
          2023-04-14 12:32:57 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'donor' --address 't2vdbs-itcttass-ttastdb1-2-01-db:18303/xtrabackup_sst//1' --local-port 19307 --socket '/var/run/mariadb/.mariadb.itcttass_ttastdb1.socket' --progress 0 --datadir '/u01/mydata/itcttass_ttastdb1/' --defaults-file '/u01/mycnf/itcttass_ttastdb1/mariadb.cnf' --gtid 'c1062071-70d1-11eb-a95b-db537cbe8f2e:46173018' --gtid-domain-id 0 --binlog '/u01/myarch/itcttass_ttastdb1/mariadb-bin' --bypass --mysqld-args --defaults-file=/u01/mycnf/itcttass_ttastdb1/mariadb.cnf --wsrep_start_position=c1062071-70d1-11eb-a95b-db537cbe8f2e:45650007: 1 (Operation not permitted)
          

          The underlying case is that fgets failing to read. MariaDB could be more descriptive with the cause of this error with ferror on the first error message in sql/wsrep_sst.cc:sst_donor_thread

          The "Operations not permitted" is annoying galera fantasy of making up errno based on other info and pretending its the same
          https://github.com/MariaDB/server/blob/10.6/sql/wsrep_utils.cc#L399 which relates to the waitpid rather than the fgets.

          It seems likely the sst script started, and mariabackup too, and then terminated itself immediately.

          The joiner message of:

          12:32:57 0 [Warning] WSREP: 2.0 (t1vdbs-itcttass-ttastdb1-1-01): State transfer to 0.0 (t2vdbs-itcttass-ttastdb1-2-01) failed: -1 (Operation not permitted)
          

          The joiner is just unpacking the donors message in group_unserialize_code_msg.

          Are there the follow log files in the datadir?

          • mariabackup.prepare.log
          • mariabackup.move.log
          • mariabackup.backup.log

          Can you include these?

          danblack Daniel Black added a comment - The main error is in the node1 logs: 2023-04-14 12:32:57 0 [ERROR] WSREP: Failed to read from: wsrep_sst_mariabackup --role 'donor' --address 't2vdbs-itcttass-ttastdb1-2-01-db:18303/xtrabackup_sst//1' --local-port 19307 --socket '/var/run/mariadb/.mariadb.itcttass_ttastdb1.socket' --progress 0 --datadir '/u01/mydata/itcttass_ttastdb1/' --defaults-file '/u01/mycnf/itcttass_ttastdb1/mariadb.cnf' --gtid 'c1062071-70d1-11eb-a95b-db537cbe8f2e:46173018' --gtid-domain-id 0 --binlog '/u01/myarch/itcttass_ttastdb1/mariadb-bin' --bypass --mysqld-args --defaults-file=/u01/mycnf/itcttass_ttastdb1/mariadb.cnf --wsrep_start_position=c1062071-70d1-11eb-a95b-db537cbe8f2e:45650007 2023-04-14 12:32:57 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'donor' --address 't2vdbs-itcttass-ttastdb1-2-01-db:18303/xtrabackup_sst//1' --local-port 19307 --socket '/var/run/mariadb/.mariadb.itcttass_ttastdb1.socket' --progress 0 --datadir '/u01/mydata/itcttass_ttastdb1/' --defaults-file '/u01/mycnf/itcttass_ttastdb1/mariadb.cnf' --gtid 'c1062071-70d1-11eb-a95b-db537cbe8f2e:46173018' --gtid-domain-id 0 --binlog '/u01/myarch/itcttass_ttastdb1/mariadb-bin' --bypass --mysqld-args --defaults-file=/u01/mycnf/itcttass_ttastdb1/mariadb.cnf --wsrep_start_position=c1062071-70d1-11eb-a95b-db537cbe8f2e:45650007: 1 (Operation not permitted) The underlying case is that fgets failing to read. MariaDB could be more descriptive with the cause of this error with ferror on the first error message in sql/wsrep_sst.cc:sst_donor_thread The "Operations not permitted" is annoying galera fantasy of making up errno based on other info and pretending its the same https://github.com/MariaDB/server/blob/10.6/sql/wsrep_utils.cc#L399 which relates to the waitpid rather than the fgets. It seems likely the sst script started, and mariabackup too, and then terminated itself immediately. The joiner message of: 12:32:57 0 [Warning] WSREP: 2.0 (t1vdbs-itcttass-ttastdb1-1-01): State transfer to 0.0 (t2vdbs-itcttass-ttastdb1-2-01) failed: -1 (Operation not permitted) The joiner is just unpacking the donors message in group_unserialize_code_msg. Are there the follow log files in the datadir? mariabackup.prepare.log mariabackup.move.log mariabackup.backup.log Can you include these?
          frelist William Wong added a comment -

          Thanks @daniel

          Agree the root cause should be at donor side.

          We could not find mariabackup log file in datadir of donor node at issue time in past several incidents.

          frelist William Wong added a comment - Thanks @daniel Agree the root cause should be at donor side. We could not find mariabackup log file in datadir of donor node at issue time in past several incidents.
          danblack Daniel Black added a comment -

          Alternately look at attempting a mariabackup outside of SST following similar options and see if it succeeds.

          danblack Daniel Black added a comment - Alternately look at attempting a mariabackup outside of SST following similar options and see if it succeeds.
          frelist William Wong added a comment -

          Below is found in node 1 DB log, run wsrep_sst_mariabackup in next occurrence? Cannot find mariabackup alone command in DB log.

          2023-04-14 12:32:57 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'donor' --address 't2vdbs-itcttass-ttastdb1-2-01-db:18303/xtrabackup_sst//1' --local-port 19307 --socket '/var/run/mariadb/.mariadb.itcttass_ttastdb1.socket' --progress 0 --datadir '/u01/mydata/itcttass_ttastdb1/' --defaults-file '/u01/mycnf/itcttass_ttastdb1/mariadb.cnf' --gtid 'c1062071-70d1-11eb-a95b-db537cbe8f2e:46173018' --gtid-domain-id 0 --binlog '/u01/myarch/itcttass_ttastdb1/mariadb-bin' --bypass --mysqld-args --defaults-file=/u01/mycnf/itcttass_ttastdb1/mariadb.cnf --wsrep_start_position=c1062071-70d1-11eb-a95b-db537cbe8f2e:45650007'
          

          frelist William Wong added a comment - Below is found in node 1 DB log, run wsrep_sst_mariabackup in next occurrence? Cannot find mariabackup alone command in DB log. 2023-04-14 12:32:57 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'donor' --address 't2vdbs-itcttass-ttastdb1-2-01-db:18303/xtrabackup_sst//1' --local-port 19307 --socket '/var/run/mariadb/.mariadb.itcttass_ttastdb1.socket' --progress 0 --datadir '/u01/mydata/itcttass_ttastdb1/' --defaults-file '/u01/mycnf/itcttass_ttastdb1/mariadb.cnf' --gtid 'c1062071-70d1-11eb-a95b-db537cbe8f2e:46173018' --gtid-domain-id 0 --binlog '/u01/myarch/itcttass_ttastdb1/mariadb-bin' --bypass --mysqld-args --defaults-file=/u01/mycnf/itcttass_ttastdb1/mariadb.cnf --wsrep_start_position=c1062071-70d1-11eb-a95b-db537cbe8f2e:45650007'

          People

            Unassigned Unassigned
            frelist William Wong
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.