Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23586

Mariabackup: GTID saved for replication in 10.4.14 is wrong

Details

    Description

      It appears that the fix for https://jira.mariadb.org/browse/MDEV-21953 has introduced a serious regression and now any backup taken with MariaDB server 10.4.14 and above saves the wrong GTID.

      To reproduce, take a mariabackup from a 10.4.14 instance, then notice how the reported position does refer to a GTID which is not proper. In the example below, the PIT corresponds to position 25035029, but the reported GTID 0-1-18691 is wrong and if used, yields several duplicate keys errors. In fact the correct GTID for that position would be 0-1-18699 as reported by BINLOG_GTID_POS() function.

      This wrong GTID is saved to the xtrabackup_slave_info and since the position is no longer saved there (WHY???) the resulting backup does not carry a valid PIT information anymore.
      This is hitting a customer of ours and makes all backups inconsistent due to wrong PIT info, hence the critical priority.

      Thanks to aponzo who found and verified this issue.

      Workaround: downgrade server to 10.4.13

      [00] 2020-08-26 12:25:52 Backup created in directory '/root/backup/andre2/xtrabackup_backupfiles/'
      [00] 2020-08-26 12:25:52 MySQL binlog position: filename 'm2-bin.000007', position '25035029', GTID of the last change '0-1-18691'
      [00] 2020-08-26 12:25:52 MySQL slave binlog position: master host '192.168.56.101', gtid_slave_pos 0-1-18691
      [00] 2020-08-26 12:25:52 Streaming backup-my.cnf
      [00] 2020-08-26 12:25:52         ...done
      [00] 2020-08-26 12:25:52 Streaming xtrabackup_info
      [00] 2020-08-26 12:25:52         ...done
      [00] 2020-08-26 12:25:52 Redo log (from LSN 29523906 to 31884588) was copied.
      [00] 2020-08-26 12:25:52 completed OK!
      prepare
      2020-08-26 12:26:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=30269374
      2020-08-26 12:26:53 0 [Note] InnoDB: Starting final batch to recover 286 pages from redo log.
      2020-08-26 12:26:53 0 [Note] InnoDB: Last binlog file './m2-bin.000007', position 25035029
      [00] 2020-08-26 12:26:54 Last binlog file ./m2-bin.000007, position 25035029
      [00] 2020-08-26 12:26:54 completed OK!
      MariaDB [test]> SELECT BINLOG_GTID_POS("m2-bin.000007",25035029 );
      +--------------------------------------------+
      | BINLOG_GTID_POS("m2-bin.000007",25035029 ) |
      +--------------------------------------------+
      | 0-1-18699                                  |
      +--------------------------------------------+
      1 row in set (0.133 sec)
      

      Attachments

        Issue Links

          Activity

            rpizzi Rick Pizzi (Inactive) created issue -
            rpizzi Rick Pizzi (Inactive) made changes -
            Field Original Value New Value
            rpizzi Rick Pizzi (Inactive) made changes -
            Summary GTID saved for replication in 10.4.14 is wrong Mariabackup: GTID saved for replication in 10.4.14 is wrong
            julien.fritsch Julien Fritsch made changes -
            Assignee Michael Widenius [ monty ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.4 [ 22408 ]
            rpizzi Rick Pizzi (Inactive) added a comment - - edited

            From further testing:

            • problem hits with slave_parallel_threads = 0 too
            • it appears the drift between incorrect and correct GTID == number of parallel threads (with no parallel threads there is only one thread so it is == 1)
            rpizzi Rick Pizzi (Inactive) added a comment - - edited From further testing: problem hits with slave_parallel_threads = 0 too it appears the drift between incorrect and correct GTID == number of parallel threads (with no parallel threads there is only one thread so it is == 1)

            In the example below, the PIT corresponds to position 25035029, but the reported GTID 0-1-18691 is wrong and if used, yields several duplicate keys errors. In fact the correct GTID for that position would be 0-1-18699 as reported by BINLOG_GTID_POS() function.

            Are you using the position printed by mariabackup --prepare to validate the correctness of the position in xtrabackup_binlog_info? i.e. this position?:

            2020-08-26 12:26:53 0 [Note] InnoDB: Last binlog file './m2-bin.000007', position 25035029
            [00] 2020-08-26 12:26:54 Last binlog file ./m2-bin.000007, position 25035029
            

            If so, then I want to point out that the position printed by mariabackup --prepare is not guaranteed to be consistent, so it is not a good idea to use this position to validate the position in xtrabackup_binlog_info. In fact, the position printed by mariabackup --prepare should never be used at all. For more information about that, see MDEV-21611.

            Only the positions in xtrabackup_binlog_info and xtrabackup_slave_info are currently guaranteed to be consistent by Mariabackup.

            In other words, it is totally normal and expected if the position saved in xtrabackup_binlog_info is different from the position printed by Mariabackup, because only the former position is supposed to be consistent.

            Is your position in xtrabackup_binlog_info actually incorrect? I don't know, so I can't comment on that. I just want to point out that it is unsafe to use the position printed by mariabackup --prepare to make any judgement about the correctness of the position in xtrabackup_binlog_info.

            GeoffMontee Geoff Montee (Inactive) added a comment - In the example below, the PIT corresponds to position 25035029, but the reported GTID 0-1-18691 is wrong and if used, yields several duplicate keys errors. In fact the correct GTID for that position would be 0-1-18699 as reported by BINLOG_GTID_POS() function. Are you using the position printed by mariabackup --prepare to validate the correctness of the position in xtrabackup_binlog_info ? i.e. this position?: 2020-08-26 12:26:53 0 [Note] InnoDB: Last binlog file './m2-bin.000007', position 25035029 [00] 2020-08-26 12:26:54 Last binlog file ./m2-bin.000007, position 25035029 If so, then I want to point out that the position printed by mariabackup --prepare is not guaranteed to be consistent, so it is not a good idea to use this position to validate the position in xtrabackup_binlog_info . In fact, the position printed by mariabackup --prepare should never be used at all. For more information about that, see MDEV-21611 . Only the positions in xtrabackup_binlog_info and xtrabackup_slave_info are currently guaranteed to be consistent by Mariabackup. In other words, it is totally normal and expected if the position saved in xtrabackup_binlog_info is different from the position printed by Mariabackup, because only the former position is supposed to be consistent. Is your position in xtrabackup_binlog_info actually incorrect? I don't know, so I can't comment on that. I just want to point out that it is unsafe to use the position printed by mariabackup --prepare to make any judgement about the correctness of the position in xtrabackup_binlog_info .
            GeoffMontee Geoff Montee (Inactive) made changes -

            The printed position and the position saved in the file are identical.

            rpizzi Rick Pizzi (Inactive) added a comment - The printed position and the position saved in the file are identical.

            What is PIT? Point in time? (Binlog position would maybe a better term in that case)

            From what I understand from above, the printed binlog file and positions are correct.
            Only the GTID printed by mariabackup is wrong. Is this correct?

            monty Michael Widenius added a comment - What is PIT? Point in time? (Binlog position would maybe a better term in that case) From what I understand from above, the printed binlog file and positions are correct. Only the GTID printed by mariabackup is wrong. Is this correct?
            rpizzi Rick Pizzi (Inactive) added a comment - - edited

            Yes sorry I meant the the binlog position - which is correct, only GTID is wrong.

            The issue is, in the file xtrabackup_slave_info only the GTID is saved. I think it woud be best to go back to also printing the position...

            Rick

            rpizzi Rick Pizzi (Inactive) added a comment - - edited Yes sorry I meant the the binlog position - which is correct, only GTID is wrong. The issue is, in the file xtrabackup_slave_info only the GTID is saved. I think it woud be best to go back to also printing the position... Rick
            rpizzi Rick Pizzi (Inactive) made changes -
            Description It appears that the fix for https://jira.mariadb.org/browse/MDEV-21953 has introduced a serious regression and now any backup taken with MariaDB server 10.4.14 and above saves the wrong GTID.

            To reproduce, take a mariabackup from a 10.4.14 instance, then notice how the reported position does refer to a GTID which is not proper. In the example below, the PIT corresponds to position 25035029, but the reported GTID 0-1-18691 is wrong and if used, yields several duplicate keys errors. In fact the correct GTID for that position would be 0-1-18699 as reported by BINLOG_GTID_POS() function.

            This wrong GTID is saved to the xtrabackup_binlog_info and since the position is no longer saved there (WHY???) *the resulting backup does not carry a valid PIT information anymore*.
            This is hitting a customer of ours and makes all backups inconsistent due to wrong PIT info, hence the critical priority.

            Thanks to [~aponzo] who found and verified this issue.


            Workaround: downgrade server to 10.4.13

            {code}
            [00] 2020-08-26 12:25:52 Backup created in directory '/root/backup/andre2/xtrabackup_backupfiles/'
            [00] 2020-08-26 12:25:52 MySQL binlog position: filename 'm2-bin.000007', position '25035029', GTID of the last change '0-1-18691'
            [00] 2020-08-26 12:25:52 MySQL slave binlog position: master host '192.168.56.101', gtid_slave_pos 0-1-18691
            [00] 2020-08-26 12:25:52 Streaming backup-my.cnf
            [00] 2020-08-26 12:25:52 ...done
            [00] 2020-08-26 12:25:52 Streaming xtrabackup_info
            [00] 2020-08-26 12:25:52 ...done
            [00] 2020-08-26 12:25:52 Redo log (from LSN 29523906 to 31884588) was copied.
            [00] 2020-08-26 12:25:52 completed OK!
            prepare
            2020-08-26 12:26:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=30269374
            2020-08-26 12:26:53 0 [Note] InnoDB: Starting final batch to recover 286 pages from redo log.
            2020-08-26 12:26:53 0 [Note] InnoDB: Last binlog file './m2-bin.000007', position 25035029
            [00] 2020-08-26 12:26:54 Last binlog file ./m2-bin.000007, position 25035029
            [00] 2020-08-26 12:26:54 completed OK!
            MariaDB [test]> SELECT BINLOG_GTID_POS("m2-bin.000007",25035029 );
            +--------------------------------------------+
            | BINLOG_GTID_POS("m2-bin.000007",25035029 ) |
            +--------------------------------------------+
            | 0-1-18699 |
            +--------------------------------------------+
            1 row in set (0.133 sec)
            {code}
            It appears that the fix for https://jira.mariadb.org/browse/MDEV-21953 has introduced a serious regression and now any backup taken with MariaDB server 10.4.14 and above saves the wrong GTID.

            To reproduce, take a mariabackup from a 10.4.14 instance, then notice how the reported position does refer to a GTID which is not proper. In the example below, the PIT corresponds to position 25035029, but the reported GTID 0-1-18691 is wrong and if used, yields several duplicate keys errors. In fact the correct GTID for that position would be 0-1-18699 as reported by BINLOG_GTID_POS() function.

            This wrong GTID is saved to the xtrabackup_slave_info and since the position is no longer saved there (WHY???) *the resulting backup does not carry a valid PIT information anymore*.
            This is hitting a customer of ours and makes all backups inconsistent due to wrong PIT info, hence the critical priority.

            Thanks to [~aponzo] who found and verified this issue.


            Workaround: downgrade server to 10.4.13

            {code}
            [00] 2020-08-26 12:25:52 Backup created in directory '/root/backup/andre2/xtrabackup_backupfiles/'
            [00] 2020-08-26 12:25:52 MySQL binlog position: filename 'm2-bin.000007', position '25035029', GTID of the last change '0-1-18691'
            [00] 2020-08-26 12:25:52 MySQL slave binlog position: master host '192.168.56.101', gtid_slave_pos 0-1-18691
            [00] 2020-08-26 12:25:52 Streaming backup-my.cnf
            [00] 2020-08-26 12:25:52 ...done
            [00] 2020-08-26 12:25:52 Streaming xtrabackup_info
            [00] 2020-08-26 12:25:52 ...done
            [00] 2020-08-26 12:25:52 Redo log (from LSN 29523906 to 31884588) was copied.
            [00] 2020-08-26 12:25:52 completed OK!
            prepare
            2020-08-26 12:26:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=30269374
            2020-08-26 12:26:53 0 [Note] InnoDB: Starting final batch to recover 286 pages from redo log.
            2020-08-26 12:26:53 0 [Note] InnoDB: Last binlog file './m2-bin.000007', position 25035029
            [00] 2020-08-26 12:26:54 Last binlog file ./m2-bin.000007, position 25035029
            [00] 2020-08-26 12:26:54 completed OK!
            MariaDB [test]> SELECT BINLOG_GTID_POS("m2-bin.000007",25035029 );
            +--------------------------------------------+
            | BINLOG_GTID_POS("m2-bin.000007",25035029 ) |
            +--------------------------------------------+
            | 0-1-18699 |
            +--------------------------------------------+
            1 row in set (0.133 sec)
            {code}
            julien.fritsch Julien Fritsch made changes -
            Labels need_feedback
            rpizzi Rick Pizzi (Inactive) added a comment - - edited

            julien.fritsch A test case is more than trivial to build for this issue... just set up one master, one replica, have GTID enabled and parallel slave threads > 1, then take a mariabackup on the replica and verify that the saved GTID in the slave_info file does not correspond to the GTID of the saved position as reported by BINLOG_GTID_POS().
            You need some traffic flowing on master for this of course, we used sysbench.
            As an additional proof - if sysbench workload is inserting rows, using the recommended GTID to set up replication will result in duplicate keys errors.

            rpizzi Rick Pizzi (Inactive) added a comment - - edited julien.fritsch A test case is more than trivial to build for this issue... just set up one master, one replica, have GTID enabled and parallel slave threads > 1, then take a mariabackup on the replica and verify that the saved GTID in the slave_info file does not correspond to the GTID of the saved position as reported by BINLOG_GTID_POS(). You need some traffic flowing on master for this of course, we used sysbench. As an additional proof - if sysbench workload is inserting rows, using the recommended GTID to set up replication will result in duplicate keys errors.
            julien.fritsch Julien Fritsch made changes -
            Assignee Michael Widenius [ monty ] Vladislav Lesin [ vlad.lesin ]
            julien.fritsch Julien Fritsch made changes -
            Labels need_feedback

            Binlog position is gotten with SHOW MASTER STATUS in mariabackup. GTID of the last change is gtid_current_pos variable which is gotten with SHOW VARIABLES statement. (see write_binlog_info()). Both statements are executed under BACKUP STAGE BLOCK_COMMIT lock (if --no-lock option is not used).

            I failed to repeat it with mtr test. I have started replication with several slave threads and set watchpoints on variables which are used to count binlog position(mysql_bin_log.log_file.pos_in_file, mysql_bin_log.log_file.write_pos, mysql_bin_log.log_file.request_pos, see also MYSQL_BIN_LOG::raw_get_current_log() - the function which is used to count binlog position on SHOW MASTER STATUS execution). I also set breakpoints to function which is used to update gtid_current_pos(rpl_binlog_state::update_nolock()). Then I set BLOCK_COMMIT lock on slave and executed a bunch of insert's on master. There were not hits after BLOCK_COMMIT on slave.

            I am trying to repeat the bug with sysbench.

            vlad.lesin Vladislav Lesin added a comment - Binlog position is gotten with SHOW MASTER STATUS in mariabackup. GTID of the last change is gtid_current_pos variable which is gotten with SHOW VARIABLES statement. (see write_binlog_info()). Both statements are executed under BACKUP STAGE BLOCK_COMMIT lock (if --no-lock option is not used). I failed to repeat it with mtr test. I have started replication with several slave threads and set watchpoints on variables which are used to count binlog position(mysql_bin_log.log_file.pos_in_file, mysql_bin_log.log_file.write_pos, mysql_bin_log.log_file.request_pos, see also MYSQL_BIN_LOG::raw_get_current_log() - the function which is used to count binlog position on SHOW MASTER STATUS execution). I also set breakpoints to function which is used to update gtid_current_pos(rpl_binlog_state::update_nolock()). Then I set BLOCK_COMMIT lock on slave and executed a bunch of insert's on master. There were not hits after BLOCK_COMMIT on slave. I am trying to repeat the bug with sysbench.

            vlad.lesin let's explain this issue better:
            we have a Server A ( MariaDB version is not important)
            a server B ( Slave of A, mariadb version 10.4.14)
            we need to create a server C ( version 10.4.14) as Slave of A performing a mariabackup on B.
            during the backup the traffic is open on Master A
            At the end on the backup we need to use the file xtrabackup_slave_info to get the CHANGE MASTER in order to connect C as slave of A.

            and we have

            [root@m2 andre2]# cat xtrabackup_slave_info
            SET GLOBAL gtid_slave_pos = '0-1-18691';
            CHANGE MASTER TO master_use_gtid = slave_pos

            of course we will add the MASTER_HOST of A

            doing this we have duplicate errors because the above gtid_slave_pos is wrong.

            if we open the file
            [root@m2 andre2]# cat xtrabackup_binlog_info
            m2-bin.000007 25035029 0-1-18691

            binlog file and pos of server B
            and calculate its GTID for this position

            MariaDB [test]> SELECT BINLOG_GTID_POS("m2-bin.000007",25035029 );
            --------------------------------------------

            BINLOG_GTID_POS("m2-bin.000007",25035029 )

            --------------------------------------------

            0-1-18699

            --------------------------------------------

            it is different from that reported on xtrabackup_slave_info

            So, the MASTER_LOG_FILE and MASTER_LOG_POS are correct if we want to setup the server C as slave of B, but the GTID is not.

            the Sequence number difference is due to the number of parallel_workers used
            the difference is 1 for parallel workers 0 or 1
            and n for slave_parallel = n used in server B.

            Can you please report, on your test, the output of :
            cat xtrabackup_binlog_info
            SELECT BINLOG_GTID_POS(file,pos);

            thanks
            Andrea

            aponzo Andrea Ponzo (Inactive) added a comment - vlad.lesin let's explain this issue better: we have a Server A ( MariaDB version is not important) a server B ( Slave of A, mariadb version 10.4.14) we need to create a server C ( version 10.4.14) as Slave of A performing a mariabackup on B. during the backup the traffic is open on Master A At the end on the backup we need to use the file xtrabackup_slave_info to get the CHANGE MASTER in order to connect C as slave of A. and we have [root@m2 andre2] # cat xtrabackup_slave_info SET GLOBAL gtid_slave_pos = '0-1-18691'; CHANGE MASTER TO master_use_gtid = slave_pos of course we will add the MASTER_HOST of A doing this we have duplicate errors because the above gtid_slave_pos is wrong. if we open the file [root@m2 andre2] # cat xtrabackup_binlog_info m2-bin.000007 25035029 0-1-18691 binlog file and pos of server B and calculate its GTID for this position MariaDB [test] > SELECT BINLOG_GTID_POS("m2-bin.000007",25035029 ); -------------------------------------------- BINLOG_GTID_POS("m2-bin.000007",25035029 ) -------------------------------------------- 0-1-18699 -------------------------------------------- it is different from that reported on xtrabackup_slave_info So, the MASTER_LOG_FILE and MASTER_LOG_POS are correct if we want to setup the server C as slave of B, but the GTID is not. the Sequence number difference is due to the number of parallel_workers used the difference is 1 for parallel workers 0 or 1 and n for slave_parallel = n used in server B. Can you please report, on your test, the output of : cat xtrabackup_binlog_info SELECT BINLOG_GTID_POS(file,pos); thanks Andrea

            My current assumption, which I am trying to repeat in MTR test, is the following:

            Consider the following code:

            int ha_commit_trans(THD *thd, bool all)                                         
            {
            ...
              if (!is_real_trans)                                                           
              {                                                                             
                error= commit_one_phase_2(thd, all, trans, is_real_trans, rw_trans);        
                goto done;                                                                  
              }
            ...
              cookie= tc_log->log_and_order(thd, xid, all, need_prepare_ordered,            
                                            need_commit_ordered);
            ...
              error= commit_one_phase_2(thd, all, trans, is_real_trans, rw_trans) ? 2 : 0;  
            ... 
            }
            

            tc_log->log_and_order() flushes binlog and changes binlog position. commit_one_phase_2() acquires backup block commit lock. As we can see, in the case, when is_real_trans==true, binlog is flushed before block commit is acquired.

            So, there can be the following scenario:
            T2 - slave worker, have started transaction and executed some Write_rows events;
            T1 - connection from mariabackup, holds backup block commit lock;
            T2 - executes Xid_log_event in ha_commit_trans(), and as is_real_commit == true for Xid_log_event, it executes tc_log->log_and_order() first, i.e. flushes binlog and changes binlog position, but then it is blocked in commit_one_phase_2() because backup block commit lock is held by T1;
            T1 - executes 'SHOW MASTER STATUS' and 'SHOW VARIABLES LIKE "gtid_current_pos"', and receives binlog position, which is greater then gtid_current_pos.

            vlad.lesin Vladislav Lesin added a comment - My current assumption, which I am trying to repeat in MTR test, is the following: Consider the following code: int ha_commit_trans(THD *thd, bool all) { ... if (!is_real_trans) { error= commit_one_phase_2(thd, all, trans, is_real_trans, rw_trans); goto done; } ... cookie= tc_log->log_and_order(thd, xid, all, need_prepare_ordered, need_commit_ordered); ... error= commit_one_phase_2(thd, all, trans, is_real_trans, rw_trans) ? 2 : 0 ; ... } tc_log->log_and_order() flushes binlog and changes binlog position. commit_one_phase_2() acquires backup block commit lock. As we can see, in the case, when is_real_trans==true, binlog is flushed before block commit is acquired. So, there can be the following scenario: T2 - slave worker, have started transaction and executed some Write_rows events; T1 - connection from mariabackup, holds backup block commit lock; T2 - executes Xid_log_event in ha_commit_trans(), and as is_real_commit == true for Xid_log_event, it executes tc_log->log_and_order() first, i.e. flushes binlog and changes binlog position, but then it is blocked in commit_one_phase_2() because backup block commit lock is held by T1; T1 - executes 'SHOW MASTER STATUS' and 'SHOW VARIABLES LIKE "gtid_current_pos"', and receives binlog position, which is greater then gtid_current_pos.

            monty, I have just confirmed the above scenario with test case: https://github.com/MariaDB/server/commit/cbcb921bb83be8b84142dfa73583724c8af843a8.

            vlad.lesin Vladislav Lesin added a comment - monty , I have just confirmed the above scenario with test case: https://github.com/MariaDB/server/commit/cbcb921bb83be8b84142dfa73583724c8af843a8 .
            vlad.lesin Vladislav Lesin made changes -
            Assignee Vladislav Lesin [ vlad.lesin ] Michael Widenius [ monty ]
            vlad.lesin Vladislav Lesin added a comment - - edited

            I have just added one more test case to reproduce the scenario without replication: https://github.com/MariaDB/server/commit/f19ff5b68e70214c4d75a2165ce8f5ef8cd2739f. The problem is the same - binlog flushing in ha_commit_trans() is not covered by backup lock.

            vlad.lesin Vladislav Lesin added a comment - - edited I have just added one more test case to reproduce the scenario without replication: https://github.com/MariaDB/server/commit/f19ff5b68e70214c4d75a2165ce8f5ef8cd2739f . The problem is the same - binlog flushing in ha_commit_trans() is not covered by backup lock.
            monty Michael Widenius made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            Code to review is in bb-10.4-monty

            monty Michael Widenius added a comment - Code to review is in bb-10.4-monty
            monty Michael Widenius made changes -
            Assignee Michael Widenius [ monty ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            vlad.lesin Vladislav Lesin made changes -
            serg Sergei Golubchik made changes -
            Priority Critical [ 2 ] Blocker [ 1 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.5 [ 23123 ]

            Fixed by partly reverting MDEV-21953 to put back MDL_BACKUP_COMMIT locking
            before log_and_order.

            The original problem for MDEV-21953 was that while a thread was waiting in
            for another threads to commit in 'log_and_order', it had the
            MDL_BACKUP_COMMIT lock. The backup thread was waiting to get the
            MDL_BACKUP_WAIT_COMMIT lock, which blocks all new MDL_BACKUP_COMMIT locks.
            This causes a deadlock as the waited-for thread can never get past the
            MDL_BACKUP_COMMIT lock in ha_commit_trans.

            The main part of the bug fix is to release the MDL_BACKUP_COMMIT lock while
            a thread is waiting for other 'previous' threads to commit. This ensures
            that no transactional thread keeps MDL_BACKUP_COMMIT while waiting, which
            ensures that there are no deadlocks any more.

            monty Michael Widenius added a comment - Fixed by partly reverting MDEV-21953 to put back MDL_BACKUP_COMMIT locking before log_and_order. The original problem for MDEV-21953 was that while a thread was waiting in for another threads to commit in 'log_and_order', it had the MDL_BACKUP_COMMIT lock. The backup thread was waiting to get the MDL_BACKUP_WAIT_COMMIT lock, which blocks all new MDL_BACKUP_COMMIT locks. This causes a deadlock as the waited-for thread can never get past the MDL_BACKUP_COMMIT lock in ha_commit_trans. The main part of the bug fix is to release the MDL_BACKUP_COMMIT lock while a thread is waiting for other 'previous' threads to commit. This ensures that no transactional thread keeps MDL_BACKUP_COMMIT while waiting, which ensures that there are no deadlocks any more.
            monty Michael Widenius made changes -
            issue.field.resolutiondate 2020-09-25 14:22:50.0 2020-09-25 14:22:50.189
            monty Michael Widenius made changes -
            Fix Version/s 10.4.15 [ 24507 ]
            Fix Version/s 10.5.6 [ 24508 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Assignee Sergei Golubchik [ serg ] Michael Widenius [ monty ]
            Resolution Fixed [ 1 ]
            Status In Review [ 10002 ] Closed [ 6 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.5.7 [ 25019 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.5.6 [ 24508 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.4.16 [ 25020 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.4.15 [ 24507 ]
            elenst Elena Stepanova made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 112953 ] MariaDB v4 [ 158300 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 120782 129317

            People

              monty Michael Widenius
              rpizzi Rick Pizzi (Inactive)
              Votes:
              4 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.