[MDEV-26031] unnessary xid logging in one phase commit case Created: 2021-06-28  Updated: 2022-08-15  Resolved: 2021-06-29

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.6.3

Type: Bug Priority: Blocker
Reporter: Andrei Elkin Assignee: Andrei Elkin
Resolution: Fixed Votes: 0
Labels: None


 Description   

The bug was originally observed as hanging binlog background thread at shutdown similar to one of MDEV-21120:

#14 MYSQL_BIN_LOG::stop_background_thread (this=0x55660e6b9ba0 <mysql_bin_log>) at /data/Server/10.6D/sql/log.cc:3411
#15 0x000055660af0ff8e in close_connections () at /data/Server/10.6D/sql/mysqld.cc:1720
#16 0x000055660af215bc in mysqld_main (argc=44, argv=<optimized out>) at /data/Server/10.6D/sql/mysqld.cc:5839

The hang suggested a missed unlogging of a xid or signal notification to the thread loss.

It turns out the former is the case.
MDEV-21117 commit reveals an in-born two defects in MYSQL_BIN_LOG::write_transaction_to_binlog 's loop that marks event groups
with the need of explicit xid unlogging:
(1) the loop never expected to start from already
reset ha_info (which is the one phase commit case that does not need the unlogging) as well as
(2) had a logical flaw
in its continuatio... condition to break after the first iteration snubbing any
further ha_info in the list even if they might represent commit_checkpoint_request incapable engines - which would meant to mark the group which may not have happen on the first iteration.

I set to fix starting from 10.2 though 10.6 is the most vulnerable due to (1) - the loop marks groups that should not be.

Thanks to elenst, alice and marko who helped to identify it!



 Comments   
Comment by Elena Stepanova [ 2021-06-28 ]

Setting to a blocker as a big part of the problem is a regression highly visible at least in concurrent tests on 10.6.

Comment by Andrei Elkin [ 2021-06-28 ]

I also asked serg to review on slack.

Comment by Sujatha Sivakumar (Inactive) [ 2021-06-29 ]

Hello Andrei,

Thanks for working on this issue. Changes look good to me.

Comment by Andrei Elkin [ 2021-06-29 ]

390014781b6 pushed to 10.6. 10.2-10.5 vulnerability has much lesser chances and will be addressed with backporting the 10.6 patch.

Generated at Thu Feb 08 09:42:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.