Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.5.16
Description
Affects/is observed on 821808c45dd
Transaction that update,insert,delete a distinctive single record specified with a PK value
should not have any gap lock. E.g such one as call foo1(); where the table and procedure
are defined as follows
CREATE TABLE `t2_text` ( |
`a` int(11) NOT NULL, |
`b` int(11) DEFAULT NULL, |
`c` text DEFAULT NULL, |
PRIMARY KEY (`a`) |
) ENGINE=InnoDB
|
|
|
create procedure foo1() |
begin
|
declare av int; |
declare ni int; |
declare mk int; |
declare u0,u1,d int; |
declare skip_max int; |
|
set av = ceil(rand()*100); |
set ni = ceil(rand()*10); |
set skip_max = 2; |
set d = av + mod(ceil(rand()*100), skip_max*ni); |
set u0 = av + mod(ceil(rand()*100), skip_max*ni); |
set u1 = av + mod(ceil(rand()*100), skip_max*ni); |
|
while (ni > 0) do
|
set mk = mod(ceil(rand()*100),4); |
replace into t2_text values (av, av, repeat('a', mk*1024)); |
set ni = ni - 1; |
set av = av + 1 + mod(ceil(rand()*100), skip_max); |
end while; |
delete from t2_text where a = d; |
update t2_text set a=u1,b=u1 where a = u0; |
end| |
delimiter ;
|
However a X lock on Supremum record appears when the above `call foo1()` runs concurrently, like
--connection one
|
--send \
|
select sleep(0.1);xa start '1'; call foo1(); xa end '1'; xa prepare '1'; select sleep(0.05); xa commit '1'; select sleep(0.1); |
|
--connection two
|
--send \
|
select sleep(0.1);xa start '2'; call foo1(); xa end '2'; xa prepare '2'; select sleep(0.05); xa commit '2'; select sleep(0.1); |
|
so an assert, see the diff file attached, fires after few repeats.
It also survives MDEV-26682 commit's action to clear the user XA out of gap locks.
The latter, as it was in MDEV-26682 context, may have as a consequence seemingly non-conflicting XA transactions to become actually conflicting.
when they replayed on slave.
The attached diff suggests how to fix this though its idea is merely to work around xa replication issues, and not possible
isolation ones (when the presence of X on supremum indeed 'unexpected' and can't be justified).
(Edited to remove unnecessary Unique constraint from `b`)
Attachments
Issue Links
- is blocked by
-
MDEV-29575 Access to innodb_trx, innodb_locks and innodb_lock_waits along with detached XA's can cause SIGSEGV
-
- Closed
-
- relates to
-
MDEV-30165 X-lock on supremum for prepared transaction for RR
-
- Closed
-
-
MDEV-17814 Server crashes in is_current_stmt_binlog_format_row
-
- Open
-
-
MDEV-26682 slave lock timeout with xa and gap locks
-
- Closed
-
Some thoughts about testing. We need some tool to test isolation levels to prevent the cases like
MDEV-27025, when we released the fix, but it causedMDEV-27992, which was reported after the release.Marko gave me some hint for possible testing method: https://aphyr.com/posts/327-call-me-maybe-mariadb-galera-cluster. The general idea is to generate some operations on some limited amount of rows to have some concurrency, and then check data consistency.
This method does not work for all isolation levels, and even does not work for some kinds of queries. For example, here was shown, that the initial test was expected to cause data inconsistency for single-node RR level. Even if some transaction works in RR level, locking reads and writes work in RC mode.
But it's still can be used for some particular cases, for example Serializable testing, and this test can be a starting point for it.
One more interesting project is here. This can also be used as a starting point, it can give the idea what to test, but the initial tests does not involve some concurrency related operations, like pages splitting or purging.