[MDEV-29069] ER_KEY_NOT_FOUND upon concurrent online auto-increment addition and DELETE Created: 2022-07-08  Updated: 2023-08-16  Resolved: 2023-03-29

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table
Affects Version/s: N/A
Fix Version/s: 11.2.1

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: online-ddl

Issue Links:
Problem/Incident
is caused by MDEV-16329 Engine-independent online ALTER TABLE Closed
Relates
relates to MDEV-28808 Test MDEV-16329 (ALTER ONLINE TABLE) ... Stalled

 Description   

--source include/have_debug_sync.inc
--source include/have_innodb.inc
 
create table t1 (a int) engine=InnoDB;
insert into t1 values (10),(20),(30);
--send
  set debug_sync= 'now wait_for downgraded';
 
--connect (con_alter,localhost,root,,test)
set debug_sync= 'alter_table_online_downgraded signal downgraded wait_for goforit';
--send
  alter table t1 add pk int auto_increment primary key, algorithm=copy, lock=none;
 
--connection default
--reap
delete from t1 where a = 20;
set debug_sync= 'now signal goforit';
 
--connection con_alter
--reap
select * from t1;
show create table t1;
 
# Cleanup
drop table t1;
set debug_sync= reset;

ALTER fails with:

bb-10.10-MDEV-16329 49ad87590

mysqltest: At line 20: query 'reap' failed: ER_KEY_NOT_FOUND (1032): Can't find record in 't1'

For a side note, in the client the error gets scrambled reliably (at least for me), like this:

MariaDB [test]> alter table t1 add pk int auto_increment primary key, algorithm=copy, lock=none;
ERROR 1032 (HY000): Can't find record in 't1'e done



 Comments   
Comment by Nikita Malyavin [ 2022-07-25 ]

Sergei, please review commits e2f8dff...52f489e, branch bb-10.10-ddl-nikita

Comment by Nikita Malyavin [ 2022-10-17 ]

I have updated the branch to 10.11, please see bb-10.11-ddl-nikita [github]

Sergei, in this new branch the commits you have already reviewed:

150f8747 MDEV-29069 follow-up: support partially usable keys
bc70105f MDEV-29069 follow-up: allow deterministic DEFAULTs
04678329 MDEV-29069 ER_KEY_NOT_FOUND on online autoinc addition + concurrent DELETE

New commits for review are c71d8e92...774a0bb9, namely:

55c376fd MDEV-29069 follow-up: improve DEFAULT rules
fa07e36b few rgi assertions. this can proof that rgi is always present
b97f88f2 MDEV-29069 follow-up optimize find_key
de7a5fc8 MDEV-29069 follow-up: fix replication with extra fields + tests
c71d8e92 rpl: check should go after defaults and vcols update

I have fixed a few bugs here. First, turned out that update_default_fields updates ALL default
fields that have no explicit values, and fields with index < master_columns are not marked for
write, which led to a crash (assertion).
This is fixed in de7a5fc8 fix replication with extra fields + tests, and also few other
post-review things are there, related to master_columns, record_compare() and get_usable_key_parts().

We have discussed with you the optimization to avoid key_parts traversal for each event
(and therefore each find_key() call). I have implemented it in {b97f88f2 optimize find_key}}.

Still I have a feeling that find_key now looks complicated, and I was looking for reasons
to pass RPL_TABLE_LIST directly, and return directly instead of Rpl_table_data.
I have placed a few assertion for rgi presence to prove that it can be safe, but not sure yet.

Finally, i have totally reworked a usability of a key part in{{55c376fd improve DEFAULT rules}}.
It looks complicated now, so here's the idea: it's not enough to have a deterministic default
value, as the underlying fields should also be either explicitly set, or in turn have
deterministic/explicit DEFAULT.
Same, btw, applies to virtual columns, and few tests are included to demonstrate this.

EDIT: Sorry, wrong branch, it's 10.11, not 10.10

Generated at Thu Feb 08 10:05:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.