[MDEV-31058] ER_KEY_NOT_FOUND upon concurrent CHANGE column to autoinc and DML Created: 2023-04-15  Updated: 2023-08-16  Resolved: 2023-08-16

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table
Affects Version/s: N/A
Fix Version/s: 11.2.1

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Nikita Malyavin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
includes MDEV-31128 Server crashes in Rows_log_event::fin... Closed
includes MDEV-31136 Online ALTER is allowed on master but... Closed
includes MDEV-31172 Server crash or ASAN errors in online... Closed
Problem/Incident
causes MDEV-31128 Server crashes in Rows_log_event::fin... Closed
causes MDEV-31172 Server crash or ASAN errors in online... Closed
is caused by MDEV-16329 Engine-independent online ALTER TABLE Closed
Relates
relates to MDEV-28808 Test MDEV-16329 (ALTER ONLINE TABLE) ... Stalled

 Description   

Notes:

  • A fix for MDEV-31043 is already in the branch. Besides, this time the failure affects transactional engines as well, reproducible with at least InnoDB, MyISAM, Aria, HEAP.
  • The test case is non-deterministic, run with --repeat=N. It fails for me on almost every attempt, but it can vary on different machines and builds.
  • The second table (t2) seems to be important somehow, at least I couldn't replace its operation with just a sleep.

--source include/have_sequence.inc
 
CREATE TABLE t1 (b int);
INSERT INTO t1 VALUES (0),(0);
 
CREATE TABLE t2 (a int);
INSERT INTO t2 (a) select seq from seq_1_to_128;
 
--connect (con1,localhost,root,,)
--send
  ALTER TABLE t1 MODIFY b SERIAL;
 
--connection default
DELETE FROM t2 ORDER BY a LIMIT 1;
DELETE FROM t1 ORDER BY b LIMIT 2;
 
--connection con1
--reap
 
# Cleanup
DROP TABLE t1, t2;

bb-11.0-oalter 6f5427417eff

mysqltest: At line 18: query 'reap' failed: ER_KEY_NOT_FOUND (1032): Can't find record in 't1'



 Comments   
Comment by Nikita Malyavin [ 2023-04-17 ]

ER_KEY_NOT_FOUND seems to become a generic marker for ONLINE ALTER TABLE.
The problem here is with an added AUTO_INCREMENT (SERIAL) for a field spec, whereas the initial values were 0:

CREATE TABLE t1 (b int) engine=InnoDB;
INSERT INTO t1 VALUES (0),(0);
connect  con1,localhost,root,,;
ALTER TABLE t1 MODIFY b SERIAL;
select * from t1;
b
1
2
DROP TABLE t1;

The autoinc value was silently applied under disabled NO_AUTO_VALUE_ON_ZERO mode.
So the value '0' from the online change could not be found.

Comment by Nikita Malyavin [ 2023-04-18 ]

here's the deterministic test:

--source include/have_innodb.inc
CREATE TABLE t1 (b int) engine=InnoDB;
INSERT INTO t1 VALUES (0);
 
--connect (con1,localhost,root,,)
set debug_sync= "alter_table_copy_end signal copy wait_for goon";
send ALTER TABLE t1 MODIFY b SERIAL;
--connection default
set debug_sync= "now wait_for copy";
DELETE FROM t1;
set debug_sync= "now signal goon";
 
--connection con1
--reap
 
# Cleanup
DROP TABLE t1;

Comment by Sergei Golubchik [ 2023-04-30 ]

let's try to disable ONLINE if AUTO_INC if forced on an existing column

Comment by Nikita Malyavin [ 2023-04-30 ]

serg I already wrote an algorithm that covers a good subset. I think I'd really like to avoid mentioning that something with AUTO_INC doesn't work in the article

Comment by Sergei Golubchik [ 2023-05-01 ]

You cannot disable NO_AUTO_VALUE_ON_ZERO, non-blocking alter must produce the same result as blocking.

Technically, you can do online when AUTO_INC is added if the old table had a PK and this PK isn't changed in the ALTER TABLE. But I think it's a very corner use case, normally AUTO_INC is PK.

Comment by Nikita Malyavin [ 2023-05-03 ]

serg Please review commits 11cc554...88a2ec

Comment by Sergei Golubchik [ 2023-05-03 ]

consider this example

create table t1 (a int, b int);
insert t1 values (1,NULL),(2,NULL),(3,NULL),(4,NULL);
update t1 set b=10 where a=2;
alter table t1 modify b int auto_increment, add key(b);
select * from t1 order by a,b;
drop table t1;

it prints

a       b
1       1
2       10
3       11
4       12

I don't understand how it could be even possible to do the above online.
If I recall correctly, conceptually online alter should produce the same result as if all concurrent updates were done before the alter.

Comment by Nikita Malyavin [ 2023-05-13 ]

Right, if we stick to taking effect after the changes applied, then this method won't work

Comment by Sergei Golubchik [ 2023-05-24 ]

But of course. A user does SELECT — sees the old table structure. Performs UPDATE, commits, runs SELECT again — sees the old table structure, with the updated content. There's no other way to interpret it, the UPDATE was applied to the old table structure, before the ALTER.

Comment by Sergei Golubchik [ 2023-06-24 ]

db2d411bc is obsolete, it seems, because d23555b19206 (MDEV-30984) rewrites it.

I've already commented on d23555b19206, so there's no need to review db2d411bc anymore.

Comment by Nikita Malyavin [ 2023-06-25 ]

Well, yes, it can be done so, almost. Thanks. the commit could be useful to understand the intentions at first place, but I can squash it

Generated at Thu Feb 08 10:20:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.