[MCOL-5603] Online alter can break data consistency Created: 2023-11-07  Updated: 2024-01-16

Status: Stalled
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Nikita Malyavin Assignee: Roman
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-16329 Engine-independent online ALTER TABLE Closed

 Description   

Online alter (LOCK=NONE, which is default) works as follows:
1. Acquire an exclusive MDL lock for a short duration
2. Acquire TL_READ
3. Downgrade MDL to MDL_SHARED_UPGRADABLE to allow concurrent writes
4. Copy and replicate online changes
5. Unlock TL
6. Replicate the rest of online changes
5. Upgrade to MDL_EXCLUSIVE
6. Unlock all

This means that now MCS can end up with inconsistent data if some concurrent writes were executed during alter table:

con1: ALTER TABLE t ....
con1: # concider it reached stage 4
con2: INSERT y; COMMIT;

this may end up in the record y inserted twice in the resulting table.

Context: MDL – Metadata lock, multi-granular lock that protects table metadata. Represented by MDL_request class and TABLE_LIST::mdl_request field respectively. See enum_mdl_type for the granularity types and MDL_lock::MDL_object_lock::m_granted_incompatible for the compatibility matrix.
TL lock - a table data lock represented by thr_lock_type enum and stored in TABLE_LIST::lock_type and TABLE::reg_info.lock_type. Typically propagated to a stronger lock type then provided by a storage engine in handler::store_lock. Refer to mysql_lock_tables/mysql_unlock_tables for further info.

Reproduction:
1. Compile the sql server with -DCMAKE_BUILD_TYPE=Debug
2. Run the mtr test

create or replace table t1 (a int, b int, key(a)) engine=columnstore;
insert t1 values (1, 11);
insert t1 values (2, 22);
insert t1 values (3, 33);
insert t1 values (4, 44);
insert t1 values (5, 55);
 
set debug_sync= 'alter_table_online_downgraded SIGNAL downgraded WAIT_FOR ins';
send alter table t1 add c int algorithm= copy, lock= none;
 
--connect (con1, localhost, root,,)
set debug_sync= 'now WAIT_FOR downgraded';
set lock_wait_timeout= 0; # the correct behavior is to wait until alter table ends
--error 0,ER_LOCK_WAIT_TIMEOUT
insert t1 values (7, 77); # autocommit
 
set debug_sync= 'now SIGNAL ins';
 
--connection default
reap;
 
# row 7, 77 should not happen twice! (but can be 0 times in case of lock timeout)
select * from t1; 



 Comments   
Comment by Roman [ 2023-11-07 ]

Simplified description of the repro case:

Run INSERT exactly while the copy stage of ALTER TABLE goes on. If insert finishes successfully, the test is successful (i.e. the bug is present). In insert never finishes (practically, ends with timeout), then the fix works as expected.

Comment by Nikita Malyavin [ 2023-11-14 ]

Changed to critical since it's a regression

Generated at Thu Feb 08 02:59:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.