[MDEV-32100] Online ALTER TABLE ends with 1032 (Can't find record) under some isolation levels Created: 2023-09-05  Updated: 2023-11-04  Resolved: 2023-11-04

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Storage Engine - InnoDB
Affects Version/s: 11.2
Fix Version/s: 11.2.2

Type: Bug Priority: Blocker
Reporter: Matthias Leich Assignee: Nikita Malyavin
Resolution: Fixed Votes: 0
Labels: rr-profile

Attachments: File TBR-403.yy    

 Description   

origin/11.3, 11.3 8ad1e26b1bafa4ed9928306efc10c047f2274108 2023-09-04T09:36:44+04:00
origin/bb-11.3-MDEV-28699 d2947464c841d0a989098010e04bfda7dcf7df58 2023-09-04T21:41:01+05:30 (rr traces)
Per Marko and Thiru:
The problems are most probably caused by MDEV-16329.
 
Scenario:
1. Start the server and generate some initial data.
2. Several session run concurrent some DDL/DML mix
    Sometimes this kind of test fails with
     ALTER TABLE t3 MODIFY COLUMN col2 INT NOT NULL, LOCK = DEFAULT  /* E_R Thread2 QNO 2894 CON_ID 15 */  failed: 1032 , errstr: Can't find record in 't3'
     pluto:/data/results/1693844534/TBR-403--TBR-403D_1$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio
    or
    ALTER TABLE t1 DROP PRIMARY KEY, ALGORITHM = DEFAULT, LOCK = DEFAULT  /* E_R Thread16 QNO 59 CON_ID 28 */  failed: 1032 , errstr: Can't find record in 't1'
    pluto:/data/results/1693844534/TBR-403--TBR-403F--TBR-403D_better$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio
 
One of the various test setups which showed the trouble
# git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
#
# GIT_SHOW: HEAD -> master, origin/master, origin/HEAD d5889a108c00974367f47657cabe4804fe7df566 2023-09-04T13:53:19+02:00
# rqg.pl  : Version 4.4.0 (2023-08)
#
# $RQG_HOME/rqg.pl \
# --grammar=conf/mariadb/table_stress_innodb.yy \
# --gendata=conf/mariadb/table_stress.zz \
# --gendata_sql=conf/mariadb/table_stress.sql \
# --mysqld=--transaction-isolation=SERIALIZABLE \
# --validator=SelectStability \
# --redefine=conf/mariadb/redefine_innodb_truncate_temporary_tablespace_flipper.yy \    <==  not required
# --redefine=conf/mariadb/redefine_temporary_tables.yy \                                                               <==  not required
# --rounds=2 \                                                                                                                                                      <==  not required
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --no_mask \
# --queries=10000000 \
# --seed=random \
# --reporters=None \
# --reporters=ErrorLog \
# --reporters=Deadlock \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--plugin-load-add=provider_lzo.so \
# --mysqld=--plugin-load-add=provider_bzip2.so \
# --mysqld=--plugin-load-add=provider_lzma.so \
# --mysqld=--plugin-load-add=provider_snappy.so \
# --mysqld=--plugin-load-add=provider_lz4.so \
# --mysqld=--loose-innodb_compression_level=1 \
# --duration=300 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --mysqld=--sql_mode=traditional \
# --mysqld=--innodb_file_per_table=1 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--innodb_stats_persistent=on \
# --mysqld=--innodb_adaptive_hash_index=off \
# --mysqld=--innodb_random_read_ahead=ON \
# --mysqld=--log-bin \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
# --mysqld=--loose-max-statement-time=30 \
# --threads=2 \
# --mysqld=--innodb-use-native-aio=0 \
# --mysqld=--loose-gdb \
# --mysqld=--loose-debug-gdb \
# --rr=Extended \
# --rr_options=--wait \
# --mysqld=--innodb_undo_log_truncate=OFF \
# --mysqld=--loose_innodb_change_buffering=deletes \
# --mysqld=--innodb_undo_tablespaces=16 \
# --mysqld=--innodb_rollback_on_timeout=ON \
# --mysqld=--innodb_page_size=8K \
# --mysqld=--innodb-buffer-pool-size=256M \
# <local settings>
    



 Comments   
Comment by Elena Stepanova [ 2023-09-06 ]

Thanks.
The test case below was made from the original non-simplified source. As long as the simplified one was run with the same startup options, the grammar shouldn't matter, but it might matter a lot whether these failures were only observed with --mysqld=--transaction-isolation=SERIALIZABLE. For now I assume that it's SERIALIZABLE.

The test case is non-deterministic, run with --repeat=N. It usually fails for me within several attempts.

--source include/have_innodb.inc
 
CREATE TABLE t1 (a INT, b INT, KEY(b)) ENGINE=InnoDB;
 
--connect (con1,localhost,root,,test)
INSERT INTO t1 VALUES (1,1),(NULL,NULL),(3,3),(4,NULL),(NULL,5);
--send
  DELETE FROM t1 WHERE a IS NULL;
 
--connection default
SET lock_wait_timeout= 1;
SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE;
--send
  ALTER TABLE t1 FORCE, ALGORITHM=COPY;
 
--connection con1
--reap
DELETE FROM t1 WHERE b IS NULL;
 
--connection default
--error 0,ER_LOCK_WAIT_TIMEOUT
--reap
 
DROP TABLE t1;

11.2 9ad7c899

mysqltest: At line 22: query 'reap' failed with wrong errno ER_KEY_NOT_FOUND (1032): 'Can't find record in 't1'', instead of  (0)...

Comment by Marko Mäkelä [ 2023-09-06 ]

I think that also using READ UNCOMMITTED could lead to completely wrong results. REPEATABLE READ and READ COMMITTED are hopefully equivalent here, but that should be checked by nikitamalyavin. Look for ReadView::open() (trx->read_view.open()).

Comment by Marko Mäkelä [ 2023-09-06 ]

For what it is worth, ha_innobase::check() is changing trx_t::isolation_level to TRX_ISO_REPEATABLE_READ for the duration of running CHECK TABLE on an InnoDB table.

Comment by Matthias Leich [ 2023-09-06 ]

I tried to replay the problem without setting the ISOLATION LEVEL and without the RQG validator.
      # 2023-09-06T07:07:41 [2001010] GenTest_e::Executor::MySQL::execute:  Thread18 Query: ALTER TABLE t3 ADD FULLTEXT KEY IF NOT EXISTS `ftidx1` ( col_text ), ALGORITHM = COPY /* E_R Thread18 QNO 538 CON_ID 49 */  failed: 1032 , errstr: Can't find record in 't3'
Afaik the ISOLATION LEVEL is REPEATABLE READ if not set otherwise.

Comment by Elena Stepanova [ 2023-09-07 ]

So far I couldn't reproduce the variation without SERIALIZABLE (the one with it was rather easily reproducible with Matthias'es original test).

For the sake of time-saving, as we want as much as possible to be fixed before GA, I suggest to fix the one with SERIALIZABLE. Maybe while working on it and creating a deterministic test case, Nikita will be able to extend the scenario onto normal isolation levels.

Meanwhile, I'll continue looking into the non-SERIALIZABLE test and hopefully will get it fail eventually. The original grammar involves virtual columns, maybe it's related somehow, we've seen lots of "Can't find record" errors in relation to those. Or, maybe it's just a really thin race condition.

Comment by Nikita Malyavin [ 2023-09-07 ]

The problem summarized by Elena is likely an isolation level problem. ReadView::open is never called. In my first versions, i was explicitly calling hton->start_transaction_with_consistent snapshots, but then because of problems with some other storage engines we reorganized the code make sure the snapshot has started implicitly, before initializing the online hooks and downgrading the lock, by calling read_record() once.

As it turns out, tx_isolation itself is not enough to have a proper table-wide isolation (I didn't know). We should either honor the user setting and update the table lock accordingly, or make sure it is properly ignored. I don't know yet, which one of these two ways is correct. I'm going to check how INPLACE behaves and do the same.

EDIT: DELETE ... WHERE a IS NULL would traverse every record in the order of clustered index, i.e. it would begin from (1,1).
At the same time, Online ALTER reads a single record before a lock, which also will be (1,1). So probably it would work for READ COMMITED, while SERIALIZEABLE should hold a X lock on it.

Comment by Nikita Malyavin [ 2023-09-11 ]

Here goes the deterministic test:

--source include/have_innodb.inc
 
create table t1 (a int, b int, key(b)) engine=innodb;
 
--connect (con1,localhost,root,,test)
insert into t1 values (1,1),(null,null),(3,3),(4,null),(null,5);
 
--connection default
set session transaction isolation level read committed;
set debug_sync= "alter_table_online_downgraded signal downgraded wait_for goalters";
 
--send
alter table t1 force, algorithm=copy;
 
--connection con1
set debug_sync= "now wait_for downgraded";
delete from t1 where b is null;
set debug_sync= "now signal goalters";
 
--connection default
--reap
 
drop table t1;

Few notes:
1. READ COMMITTED works as good as SERIALIZEABLE
2. One may find that the first DELETE, searching by a (WHERE a IS NULL) is missing in the test.
It is not possible to reproduce the test with the subj, as it just serializes with ALTER. Probably, the reason is that it makes searches by the clustered index instead of secondary key. It's surprising, though, that searching by key doesn't lock that much. Maybe marko can know why.

No #3, pardon

Comment by Elena Stepanova [ 2023-09-11 ]

1. READ COMMITTED works as good as SERIALIZEABLE

And READ UNCOMMITTED too (for completeness.).

Comment by Nikita Malyavin [ 2023-09-11 ]

So far, the INPLACE innodb's trx is initialized during the INPLACE ALTER TABLE with the default isolation_level, which is TRX_ISO_REPEATABLE_READ,
whereas COPY ALTER TABLE uses trx assigned to current thd.

(gdb) bt
#0  0x000055a77f5f567a in ha_innobase_inplace_ctx::ha_innobase_inplace_ctx (this=0x6290000fc1a8, prebuilt_arg=@0x61d000284e90: 0x620000020120, drop_arg=0x0, num_to_drop_arg=0, drop_fk_arg=0x0, num_to_drop_fk_arg=0, add_fk_arg=0x0, num_to_add_fk_arg=0, online_arg=true, heap_arg=0x61a000025898, new_table_arg=0x61800003b120, col_names_arg=0x0, add_autoinc_arg=18446744073709551615, autoinc_col_min_value_arg=0, autoinc_col_max_value_arg=0, allow_not_null_flag=false, page_compressed=false, page_compression_level_arg=0) at /home/nik/mariadb/storage/innobase/handler/handler0alter.cc:1118
#1  0x000055a77f59adbd in ha_innobase::prepare_inplace_alter_table (this=0x61d0002846b8, altered_table=0x7fd66f25d810, ha_alter_info=0x7fd66f25d080) at /home/nik/mariadb/storage/innobase/handler/handler0alter.cc:8621
#2  0x000055a77d511b7e in handler::ha_prepare_inplace_alter_table (this=0x61d0002846b8, altered_table=0x7fd66f25d810, ha_alter_info=0x7fd66f25d080) at /home/nik/mariadb/sql/handler.cc:5429
#3  0x000055a77e3b6060 in mysql_inplace_alter_table (thd=0x62b00017a218, table_list=0x6290000fa398, table=0x6190000c1798, altered_table=0x7fd66f25d810, ha_alter_info=0x7fd66f25d080, target_mdl_request=0x7fd66f25b6b0, ddl_log_state=0x7fd66f25b040, trigger_param=0x7fd66f25b8e0, alter_ctx=0x7fd66f25bd70, partial_alter=@0x7fd66f25b0c0: false, start_alter_id=@0x7fd66f25b0d0: 0, if_exists=false) at /home/nik/mariadb/sql/sql_table.cc:7679
#4  0x000055a77e39f4ff in mysql_alter_table (thd=0x62b00017a218, new_db=0x62b00017efe0, new_name=0x62b00017f430, create_info=0x7fd66f028020, table_list=0x6290000fa398, recreate_info=0x7fd66f028c80, alter_info=0x7fd66f028320, order_num=0, order=0x0, ignore=false, if_exists=false) at /home/nik/mariadb/sql/sql_table.cc:11020
#5  0x000055a77e64b1d6 in Sql_cmd_alter_table::execute (this=0x6290000fad10, thd=0x62b00017a218) at /home/nik/mariadb/sql/sql_alter.cc:615
#6  0x000055a77dffd17a in mysql_execute_command (thd=0x62b00017a218, is_called_from_prepared_stmt=false) at /home/nik/mariadb/sql/sql_parse.cc:5775
#7  0x000055a77dfdc10b in mysql_parse (thd=0x62b00017a218, rawbuf=0x6290000fa238 "alter table t1 add c int, add key(c), algorithm=inplace, lock=none", length=66, parser_state=0x7fd66f1677f0) at /home/nik/mariadb/sql/sql_parse.cc:7811
#8  0x000055a77dfd5b8e in dispatch_command (command=COM_QUERY, thd=0x62b00017a218, packet=0x629000276219 "alter table t1 add c int, add key(c), algorithm=inplace, lock=none", packet_length=66, blocking=true) at /home/nik/mariadb/sql/sql_parse.cc:1893
#9  0x000055a77dfddeda in do_command (thd=0x62b00017a218, blocking=true) at /home/nik/mariadb/sql/sql_parse.cc:1406
#10 0x000055a77e622536 in do_handle_one_connection (connect=0x608000002e38, put_in_cache=true) at /home/nik/mariadb/sql/sql_connect.cc:1445
#11 0x000055a77e621ccf in handle_one_connection (arg=0x608000002db8) at /home/nik/mariadb/sql/sql_connect.cc:1347

Comment by Nikita Malyavin [ 2023-09-11 ]

Please review branch bb-11.2-nikita

Comment by Sergei Golubchik [ 2023-09-22 ]

fb158d2fece is ok to push

Generated at Thu Feb 08 10:28:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.