[MDEV-30983] ALTER TABLE times out, but does not roll back the operation Created: 2023-04-01  Updated: 2024-01-02  Resolved: 2024-01-02

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Storage Engine - InnoDB
Affects Version/s: 11.0
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Nikita Malyavin
Resolution: Cannot Reproduce Votes: 1
Labels: None

Issue Links:
Relates
relates to MDEV-31957 Concurrent ALTER and ANALYZE collecti... Closed

 Description   

In the test case below, one connection performs ALTER, while another has something to do with statistical tables. Every ones in a while, ALTER fails with a timeout. Apparently the timeout concerns statistical tables, as there seems to be no other relation between connections. Even though the ALTER fails, the modification it performed remains, e.g. in this case the column remains renamed.

The practical effect is, for example, further replication discrepancy: since ALTER fails, it is not written in the binlog, so table remains unchanged on the slave, which causes further discrepancies.

Notes:

  • the test case is very non-deterministic, run with big enough --repeat=N. Do not put it into the regression suite, create a deterministic one instead! It usually fails for me within ~20 attempts, but it can vary a lot on different machines and builds;
  • sequences and prepared statement are not important, they are just used to create tables with many columns;
  • tables can be (at least) MyISAM, Aria, or InnoDB, although it seems to take a bit longer with MyISAM;
  • could not reproduce on the baseline.

--source include/have_sequence.inc
 
select concat('create table t1 (',group_concat(concat('c',seq,' int')),')') into @create_tbl from seq_1_to_300;
execute immediate @create_tbl;
create table t2 like t1;
 
--connect (con1,localhost,root,,)
--send
  analyze table t1 persistent for all;
 
--connection default
--error 0,ER_LOCK_WAIT_TIMEOUT
  alter table t2 nowait rename column c298 TO c298_renamed;
if ($mysql_errno)
{
  show create table t2;
  --die ALTER failed with ER_LOCK_WAIT_TIMEOUT, check whether `c298` column is still there
}
 
drop table t1, t2;

bb-11.0-oalter e599b8b4586

show create table t2;
Table	Create Table
t2	CREATE TABLE `t2` (
  `c1` int(11) DEFAULT NULL,
  `c2` int(11) DEFAULT NULL,
...
  `c298_renamed` int(11) DEFAULT NULL,
  `c299` int(11) DEFAULT NULL,
  `c300` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci
 
mysqltest: At line 17: ALTER failed with ER_LOCK_WAIT_TIMEOUT, check whether `c298` column is still there



 Comments   
Comment by Nikita Malyavin [ 2023-04-06 ]

This happens under INPLACE:

(gdb) l
10821	      inplace_alter_table_committed_argument=
10822	        ha_alter_info.inplace_alter_table_committed_argument;
10823	      if (res)
10824	      {
10825	        cleanup_table_after_inplace_alter(&altered_table);
10826	        goto err_cleanup;
10827	      }
10828	      cleanup_table_after_inplace_alter_keep_files(&altered_table);
10829	
10830	      goto end_inplace;
(gdb) p (char*)thd->m_stmt_da->m_message
$3 = 0x62b000180212 "Lock wait timeout exceeded; try restarting transaction"
(gdb) bt
#0  mysql_alter_table (thd=0x62b00017a218, new_db=0x62b00017ef08, new_name=0x62b00017f350, create_info=0x7fd272f4a020, table_list=0x6290000fa550, recreate_info=0x7fd272f4ac30, alter_info=0x7fd272f4a320, order_num=0, order=0x0, ignore=false, if_exists=false) at /home/nik/mariadb/sql/sql_table.cc:10826
#1  0x00005608e9c6dcca in Sql_cmd_alter_table::execute (this=0x6290000facf8, thd=0x62b00017a218) at /home/nik/mariadb/sql/sql_alter.cc:563
#2  0x00005608e96e9600 in mysql_execute_command (thd=0x62b00017a218, is_called_from_prepared_stmt=false) at /home/nik/mariadb/sql/sql_parse.cc:6001
#3  0x00005608e96c942f in mysql_parse (thd=0x62b00017a218, rawbuf=0x6290000fa238 "alter table t2 nowait rename column c298 TO c298_renamed", length=56, parser_state=0x7fd2730db7b0) at /home/nik/mariadb/sql/sql_parse.cc:8000
#4  0x00005608e96c3ca4 in dispatch_command (command=COM_QUERY, thd=0x62b00017a218, packet=0x629000271219 "alter table t2 nowait rename column c298 TO c298_renamed", packet_length=56, blocking=true) at /home/nik/mariadb/sql/sql_parse.cc:1894
#5  0x00005608e96ca725 in do_command (thd=0x62b00017a218, blocking=true) at /home/nik/mariadb/sql/sql_parse.cc:1407
#6  0x00005608e9c4f436 in do_handle_one_connection (connect=0x6080000039b8, put_in_cache=true) at /home/nik/mariadb/sql/sql_connect.cc:1416
#7  0x00005608e9c4ec0f in handle_one_connection (arg=0x608000003938) at /home/nik/mariadb/sql/sql_connect.cc:1318
#8  0x00005608ea3d1ce9 in pfs_spawn_thread (arg=0x617000008218) at /home/nik/mariadb/storage/perfschema/pfs.cc:2201
#9  0x00007fd28969ebb5 in ?? () from target:/usr/lib/libc.so.6
#10 0x00007fd289720cb4 in clone () from target:/usr/lib/libc.so.6

Comment by Nikita Malyavin [ 2023-04-06 ]

This was successfully reproduced on the most recent common with 11.0 commit: 799f7595 bump the maturity

Comment by Nikita Malyavin [ 2023-04-06 ]

I've also reproduced it on fresh 11.0 8e55d7ea4a2f9. After failure, c298_renamed was visible to SHOW CREATE.

I didn't investigate which versions are affected.

marko please adjust the priority of your choice

Comment by Marko Mäkelä [ 2023-04-11 ]

nikitamalyavin, can you please reproduce an rr replay trace of this on a shared machine so that it can be analyzed further? I failed to reproduce any failure with the posted mtr test case on a 11.1 based branch, even after adjusting the test to create InnoDB tables.

Note: The ANALYZE TABLE t1 PERSISTENT FOR ALL is not InnoDB statistics, but "engine independent table statistics". If there is a locking problem, it likely resides somewhere outside InnoDB.

Related to InnoDB persistent statistics, there is a bug MDEV-29566, which might play a role here. To rule it out, please try creating the tables with STATS_PERSISTENT=0.

Comment by Marko Mäkelä [ 2023-04-12 ]

Actually, in the end I too reproduced this on a 11.1 based branch once, but without rr record.

Comment by Nikita Malyavin [ 2023-05-16 ]

This bug wasn't related to innodb's inplace, but to a generic inplace (as it shows up, myisam and maria can also do renaming), and, as predicted by marko, is related to "engine independent table statistics".

The lock timeout is reported, because the stat table is opened by ANALYZE TABLE, and a table was still ALTERed, because the proper error handling was missing in the alter table code.

Comment by Nikita Malyavin [ 2023-05-16 ]

serg can we just ignore the error from rename_column_in_stat_tables?

I see that thd->clear_error() was there until commit 313855766f3c

Comment by Nikita Malyavin [ 2023-05-22 ]

Note: the fix with a proper handing (bailout) is here
(it can be thd->clear_error() instead)

the deterministic test:

create table t1 (c1 int, c2 int, c3 int);
create table t2 like t1;
 
--connect (con1,localhost,root,,)
set debug_sync= "statistics_update_table_opened SIGNAL open WAIT_FOR done";
--send
analyze table t1 persistent for all;
 
--connection default
set debug_sync= "now WAIT_FOR open";
--error ER_LOCK_WAIT_TIMEOUT
alter table t2 nowait rename column c3 TO c3_renamed;
show warnings;
 
show create table t2;
 
set debug_sync= "now SIGNAL done";
 
drop table t1, t2;

Comment by Nikita Malyavin [ 2024-01-02 ]

It was likely fixed by MDEV-31957

Generated at Thu Feb 08 10:20:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.