[MDEV-9679] main.delayed fails sporadically Created: 2016-03-03  Updated: 2016-04-26  Resolved: 2016-04-26

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Tests
Affects Version/s: 10.0, 10.1
Fix Version/s: 10.0.25, 10.1.13

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: None

Sprint: 10.1.13

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-deb-precise-amd64/builds/5724/steps/test_4/logs/stdio

main.delayed                             w4 [ fail ]  timeout after 900 seconds
        Test ended at 2016-02-18 03:34:18
 
Test case timeout after 900 seconds
 
== /run/shm/var/4/log/delayed.log == 
INSERT DELAYED INTO t2 VALUES (1);
ERROR 40001: Deadlock found when trying to get lock; try restarting transaction
UNLOCK TABLES;
# Connection default
# Reaping: INSERT DELAYED INTO t1 VALUES (1)
UNLOCK TABLES;
# Test 2: Using ALTER TABLE
START TRANSACTION;
SELECT * FROM t1 WHERE a=0;
a
# Connection con1
# Sending:
ALTER TABLE t1 COMMENT 'test';
# Connection default
# Wait until ALTER TABLE is blocked on table 't1'.
INSERT DELAYED INTO t1 VALUES (3);
ERROR 40001: Deadlock found when trying to get lock; try restarting transaction
COMMIT;
# Connection con1
# Reaping: ALTER TABLE t1 COMMENT 'test'
 
 == /run/shm/var/4/tmp/analyze-timeout-mysqld.1.err ==
mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local MySQL server through socket '/run/shm/var/tmp/4/mysqld.1.sock' (111 "Connection refused")

I got the same problem by running this test alone with --repeat=100 on 10.0.24.

MariaDB [test]> show processlist;
+------+---------+-----------------+------+---------+------+---------------------------------+-------------------------------+----------+
| Id   | User    | Host            | db   | Command | Time | State                           | Info                          | Progress |
+------+---------+-----------------+------+---------+------+---------------------------------+-------------------------------+----------+
| 1303 | root    | localhost       | test | Sleep   |  297 |                                 | NULL                          |    0.000 |
| 1318 | root    | localhost       | test | Query   |  297 | Waiting for table metadata lock | ALTER TABLE t1 COMMENT 'test' |    0.000 |
| 1319 | DELAYED | localhost       | test | Killed  |  297 | Waiting for table level lock    |                               |    0.000 |
| 1320 | root    | localhost:40374 | test | Query   |    0 | init                            | show processlist              |    0.000 |
+------+---------+-----------------+------+---------+------+---------------------------------+-------------------------------+----------+
4 rows in set (0.00 sec)

Thread 3 (Thread 0x7f3358feb700 (LWP 13595)):
#0  0x00007f3358466408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000069a0d5 in inline_mysql_cond_timedwait (abstime=<optimized out>, mutex=<optimized out>, that=0x7f3351fe6130, src_file=<optimized out>, src_line=<optimized out>) at /home/buildbot/buildbot/build/include/mysql/psi/mysql_thread.h:1191
#2  MDL_wait::timed_wait (this=this@entry=0x7f3351fe6100, owner=0x7f3351fe60b8, abs_timeout=abs_timeout@entry=0x7f3358fe6c20, set_status_on_timeout=set_status_on_timeout@entry=false, wait_state_name=<optimized out>) at /home/buildbot/buildbot/build/sql/mdl.cc:1435
#3  0x000000000069b130 in MDL_context::acquire_lock (this=this@entry=0x7f3351fe6100, mdl_request=mdl_request@entry=0x7f3358fe6c80, lock_wait_timeout=lock_wait_timeout@entry=31536000) at /home/buildbot/buildbot/build/sql/mdl.cc:2375
#4  0x000000000069b97e in MDL_context::upgrade_shared_lock (this=this@entry=0x7f3351fe6100, mdl_ticket=0x7f334fc0f080, new_type=new_type@entry=MDL_EXCLUSIVE, lock_wait_timeout=31536000) at /home/buildbot/buildbot/build/sql/mdl.cc:2568
#5  0x0000000000587fd6 in wait_while_table_is_used (thd=thd@entry=0x7f3351fe6008, table=table@entry=0x7f335005b408, function=function@entry=HA_EXTRA_PREPARE_FOR_RENAME) at /home/buildbot/buildbot/build/sql/sql_base.cc:1834
#6  0x00000000006521ac in mysql_inplace_alter_table (alter_ctx=0x7f32ffffffc6, inplace_supported=HA_ALTER_INPLACE_NO_LOCK, ha_alter_info=0x7f3358fe7020, altered_table=0x7f334fcd5008, table=0x7f335005b408, table_list=0x7f334fc10108, thd=0x7f3351fe6008, target_mdl_request=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_table.cc:7081
#7  mysql_alter_table (thd=0x7f3300000006, thd@entry=0x7f3351fe6008, new_db=<optimized out>, new_name=<optimized out>, create_info=0x7f3354bfac08, create_info@entry=0x7f3358fe88a0, table_list=table_list@entry=0x7f334fc10108, alter_info=0x7f3354bfac08, alter_info@entry=0x7f3358fe8810, order_num=0, order=0x0, ignore=false) at /home/buildbot/buildbot/build/sql/sql_table.cc:8866
#8  0x00000000006951d1 in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x7f3351fe6008) at /home/buildbot/buildbot/build/sql/sql_alter.cc:312
#9  0x00000000005cbb88 in mysql_execute_command (thd=thd@entry=0x7f3351fe6008) at /home/buildbot/buildbot/build/sql/sql_parse.cc:5125
#10 0x00000000005d0897 in mysql_parse (thd=thd@entry=0x7f3351fe6008, parser_state=parser_state@entry=0x7f3358fea5d0, length=<optimized out>, rawbuf=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_parse.cc:6565
#11 0x00000000005d2cb8 in dispatch_command (command=COM_QUERY, thd=0x7f3351fe6008, packet=0x7f334fc10020 "ALTER TABLE t1 COMMENT 'test'", packet_length=29) at /home/buildbot/buildbot/build/sql/sql_parse.cc:1780
#12 0x00000000005d2e53 in do_command (thd=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_parse.cc:998
#13 0x00000000006918f3 in do_handle_one_connection (thd_arg=thd_arg@entry=0x7f3351fe6008) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1377
#14 0x00000000006919c2 in handle_one_connection (arg=arg@entry=0x7f3351fe6008) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1292
#15 0x0000000000a7425d in pfs_spawn_thread (arg=0x7f335703f808) at /home/buildbot/buildbot/build/storage/perfschema/pfs.cc:1860
#16 0x00007f33584620a4 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f3357b1504d in clone () from /lib64/libc.so.6
 
Thread 2 (Thread 0x7f3358fa2700 (LWP 13596)):
#0  0x00007f3358466408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000bc54c1 in inline_mysql_cond_timedwait (abstime=0x7f3358fa1c10, mutex=0x7f334fc478a0, that=0x7f334f80e108, src_file=<optimized out>, src_line=<optimized out>) at /home/buildbot/buildbot/build/include/mysql/psi/mysql_thread.h:1191
#2  wait_for_lock (data=data@entry=0x7f334fc13e68, in_wait_list=in_wait_list@entry=1 '\001', lock_wait_timeout=lock_wait_timeout@entry=31536000, wait=0x7f334fc478f0) at /home/buildbot/buildbot/build/mysys/thr_lock.c:570
#3  0x0000000000bc6b6f in thr_upgrade_write_delay_lock (data=0x7f334fc13e68, new_lock_type=TL_WRITE, lock_wait_timeout=31536000) at /home/buildbot/buildbot/build/mysys/thr_lock.c:1493
#4  0x00000000005b6eea in Delayed_insert::handle_inserts (this=this@entry=0x7f335023c008) at /home/buildbot/buildbot/build/sql/sql_insert.cc:3042
#5  0x00000000005b8354 in handle_delayed_insert (arg=arg@entry=0x7f335023c008) at /home/buildbot/buildbot/build/sql/sql_insert.cc:2932
#6  0x0000000000a7425d in pfs_spawn_thread (arg=0x7f3350183808) at /home/buildbot/buildbot/build/storage/perfschema/pfs.cc:1860
#7  0x00007f33584620a4 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f3357b1504d in clone () from /lib64/libc.so.6

From buildbot history it seems that the problem was introduced recently, possibly by this push:
http://buildbot.askmonty.org/buildbot/builders/kvm-tarbake-jaunty-x86/builds/9663

I did not check if it's indeed so, though.



 Comments   
Comment by Elena Stepanova [ 2016-03-03 ]

Unfortunately it does not happen on my machine consistently enough to pinpoint the exact commit, I suspected 9b9522a8b5952c2ce9138f13112cc4fd3efac930 , but I haven't got the problem after 1000 test runs on this commit.

Comment by Otto Kekäläinen [ 2016-03-03 ]

Also occured in these builds:
https://tests.reproducible-builds.org/logs/unstable/armhf/mariadb-10.0_10.0.24-1.build2.log.gz
https://tests.reproducible-builds.org/rbuild/unstable/amd64/mariadb-10.0_10.0.24-1.rbuild.log
https://buildd.debian.org/status/fetch.php?pkg=mariadb-10.0&arch=s390x&ver=10.0.24-1&stamp=1456018445

I disabled it in Debian/Ubuntu to get the .24 update forward, as this has been confirmed to be a build environment related thing only.
https://github.com/ottok/mariadb-10.0/commit/6db9e089ba97e4bd82867f7993d1daa78beef393

Comment by Michael Widenius [ 2016-04-25 ]

There is still a race condition in delayed.test if ALTER TABLE gets it's thd_locks before the insert delayed thread has a chance to process the rows.

Comment by Michael Widenius [ 2016-04-26 ]

Problem was that notify_shared_lock() didn't abort an insert delayed thread
if it was in thr_upgrade_write_delay_lock().

ALTER TABLE first takes a weak_mdl_lock, then a thr_lock and then tries to upgrade
the mdl_lock.
Delayed insert thread first takes a mdl lock followed by a
thr_upgrade_write_delay_lock()
This caused insert delay to wait for alter table in thr_lock, while
alter table was waiting for the mdl lock by insert delay.

Fixed by telling mdl to run thr_lock_abort() for the insert delay thread table.
We also set thd->mysys_var->abort to 1 for the delay thread when it's killed
by alter table to ensure it doesn't ever get locked in thr_lock

Generated at Thu Feb 08 07:36:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.