[MDEV-13316] Unresolved conflict between OPTIMIZE TABLE and DML transaction, causing cluster hang Created: 2017-07-13  Updated: 2019-05-20  Resolved: 2019-05-20

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Galera, Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.1.19
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Seppo Jaakola Assignee: Jan Lindström (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-13379 subsequent OPTIMIZEs on the same tabl... Closed

 Description   

MariaDB Galera Cluster node hangs for the duration of innodb_lock_wait_timeout, if there is an OPTIMIZE TABLE statement conflicting with regular DML transaction. The problem appears to be that OPTIMIZE TABLE tried to abort the victim DML transaction, but DML transaction is in innodb lock wait, and aborts only after the lock wait times out. Stack traces for these threads are as follows:

Thread 3 (Thread 0x7f1e7024fb00 (LWP 11581)):
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f1e838bc6e6 in os_cond_wait (fast_mutex=0x7f1e200273a0, 
    cond=0x7f1e200273d8)
    at /home/seppo/work/MariaDB/server/storage/xtradb/os/os0sync.cc:196
#2  os_event_wait_low (event=0x7f1e200273a0, reset_sig_count=326, 
    reset_sig_count@entry=0)
    at /home/seppo/work/MariaDB/server/storage/xtradb/os/os0sync.cc:477
#3  0x00007f1e8388945d in lock_wait_suspend_thread (thr=thr@entry=0x7f1e4804e718)
    at /home/seppo/work/MariaDB/server/storage/xtradb/lock/lock0wait.cc:342
#4  0x00007f1e8392f6af in row_mysql_handle_errors (
    new_err=new_err@entry=0x7f1e7024bbfc, trx=trx@entry=0x7f1e0406af88, 
    thr=thr@entry=0x7f1e4804e718, savept=savept@entry=0x0)
    at /home/seppo/work/MariaDB/server/storage/xtradb/row/row0mysql.cc:1173
#5  0x00007f1e8395f74a in row_search_for_mysql (
    buf=buf@entry=0x7f1e4804a7e8 "\341\302\001", mode=<optimized out>, 
    mode@entry=0, prebuilt=0x7f1e4804dfb8, match_mode=match_mode@entry=0, 
    direction=direction@entry=1)
    at /home/seppo/work/MariaDB/server/storage/xtradb/row/row0sel.cc:5198
#6  0x00007f1e8383c62e in ha_innodb::general_fetch (this=0x7f1e4804d378, 
    buf=0x7f1e4804a7e8 "\341\302\001", direction=direction@entry=1, 
    match_mode=match_mode@entry=0)
    at /home/seppo/work/MariaDB/server/storage/xtradb/handler/ha_innodb.cc:10437
#7  0x00007f1e8383c8d7 in ha_innodb::index_next (this=<optimized out>, 
    buf=<optimized out>)
    at /home/seppo/work/MariaDB/server/storage/xtradb/handler/ha_innodb.cc:10507
#8  0x000056500ab9fc32 in handler::ha_index_next (this=this@entry=0x7f1e4804d378, 
    buf=0x7f1e4804a7e8 "\341\302\001")
    at /home/seppo/work/MariaDB/server/sql/handler.cc:2666
#9  0x000056500aba527f in handler::read_range_next (this=0x7f1e4804d378)
    at /home/seppo/work/MariaDB/server/sql/handler.cc:5331
#10 0x000056500ab01e74 in handler::multi_range_read_next (this=0x7f1e4804d378, 
    range_info=range_info@entry=0x7f1e7024c658)
    at /home/seppo/work/MariaDB/server/sql/multi_range_read.cc:257
#11 0x000056500ab01f78 in Mrr_simple_index_reader::get_next (this=0x7f1e4804d980, 
    range_info=0x7f1e7024c658)
    at /home/seppo/work/MariaDB/server/sql/multi_range_read.cc:322
#12 0x000056500ab03806 in DsMrr_impl::dsmrr_next (this=0x7f1e4804d840, 
    range_info=0x7f1e7024c658)
    at /home/seppo/work/MariaDB/server/sql/multi_range_read.cc:1408
#13 0x00007f1e83827b92 in ha_innodb::multi_range_read_next (this=<optimized out>, 
    range_info=<optimized out>)
    at /home/seppo/work/MariaDB/server/storage/xtradb/handler/ha_innodb.cc:22465
#14 0x000056500ac8b6b0 in QUICK_RANGE_SELECT::get_next (this=0x7f1e0405fa70)
    at /home/seppo/work/MariaDB/server/sql/opt_range.cc:11184
#15 0x000056500acab554 in rr_quick (info=0x7f1e7024c830)
    at /home/seppo/work/MariaDB/server/sql/records.cc:349
#16 0x000056500aa89847 in mysql_update (thd=thd@entry=0x56500df30a50, 
    table_list=0x7f1e040c9fc0, fields=..., values=..., conds=0x7f1e040cd300, 
    order_num=<optimized out>, order=<optimized out>, limit=18446744073709551611, 
    handle_duplicates=DUP_ERROR, ignore=false, found_return=0x7f1e7024cc00, 
    updated_return=0x7f1e7024cc90)
    at /home/seppo/work/MariaDB/server/sql/sql_update.cc:639
#17 0x000056500a9dfeaa in mysql_execute_command (thd=thd@entry=0x56500df30a50)
    at /home/seppo/work/MariaDB/server/sql/sql_parse.cc:3746
#18 0x000056500a9e6688 in mysql_parse (thd=thd@entry=0x56500df30a50, 
    rawbuf=rawbuf@entry=0x7f1e040c9d68 "UPDATE comm38 SET x = (x + 1) % 65537, y = ((y*x % 65537) + (41504*y % 65537) + (y*z % 65537)) % 65537, z = ((z*x % 65537) + (z*y % 65537) + (41504*z % 65537)) % 65537, r = RAND() WHERE r > 0.329837 A"..., 
    length=length@entry=215, parser_state=parser_state@entry=0x7f1e7024e5e0)
    at /home/seppo/work/MariaDB/server/sql/sql_parse.cc:7319
#19 0x000056500a9e71c6 in wsrep_mysql_parse (thd=thd@entry=0x56500df30a50, 
    rawbuf=0x7f1e040c9d68 "UPDATE comm38 SET x = (x + 1) % 65537, y = ((y*x % 65537) + (41504*y % 65537) + (y*z % 65537)) % 65537, z = ((z*x % 65537) + (z*y % 65537) + (41504*z % 65537)) % 65537, r = RAND() WHERE r > 0.329837 A"..., length=215, 
    parser_state=parser_state@entry=0x7f1e7024e5e0)
    at /home/seppo/work/MariaDB/server/sql/sql_parse.cc:7142
#20 0x000056500a9e8705 in dispatch_command (command=command@entry=COM_QUERY, 
    thd=thd@entry=0x56500df30a50, 
    packet=packet@entry=0x56500de10541 "UPDATE comm38 SET x = (x + 1) % 65537, y = ((y*x % 65537) + (41504*y % 65537) + (y*z % 65537)) % 65537, z = ((z*x % 65537) + (z*y % 65537) + (41504*z % 65537)) % 65537, r = RAND() WHERE r > 0.329837 A"..., 
    packet_length=packet_length@entry=215)
    at /home/seppo/work/MariaDB/server/sql/sql_parse.cc:1485
#21 0x000056500a9ead14 in do_command (thd=0x56500df30a50)
    at /home/seppo/work/MariaDB/server/sql/sql_parse.cc:1108
#22 0x000056500aac92d0 in do_handle_one_connection (
    thd_arg=thd_arg@entry=0x56500df30a50)
    at /home/seppo/work/MariaDB/server/sql/sql_connect.cc:1350
#23 0x000056500aac94bc in handle_one_connection (arg=arg@entry=0x56500df30a50)
    at /home/seppo/work/MariaDB/server/sql/sql_connect.cc:1262
#24 0x000056500ad26587 in pfs_spawn_thread (arg=0x56500de2d2e0)
    at /home/seppo/work/MariaDB/server/storage/perfschema/pfs.cc:1860
#25 0x00007f1e91df66ba in start_thread (arg=0x7f1e7024fb00)
    at pthread_create.c:333
#26 0x00007f1e914a13dd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
 
Thread 2 (Thread 0x7f1e70299b00 (LWP 11591)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x000056500ae6daf9 in safe_cond_timedwait (cond=cond@entry=0x7f1e08014f98, 
    mp=mp@entry=0x7f1e4802e008, abstime=abstime@entry=0x7f1e70295f80, 
    file=file@entry=0x56500aefe398 "/home/seppo/work/MariaDB/server/include/mysql/psi/mysql_thread.h", line=line@entry=1191)
    at /home/seppo/work/MariaDB/server/mysys/thr_mutex.c:547
#2  0x000056500ae6acb7 in inline_mysql_cond_timedwait (src_line=586, 
    src_file=0x56500b03f710 "/home/seppo/work/MariaDB/server/mysys/thr_lock.c", 
    abstime=0x7f1e70295f80, mutex=0x7f1e4802e008, that=0x7f1e08014f98)
    at /home/seppo/work/MariaDB/server/include/mysql/psi/mysql_thread.h:1191
#3  wait_for_lock (wait=wait@entry=0x7f1e4802e0d8, 
    data=data@entry=0x7f1e140a8140, in_wait_list=in_wait_list@entry=1 '\001', 
    lock_wait_timeout=lock_wait_timeout@entry=31536000)
    at /home/seppo/work/MariaDB/server/mysys/thr_lock.c:586
#4  0x000056500ae6bbae in thr_lock (lock_wait_timeout=31536000, 
    owner=0x56500df278d8, data=0x7f1e140a8140)
    at /home/seppo/work/MariaDB/server/mysys/thr_lock.c:1021
#5  thr_multi_lock (data=0x7f1e08077120, count=<optimized out>, 
    owner=owner@entry=0x56500df278d8, 
    lock_wait_timeout=lock_wait_timeout@entry=31536000)
    at /home/seppo/work/MariaDB/server/mysys/thr_lock.c:1302
#6  0x000056500ac650ed in mysql_lock_tables (thd=thd@entry=0x56500df26070, 
    sql_lock=sql_lock@entry=0x7f1e08077100, flags=flags@entry=0)
    at /home/seppo/work/MariaDB/server/sql/lock.cc:320
#7  0x000056500ac65ea1 in mysql_lock_tables (thd=thd@entry=0x56500df26070, 
    tables=tables@entry=0x7f1e080770f8, count=count@entry=1, flags=flags@entry=0)
    at /home/seppo/work/MariaDB/server/sql/lock.cc:275
#8  0x000056500a98e5f5 in lock_tables (thd=thd@entry=0x56500df26070, 
    tables=0x7f1e080768a0, count=<optimized out>, flags=flags@entry=0)
    at /home/seppo/work/MariaDB/server/sql/sql_base.cc:5438
#9  0x000056500a9918b6 in open_and_lock_tables (thd=thd@entry=0x56500df26070, 
    options=..., tables=tables@entry=0x7f1e080768a0, derived=derived@entry=true, 
    flags=flags@entry=0, 
    prelocking_strategy=prelocking_strategy@entry=0x7f1e70296440)
    at /home/seppo/work/MariaDB/server/sql/sql_base.cc:5267
#10 0x000056500aad5c66 in open_and_lock_tables (flags=0, derived=true, 
    tables=0x7f1e080768a0, thd=0x56500df26070)
    at /home/seppo/work/MariaDB/server/sql/sql_base.h:534
#11 mysql_admin_table(THD *, TABLE_LIST *, HA_CHECK_OPT *, const char *, thr_lock_type, bool, bool, uint, int (*)(THD *, TABLE_LIST *, HA_CHECK_OPT *), struct {...}, int (*)(THD *, TABLE_LIST *, HA_CHECK_OPT *)) (thd=thd@entry=0x56500df26070, 
    tables=tables@entry=0x7f1e080768a0, check_opt=check_opt@entry=0x56500df2a8d0, 
    operator_name=operator_name@entry=0x56500af2b42b "optimize", 
    lock_type=lock_type@entry=TL_WRITE, 
    open_for_modify=open_for_modify@entry=true, repair_table_use_frm=false, 
    extra_open_options=0, prepare_func=0x0, operator_func=
    (int (handler::*)(handler * const, THD *, HA_CHECK_OPT *)) 0x56500aba295e <handler::ha_optimize(THD*, st_ha_check_opt*)>, view_operator_func=0x0)
    at /home/seppo/work/MariaDB/server/sql/sql_admin.cc:465
#12 0x000056500aad7df1 in Sql_cmd_optimize_table::execute (this=<optimized out>, 
    thd=0x56500df26070) at /home/seppo/work/MariaDB/server/sql/sql_admin.cc:1308
#13 0x000056500a9e511c in mysql_execute_command (thd=thd@entry=0x56500df26070)
    at /home/seppo/work/MariaDB/server/sql/sql_parse.cc:5671
#14 0x000056500a9e6688 in mysql_parse (thd=thd@entry=0x56500df26070, 
    rawbuf=rawbuf@entry=0x7f1e080767b8 "optimize table test.comm38", 
    length=length@entry=26, parser_state=parser_state@entry=0x7f1e702985e0)
    at /home/seppo/work/MariaDB/server/sql/sql_parse.cc:7319
#15 0x000056500a9e71c6 in wsrep_mysql_parse (thd=thd@entry=0x56500df26070, 
    rawbuf=0x7f1e080767b8 "optimize table test.comm38", length=26, 
    parser_state=parser_state@entry=0x7f1e702985e0)
    at /home/seppo/work/MariaDB/server/sql/sql_parse.cc:7142
#16 0x000056500a9e8705 in dispatch_command (command=command@entry=COM_QUERY, 
    thd=thd@entry=0x56500df26070, packet=packet@entry=0x56500df35f41 "", 
    packet_length=packet_length@entry=26)
    at /home/seppo/work/MariaDB/server/sql/sql_parse.cc:1485
#17 0x000056500a9ead14 in do_command (thd=0x56500df26070)
    at /home/seppo/work/MariaDB/server/sql/sql_parse.cc:1108
#18 0x000056500aac92d0 in do_handle_one_connection (
    thd_arg=thd_arg@entry=0x56500df26070)
    at /home/seppo/work/MariaDB/server/sql/sql_connect.cc:1350
#19 0x000056500aac94bc in handle_one_connection (arg=arg@entry=0x56500df26070)
    at /home/seppo/work/MariaDB/server/sql/sql_connect.cc:1262
#20 0x000056500ad26587 in pfs_spawn_thread (arg=0x56500def29e0)
    at /home/seppo/work/MariaDB/server/storage/perfschema/pfs.cc:1860
#21 0x00007f1e91df66ba in start_thread (arg=0x7f1e70299b00)
    at pthread_create.c:333
#22 0x00007f1e914a13dd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

I have a test case which can reproduce the issue reliably, but I cannot share it. For reproducing following sinmple scenario will make the issue surface:

  • Start one MariaDB Galera Cluster 10.1.9 node
  • create ~30 simple innodb tables, and populate tables with ~100 rows
  • start 10 client sessions running transactions issuing a few single row updates randomly on the tables. Use proper MST transaction, not AC
  • start one client session cycling through tables issuing OPTIMIZE TABLE for each table in turn

The OPTIMIZE TABLE session will abort DML transactions, which usually abort gracefully. But, in my test rig, after ̃ ~10 minute test run I can observe a total hang for the duration of innodb_lock_wait_timeout. It makes sense to set innodb_lock_wait_timeout to high value so that the hang is easier to diagnose.

I have reproduced the issue with MariaDB 10.1.19, so far



 Comments   
Comment by Seppo Jaakola [ 2017-07-13 ]

I have a potential fix for this now, looks like wsrep patch merge into mariaDB was not complete in storage/innobase/lock0lock.cc. Need to test some more to see if no issues with longer test loads

Comment by Andrii Nikitin (Inactive) [ 2017-07-13 ]

Referring https://mariadb.com/kb/en/mariadb/mariadb-galera-cluster-known-limitations/ :

Locking is lax when DDL is involved. For example, if your DML transaction uses a table, and a parallel DDL statement is started, in the normal MySQL setup it would have waited for the metadata lock, but in Galera context it will be executed right away. It happens even if you are running a single node, as long as you configured it as a cluster node. See also MDEV-468. *This behavior might cause various side-effects, the consequences have not been investigated yet. Try to avoid such parallelism. *

I tend to close this as 'Not a bug'.
But what are you trying to achieve with OPTIMIZE TABLE here? Maybe NO_WRITE_TO_BINLOG clause will help to achieve the same without 'side effects'?

Comment by Seppo Jaakola [ 2017-07-13 ]

The use of OPTIMIZE TABLE comes from the end user use case, probably any DDL would cause the hang. I'm just trying to simulate their issue and come up with a fix.

Note that OPTIMIZE TABLE is pending in thr lock wait, so it is not able to start processing. It tried to abort the victim, but the victim transaction is waiting for innodb table level lock and keeps on waiting until innodb_lock_wait_timeout elapses. During this wait, nobody is able to commit, the node is totally silent for the default 50 secs. If this node is in a cluster, then because of flow control, the cluster would hang also.

Comment by Andrii Nikitin (Inactive) [ 2017-07-13 ]

Step aside this particular problem: I have quite extended experience with InnoDB and cannot imagine any practical scenario where regular OPTIMIZE TABLE may bring any practical benefits.
So maybe solution to business challenge is to not use Optimize command frequently (especially when mentioned 'limitations' section warns against concurrent DDL and transactions).

Otherwise, if you can come with InnoDB patch - I expect that developers will eventually review and probably accept it. (Especially if you are totally sure that mentioned limitations are not applicable to this particular scenario and the issue may be fixed). Without such reasoning I don't have enough ground to verify the bug considering part "This behavior might cause various side-effects, the consequences have not been investigated yet. Try to avoid such parallelism".

Comment by Andrii Nikitin (Inactive) [ 2017-07-24 ]

seppo (While working on different case) I see concurrency problem in MYSQL_BIN_LOG::rotate - it checks `if (wsrep_to_isolation)` without any mutex guard, thus logic related to TOI and checkpoint in log.cc is unreliable.
(The code is very complicated for me, so it is hard to tell consequences of this problem.)

If I insert proper sleep() calls around that variable - I can run into complete cluster deadlock with concurrent flush tables / optimize table (which I still should report when ready).
But do you think it may be related to what you see in this ticket?

Comment by Andrii Nikitin (Inactive) [ 2017-07-25 ]

I've opened MDEV-13379 for case I observed, - maybe it is duplicate of this one - it just is hanging forever and doesn't honour innodb_lock_wait_timeout

Comment by Andrii Nikitin (Inactive) [ 2017-09-07 ]

Just for reference - I can't reproduce described problem with script below, processlist doesn't show any major stalls (all commits happen within few secods):

ver=${ENVIRON:-10.1.19}
 
# just use current directory if called from framework
if [ ! -f common.sh ] ; then
  [ -d mariadb-environs ] || git clone http://github.com/AndriiNikitin/mariadb-environs
  cd mariadb-environs
  ./get_plugin.sh galera
fi
 
_template/plant_cluster.sh cluster1
echo m0 > cluster1/nodes.lst
# echo m1 >> cluster1/nodes.lst
# echo m2 >> cluster1/nodes.lst
cluster1/replant.sh ${ver}
 
./build_or_download.sh m0
 
. cluster1/gen_cnf.sh innodb_flush_log_at_trx_commit=2 innodb_lock_wait_timeout=1000
. cluster1/install_db.sh
. cluster1/galera_setup_acl.sh
. cluster1/galera_start_new.sh $WSREP_EXTRA_OPT
 
sleep 45
. cluster1/galera_cluster_size.sh
 
for i in {1..30}; do
  m0*/sql.sh create table t$i select seq as a, seq as b from seq_1_to_100
  m0*/sql.sh "alter table t$i add unique index(a)"
done
 
trap 'kill $(jobs -p)' EXIT
 
for i in {1..30}; do
  export I=$i
  ( while : ; do m0*/sql.sh "start transaction; update t$I set b=b+1 where a=$I; delete from t$I where a=2*$I; commit" 2>/dev/null || : ; done; )&
done
 
( while : ; do 
for i in {1..30}; do m0*/sql.sh optimize table t$i 2>/dev/null ; done
done ) &
 
while : ; do
m0*/sql.sh show processlist && sleep 10
done

Generated at Thu Feb 08 08:04:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.