[MDEV-20225] MDL BF-BF conflict of DROP TRIGGER vs INSERT ... SELECT for the table with trigger vs DELETE from table trigger logs to Created: 2019-07-31  Updated: 2022-01-10  Resolved: 2019-10-01

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.4.6
Fix Version/s: 10.4.9

Type: Bug Priority: Major
Reporter: Valerii Kravchuk Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File galera_trigger.test     File mysqld.1.err.gz    

 Description   

The following conflict happens on MariaDB 10.4.6 (schema details removed/anonymized:

2019-07-30 0:33:47 42 [Note] WSREP: MDL BF-BF conflict
schema: db
request: (42 seqno 2086357067 wsrep (toi, exec, committed) cmd 0 102 DROP TRIGGER IF EXISTS aud_tr_t)
granted: (37 seqno 2086357068 wsrep (high priority, exec, executing) cmd 0 161 INSERT INTO t ... SELECT ... FROM t1, t2 WHERE ... ON DUPLICATE KEY
UPDATE c1 = VALUES(...)
2019-07-30 0:33:47 42 [ERROR] Aborting
 
2019-07-30 0:34:07 0 [Warning] WSREP: 0x55859a56dea8 down context(s) not set
2019-07-30 0:34:07 0 [Warning] WSREP: Failed to send state UUID: -107 (Transport endpoint is not connected)

On node1 we see in the processlist (eventually threads were killed):

...
1320412 jobs ...:7678 db Killed 3959 wsrep replaying trx DELETE FROM audit_change WHERE id = 459275 0.000
1324781 jobs ...l:18998 db Killed 3959 wsrep replaying trx INSERT INTO t ... SELECT ... FROM t1, t2 WHERE ... ON DUPLICATE KEY UPDATE c1 = VALUES(...) 0.000
1324822 jobs ...:19118 db Killed 3959 Commit_implicit DROP TRIGGER IF EXISTS aud_tr_t 0.000
...

Trigger works with that audit_change table.

On node2 we see:

...
27 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
28 system user NULL Sleep 4004 After apply log event NULL 0.000
29 system user edoc_live_md Sleep 4004 Commit_implicit DROP TRIGGER IF EXISTS aud_tr_t 0.000
30 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
31 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
32 system user NULL Sleep 4004 After apply log event NULL 0.000
35 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
33 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
36 system user NULL Sleep 4004 committing NULL 0.000
37 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
38 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
39 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
40 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
41 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
42 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
43 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
44 system user NULL Sleep 4004 After apply log event NULL 0.000
45 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
46 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
47 system user NULL Sleep 4004 Opening tables INSERT INTO t ... SELECT ... FROM t1, t2 WHERE ... ON DUPLICATE KEY UPDATE c1 = VALUES(...) 0.000
49 system user NULL Sleep 4004 wsrep applier committed NULL 0.000
...

We have:

wsrep_slave_threads = 64
wsrep_retry_autocommit = 50
wsrep_provider_options = "gcache.size=10G; gmcast.segment=2; evs.send_window=256; evs.user_send_window=64;gcs.fc_limit=512;gcs.fc_factor=0.9;evs.keepalive_period=PT3S; evs.suspect_timeout=PT15S; evs.inactive_timeout=PT30S; evs.install_timeout=PT30S; evs.inactive_check_period=PT5S; evs.join_retrans_period=PT2S"

I'd expect metadata locks to be set on applier node and prevent attempt to execute DROP TRIGGER in parallel, further fail and retries etc. IMHO it is a bug.



 Comments   
Comment by Jan Lindström (Inactive) [ 2019-09-09 ]

I was able to repeat MDL BF-BF conflict using attached test case and with default parameters. I attached error logs from both servers.

Comment by Seppo Jaakola [ 2019-09-10 ]

The mtr test reveals a problem in BF aborting a stored procedure execution, which happens in too low level inside SP execution loop.
There is a potential fix for this issue, and is currently under testing.

However, the actual issue reported here is about conflict between DROP TRIGGER and INSERT...SELECT executions. Both executIons are marked as high priority, and MariaDB should not allow these to be applied parallelly, in the first place.

More detailed information is needed for reproducing this issue:

  • in which node was DROP TRIGGER statement entered?
  • in which node was INSERT..SELECT statement entered?
  • in which node was this conflict observed?
  • were other nodes able to apply both DROP TRIGGER and INSERT?
  • what was the impact of this BF conflict, e.g. did one or several nodes terminate?
  • Did the INSERT...SELECT statement execution cause a trigger to fire, and if so, was it the same trigger as was dropped?
Comment by Valerii Kravchuk [ 2019-09-10 ]

As for the impact, all 3 nodes went non-Primary.

I'll try to get the details to answer your other questions.

Comment by Seppo Jaakola [ 2019-09-10 ]

Another fix was pushed in our PR, with the purpose of preventing parallel applying of DROP TRIGGER and overlapping DML. However, there is no test script yet for this potential error scenario

Comment by Jan Lindström (Inactive) [ 2019-09-30 ]

seppo I tried your PR with attached test case and it still crashes. Now differently and it seems that we try to send result set after actual SP-execution has been terminated maybe because of BF kill or after a error. Here is stack trace from my laptop:

(gdb) where
#0  __pthread_kill (threadid=<optimized out>, signo=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x0000558ffee8e41b in my_write_core (sig=11) at /home/jan/mysql/10.4/mysys/stacktrace.c:481
#2  0x0000558ffe61465a in handle_fatal_signal (sig=11) at /home/jan/mysql/10.4/sql/signal_handler.cc:343
#3  <signal handler called>
#4  0x0000558ffe1a0617 in Protocol::valid_handler (this=0x7f7ca0001328, pos=0, type=PROTOCOL_SEND_LONG) at /home/jan/mysql/10.4/sql/protocol.h:48
#5  0x0000558ffe19d9de in Protocol_text::store_long (this=0x7f7ca0001328, from=0) at /home/jan/mysql/10.4/sql/protocol.cc:1200
#6  0x0000558ffe4e2ac8 in Type_handler::Item_send_long (this=0x558fff83db88 <type_handler_long>, item=0x7f7ca00f8800, protocol=0x7f7ca0001328, buf=0x7f7d21a07ad0) at /home/jan/mysql/10.4/sql/sql_type.cc:6751
#7  0x0000558ffe4ef48c in Type_handler_long::Item_send (this=0x558fff83db88 <type_handler_long>, item=0x7f7ca00f8800, protocol=0x7f7ca0001328, buf=0x7f7d21a07ad0) at /home/jan/mysql/10.4/sql/sql_type.h:4876
#8  0x0000558ffe1a2f16 in Item::send (this=0x7f7ca00f8800, protocol=0x7f7ca0001328, buffer=0x7f7d21a07ad0) at /home/jan/mysql/10.4/sql/item.h:1034
#9  0x0000558ffe19d071 in Protocol::send_result_set_row (this=0x7f7ca0001328, row_items=0x7f7ca004c3f0) at /home/jan/mysql/10.4/sql/protocol.cc:1035
#10 0x0000558ffe2513da in select_send::send_data (this=0x7f7ca00fe178, items=...) at /home/jan/mysql/10.4/sql/sql_class.cc:3003
#11 0x0000558ffe31699a in JOIN::exec_inner (this=0x7f7ca00fe1a0) at /home/jan/mysql/10.4/sql/sql_select.cc:4254
#12 0x0000558ffe316242 in JOIN::exec (this=0x7f7ca00fe1a0) at /home/jan/mysql/10.4/sql/sql_select.cc:4168
#13 0x0000558ffe317949 in mysql_select (thd=0x7f7ca0000d60, tables=0x0, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147749632, result=0x7f7ca00fe178, unit=0x7f7ca00fc0e0, select_lex=0x7f7ca004c2a8) at /home/jan/mysql/10.4/sql/sql_select.cc:4600
#14 0x0000558ffe307a67 in handle_select (thd=0x7f7ca0000d60, lex=0x7f7ca00fc018, result=0x7f7ca00fe178, setup_tables_done_option=0) at /home/jan/mysql/10.4/sql/sql_select.cc:408
#15 0x0000558ffe2cea8c in execute_sqlcom_select (thd=0x7f7ca0000d60, all_tables=0x0) at /home/jan/mysql/10.4/sql/sql_parse.cc:6352
#16 0x0000558ffe2c41c4 in mysql_execute_command (thd=0x7f7ca0000d60) at /home/jan/mysql/10.4/sql/sql_parse.cc:3894
#17 0x0000558ffe1d71e4 in sp_instr_stmt::exec_core (this=0x7f7ca00f90a8, thd=0x7f7ca0000d60, nextp=0x7f7d21a09204) at /home/jan/mysql/10.4/sql/sp_head.cc:3666
#18 0x0000558ffe1d6539 in sp_lex_keeper::reset_lex_and_exec_core (this=0x7f7ca00f90f0, thd=0x7f7ca0000d60, nextp=0x7f7d21a09204, open_tables=false, instr=0x7f7ca00f90a8) at /home/jan/mysql/10.4/sql/sp_head.cc:3394
#19 0x0000558ffe1d6da4 in sp_instr_stmt::execute (this=0x7f7ca00f90a8, thd=0x7f7ca0000d60, nextp=0x7f7d21a09204) at /home/jan/mysql/10.4/sql/sp_head.cc:3572
#20 0x0000558ffe1d052c in sp_head::execute (this=0x7f7ca0025398, thd=0x7f7ca0000d60, merge_da_on_success=true) at /home/jan/mysql/10.4/sql/sp_head.cc:1346
#21 0x0000558ffe1d3345 in sp_head::execute_procedure (this=0x7f7ca0025398, thd=0x7f7ca0000d60, args=0x7f7ca0005b48) at /home/jan/mysql/10.4/sql/sp_head.cc:2347
#22 0x0000558ffe2c15c5 in do_execute_sp (thd=0x7f7ca0000d60, sp=0x7f7ca0025398) at /home/jan/mysql/10.4/sql/sql_parse.cc:3014
#23 0x0000558ffe2c22b0 in Sql_cmd_call::execute (this=0x7f7ca0020b00, thd=0x7f7ca0000d60) at /home/jan/mysql/10.4/sql/sql_parse.cc:3256
#24 0x0000558ffe2cd502 in mysql_execute_command (thd=0x7f7ca0000d60) at /home/jan/mysql/10.4/sql/sql_parse.cc:6094
#25 0x0000558ffe2d2b8b in mysql_parse (thd=0x7f7ca0000d60, rawbuf=0x7f7ca001f2f8 "CALL p2()", length=9, parser_state=0x7f7d21a0b150, is_com_multi=false, is_next_command=false) at /home/jan/mysql/10.4/sql/sql_parse.cc:7912
#26 0x0000558ffe2d218b in wsrep_mysql_parse (thd=0x7f7ca0000d60, rawbuf=0x7f7ca001f2f8 "CALL p2()", length=9, parser_state=0x7f7d21a0b150, is_com_multi=false, is_next_command=false) at /home/jan/mysql/10.4/sql/sql_parse.cc:7716
#27 0x0000558ffe2bde1c in dispatch_command (command=COM_QUERY, thd=0x7f7ca0000d60, packet=0x7f7ca00171c1 "", packet_length=10, is_com_multi=false, is_next_command=false) at /home/jan/mysql/10.4/sql/sql_parse.cc:1826
#28 0x0000558ffe2bc5f1 in do_command (thd=0x7f7ca0000d60) at /home/jan/mysql/10.4/sql/sql_parse.cc:1358
#29 0x0000558ffe441490 in do_handle_one_connection (connect=0x5590020a1040) at /home/jan/mysql/10.4/sql/sql_connect.cc:1412
#30 0x0000558ffe4411df in handle_one_connection (arg=0x5590020a1040) at /home/jan/mysql/10.4/sql/sql_connect.cc:1316
#31 0x0000558ffee1ca60 in pfs_spawn_thread (arg=0x5590020f3930) at /home/jan/mysql/10.4/storage/perfschema/pfs.cc:1862
#32 0x00007f7d30ffa182 in start_thread (arg=<optimized out>) at pthread_create.c:486
#33 0x00007f7d30873b1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) p field_handlers
$1 = (const Type_handler **) 0x7f7ca00358e8
(gdb) p *field_handlers
$2 = (const Type_handler *) 0xa5a5a5a5a5a5a5a5
(gdb) p type
$3 = PROTOCOL_SEND_LONG
(gdb) p field_pos
$4 = 0

Comment by Jan Lindström (Inactive) [ 2019-09-30 ]

Attached latest error log here. Second type of crash I had contains following stack:

(gdb) where
#0  __pthread_kill (threadid=<optimized out>, signo=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x0000558fded3e41b in my_write_core (sig=11) at /home/jan/mysql/10.4/mysys/stacktrace.c:481
#2  0x0000558fde4c465a in handle_fatal_signal (sig=11) at /home/jan/mysql/10.4/sql/signal_handler.cc:343
#3  <signal handler called>
#4  0x0000558fde05061e in Protocol::valid_handler (this=0x7fb79c001328, pos=0, type=PROTOCOL_SEND_LONG) at /home/jan/mysql/10.4/sql/protocol.h:48
#5  0x0000558fde04d9de in Protocol_text::store_long (this=0x7fb79c001328, from=0) at /home/jan/mysql/10.4/sql/protocol.cc:1200
#6  0x0000558fde392ac8 in Type_handler::Item_send_long (this=0x558fdf6edb88 <type_handler_long>, item=0x7fb79c131a60, protocol=0x7fb79c001328, buf=0x7fb824ebdad0) at /home/jan/mysql/10.4/sql/sql_type.cc:6751
#7  0x0000558fde39f48c in Type_handler_long::Item_send (this=0x558fdf6edb88 <type_handler_long>, item=0x7fb79c131a60, protocol=0x7fb79c001328, buf=0x7fb824ebdad0) at /home/jan/mysql/10.4/sql/sql_type.h:4876
#8  0x0000558fde052f16 in Item::send (this=0x7fb79c131a60, protocol=0x7fb79c001328, buffer=0x7fb824ebdad0) at /home/jan/mysql/10.4/sql/item.h:1034
#9  0x0000558fde04d071 in Protocol::send_result_set_row (this=0x7fb79c001328, row_items=0x7fb79c130db0) at /home/jan/mysql/10.4/sql/protocol.cc:1035
#10 0x0000558fde1013da in select_send::send_data (this=0x7fb79c028478, items=...) at /home/jan/mysql/10.4/sql/sql_class.cc:3003
#11 0x0000558fde1c699a in JOIN::exec_inner (this=0x7fb79c0284a0) at /home/jan/mysql/10.4/sql/sql_select.cc:4254
#12 0x0000558fde1c6242 in JOIN::exec (this=0x7fb79c0284a0) at /home/jan/mysql/10.4/sql/sql_select.cc:4168
#13 0x0000558fde1c7949 in mysql_select (thd=0x7fb79c000d60, tables=0x0, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147749632, result=0x7fb79c028478, unit=0x7fb79c04a910, select_lex=0x7fb79c130c68) at /home/jan/mysql/10.4/sql/sql_select.cc:4600
#14 0x0000558fde1b7a67 in handle_select (thd=0x7fb79c000d60, lex=0x7fb79c04a848, result=0x7fb79c028478, setup_tables_done_option=0) at /home/jan/mysql/10.4/sql/sql_select.cc:408
#15 0x0000558fde17ea8c in execute_sqlcom_select (thd=0x7fb79c000d60, all_tables=0x0) at /home/jan/mysql/10.4/sql/sql_parse.cc:6352
#16 0x0000558fde1741c4 in mysql_execute_command (thd=0x7fb79c000d60) at /home/jan/mysql/10.4/sql/sql_parse.cc:3894
#17 0x0000558fde0871e4 in sp_instr_stmt::exec_core (this=0x7fb79c132308, thd=0x7fb79c000d60, nextp=0x7fb824ebf204) at /home/jan/mysql/10.4/sql/sp_head.cc:3666
#18 0x0000558fde086539 in sp_lex_keeper::reset_lex_and_exec_core (this=0x7fb79c132350, thd=0x7fb79c000d60, nextp=0x7fb824ebf204, open_tables=false, instr=0x7fb79c132308) at /home/jan/mysql/10.4/sql/sp_head.cc:3394
#19 0x0000558fde086da4 in sp_instr_stmt::execute (this=0x7fb79c132308, thd=0x7fb79c000d60, nextp=0x7fb824ebf204) at /home/jan/mysql/10.4/sql/sp_head.cc:3572
#20 0x0000558fde08052c in sp_head::execute (this=0x7fb79c0253d8, thd=0x7fb79c000d60, merge_da_on_success=true) at /home/jan/mysql/10.4/sql/sp_head.cc:1346
#21 0x0000558fde083345 in sp_head::execute_procedure (this=0x7fb79c0253d8, thd=0x7fb79c000d60, args=0x7fb79c005b48) at /home/jan/mysql/10.4/sql/sp_head.cc:2347
#22 0x0000558fde1715c5 in do_execute_sp (thd=0x7fb79c000d60, sp=0x7fb79c0253d8) at /home/jan/mysql/10.4/sql/sql_parse.cc:3014
#23 0x0000558fde1722b0 in Sql_cmd_call::execute (this=0x7fb79c023290, thd=0x7fb79c000d60) at /home/jan/mysql/10.4/sql/sql_parse.cc:3256
#24 0x0000558fde17d502 in mysql_execute_command (thd=0x7fb79c000d60) at /home/jan/mysql/10.4/sql/sql_parse.cc:6094
#25 0x0000558fde182b8b in mysql_parse (thd=0x7fb79c000d60, rawbuf=0x7fb79c01f338 "CALL p2()", length=9, parser_state=0x7fb824ec1150, is_com_multi=false, is_next_command=false) at /home/jan/mysql/10.4/sql/sql_parse.cc:7912
#26 0x0000558fde18218b in wsrep_mysql_parse (thd=0x7fb79c000d60, rawbuf=0x7fb79c01f338 "CALL p2()", length=9, parser_state=0x7fb824ec1150, is_com_multi=false, is_next_command=false) at /home/jan/mysql/10.4/sql/sql_parse.cc:7716
#27 0x0000558fde16de1c in dispatch_command (command=COM_QUERY, thd=0x7fb79c000d60, packet=0x7fb79c017201 "", packet_length=10, is_com_multi=false, is_next_command=false) at /home/jan/mysql/10.4/sql/sql_parse.cc:1826
#28 0x0000558fde16c5f1 in do_command (thd=0x7fb79c000d60) at /home/jan/mysql/10.4/sql/sql_parse.cc:1358
#29 0x0000558fde2f1490 in do_handle_one_connection (connect=0x558fe21061f0) at /home/jan/mysql/10.4/sql/sql_connect.cc:1412
#30 0x0000558fde2f11df in handle_one_connection (arg=0x558fe21061f0) at /home/jan/mysql/10.4/sql/sql_connect.cc:1316
#31 0x0000558fdeccca60 in pfs_spawn_thread (arg=0x558fe2039570) at /home/jan/mysql/10.4/storage/perfschema/pfs.cc:1862
#32 0x00007fb834484182 in start_thread (arg=<optimized out>) at pthread_create.c:486
#33 0x00007fb833cfdb1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) p field_pos
$1 = 0
(gdb) p field_handlers
$2 = (const Type_handler **) 0x7fb79c0359a8
(gdb) p type
$3 = PROTOCOL_SEND_LONG
(gdb) p field_handlers[0]
$4 = (const Type_handler *) 0x558fde503f7e <dummy_error_processor(THD*, void*)>
(gdb) p field_handlers[1]
$5 = (const Type_handler *) 0x0

Comment by Jan Lindström (Inactive) [ 2019-09-30 ]

I could not repeat these crashes when running standalone server with same test case (i.e. galera disabled).

Comment by Jan Lindström (Inactive) [ 2019-10-01 ]

I could not repeat the problem with 10.2

Comment by Julien Fritsch [ 2020-12-03 ]

Found in disabled.def that it was disabled, also I'm adding the missing label.

Comment by Julien Fritsch [ 2020-12-03 ]

Found in disabled.def for ES that it was disabled, also I'm adding the missing label.

Generated at Thu Feb 08 08:57:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.