[MDEV-28777] binlog.binlog_truncate_multi_engine failed in bb with Lost connection to server during query Created: 2022-06-08  Updated: 2023-04-17  Resolved: 2023-04-17

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.6, 10.8
Fix Version/s: 10.11.3, 10.6.13, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Major
Reporter: Angelique Sklavounos (Inactive) Assignee: Andrei Elkin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates

 Description   

https://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/8853

10.6 4179f93d28035ea2798cb1c16feeaaef

binlog.binlog_truncate_multi_engine 'innodb,row' w4 [ fail ]
        Test ended at 2022-06-06 16:16:56
 
CURRENT_TEST: binlog.binlog_truncate_multi_engine
mysqltest: In included file "/usr/share/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.inc": 
included from /usr/share/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test at line 47:
At line 42: query 'if (`SELECT $case = "B"`)
{' failed: <Unknown> (2013): Lost connection to server during query
 
The result from queries just before the failure was:
< snip >
#
# Case "B" : "one engine has committed its transaction branch"
#
RESET MASTER;
FLUSH LOGS;
SET GLOBAL max_binlog_size= 4096;
connect con1,localhost,root,,;
List of binary logs before rotation
show binary logs;
Log_name	File_size
master-bin.000001	#
master-bin.000002	#
INSERT INTO t1 VALUES (1, REPEAT("x", 1));
INSERT INTO t2 VALUES (1, REPEAT("x", 1));
SET GLOBAL debug_dbug="d,enable_log_write_upto_crash";
BEGIN;
INSERT INTO t2 VALUES (2, REPEAT("x", 4100));
INSERT INTO t1 VALUES (2, REPEAT("x", 4100));
COMMIT;
connection default;

Has happened a few times, always on kvm-rpm-centos74-amd64-debug.



 Comments   
Comment by Sergei Golubchik [ 2022-11-15 ]

Binlog does

void
MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
{
...
      mysql_mutex_lock(&current->thd->LOCK_thd_data);
      run_commit_ordered(current->thd, current->all);
      mysql_mutex_unlock(&current->thd->LOCK_thd_data);

Then in run_commit_ordered()

void
TC_LOG::run_commit_ordered(THD *thd, bool all)
{
...    DEBUG_SYNC(thd, "commit_after_run_commit_ordered");
  }
}

When the server is being shut down it runs

static my_bool kill_thread_phase_1(THD *thd, int *n_threads_awaiting_ack)
{
  if (thd->slave_thread || thd->is_binlog_dump_thread() ||
      (shutdown_wait_for_slaves &&
      ...

which does

bool THD::is_binlog_dump_thread()
{
  mysql_mutex_lock(&LOCK_thd_data);

but this mutex is already taken by the group commit leader, which is forever waiting on the "commit_after_run_commit_ordered" sync point. mysqltest gives up waiting for the server to shut down and kills it with SIGABRT.

Comment by Andrei Elkin [ 2023-04-14 ]

Interestingly S@rg's comments in there talk about a different already fixed by him `2ab52cc0e5c3` issue in the case C.

This time it is B.
Could you please study the commit which is hopefully instructive.

Comment by Angelique Sklavounos (Inactive) [ 2023-04-14 ]

Thanks, the fix makes sense to me.

Generated at Thu Feb 08 10:03:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.