Details
Description
This test is failing. I'm suspecting its just the test case.
https://buildbot.mariadb.org/#/builders/728/builds/7981/steps/7/logs/stdio
bb-10.11-release (10.11.14) 81d460b16a6e4dc1faaa23813b1e2aea01af916c |
CURRENT_TEST: galera.mysql-wsrep#198
|
mysqltest: At line 43: query 'UNLOCK TABLES' failed: <Unknown> (2026): TLS/SSL error: unexpected eof while reading
|
The result from queries just before the failure was:
|
< snip >
|
connection node_2_ctrl;
|
SET SESSION wsrep_sync_wait = 0;
|
connect node_2b, 127.0.0.1, root, , test, $NODE_MYPORT_2;
|
connection node_2b;
|
REPAIR TABLE t1,t2;;
|
connection node_2_ctrl;
|
Timeout in wait_condition.inc for SELECT COUNT(*) = 2 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE STATE LIKE 'Waiting for table metadata lock%' OR STATE LIKE 'acquiring total order isolation%';
|
SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST;
|
ID USER HOST DB COMMAND TIME STATE INFO TIME_MS STAGE MAX_STAGE PROGRESS MEMORY_USED MAX_MEMORY_USED EXAMINED_ROWS QUERY_ID INFO_BINARY TID
|
14 root localhost:48864 test Query 30 Waiting to execute in isolation REPAIR TABLE t1,t2 30982.554 0 0 0.000 88712 89728 0 50 REPAIR TABLE t1,t2 25354
|
13 root localhost:48858 test Query 31 Waiting for table metadata lock OPTIMIZE TABLE t1,t2 31100.824 0 0 0.000 88712 97848 0 46 OPTIMIZE TABLE t1,t2 25321
|
12 root localhost:48842 test Query 0 Filling schema table SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST 1.763 0 0 0.000 173872 187752 0 351 SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST 25319
|
11 root localhost:55088 test Sleep 31 NULL 31123.816 0 0 0.000 105320 298720 0 44 NULL 24681
|
2 system user NULL Sleep 31 After apply log event NULL 31124.787 0 0 0.000 79704 88840 0 42 NULL 22267
|
1 system user NULL Sleep 38 wsrep aborter idle NULL 38382.577 0 0 0.000 72296 72296 0 0 NULL 22266
|
Killing server ...
|
connection node_1;
|
INSERT INTO t2 VALUES (1);
|
connection node_2;
|
UNLOCK TABLES;
|
Problem 1:
By doing a SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST its failed its wait condition. The process list shows the "Waiting for table medatadata lock" so its the "acquiring total order isolation status" that isn't reached, or got bypassed.
Rather than this condition there is a "wsrep_before_toi_begin" DEBUG syncpoint immediately after the "acquiring total order isolation status" thd_proc_info status is set that can be used to get to this state.
Problem 2:
Why is the the test performing an UNLOCK TABLES on a galera instance that just got killed?
The kill_galera.inc disables reconnection so its a race condition that the SQL is actually made it to the server. But also, if the server is being killed, is there a point in UNLOCK TABLES?
Problem 3:
The ./mysql-test/include/kill_galera.inc ./mysql-test/suite/galera/include/kill_galera.inc scripts (is there a need for two? they are compatible) do not wait until the process is actually killed.
Tests like galera.GAL-419, galera_gcache_recover, are just sleeping to ensure the death of the process which is incorrect. MDEV-29142 as a test has include/wait_until_disconnected.inc
others are probing the cluster size until it reduces. Some consistency should be applied for reliability.