Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-37269

Test failure: galera.mysql-wsrep#198 (and others using kill_galera.inc)

    XMLWordPrintable

Details

    Description

      This test is failing. I'm suspecting its just the test case.

      https://buildbot.mariadb.org/#/builders/728/builds/7981/steps/7/logs/stdio

      bb-10.11-release (10.11.14) 81d460b16a6e4dc1faaa23813b1e2aea01af916c

      CURRENT_TEST: galera.mysql-wsrep#198
      mysqltest: At line 43: query 'UNLOCK TABLES' failed: <Unknown> (2026): TLS/SSL error: unexpected eof while reading
      The result from queries just before the failure was:
      < snip >
      connection node_2_ctrl;
      SET SESSION wsrep_sync_wait = 0;
      connect node_2b, 127.0.0.1, root, , test, $NODE_MYPORT_2;
      connection node_2b;
      REPAIR TABLE t1,t2;;
      connection node_2_ctrl;
      Timeout in wait_condition.inc for SELECT COUNT(*) = 2 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE STATE LIKE 'Waiting for table metadata lock%' OR STATE LIKE 'acquiring total order isolation%';
      SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST;
      ID	USER	HOST	DB	COMMAND	TIME	STATE	INFO	TIME_MS	STAGE	MAX_STAGE	PROGRESS	MEMORY_USED	MAX_MEMORY_USED	EXAMINED_ROWS	QUERY_ID	INFO_BINARY	TID
      14	root	localhost:48864	test	Query	30	Waiting to execute in isolation	REPAIR TABLE t1,t2	30982.554	0	0	0.000	88712	89728	0	50	REPAIR TABLE t1,t2	25354
      13	root	localhost:48858	test	Query	31	Waiting for table metadata lock	OPTIMIZE TABLE t1,t2	31100.824	0	0	0.000	88712	97848	0	46	OPTIMIZE TABLE t1,t2	25321
      12	root	localhost:48842	test	Query	0	Filling schema table	SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST	1.763	0	0	0.000	173872	187752	0	351	SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST	25319
      11	root	localhost:55088	test	Sleep	31		NULL	31123.816	0	0	0.000	105320	298720	0	44	NULL	24681
      2	system user		NULL	Sleep	31	After apply log event	NULL	31124.787	0	0	0.000	79704	88840	0	42	NULL	22267
      1	system user		NULL	Sleep	38	wsrep aborter idle	NULL	38382.577	0	0	0.000	72296	72296	0	0	NULL	22266
      Killing server ...
      connection node_1;
      INSERT INTO t2 VALUES (1);
      connection node_2;
      UNLOCK TABLES;
      

      Problem 1:

      By doing a SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST its failed its wait condition. The process list shows the "Waiting for table medatadata lock" so its the "acquiring total order isolation status" that isn't reached, or got bypassed.

      Rather than this condition there is a "wsrep_before_toi_begin" DEBUG syncpoint immediately after the "acquiring total order isolation status" thd_proc_info status is set that can be used to get to this state.

      Problem 2:

      Why is the the test performing an UNLOCK TABLES on a galera instance that just got killed?

      The kill_galera.inc disables reconnection so its a race condition that the SQL is actually made it to the server. But also, if the server is being killed, is there a point in UNLOCK TABLES?

      Problem 3:

      The ./mysql-test/include/kill_galera.inc ./mysql-test/suite/galera/include/kill_galera.inc scripts (is there a need for two? they are compatible) do not wait until the process is actually killed.

      Tests like galera.GAL-419, galera_gcache_recover, are just sleeping to ensure the death of the process which is incorrect. MDEV-29142 as a test has include/wait_until_disconnected.inc
      others are probing the cluster size until it reduces. Some consistency should be applied for reliability.

      Attachments

        Activity

          People

            sysprg Julius Goryavsky
            danblack Daniel Black
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.