[MDEV-20928] Galera test failure on galera.galera_var_innodb_disallow_writes: Result length mismatch Created: 2019-10-30  Updated: 2020-07-27  Resolved: 2020-07-24

Status: Closed
Project: MariaDB Server
Component/s: Galera, Tests
Affects Version/s: 10.3.21, 10.4.9
Fix Version/s: 10.2.33, 10.3.24, 10.4.14, 10.5.5

Type: Bug Priority: Critical
Reporter: Stepan Patryshev (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

OS: CentOS Linux release 7.6.1810 (Core). TestTarball_2 rhel-8.


Attachments: Text File 191029_HDD_stdout.log     Zip Archive 191030_galera_var_innodb_disallow_writes.zip     Text File 20200112.2_tb2_rhel8.txt    

 Description   

It sporadically fails on bb and local suite runs.

MariaDB Version 10.4.9-MariaDB-debug: Repository: MariaDB/server; branch 10.4; Revision 9afbb1069ab4178741b76adefacd3f6a0ed51369; debug built from sources.

Galera lib 26.4.3(r4535): Repository: MariaDB/galera; branch mariadb-4.x; Revision 752664dc3c7065d8e0c73ac99d0028a5f84eb250; debug built from sources.

Run:

./mtr --suite galera --testcase-timeout=90 --force --big-test --max-test-fail=0 --timer --timestamp --timediff

stdout.log (the whole file is attached: 191029_HDD_stdout.log):

mysqltest: Result length mismatch

191029 20:10:41 +55.136 galera.galera_var_innodb_disallow_writes 'innodb' [ fail ]
        Test ended at 2019-10-29 20:10:41
 
CURRENT_TEST: galera.galera_var_innodb_disallow_writes
--- /home/stepan/mariadb/10.4/github/mysql-test/suite/galera/r/galera_var_innodb_disallow_writes.result	2019-10-17 14:30:27.382278020 +0300
+++ /home/stepan/mariadb/10.4/github/mysql-test/suite/galera/r/galera_var_innodb_disallow_writes.reject	2019-10-29 20:10:41.229513794 +0200
@@ -10,6 +10,18 @@
 SELECT COUNT(*) = 1 FROM t1;
 COUNT(*) = 1
 0
+Timeout in wait_condition.inc for SELECT 1 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE INFO = 'INSERT INTO t1 VALUES (1)' AND State = 'Commit'
+Id	User	Host	db	Command	Time	State	Info	Progress
+1	system user		NULL	Sleep	43	wsrep aborter idle	NULL	0.000
+2	system user		NULL	Sleep	43	Closing tables	NULL	0.000
+3	system user		NULL	Daemon	NULL	InnoDB purge coordinator	NULL	0.000
+4	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+5	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+6	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+7	system user		NULL	Daemon	NULL	InnoDB shutdown handler	NULL	0.000
+18	root	localhost	test	Sleep	35		NULL	0.000
+19	root	localhost:48696	test	Query	31	Update	INSERT INTO t1 VALUES (1)	0.000
+20	root	localhost:48698	test	Query	0	Init	show full processlist	0.000
 SELECT COUNT(*) = 1 FROM t1;
 COUNT(*) = 1
 0
 
mysqltest: Result length mismatch

See also other logs attached: 191030_galera_var_innodb_disallow_writes.zip.



 Comments   
Comment by Stepan Patryshev (Inactive) [ 2020-01-15 ]

It also sporadically fails on Azure.

stdout.log:

10.3.21-5 76cd6d4б rhel-8

2020-01-12T03:36:58.8333020Z galera.galera_var_innodb_disallow_writes 'innodb' w1 [ fail ]
2020-01-12T03:36:58.8334667Z         Test ended at 2020-01-12 03:36:58
2020-01-12T03:36:58.8335002Z 
2020-01-12T03:36:58.8335902Z CURRENT_TEST: galera.galera_var_innodb_disallow_writes
2020-01-12T03:36:58.8338173Z --- /__w/1/s/mysql-test/suite/galera/r/galera_var_innodb_disallow_writes.result	2020-01-12 01:04:46.000000000 +0000
2020-01-12T03:36:58.8338910Z +++ /__w/1/s/mysql-test/suite/galera/r/galera_var_innodb_disallow_writes.reject	2020-01-12 03:36:58.461969323 +0000
2020-01-12T03:36:58.8339402Z @@ -8,6 +8,18 @@
2020-01-12T03:36:58.8339644Z  SELECT COUNT(*) = 1 FROM t1;
2020-01-12T03:36:58.8339858Z  COUNT(*) = 1
2020-01-12T03:36:58.8340042Z  0
2020-01-12T03:36:58.8340597Z +Timeout in wait_condition.inc for SELECT 1 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE INFO = 'INSERT INTO t1 VALUES (1)' AND State = 'Commit'
2020-01-12T03:36:58.8340916Z +Id	User	Host	db	Command	Time	State	Info	Progress
2020-01-12T03:36:58.8341161Z +1	system user		NULL	Sleep	108	WSREP aborter idle	NULL	0.000
2020-01-12T03:36:58.8341392Z +2	system user		NULL	Sleep	33	Committed 87	NULL	0.000
2020-01-12T03:36:58.8341604Z +3	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
2020-01-12T03:36:58.8341841Z +4	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
2020-01-12T03:36:58.8342073Z +5	system user		NULL	Daemon	NULL	InnoDB purge coordinator	NULL	0.000
2020-01-12T03:36:58.8342314Z +6	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
2020-01-12T03:36:58.8342549Z +7	system user		NULL	Daemon	NULL	InnoDB shutdown handler	NULL	0.000
2020-01-12T03:36:58.8342763Z +101	root	localhost	test	Sleep	33		NULL	0.000
2020-01-12T03:36:58.8342996Z +102	root	localhost:34018	test	Query	30	Update	INSERT INTO t1 VALUES (1)	0.000
2020-01-12T03:36:58.8343239Z +103	root	localhost:34020	test	Query	0	Init	show full processlist	0.000
2020-01-12T03:36:58.8345241Z  SELECT COUNT(*) = 1 FROM t1;
2020-01-12T03:36:58.8345511Z  COUNT(*) = 1
2020-01-12T03:36:58.8345699Z  0
2020-01-12T03:36:58.8345852Z 
2020-01-12T03:36:58.8346020Z mysqltest: Result length mismatch
2020-01-12T03:36:58.8346188Z 
2020-01-12T03:36:58.8399015Z  - saving '/var/tmp/mtr/1/log/galera.galera_var_innodb_disallow_writes-innodb/' to '/var/tmp/mtr/log/galera.galera_var_innodb_disallow_writes-innodb/'

Comment by Stepan Patryshev (Inactive) [ 2020-01-15 ]

Increased the priority since it failed on Azure.

Comment by Seppo Jaakola [ 2020-06-26 ]

This test can end up in mutex deadlock between innodb background threads and two test connection thread executions:

2020-06-18 15:37:20 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139712668890880 has waited at row0row.cc line 1212 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x7f1173416308 created in file buf0buf.cc line 1580
a writer (thread id 0) has reserved it in mode SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file buf0flu.cc line 1165
2020-06-18 15:37:20 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139712923383552 has waited at buf0lru.cc line 936 for 241.00 seconds the semaphore:
Mutex at 0x5585f8439e30, Mutex BUF_POOL created buf0buf.cc:1873, lock var 2
Last time reserved in file /home/mikko/work/codership_mariadb/mariadb-server/storage/innobase/buf/buf0buf.cc line 6214
2020-06-18 15:37:20 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139713340323584 has waited at btr0pcur.ic line 545 for 241.00 seconds the semaphore:
S-lock on RW-latch at 0x7f117342c4f0 created in file buf0buf.cc line 1580
a writer (thread id 139712923383552) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time write locked in file row0ins.cc line 2607
2020-06-18 15:37:20 0 [Note] InnoDB: A semaphore wait:
--Thread 139712803108608 has waited at buf0flu.cc line 2226 for 240.00 seconds the semaphore:
Mutex at 0x5585f8439e30, Mutex BUF_POOL created buf0buf.cc:1873, lock var 2
Last time reserved in file /home/mikko/work/codership_mariadb/mariadb-server/storage/innobase/buf/buf0buf.cc line 6214

1. test connection processing INSERT, hangs
Thread 29 (Thread 0x7f11732b3700 (LWP 99285)):
buf_pool_mutex_enter(buf_pool);

2. test connection processing SELECT, hangs:
Thread 30 (Thread 0x7f118c053700 (LWP 99043)):
rw_lock_s_lock_inline(&block->lock, 0, file, line);

3. innodb background thread hangs:
#25 0x00005585f60a6841 in srv_purge_coordinator_thread (arg=0x0) at /home/mikko/work/codership_mariadb/mariadb-server/storage/innobase/srv/srv0srv.cc:2740
Thread 23 (Thread 0x7f1163fff700 (LWP 99030)):
rw_lock_x_lock_inline(&block->lock, 0, file, line);

4. another innodb background threads hangs for the actual innodb_disallow_writes block:
#15 0x00005585f60a59e6 in srv_master_thread (arg=0x0) at /home/mikko/work/codership_mariadb/mariadb-server/storage/innobase/srv/srv0srv.cc:2384
Thread 18 (Thread 0x7f1168df4700 (LWP 99029)):
WAIT_ALLOW_WRITES();

Comment by Seppo Jaakola [ 2020-06-26 ]

a PR proposal to fix this will be submitted soon'ish, it contains test logic refactoring

Comment by Seppo Jaakola [ 2020-07-02 ]

a PR was submitted

Comment by Seppo Jaakola [ 2020-07-02 ]

note: the test was not enabled as part of the PR, it probably should be enabled if PR passes reviews

Generated at Thu Feb 08 09:03:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.