[MDEV-14489] rpl.rpl_stm_mixing_engines, rpl.rpl_non_direct_mixed_mixing_engines, rpl.rpl_mixed_mixing_engines, rpl.rpl_non_direct_stm_mixing_engines failed in buildbot, sync_slave_with_master failed Created: 2017-11-23  Updated: 2019-12-10  Resolved: 2019-12-10

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Tests
Affects Version/s: 10.1
Fix Version/s: 10.1.34, 10.2.16, 10.3.8, 10.4.0

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: None

Attachments: HTML File output    
Issue Links:
Relates
relates to MDEV-16283 ALTER TABLE...DISCARD TABLESPACE stil... Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest-big/builds/1328/steps/mtr_ps/logs/stdio

10.1 7e1326cfcf5aea38f40fb787f1b987cb325c785b

rpl.rpl_non_direct_stm_mixing_engines 'innodb_plugin,stmt' w2 [ fail ]
        Test ended at 2017-11-15 15:12:05
 
CURRENT_TEST: rpl.rpl_non_direct_stm_mixing_engines
=== SHOW MASTER STATUS ===
---- 1. ----
File	slave-bin.000001
Position	124783
Binlog_Do_DB	
Binlog_Ignore_DB	
==========================
 
=== SHOW SLAVE STATUS ===
---- 1. ----
Slave_IO_State	Waiting for master to send event
Master_Host	127.0.0.1
Master_User	root
Master_Port	16020
Connect_Retry	1
Master_Log_File	master-bin.000001
Read_Master_Log_Pos	144680
Relay_Log_File	slave-relay-bin.000002
Relay_Log_Pos	130500
Relay_Master_Log_File	master-bin.000001
Slave_IO_Running	Yes
Slave_SQL_Running	Yes
Replicate_Do_DB	
Replicate_Ignore_DB	
Replicate_Do_Table	
Replicate_Ignore_Table	
Replicate_Wild_Do_Table	
Replicate_Wild_Ignore_Table	
Last_Errno	0
Last_Error	
Skip_Counter	0
Exec_Master_Log_Pos	130211
Relay_Log_Space	145267
Until_Condition	None
Until_Log_File	
Until_Log_Pos	0
Master_SSL_Allowed	No
Master_SSL_CA_File	
Master_SSL_CA_Path	
Master_SSL_Cert	
Master_SSL_Cipher	
Master_SSL_Key	
Seconds_Behind_Master	405
Master_SSL_Verify_Server_Cert	No
Last_IO_Errno	0
Last_IO_Error	
Last_SQL_Errno	0
Last_SQL_Error	
Replicate_Ignore_Server_Ids	
Master_Server_Id	1
Master_SSL_Crl	
Master_SSL_Crlpath	
Using_Gtid	No
Gtid_IO_Pos	
Replicate_Do_Domain_Ids	
Replicate_Ignore_Domain_Ids	
Parallel_Mode	conservative
=========================
 
=== SHOW PROCESSLIST ===
---- 1. ----
Id	4
User	root
Host	localhost:37568
db	test
Command	Sleep
Time	1141
State	
Info	
Progress	0.000
---- 2. ----
Id	5
User	root
Host	localhost:37570
db	test
Command	Sleep
Time	1141
State	
Info	
Progress	0.000
---- 3. ----
Id	6
User	system user
Host	
db	
Command	Connect
Time	1141
State	Waiting for master to send event
Info	
Progress	0.000
---- 4. ----
Id	7
User	system user
Host	
db	test
Command	Connect
Time	405
State	Unlocking tables
Info	DROP TABLE IF EXISTS `tt_xx_7`,`nt_xx_7` /* generated by server */
Progress	0.000
---- 5. ----
Id	8
User	root
Host	localhost:37582
db	test
Command	Query
Time	0
State	init
Info	SHOW PROCESSLIST
Progress	0.000
---- 6. ----
Id	9
User	root
Host	localhost:37584
db	test
Command	Sleep
Time	1141
State	
Info	
Progress	0.000
========================
 
analyze: sync_with_master
mysqltest: In included file "./extra/rpl_tests/rpl_mixing_engines.test": 
included from /mnt/buildbot/build/mariadb-10.1.30/mysql-test/suite/rpl/t/rpl_non_direct_stm_mixing_engines.test at line 15:
At line 1803: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 144680, 300, '')' returned -1 indicating timeout after 300 seconds

The full failure output is too long, it's in the attachment.



 Comments   
Comment by Elena Stepanova [ 2017-12-20 ]

rpl.rpl_mixed_mixing_engines
http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest-big/builds/1427/steps/mtr_ps/logs/stdio

rpl.rpl_mixed_mixing_engines 'innodb_plugin,mix' w1 [ fail ]
        Test ended at 2017-12-15 00:11:16
 
CURRENT_TEST: rpl.rpl_mixed_mixing_engines
=== SHOW MASTER STATUS ===
---- 1. ----
File	slave-bin.000001
Position	131587
Binlog_Do_DB	
Binlog_Ignore_DB	
==========================
 
=== SHOW SLAVE STATUS ===
---- 1. ----
Slave_IO_State	Waiting for master to send event
Master_Host	127.0.0.1
Master_User	root
Master_Port	16000
Connect_Retry	1
Master_Log_File	master-bin.000001
Read_Master_Log_Pos	168022
Relay_Log_File	slave-relay-bin.000002
Relay_Log_Pos	138938
Relay_Master_Log_File	master-bin.000001
Slave_IO_Running	Yes
Slave_SQL_Running	Yes
Replicate_Do_DB	
Replicate_Ignore_DB	
Replicate_Do_Table	
Replicate_Ignore_Table	
Replicate_Wild_Do_Table	
Replicate_Wild_Ignore_Table	
Last_Errno	0
Last_Error	
Skip_Counter	0
Exec_Master_Log_Pos	138649
Relay_Log_Space	168609
Until_Condition	None
Until_Log_File	
Until_Log_Pos	0
Master_SSL_Allowed	No
Master_SSL_CA_File	
Master_SSL_CA_Path	
Master_SSL_Cert	
Master_SSL_Cipher	
Master_SSL_Key	
Seconds_Behind_Master	402
Master_SSL_Verify_Server_Cert	No
Last_IO_Errno	0
Last_IO_Error	
Last_SQL_Errno	0
Last_SQL_Error	
Replicate_Ignore_Server_Ids	
Master_Server_Id	1
Master_SSL_Crl	
Master_SSL_Crlpath	
Using_Gtid	No
Gtid_IO_Pos	
Replicate_Do_Domain_Ids	
Replicate_Ignore_Domain_Ids	
Parallel_Mode	conservative
=========================
 
=== SHOW PROCESSLIST ===
---- 1. ----
Id	14
User	root
Host	localhost:38668
db	test
Command	Sleep
Time	1052
State	
Info	
Progress	0.000
---- 2. ----
Id	15
User	root
Host	localhost:38670
db	test
Command	Sleep
Time	1052
State	
Info	
Progress	0.000
---- 3. ----
Id	16
User	system user
Host	
db	
Command	Connect
Time	1052
State	Waiting for master to send event
Info	
Progress	0.000
---- 4. ----
Id	17
User	system user
Host	
db	test
Command	Connect
Time	401
State	Unlocking tables
Info	DROP TABLE IF EXISTS `tt_xx_7`,`nt_xx_7` /* generated by server */
Progress	0.000
---- 5. ----
Id	18
User	root
Host	localhost:38682
db	test
Command	Query
Time	0
State	init
Info	SHOW PROCESSLIST
Progress	0.000
---- 6. ----
Id	19
User	root
Host	localhost:38684
db	test
Command	Sleep
Time	1052
State	
Info	
Progress	0.000
========================
 
analyze: sync_with_master
mysqltest: In included file "./extra/rpl_tests/rpl_mixing_engines.test": 
included from /mnt/buildbot/build/mariadb-10.1.30/mysql-test/suite/rpl/t/rpl_mixed_mixing_engines.test at line 12:
At line 1803: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 168022, 300, '')' returned -1 indicating timeout after 300 seconds
 
The result from queries just before the failure was:
< snip >
-e-e-e-e-e-e-e-e-e-e-e- >> set-NT << -e-e-e-e-e-e-e-e-e-e-e-
-b-b-b-b-b-b-b-b-b-b-b- >> R << -b-b-b-b-b-b-b-b-b-b-b-
ROLLBACK;
Warnings:
Warning	#	Some non-transactional changed tables couldn't be rolled back
include/show_binlog_events.inc
-e-e-e-e-e-e-e-e-e-e-e- >> R << -e-e-e-e-e-e-e-e-e-e-e-
-b-b-b-b-b-b-b-b-b-b-b- >> B set-T set-NT R << -b-b-b-b-b-b-b-b-b-b-b-
include/show_binlog_events.inc
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
master-bin.000001	#	Gtid	#	#	BEGIN GTID #-#-#
master-bin.000001	#	Table_map	#	#	table_id: # (test.nt_3)
master-bin.000001	#	Table_map	#	#	table_id: # (test.nt_3)
master-bin.000001	#	Write_rows_v1	#	#	table_id: # flags: STMT_END_F
master-bin.000001	#	Query	#	#	COMMIT
-e-e-e-e-e-e-e-e-e-e-e- >> B set-T set-NT R << -e-e-e-e-e-e-e-e-e-e-e-
 
###################################################################################
#                               CHECK CONSISTENCY 
###################################################################################
 
More results from queries before failure can be found in /mnt/buildbot/build/mariadb-10.1.30/mysql-test/var/1/log/rpl_mixed_mixing_engines.log
 
 
 == /mnt/buildbot/build/mariadb-10.1.30/mysql-test/var/1/tmp/analyze-sync_with_master-mysqld.1.err ==
 
############################## default ##############################
 
**** SHOW WARNINGS on default ****
SHOW WARNINGS;
Level	Code	Message
 
**** SELECT replication-related variables on default ****
SELECT NOW(), @@SERVER_ID;
NOW()	@@SERVER_ID
2017-12-15 00:11:15	1
 
**** SHOW SLAVE STATUS on default ****
SHOW SLAVE STATUS;
 
**** SHOW MASTER STATUS on default ****
SHOW MASTER STATUS;
File	master-bin.000001
Position	168022
Binlog_Do_DB	
Binlog_Ignore_DB	
 
**** SHOW SLAVE HOSTS on default ****
SHOW SLAVE HOSTS;
Server_id	2
Host	127.0.0.1
Port	16001
Master_id	1
 
**** SHOW PROCESSLIST on default ****
SHOW PROCESSLIST;
Id	User	Host	db	Command	Time	State	Info	Progress
130	root	localhost:45826	NULL	Binlog Dump	1056	Master has sent all binlog to slave; waiting for binlog to be updated	NULL	0.000
1862	root	localhost	NULL	Query	0	init	SHOW PROCESSLIST	0.000
 
**** SHOW BINARY LOGS on default ****
SHOW BINARY LOGS;
Log_name	File_size
master-bin.000001	168022
 
**** SHOW BINLOG EVENTS on default ****
binlog_name = 'master-bin.000001'
SHOW BINLOG EVENTS IN 'master-bin.000001';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
master-bin.000001	4	Format_desc	1	249	Server ver: 10.1.30-MariaDB, Binlog ver: 4
master-bin.000001	249	Gtid_list	1	274	[]
master-bin.000001	274	Binlog_checkpoint	1	314	master-bin.000001
master-bin.000001	314	Gtid	1	352	BEGIN GTID 0-1-1
master-bin.000001	352	Query	1	620	use `mtr`; INSERT INTO test_suppressions (pattern) VALUES ( NAME_CONST('pattern',_latin1'Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT' COLLATE 'latin1_swedish_ci'))
master-bin.000001	620	Query	1	688	COMMIT
...
slave-relay-bin.000002	168054	Gtid	1	167803	BEGIN GTID 0-1-602
slave-relay-bin.000002	168092	Table_map	1	167850	table_id: 47 (test.nt_3)
slave-relay-bin.000002	168139	Table_map	1	167897	table_id: 47 (test.nt_3)
slave-relay-bin.000002	168186	Write_rows_v1	1	167953	table_id: 47 flags: STMT_END_F
slave-relay-bin.000002	168242	Query	1	168022	COMMIT
 
 
 
 - saving '/mnt/buildbot/build/mariadb-10.1.30/mysql-test/var/1/log/rpl.rpl_mixed_mixing_engines-innodb_plugin,mix/' to '/mnt/buildbot/build/mariadb-10.1.30/mysql-test/var/log/rpl.rpl_mixed_mixing_engines-innodb_plugin,mix/'
worker[1] > Restart  - not started
worker[1] > Restart  - not started
***Warnings generated in error logs during shutdown after running tests: rpl.rpl_mixed_drop_create_temp_table rpl.rpl_mixed_mixing_engines
 
InnoDB: Error: Waited for 5 secs for hash index ref_count (1) to drop to 0.
...
InnoDB: Error: Waited for 235 secs for hash index ref_count (1) to drop to 0.
InnoDB: Warning: semaphore wait:
InnoDB: Error: Waited for 240 secs for hash index ref_count (1) to drop to 0.
InnoDB: Error: Waited for 245 secs for hash index ref_count (1) to drop to 0.
InnoDB: Error: Waited for 250 secs for hash index ref_count (1) to drop to 0.
InnoDB: Error: Waited for 255 secs for hash index ref_count (1) to drop to 0.
InnoDB: Error: Waited for 260 secs for hash index ref_count (1) to drop to 0.
InnoDB: Error: Waited for 265 secs for hash index ref_count (1) to drop to 0.
InnoDB: Warning: semaphore wait:
InnoDB: Warning: semaphore wait:
InnoDB: Error: Waited for 270 secs for hash index ref_count (1) to drop to 0.
InnoDB: Error: Waited for 275 secs for hash index ref_count (1) to drop to 0.
InnoDB: Error: Waited for 280 secs for hash index ref_count (1) to drop to 0.
InnoDB: Error: Waited for 285 secs for hash index ref_count (1) to drop to 0.
InnoDB: Error: Waited for 290 secs for hash index ref_count (1) to drop to 0.
InnoDB: Error: Waited for 295 secs for hash index ref_count (1) to drop to 0.
InnoDB: Warning: semaphore wait:
InnoDB: Warning: semaphore wait:
InnoDB: Error: Waited for 300 secs for hash index ref_count (1) to drop to 0.
...
InnoDB: Error: Waited for 395 secs for hash index ref_count (1) to drop to 0.

Comment by Marko Mäkelä [ 2018-05-29 ]

As part of the MDEV-16283 fix, I replaced the flawed logic with one that actively evicts the adaptive hash index. My understanding is that the wait while holding the InnoDB data dictionary lock could essentially lock the whole InnoDB instance. When the InnoDB buffer pool is idle, nothing can possibly evict pages, which would be the only way how adaptive hash index entries for a to-be-dropped index could be removed.

Because I am not sure if this loop of

InnoDB: Error: Waited for 5 secs for hash index ref_count (1) to drop to 0.

is the only source of these test failures, I am leaving this issue open.

Comment by Marko Mäkelä [ 2019-12-10 ]

I have good reason to believe that this was indeed fixed by the MDEV-16283 fix, which was pushed on May 29, 2018. According to buildbot cross-reference, the latest failure of this test was on April 27, 2018. Due to the nondeterministic nature of the InnoDB adaptive hash index, the time between failures varied to up to 3 months.

Generated at Thu Feb 08 08:13:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.