[MDEV-24485] galera.galera_bf_kill_debug MTR failed: A long semaphore wait Created: 2020-12-23  Updated: 2023-12-05

Status: Stalled
Project: MariaDB Server
Component/s: Galera, Tests
Affects Version/s: 10.5.9, 10.4
Fix Version/s: 10.4, 10.5

Type: Bug Priority: Major
Reporter: Stepan Patryshev (Inactive) Assignee: Seppo Jaakola
Resolution: Unresolved Votes: 0
Labels: None
Environment:

kvm-rpm-centos74-amd64-debug


Attachments: Zip Archive MDEV-24485_105_logs_201221.zip     Zip Archive MDEV-24485_106_logs_201219.zip    
Issue Links:
Blocks
blocks MDEV-22122 Galera test failures on 10.5 Open

 Description   

galera.galera_bf_kill_debug failed on BB 10.5 CS: "A long semaphore wait".
It seems to be a sporadic issue.

stdio.log:

10.5.9, 39378e1366f78b38c05e45103b9fb9c829cc5f4f, kvm-rpm-centos74-amd64-debug

galera.galera_bf_kill_debug 'innodb'     w2 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2020-12-21 19:46:40
line
2020-12-21 19:45:52 0 [Warning] InnoDB: A long semaphore wait:
2020-12-21 19:46:07 0 [Warning] InnoDB: A long semaphore wait:
2020-12-21 19:46:22 0 [Warning] InnoDB: A long semaphore wait:
2020-12-21 19:46:37 0 [Warning] InnoDB: A long semaphore wait:
^ Found warnings in /dev/shm/var/2/log/mysqld.2.err
ok
 
 - skipping '/dev/shm/var/2/log/galera.galera_bf_kill_debug-innodb/'
 
Retrying test galera.galera_bf_kill_debug, attempt(2/3)...
 
worker[2] > Restart  - not started
worker[2] > Restart  - not started

10.5 Server logs.

But on BB 10.6 CS it failed with another output: "safe_mutex: Found wrong usage of mutex 'LOCK_thd_kill' and 'mutex'".

stdio.log:

10.6.0, 30dc4287ec3d46bae7593f56383b9f3738e3c4e6, kvm-rpm-centos74-amd64-debug

galera.galera_bf_kill_debug 'innodb'     w2 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2020-12-19 12:47:06
line
safe_mutex: Found wrong usage of mutex 'LOCK_thd_kill' and 'mutex'
^ Found warnings in /dev/shm/var/2/log/mysqld.2.err
ok
 
 - skipping '/dev/shm/var/2/log/galera.galera_bf_kill_debug-innodb/'
 
Retrying test galera.galera_bf_kill_debug, attempt(2/3)...
 
worker[2] > Restart  - not started
worker[2] > Restart  - not started
galera.galera_bf_kill_debug 'innodb'     w2 [ retry-pass ]   2059
 
Retrying test galera.galera_bf_kill_debug, attempt(3/3)...
 
galera.galera_bf_kill_debug 'innodb'     w2 [ retry-fail ]  Found warnings/errors in server log file!
        Test ended at 2020-12-19 12:47:14
line
safe_mutex: Found wrong usage of mutex 'mutex' and 'LOCK_thd_data'
^ Found warnings in /dev/shm/var/2/log/mysqld.2.err
ok
 
 - skipping '/dev/shm/var/2/log/galera.galera_bf_kill_debug-innodb/'
worker[2] > Restart  - using different config file
worker[2] > Restart  - using different config file

10.6 Server logs.



 Comments   
Comment by Marko Mäkelä [ 2021-03-19 ]

I will disable the test on 10.5 due to this local failure:

CURRENT_TEST: galera.galera_bf_kill_debug
mysqltest: At line 93: query 'drop table t1' failed: 2013: Lost connection to MySQL server during query
2021-03-19 12:28:31 0 [Note] WSREP: enqueuing trx abort for (11)
2021-03-19 12:28:31 10 [Note] WSREP: wsrep_thd_binlog_reset
mariadbd: /mariadb/10.5m/wsrep-lib/src/client_state.cpp:775: void wsrep::client_state::do_acquire_ownership(wsrep::unique_lock<wsrep::mutex> &): Assertion `state_ == s_idle || mode_ != m_local' failed.
#7  0x000055c0a15c828f in wsrep::client_state::do_acquire_ownership (this=0x7f93f4008308, lock=<optimized out>) at /mariadb/10.5m/wsrep-lib/src/client_state.cpp:775
#8  0x000055c0a10dffa7 in wsrep::client_state::acquire_ownership (this=this@entry=0x7f93f4008308) at /mariadb/10.5m/wsrep-lib/include/wsrep/client_state.hpp:153
#9  0x000055c0a10f6383 in wsrep_rollback_process (rollbacker=0x7f93fc000d48, arg=<optimized out>) at /mariadb/10.5m/sql/wsrep_thd.cc:253
#10 0x000055c0a10ec4db in start_wsrep_THD (arg=arg@entry=0x55c0a3310c10) at /mariadb/10.5m/sql/wsrep_mysqld.cc:3050
#11 0x000055c0a107aa2f in pfs_spawn_thread (arg=0x55c0a3335398) at /mariadb/10.5m/storage/perfschema/pfs.cc:2201
#12 0x00007f9423fcdea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x00007f94235fadef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment by Jan Lindström (Inactive) [ 2021-04-15 ]

I can repeat the last assertion on wsrep-lib using --repeat=100

Comment by Julien Fritsch [ 2023-12-05 ]

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Comment by JiraAutomate [ 2023-12-05 ]

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Generated at Thu Feb 08 09:30:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.