[MDEV-7174] perfschema.global_read_lock fails in buildbot Created: 2014-11-23  Updated: 2015-01-11  Resolved: 2015-01-11

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.1
Fix Version/s: 10.1.3

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: buildbot, tests

Issue Links:
Blocks
blocks MDEV-7069 Fix buildbot failures in main server ... Stalled
blocks MDEV-7172 Fix buildbot failures in 10.1 tree Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-deb-precise-amd64/builds/3551/steps/test_4/logs/stdio

perfschema.global_read_lock              w3 [ fail ]
        Test ended at 2014-11-23 03:31:21
 
CURRENT_TEST: perfschema.global_read_lock
--- /usr/share/mysql/mysql-test/suite/perfschema/r/global_read_lock.result	2014-11-22 23:33:32.000000000 +0200
+++ /run/shm/var/3/log/global_read_lock.reject	2014-11-23 03:31:21.547093061 +0200
@@ -18,13 +18,16 @@
 unlock tables;
 lock tables performance_schema.setup_instruments write;
 connection default;
+Timeout in wait_condition.inc for select 1 from performance_schema.events_waits_current where event_name like "wait/synch/cond/sql/MDL_context::COND_wait_status"
+Id	User	Host	db	Command	Time	State	Info	Progress
+12470	root	localhost	performance_schema	Query	0	init	show full processlist	0.000
+12471	pfsuser	localhost	test	Query	30	Waiting for global read lock	lock tables performance_schema.setup_instruments write	0.000
 select event_name,
 left(source, locate(":", source)) as short_source,
 timer_end, timer_wait, operation
 from performance_schema.events_waits_current
 where event_name like "wait/synch/cond/sql/MDL_context::COND_wait_status";
 event_name	short_source	timer_end	timer_wait	operation
-wait/synch/cond/sql/MDL_context::COND_wait_status	mdl.cc:	NULL	NULL	timed_wait
 unlock tables;
 update performance_schema.setup_instruments set enabled='NO';
 update performance_schema.setup_instruments set enabled='YES';
 
mysqltest: Result length mismatch



 Comments   
Comment by Elena Stepanova [ 2014-11-23 ]

The sequence that triggers the failure is perl ./mtr --noreorder main.mdev-504 perfschema.global_read_lock

Comment by Elena Stepanova [ 2014-11-23 ]

Here is the essential parts of the test put together to get the same failure:

--let $trial = 10000
 
while ($trial)
{
  --connect (con3,localhost,root,,)
  --disconnect con3
  --dec $trial
}
 
--connection default
 
use performance_schema;
 
update performance_schema.setup_instruments set enabled='YES';
 
connect (con1, localhost, root, , test);
 
lock tables performance_schema.setup_instruments read;
--disable_result_log
select * from performance_schema.setup_instruments;
--enable_result_log
unlock tables;
 
lock tables performance_schema.setup_instruments write;
update performance_schema.setup_instruments set enabled='NO';
update performance_schema.setup_instruments set enabled='YES';
unlock tables;
 
--echo connection default;
connection default;
 
flush tables with read lock;
 
connection con1;
 
--send
lock tables performance_schema.setup_instruments write;
 
connection default;
 
let $wait_condition= select 1 from performance_schema.events_waits_current where event_name like "wait/synch/cond/sql/MDL_context::COND_wait_status";
let $wait_timeout= 5;
 
--source include/wait_condition.inc
 
unlock tables;

It goes all right on 10.0, but on 10.1 it causes a timeout in wait_condition.
That's because instead of the expected wait/synch/cond/sql/MDL_context::COND_wait_status we are now getting wait/synch/mutex/sql/MDL_wait::LOCK_wait_status.

For me, it starts happening on 10.1 tree with revision 3f2d9a902ec93327515ae94ae0c8c0c2c485d15f. On the previous revision f1afc003eefe0aafd3e070c7453d9e029d8445a8 there is no timeout.

I am not sure whether it's expected or not, because my further attempt to investigate got lost in git magic. If I look at the revision 3f2d9a902ec93327515ae94ae0c8c0c2c485d15f, it appears to be just a tiny change in an unrelated test case, which couldn't possibly make a difference.
But if I do a git diff between f1afc003eefe0aafd3e070c7453d9e029d8445a8 and 3f2d9a902ec93327515ae94ae0c8c0c2c485d15f, I get a huge diff (like 400K lines).

Comment by Elena Stepanova [ 2014-12-20 ]

After resolving git mystery with serg's help, I got another suspect for breaking the test:

commit ab150128ce78fd363f6041277862686a61730b2b
Merge: 9534fd8 20e20f6
Author: Jan Lindström <jan.lindstrom@skysql.com>
Date:   Wed Aug 27 13:15:37 2014 +0300
 
    MDEV-6247: Merge 10.0-galera to 10.1.
    
        Merged lp:maria/maria-10.0-galera up to revision 3880.
    
        Added a new functions to handler API to forcefully abort_transaction,
        producing fake_trx_id, get_checkpoint and set_checkpoint for XA. These
        were added for future possiblity to add more storage engines that
        could use galera replication.

Generated at Thu Feb 08 07:17:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.