[MDEV-4322] Broken XID counting during binlog rotation Created: 2013-03-24  Updated: 2024-01-23  Resolved: 2013-03-25

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.1
Fix Version/s: 10.0.2

Type: Bug Priority: Major
Reporter: Pavel Ivanov Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: replication


 Description   

It looks like rotation of binlogs involves some broken logic involving . After binlog rotation MYSQL_BIN_LOG::do_checkpoint_request() is called. It calls ha_commit_checkpoint_request(), and from there for each engine that has commit_checkpoint_request function binlog_checkpoint_callback() is called. There mysql_bin_log.mark_xids_active() is called and corresponding mark_xid_done() is not called from anywhere.

I don't understand why this problem is not exposed anywhere in the test suite but I was able to hit it like this:

cmake . -DCMAKE_BUILD_TYPE=Debug
make
cd mysql-test/
echo "--innodb" >> suite/sys_vars/t/rpl_max_binlog_size_func-master.opt
./mtr sys_vars.rpl_max_binlog_size_func

After that I get "void MYSQL_BIN_LOG::cleanup(): Assertion `b->xid_count == 0' failed" during shutdown after the test.



 Comments   
Comment by Elena Stepanova [ 2013-03-24 ]

perl ./mtr sys_vars.rpl_max_binlog_size_func -mysqld=-innodb

#7 0x00007fc5c1010192 in _GI__assert_fail (assertion=0xe4eaa4 "b->xid_count == 0", file=0xe4dd48 "/data/bzr/10.0/sql/log.cc", line=2979, function=0xe51d90 "void MYSQL_BIN_LOG::cleanup()") at assert.c:103
#8 0x00000000008a1bb3 in MYSQL_BIN_LOG::cleanup (this=0x15d8220) at /data/bzr/10.0/sql/log.cc:2979
#9 0x0000000000563f23 in clean_up (print_message=true) at /data/bzr/10.0/sql/mysqld.cc:1906
#10 0x0000000000563d42 in unireg_end () at /data/bzr/10.0/sql/mysqld.cc:1837
#11 0x0000000000563c66 in kill_server (sig_ptr=0x0) at /data/bzr/10.0/sql/mysqld.cc:1765
#12 0x0000000000563c81 in kill_server_thread (arg=0x7fc5b9424e48) at /data/bzr/10.0/sql/mysqld.cc:1788
#13 0x0000000000cd9f64 in pfs_spawn_thread (arg=0x7fc588001340) at /data/bzr/10.0/storage/perfschema/pfs.cc:1800
#14 0x00007fc5c1de0e9a in start_thread (arg=0x7fc5b9393700) at pthread_create.c:308
#15 0x00007fc5c10d4cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#16 0x0000000000000000 in ?? ()

Comment by Kristian Nielsen [ 2013-03-24 ]

Thanks for catching this!

> There mysql_bin_log.mark_xids_active() is called and corresponding
> mark_xid_done() is not called from anywhere.

It is called from binlog_background_thread(). This happens asynchroneously, in
the background.

I suspect the problem here is that we need to wait for the binlog background
thread to have time to process any pending requests before we do binlog
cleanup. At least if I add this at the end of the test case, it does not
assert:

--source include/wait_for_binlog_checkpoint.inc

I'll take a closer look next week and find out what the real solution should
be.

Comment by Kristian Nielsen [ 2013-03-25 ]

Fix pushed to 10.0-base (will be later merged to 10.0).

Comment by Kristian Nielsen [ 2013-03-25 ]

Here is the patch for the bug, including test case:

http://lists.askmonty.org/pipermail/commits/2013-March/004483.html

Generated at Thu Feb 08 06:55:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.