[MDEV-33320] Assertion `(longlong) binlog_space_total >= 0' failed in MYSQL_BIN_LOG::real_purge_logs_by_size #2 Created: 2024-01-27  Updated: 2024-01-30  Resolved: 2024-01-30

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: N/A
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: rr-profile

Issue Links:
Problem/Incident
is caused by MDEV-31404 Implement binlog_space_limit In Testing
Relates
relates to MDEV-33282 Assertion `(longlong) binlog_space_to... Closed

 Description   

Note: the fix for MDEV-33282 should already be in the tree.

bb-11.4-timestamp 53df87b27d9e0c0e7e5f2e2e7e5aabd86aa20bee

(rr) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=8488158332480) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=8488158332480) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=8488158332480, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00006e5130eb0476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00006e5130e967f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00006e5130e9671b in __assert_fail_base (fmt=0x6e513104b130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55d2d4209828 "(longlong) binlog_space_total >= 0", 
    file=0x55d2d4206f20 "/data/MDEV-33320/src/sql/log.cc", line=5297, function=<optimized out>) at ./assert/assert.c:92
#6  0x00006e5130ea7e96 in __GI___assert_fail (assertion=0x55d2d4209828 "(longlong) binlog_space_total >= 0", file=0x55d2d4206f20 "/data/MDEV-33320/src/sql/log.cc", line=5297, 
    function=0x55d2d42097f0 "int MYSQL_BIN_LOG::real_purge_logs_by_size(ulonglong)") at ./assert/assert.c:101
#7  0x000055d2d37da2e0 in MYSQL_BIN_LOG::real_purge_logs_by_size (this=0x55d2d4c47580 <mysql_bin_log>, binlog_pos=343) at /data/MDEV-33320/src/sql/log.cc:5297
#8  0x000055d2d37eec2f in MYSQL_BIN_LOG::purge_logs_by_size (this=0x55d2d4c47580 <mysql_bin_log>, binlog_pos=343) at /data/MDEV-33320/src/sql/log.h:1059
#9  0x000055d2d37e04a7 in MYSQL_BIN_LOG::rotate (this=0x55d2d4c47580 <mysql_bin_log>, force_rotate=true, check_purge=0x7b84da96a4f) at /data/MDEV-33320/src/sql/log.cc:7462
#10 0x000055d2d37e0dda in MYSQL_BIN_LOG::rotate_and_purge (this=0x55d2d4c47580 <mysql_bin_log>, force_rotate=true, domain_drop_lex=0x0) at /data/MDEV-33320/src/sql/log.cc:7651
#11 0x000055d2d34b013e in reload_acl_and_cache (thd=0x45200002568, options=16130, tables=0x0, write_to_binlog=0x7b84da96df0) at /data/MDEV-33320/src/sql/sql_reload.cc:160
#12 0x000055d2d329e29b in mysql_execute_command (thd=0x45200002568, is_called_from_prepared_stmt=false) at /data/MDEV-33320/src/sql/sql_parse.cc:5253
#13 0x000055d2d32a5dc0 in mysql_parse (thd=0x45200002568, rawbuf=0x45200014c50 "/* WRK-4 QNO 1546 */ FLUSH LOGS", length=31, parser_state=0x7b84da97320) at /data/MDEV-33320/src/sql/sql_parse.cc:7798
#14 0x000055d2d32934ff in dispatch_command (command=COM_QUERY, thd=0x45200002568, packet=0x4520000c919 "/* WRK-4 QNO 1546 */ FLUSH LOGS", packet_length=31, blocking=true)
    at /data/MDEV-33320/src/sql/sql_parse.cc:1893
#15 0x000055d2d32920a9 in do_command (thd=0x45200002568, blocking=true) at /data/MDEV-33320/src/sql/sql_parse.cc:1406
#16 0x000055d2d3467bbb in do_handle_one_connection (connect=0x55d2d794ca48, put_in_cache=true) at /data/MDEV-33320/src/sql/sql_connect.cc:1417
#17 0x000055d2d3467955 in handle_one_connection (arg=0x55d2d794ca48) at /data/MDEV-33320/src/sql/sql_connect.cc:1319
#18 0x00006e5130f02ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#19 0x00006e5130f93a04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

The stress test mostly performs admin operations – flush logs, purge logs, setting max_binlog_total_size, and such.



 Comments   
Comment by Elena Stepanova [ 2024-01-27 ]

Here is a simplified test case. It is still highly non-deterministic, needs to be run with a big value of --repeat=N. It usually fails for me in 20-40 attempts without --rr, but it can vary a lot on different machines and builds. It's also rr-able with --rr="-h", but it takes hundreds of repetitions.

--source include/have_log_bin.inc
 
SET @total= @@max_binlog_total_size, @slaves= @@slave_connections_needed_for_purge;
SET GLOBAL SLAVE_CONNECTIONS_NEEDED_FOR_PURGE= 0;
 
FLUSH BINARY LOGS;
FLUSH BINARY LOGS;
 
--connect (con1,localhost,root,,)
--connect (con2,localhost,root,,)
--connect (con3,localhost,root,,)
--send
  FLUSH BINARY LOGS;
--connection con1
--send
  FLUSH BINARY LOGS;
--connection con2
--send
  PURGE BINARY LOGS BEFORE NOW(6);
 
--connection default
SET GLOBAL MAX_BINLOG_TOTAL_SIZE= 1073741824;
 
# Cleanup
SET GLOBAL MAX_BINLOG_TOTAL_SIZE= @total, SLAVE_CONNECTIONS_NEEDED_FOR_PURGE= @slaves;
--connection con1
--reap
--disconnect con1
--connection con2
--reap
--disconnect con2
--connection con3
--reap
--disconnect con3

Comment by Michael Widenius [ 2024-01-30 ]

Added a mutex around the variables and this fixed the issue.
I verified that without the patch I got a crash. without it, things works (repeat=100)

Generated at Thu Feb 08 10:38:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.