[MDEV-33107] Assertion `!writer.checksum_len || writer.remains == 0' failed in Event_log::write_cache on OOS in --tmpdir Created: 2023-12-22  Updated: 2023-12-22

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 11.3.1
Fix Version/s: 11.3

Type: Bug Priority: Major
Reporter: Roel Van de Paar Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: OOS, resources

Issue Links:
Relates
relates to MDEV-25488 Assertion `!writer.checksum_len || wr... Stalled

 Description   

This issue looks to be different from both MDEV-23529 and MDEV-25488. It was likely previously seen by mleich, here in 11.3 as well.

11.3.0 126157061b4376496c034a809ea4943e863d1465 (Debug)

mariadbd: /test/11.3_dbg/sql/log.cc:8068: int Event_log::write_cache(THD*, IO_CACHE*): Assertion `!writer.checksum_len || writer.remains == 0' failed.

11.3.0 126157061b4376496c034a809ea4943e863d1465 (Debug)

Core was generated by `/test/MD311023-mariadb-11.3.0-linux-x86_64-dbg/bin/mariadbd --no-defaults --gti'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22527977334336)
    at ./nptl/pthread_kill.c:44
[Current thread is 1 (Thread 0x147d34162640 (LWP 2379815))]
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22527977334336) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=22527977334336) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=22527977334336, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x0000147d49242476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x0000147d492287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000147d4922871b in __assert_fail_base (fmt=0x147d493dd130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5627d25c6ce0 "!writer.checksum_len || writer.remains == 0", file=0x5627d25c48dc "/test/11.3_dbg/sql/log.cc", line=8068, function=<optimized out>) at ./assert/assert.c:92
#6  0x0000147d49239e96 in __GI___assert_fail (assertion=0x5627d25c6ce0 "!writer.checksum_len || writer.remains == 0", file=0x5627d25c48dc "/test/11.3_dbg/sql/log.cc", line=8068, function=0x5627d25c6bf8 "int Event_log::write_cache(THD*, IO_CACHE*)") at ./assert/assert.c:101
#7  0x00005627d1c66000 in Event_log::write_cache (this=this@entry=0x5627d2eb0a48 <mysql_bin_log+8>, thd=<optimized out>, cache=cache@entry=0x147d1802be28) at /test/11.3_dbg/sql/log.cc:8068
#8  0x00005627d1c664fb in MYSQL_BIN_LOG::write_transaction_or_stmt (this=this@entry=0x5627d2eb0a40 <mysql_bin_log>, entry=entry@entry=0x147d34160740, commit_id=commit_id@entry=0) at /test/11.3_dbg/sql/log.cc:567
#9  0x00005627d1c73130 in MYSQL_BIN_LOG::trx_group_commit_leader (this=this@entry=0x5627d2eb0a40 <mysql_bin_log>, leader=leader@entry=0x147d34160740) at /test/11.3_dbg/sql/log.cc:8918
#10 0x00005627d1c73e1e in MYSQL_BIN_LOG::write_transaction_to_binlog_events (this=this@entry=0x5627d2eb0a40 <mysql_bin_log>, entry=entry@entry=0x147d34160740) at /test/11.3_dbg/sql/log.cc:8710
#11 0x00005627d1c7444e in MYSQL_BIN_LOG::write_transaction_to_binlog (this=this@entry=0x5627d2eb0a40 <mysql_bin_log>, thd=thd@entry=0x147d18005dc8, cache_mngr=cache_mngr@entry=0x147d1802be28, end_ev=end_ev@entry=0x147d341608d0, all=all@entry=false, using_stmt_cache=using_stmt_cache@entry=true, using_trx_cache=false, is_ro_1pc=false) at /test/11.3_dbg/sql/log.cc:8307
#12 0x00005627d1c74632 in binlog_flush_cache (thd=thd@entry=0x147d18005dc8, cache_mngr=cache_mngr@entry=0x147d1802be28, end_ev=end_ev@entry=0x147d341608d0, all=all@entry=false, using_stmt=using_stmt@entry=true, using_trx=using_trx@entry=false, is_ro_1pc=false) at /test/11.3_dbg/sql/log.cc:1907
#13 0x00005627d1c74902 in binlog_commit_flush_stmt_cache (thd=thd@entry=0x147d18005dc8, all=all@entry=false, cache_mngr=cache_mngr@entry=0x147d1802be28) at /test/11.3_dbg/sql/log.cc:1960
#14 0x00005627d1c74ceb in binlog_rollback (hton=<optimized out>, thd=0x147d18005dc8, all=false) at /test/11.3_dbg/sql/log.cc:2484
#15 0x00005627d1b11c76 in ha_rollback_trans (thd=thd@entry=0x147d18005dc8, all=all@entry=false) at /test/11.3_dbg/sql/handler.cc:2266
#16 0x00005627d199039e in trans_rollback_stmt (thd=thd@entry=0x147d18005dc8) at /test/11.3_dbg/sql/transaction.cc:535
#17 0x00005627d18106aa in mysql_execute_command (thd=thd@entry=0x147d18005dc8, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/11.3_dbg/sql/sql_parse.cc:5813
#18 0x00005627d1811193 in mysql_parse (thd=thd@entry=0x147d18005dc8, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x147d341611e0) at /test/11.3_dbg/sql/sql_parse.cc:7742
#19 0x00005627d1813509 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x147d18005dc8, packet=packet@entry=0x147d1800faf9 "2717029162\t19095264\t17983729\n2514624261\t1805713521\t473331427\t87102688\t303387515\n311589450\t838519948\t1300610719\t2455619390\t433631583\n40910711\t2017015626\t111163223\t1757432376\t184167807\n514823146\t1079126"..., packet_length=packet_length@entry=163, blocking=blocking@entry=true) at /test/11.3_dbg/sql/sql_class.h:251
#20 0x00005627d1815625 in do_command (thd=0x147d18005dc8, blocking=blocking@entry=true) at /test/11.3_dbg/sql/sql_parse.cc:1406
#21 0x00005627d197a54d in do_handle_one_connection (connect=<optimized out>, connect@entry=0x5627d4ff33e8, put_in_cache=put_in_cache@entry=true) at /test/11.3_dbg/sql/sql_connect.cc:1418
#22 0x00005627d197a842 in handle_one_connection (arg=arg@entry=0x5627d4ff33e8) at /test/11.3_dbg/sql/sql_connect.cc:1320
#23 0x00005627d1dbfa28 in pfs_spawn_thread (arg=0x5627d4ff7968) at /test/11.3_dbg/storage/perfschema/pfs.cc:2201
#24 0x0000147d49294ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#25 0x0000147d49326660 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The issue was seen with the testcase here and is likely caused due to OOS in --tmpdir.



 Comments   
Comment by Roel Van de Paar [ 2023-12-22 ]

Adding stack from mleich:
A significant share of RQG tests fails with:

11.3 8ad1e26b1bafa4ed9928306efc10c047f2274108

origin/11.3, 11.3 8ad1e26b1bafa4ed9928306efc10c047f2274108 2023-09-04T09:36:44+04:00  debug build claims to be a 11.3.0
 
mariadbd: /data/Server/11.3/sql/log.cc:8052: int Event_log::write_cache(THD*, IO_CACHE*): Assertion `!writer.checksum_len || writer.remains == 0' failed.
Query (0x6767640133a0): COMMIT
Status: KILL_TIMEOUT
 
# 2023-09-07T04:51:32 [3411390] | Thread 3 (Thread 3415550.3422250 (mariadbd)):
# 2023-09-07T04:51:32 [3411390] | #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=33824607811136) at ./nptl/pthread_kill.c:44
# 2023-09-07T04:51:32 [3411390] | #1  __pthread_kill_internal (signo=6, threadid=33824607811136) at ./nptl/pthread_kill.c:78
# 2023-09-07T04:51:32 [3411390] | #2  __GI___pthread_kill (threadid=33824607811136, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
# 2023-09-07T04:51:32 [3411390] | #3  0x00004c9a6874e476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
# 2023-09-07T04:51:32 [3411390] | #4  0x00004c9a687347f3 in __GI_abort () at ./stdlib/abort.c:79
# 2023-09-07T04:51:32 [3411390] | #5  0x00004c9a6873471b in __assert_fail_base (fmt=0x4c9a688e9150 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55bb03b799d0 "!writer.checksum_len || writer.remains == 0", file=0x55bb03b76ae8 "/data/Server/11.3/sql/log.cc", line=8052, function=<optimized out>) at ./assert/assert.c:92
# 2023-09-07T04:51:32 [3411390] | #6  0x00004c9a68745e96 in __GI___assert_fail (assertion=0x55bb03b799d0 "!writer.checksum_len || writer.remains == 0", file=0x55bb03b76ae8 "/data/Server/11.3/sql/log.cc", line=8052, function=0x55bb03b798e8 "int Event_log::write_cache(THD*, IO_CACHE*)") at ./assert/assert.c:101
# 2023-09-07T04:51:32 [3411390] | #7  0x000055bb0322fc75 in Event_log::write_cache (this=this@entry=0x55bb044a2a68 <mysql_bin_log+8>, thd=<optimized out>, cache=cache@entry=0x67676401a368) at /data/Server/11.3/sql/log.cc:8052
# 2023-09-07T04:51:32 [3411390] | #8  0x000055bb0323035a in MYSQL_BIN_LOG::write_transaction_or_stmt (this=this@entry=0x55bb044a2a60 <mysql_bin_log>, entry=entry@entry=0x1ec367bb8b60, commit_id=commit_id@entry=0) at /data/Server/11.3/sql/log.cc:9182
# 2023-09-07T04:51:32 [3411390] | #9  0x000055bb0323ba55 in MYSQL_BIN_LOG::trx_group_commit_leader (this=this@entry=0x55bb044a2a60 <mysql_bin_log>, leader=leader@entry=0x1ec367bb8b60) at /data/Server/11.3/sql/log.cc:8902
# 2023-09-07T04:51:32 [3411390] | #10 0x000055bb0323c98b in MYSQL_BIN_LOG::write_transaction_to_binlog_events (this=this@entry=0x55bb044a2a60 <mysql_bin_log>, entry=entry@entry=0x1ec367bb8b60) at /data/Server/11.3/sql/log.cc:8694
# 2023-09-07T04:51:32 [3411390] | #11 0x000055bb0323cdeb in MYSQL_BIN_LOG::write_transaction_to_binlog (this=this@entry=0x55bb044a2a60 <mysql_bin_log>, thd=thd@entry=0x676764000d58, cache_mngr=cache_mngr@entry=0x67676401a1c8, end_ev=end_ev@entry=0x1ec367bb8e80, all=all@entry=true, using_stmt_cache=using_stmt_cache@entry=false, using_trx_cache=true, is_ro_1pc=true) at /data/Server/11.3/sql/log.cc:8291
# 2023-09-07T04:51:32 [3411390] | #12 0x000055bb0323d070 in binlog_flush_cache (thd=thd@entry=0x676764000d58, cache_mngr=cache_mngr@entry=0x67676401a1c8, end_ev=end_ev@entry=0x1ec367bb8e80, all=all@entry=true, using_stmt=using_stmt@entry=false, using_trx=using_trx@entry=true, is_ro_1pc=true) at /data/Server/11.3/sql/log.cc:1907
# 2023-09-07T04:51:32 [3411390] | #13 0x000055bb0323eabf in binlog_commit_flush_trx_cache (thd=thd@entry=0x676764000d58, all=all@entry=true, cache_mngr=cache_mngr@entry=0x67676401a1c8, ro_1pc=ro_1pc@entry=true) at /data/Server/11.3/sql/log.cc:2006
# 2023-09-07T04:51:32 [3411390] | #14 0x000055bb0323ee82 in binlog_commit (thd=thd@entry=0x676764000d58, all=all@entry=true, ro_1pc=<optimized out>) at /data/Server/11.3/sql/log.cc:2416
# 2023-09-07T04:51:32 [3411390] | #15 0x000055bb03098b94 in commit_one_phase_2 (thd=thd@entry=0x676764000d58, all=all@entry=true, trans=trans@entry=0x676764004800, is_real_trans=is_real_trans@entry=true) at /data/Server/11.3/sql/handler.cc:2130
# 2023-09-07T04:51:32 [3411390] | #16 0x000055bb03098f8f in ha_commit_one_phase (thd=thd@entry=0x676764000d58, all=all@entry=true) at /data/Server/11.3/sql/handler.cc:2092
# 2023-09-07T04:51:32 [3411390] | #17 0x000055bb0309a572 in ha_commit_trans (thd=thd@entry=0x676764000d58, all=all@entry=true) at /data/Server/11.3/sql/handler.cc:1886
# 2023-09-07T04:51:32 [3411390] | #18 0x000055bb02ee8ee8 in trans_commit (thd=thd@entry=0x676764000d58) at /data/Server/11.3/sql/transaction.cc:266
# 2023-09-07T04:51:32 [3411390] | #19 0x000055bb02d3fa9a in mysql_execute_command (thd=thd@entry=0x676764000d58, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /data/Server/11.3/sql/sql_parse.cc:5361
# 2023-09-07T04:51:32 [3411390] | #20 0x000055bb02d41d81 in mysql_parse (thd=thd@entry=0x676764000d58, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x1ec367bba270) at /data/Server/11.3/sql/sql_parse.cc:7760
# 2023-09-07T04:51:32 [3411390] | #21 0x000055bb02d4419d in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x676764000d58, packet=packet@entry=0x67676400b009 " COMMIT  /* E_R Thread9 QNO 244 CON_ID 19 */ ", packet_length=packet_length@entry=45, blocking=blocking@entry=true) at /data/Server/11.3/sql/sql_parse.cc:1893
# 2023-09-07T04:51:32 [3411390] | #22 0x000055bb02d4665c in do_command (thd=0x676764000d58, blocking=blocking@entry=true) at /data/Server/11.3/sql/sql_parse.cc:1406
# 2023-09-07T04:51:32 [3411390] | #23 0x000055bb02ed25e7 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55bb0531b498, put_in_cache=put_in_cache@entry=true) at /data/Server/11.3/sql/sql_connect.cc:1445
# 2023-09-07T04:51:32 [3411390] | #24 0x000055bb02ed2855 in handle_one_connection (arg=0x55bb0531b498) at /data/Server/11.3/sql/sql_connect.cc:1347
# 2023-09-07T04:51:32 [3411390] | #25 0x00004c9a687a0b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
# 2023-09-07T04:51:32 [3411390] | #26 0x00004c9a68831bb4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

No replay of the problem on a RelWithDebInfo build. Query: COMMIT

Comment by Roel Van de Paar [ 2023-12-22 ]

The issue has proved readily reproducible with the testcase provided. No separate replication channel is necessary, and standard replication setup is sufficient.

Comment by Roel Van de Paar [ 2023-12-22 ]

Issue does not seem to affect InnoDB.

Comment by Roel Van de Paar [ 2023-12-22 ]

On optimized builds, WSREP complains that handlerton rollback failed:

11.3.0 126157061b4376496c034a809ea4943e863d1465 (Optimized, Master)

2023-12-22 12:55:15 6 [Note] Start binlog_dump to slave_server(2), pos(, 4), using_gtid(1), gtid('')
2023-12-22 12:56:00 43 [ERROR] mariadbd: Error writing file 'binlog' (errno: 28 "No space left on device")
2023-12-22 12:56:00 43 [Warning] WSREP: handlerton rollback failed, thd 43 99 conf 0 SQL load data concurrent local infile 'load.sql' REPLACE into table test_corruption.test ( ID, ID2, ID3, ID4, ID5 )

However, it does handle the slave stop (also discussed in MENT-1989) correctly:

11.3.0 126157061b4376496c034a809ea4943e863d1465 (Optimized, Slave)

Slave_IO_Running: No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event '.' at 4, the last event read from 'binlog.000001' at 30408, the last byte read from 'binlog.000001' at 32768.'

Generated at Thu Feb 08 10:36:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.