[MDEV-31646] Online alter applies binlog cache limit to cache writes Created: 2023-07-07  Updated: 2023-08-16  Resolved: 2023-08-16

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table
Affects Version/s: N/A
Fix Version/s: 11.2.1

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Nikita Malyavin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-16329 Engine-independent online ALTER TABLE Closed

 Description   

Note: To get the error shown below (ER_TRANS_CACHE_FULL), I had to modify log.cc:

--- a/sql/log.cc
+++ b/sql/log.cc
@@ -7984,7 +7984,7 @@ int Event_log::write_cache(THD *thd, IO_CACHE *cache)
   } while ((length= my_b_fill(cache)));
 
   DBUG_ASSERT(carry == 0);
-  DBUG_ASSERT(!writer.checksum_len || writer.remains == 0);
+//  DBUG_ASSERT(!writer.checksum_len || writer.remains == 0);
 
   DBUG_RETURN(0);                               // All OK
 }

This over-zealous assertion prevents any testing causing binlog write errors, even those which are supposed to be properly handled (it has been discussed with the replication team); and the test could not be run on a non-debug build because it uses debug sync facility.
The test can be run with non-modified code, it will just fail with the assertion failure upon update instead of ER_TRANS_CACHE_FULL; the point remains the same. Just note that the assertion failure as such is not the topic of the report, it's just a side-effect of applying the limit.


Binary logging has two hard limits for the caches, max_binlog_cache_size is applied to (multi-statement) transactions and max_binlog_stmt_cache_size to individual statements.
Online alter applies max_binlog_cache_size to both. It would be easier to understand if the limits weren't applied at all; I think it is going to be confusing enough that binlog variables start having effect when binary logging isn't even enabled, it will be even more confusing if they also have a different meaning.

Online alter test case

--source include/have_debug_sync.inc
--source include/have_sequence.inc
 
create table t (pk int primary key, a varchar(16)) engine=MyISAM;
insert into t select seq, NULL from seq_1_to_20000;
 
set @cache.size= @@max_binlog_cache_size;
set global max_binlog_cache_size= 4096;
 
--connect (con2,localhost,root,,)
--send
  set debug_sync= 'now wait_for do_updates';
 
--connect (con1,localhost,root,,)
set debug_sync= 'alter_table_online_progress signal do_updates wait_for go';
--send
  alter table t add b int, algorithm=copy, lock=none;
 
--connection con2
--reap
--error ER_TRANS_CACHE_FULL
update t set a = 'update-2' where pk > 1 order by pk limit 10000;
show warnings;
 
set debug_sync= 'now signal go';
 
--connection con1
--reap
show warnings;
 
# Cleanup
--disconnect con1
--disconnect con2
--connection default
drop table t;
set debug_sync= reset;
set global max_binlog_stmt_cache_size= @cache.size;

bb-11.2-oalter 6e6e950c8abf40041b146b538c4825856c89e64b

update t set a = 'update-2' where pk > 1 order by pk limit 10000;
ERROR HY000: Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mariadbd variable and try again
show warnings;
Level	Code	Message
Error	1197	Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mariadbd variable and try again
Error	1534	Writing one row to the row-based binary log failed
Warning	1196	Some non-transactional changed tables couldn't be rolled back

If we instead modify max_binlog_stmt_cache_size in the test case, it doesn't fail.

Here is the matching test case for the traditional binary log:

Binlog test case (for comparison)

--source include/have_binlog_format_row.inc
--source include/have_sequence.inc
 
create table t (pk int primary key, a varchar(16)) engine=MyISAM;
insert into t select seq, NULL from seq_1_to_20000;
 
set @cache.size= @@max_binlog_cache_size;
set global max_binlog_cache_size= 4096;
 
--connect (con2,localhost,root,,)
update t set a = 'update-2' where pk > 1 order by pk limit 10000;
show warnings;
 
# Cleanup
--disconnect con2
--connection default
drop table t;
set debug_sync= reset;
set global max_binlog_cache_size= @cache.size;

Here update is not affected by the limit change and passes.
And if we modify max_binlog_stmt_cache_size instead, it fails.



 Comments   
Comment by Elena Stepanova [ 2023-07-09 ]

Come to think of it, it doesn't seem right at all that the DML fails here (regardless which cache limit is in use). Binary logging of DML here is only needed for ALTER, so if anything fails here, it should be ALTER.
However, I don't know how difficult it will be to achieve with the current design, and limited max_binlog_stmt_cache_size / max_binlog_cache_size should be a fairly rare configuration in real life.

Comment by Nikita Malyavin [ 2023-07-13 ]

elenst there is binlog_bug23533.test, that sets to the minimal value and makes lots of inserts, so it may be my wrong usage of some underlying functions

Comment by Elena Stepanova [ 2023-07-13 ]

If you mean the assertion failure, no, it's not you, it fails readily with vanilla binary logging. If you take the 2nd test case from the description (of course with max_binlog_stmt_cache_size instead of max_binlog_cache_size), it will fail on that assertion even with 10.4.

I'm not sure what is different in binlog_bug23533. Maybe it doesn't fail the exact same way with the transaction cache; or maybe it doesn't fail the same way with CREATE ... SELECT (which is the only query they execute with minimal cache limits); or maybe binlog_cache_size prevents the failure on max_binlog_cache_size; or whatever.

Comment by Nikita Malyavin [ 2023-07-13 ]

Good to know. I didn't manae to reproduce it myself by varying an amount of rows and their width and inserting sporadic log flushes in-between with the aforementioned test.

Comment by Nikita Malyavin [ 2023-07-19 ]

Please review these commits:
ca64ddcc MDEV-31646 Online alter applies binlog cache limit to cache writes
8c774b12 MDEV-31646 preserve DMLs in case of online binlog fault
a31b2a52 MDEV-31646 untie from max_allowed_packet and opt_binlog_rows_event_max_size

Generated at Thu Feb 08 10:25:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.