[MDEV-32877] record big transactions to master error log Created: 2023-11-25  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: New Feature Priority: Major
Reporter: Sean Peng Assignee: Sean Peng
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Binlog flush is serial, and large transactions can take a long time to flush. There are a few serious consequences when large transactions hit the server,
1. large transactions can cause replication lag on slave node;
2. large transactions can trigger large amount of events to be written to the binlog file from the binlog cache, which may stall the master node causing unavailability. And, it also prevents other transactions from writing binlog. In more serious cases, it may cause IO pressure, and the database will hang or even crash.

When the above mentioned scenarios happens, people may not be able to quickly determined the underlying root cause due to insufficient or inexisting monitoring. Therefore, it makes sense to log a warning message in the master error log, when the transaction's binlog cache size exceeds a certain threshold (say 100MB, and it should be configurable).



 Comments   
Comment by Andrew Hutchings [ 2023-11-27 ]

After discussions with Elkin, we think this might be better in the slow log rather than error log.

Comment by Andrew Hutchings [ 2023-11-27 ]

Further to this, serg said that if we do this, it should probably be logged on the COMMIT. For example "COMMIT took more than X ms, trx cache size=Y, etc". "COMMIT" here is a shortcut for "any statement that caused trx binlog cache write", might be implicit commit too

Generated at Thu Feb 08 10:34:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.