[MDEV-33291] Slave SQL: Could not execute Delete_rows_v1/Update_rows_v1 event; Can't find record, Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND or HA_ERR_NO_PARTITION_FOUND Created: 2024-01-22  Updated: 2024-02-08

Status: Open
Project: MariaDB Server
Component/s: Replication, Storage Engine - InnoDB
Affects Version/s: 10.6, 10.11, 11.0, 11.1, 11.2, 11.3, 11.4
Fix Version/s: 10.6, 10.11, 11.0, 11.1, 11.2, 11.3

Type: Bug Priority: Critical
Reporter: Roel Van de Paar Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: affects-tests, regression-10.6

Issue Links:
Relates
relates to MDEV-33359 Inconsistent MariaDB error codes 0, 1... Open
relates to MDEV-33426 Assertion `status_var.local_memory_us... Open
relates to MDEV-16710 Slave SQL: Could not execute Update_r... Open
relates to MDEV-27135 error code: 1032 on random slaves Open
relates to MDEV-33022 Galera: Error_code: 1032; handler er... Open

 Description   

While the testcase does not immediately look significant (we're deleting from mysql.innodb_table_stats) note that this is likely just an artifact, i.e. the issue can likely be reproduced in other ways without deleting from innodb_table_stats. There are number of bugs open that look somewhat related (linked), but they are on older versions (except MDEV-33022 which is in combination with Galera), and this issue could be reproduced on 10.6+ only.

# Requires standard master/slave setup
CREATE TABLE t (c VARCHAR(2000) BINARY CHARACTER SET 'utf8') ENGINE=InnoDB;
ALTER TABLE t ADD UNIQUE (c);
SELECT c FROM t;
DELETE FROM mysql.innodb_table_stats;

Leads to (on slave):

11.4.0 8bf9f21855606ddeb0fe90fb1e70165adbec6746 (Debug, Slave)

2024-01-22 15:37:12 6 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './relaylog.000001' position: 4; GTID position ''
2024-01-22 15:37:15 69 [ERROR] mariadbd: Can't find record in 'innodb_table_stats'
2024-01-22 15:37:15 69 [Warning] Slave SQL: Could not execute Delete_rows_v1 event on table mysql.innodb_table_stats; Can't find record in 'innodb_table_stats', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000001, end_log_pos 0, Gtid 0-1-6, Internal MariaDB error code: 1032

Present in 10.6+, opt+dbg builds



 Comments   
Comment by Roel Van de Paar [ 2024-01-22 ]

MTR Testcase. May be very lightly sporadic.

--source include/have_innodb.inc
--source include/master-slave.inc
CREATE TABLE t (c VARCHAR(2000) BINARY CHARACTER SET 'utf8') ENGINE=InnoDB;
ALTER TABLE t ADD UNIQUE (c);
SELECT c FROM t;
DELETE FROM mysql.innodb_table_stats;
--sync_slave_with_master
DROP TABLE t;
--source include/rpl_end.inc

11.3.2 63fb478f88e0061d149f5cdd3c4d21d4a35c7bd9 (Debug)

Last_SQL_Errno	1032
Last_SQL_Error	Could not execute Delete_rows_v1 event on table mysql.innodb_table_stats; Can't find record in 'innodb_table_stats', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log master-bin.000001, end_log_pos 910

Comment by Roel Van de Paar [ 2024-01-22 ]

Affects tests as the filtering of this issue may mean missing other issues with a similar or identical outcome.

Comment by Roel Van de Paar [ 2024-01-24 ]

I ran into this again, with different SQL. Attempts to reproduce this version in MTR failed, it reproduces using the pquery client only.

# Requires standard master/slave setup and binlog_format=ROW on master. Execute SQL on the master.
SET sql_mode='';
RESET MASTER;
CREATE TABLE t1(c INT);
GRANT ALL ON a.* to a;
CREATE TABLE t2(c INT);
DELETE FROM mysql.db;
SELECT SLEEP(2);

This results in, on the master:

11.4.0 8bf9f21855606ddeb0fe90fb1e70165adbec6746 (Debug, Master)

2024-01-24 14:00:06 6 [Note] Start binlog_dump to slave_server(2), pos(, 4), using_gtid(1), gtid('')
2024-01-24 14:00:08 9 [Note] Start binlog_dump to slave_server(2), pos(binlog.000001, 959), using_gtid(1), gtid('0-1-3')

However, the second Start binlog_dump to slave_server line only happens on the DELETE being executed
And on the slave:

11.4.0 8bf9f21855606ddeb0fe90fb1e70165adbec6746 (Debug, Slave)

2024-01-24 14:00:08 5 [Note] Slave: received end packet from server, apparent master shutdown:
2024-01-24 14:00:08 5 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binlog.000001' at position 959; GTID position '0-1-3'
2024-01-24 14:00:08 6 [ERROR] mariadbd: Can't find record in 'db'
2024-01-24 14:00:08 6 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table mysql.db; Can't find record in 'db', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000001, end_log_pos 0, Gtid 0-1-4, Internal MariaDB error code: 1032
2024-01-24 14:00:08 6 [Warning] Slave: Can't find record in 'db' Error_code: 1032
2024-01-24 14:00:08 6 [Warning] Slave: Engine Aria does not support rollback. Changes were committed during rollback call Error_code: 4173
2024-01-24 14:00:08 6 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 751; GTID position '0-1-3'

The retry reconnect looks fine (inc the GTID used), but the Can't find record in 'db' looks incorrect, as there are two entries in mysql.db (on the master and the slave) - the test db's. 1032=ER_KEY_NOT_FOUND.

Final MTR attempt (does not reproduce the issue):

--source include/have_binlog_format_row.inc
--source include/master-slave.inc
INSERT INTO mysql.db VALUES('','test','PUBLIC','Y','Y','Y','Y','Y','Y','N','Y','Y','Y','Y','Y','Y','Y','Y','N','N','Y','Y','Y','Y');  # Match generic setup
INSERT INTO mysql.db VALUES('','test\_%','PUBLIC','Y','Y','Y','Y','Y','Y','N','Y','Y','Y','Y','Y','Y','Y','Y','N','N','Y','Y','Y','Y');  # Match generic setup
SET sql_mode='';
RESET MASTER;
CREATE TABLE t1(c INT);
GRANT ALL ON a.* to a;
CREATE TABLE t2(c INT);
--sync_slave_with_master
DELETE FROM mysql.db;
SELECT SLEEP(2);
--source include/rpl_end.inc

Comment by Roel Van de Paar [ 2024-01-30 ]

Increased prio on this ticket as this issue significantly and regularly affects replication testing.

Comment by Roel Van de Paar [ 2024-01-30 ]

# Requires standard master/slave setup and binlog_format=ROW on master. Execute SQL on the master.
CREATE TABLE t (c INT) ENGINE=MyISAM;
RESET MASTER;
INSERT t VALUES (0);
SET GLOBAL binlog_checksum=NONE;
UPDATE t SET c=1;

Leads to:

11.4.0 8bf9f21855606ddeb0fe90fb1e70165adbec6746 (Debug)

2024-01-30 16:34:57 4 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='10411', master_log_file='', master_log_pos='4'.
2024-01-30 16:34:57 4 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=Slave_Pos
2024-01-30 16:34:57 5 [Note] Slave I/O thread: Start asynchronous replication to master 'repl_user@127.0.0.1:10411' in log '' at position 4
2024-01-30 16:34:57 6 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './qa-roel-2-relay-bin.000001' position: 4; GTID position ''
2024-01-30 16:34:57 5 [Note] Slave I/O thread: connected to master 'repl_user@127.0.0.1:10411',replication starts at GTID position ''
2024-01-30 16:35:03 6 [ERROR] mariadbd: Can't find record in 't'
2024-01-30 16:35:03 6 [ERROR] Slave SQL: Could not execute Update_rows_v1 event on table test.t; Can't find record in 't', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log binlog.000002, end_log_pos 520, Gtid 0-1-2, Internal MariaDB error code: 1032
2024-01-30 16:35:03 6 [Warning] Slave: Can't find record in 't' Error_code: 1032
2024-01-30 16:35:03 6 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000002' position 367; GTID position '0-1-4'

Same testcase but with a PK:

CREATE TABLE t (c INT,PRIMARY KEY(c)) ENGINE=MyISAM;

Gives instead:

11.4.0 8bf9f21855606ddeb0fe90fb1e70165adbec6746 (Debug)

2024-01-30 17:19:22 6 [ERROR] Slave SQL: Could not execute Update_rows_v1 event on table test.t; Can't find record in 't', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000002, end_log_pos 520, Gtid 0-1-2, Internal MariaDB error code: 1032

Comment by Roel Van de Paar [ 2024-01-30 ]

It is likely that the original issue (inc first comment MTR testcase) is different from the later ones which require RESET MASTER and ROW based replication.

Comment by Kristian Nielsen [ 2024-02-02 ]

1. The issue with mysql.innodb_table_stats happens because that table is updated in the background by InnoDB internals, and the contents not replicated. For some reason I didn't get any data in the table in my quick test. But if at some point it contains different data on master and slave, a row-binlogged update/delete on this table on the master is likely to result in "row not found" on the slave.

I think tests that do DML on mysql.* tables can be used to check that the slaves do not crash or deadlock the whole slave server somehow, but the slave failing with error or getting different data must be expected.

2. I checked the code. When RESET MASTER is run while a slave is connected, it looks like the dump thread happily continues with its open filehandle to the old (now deleted) master-bin.000001 file. Then, once sufficient data is written to the new master-bin.000001 to make it longer than the old, the dump thread wakes up, tries to read from its stale file handle, finds EOF, and exits. The slave IO thread gets disconnected, and tries to reconnect with its current binlog position, which is now completely wrong since RESET MASTER was done on master.

In (2), I think normally an error like "bogus data in log event" or checksum error will occur when the slave jumps to the invalid position. But if by chance the new position is valid, the slave will skip a bunch of events and can get a "can't find record" error like you saw.

It's clear that RESET MASTER will in general require re-configuring or resetting slaves, since it deletes the binlogs on the master and reverts to the starting master-bin.000001 position and GTID 0-1-1.

But the current behaviour of leaving the old dump thread reading a stale file handle, and then at some random time later jumping to an invalid position in the new binlog file, seems ... not ideal.

I wonder what the correct behaviour should be? Maybe an error should be given if any slaves are connected when RESET MASTER is attempted. (The locking around RESET MASTER is already tricky, and it might be hard to avoid a race if a slave connects at the same time as RESET MASTER is run, but at least any connected slave could cause the RESET MASTER to fail until all slave connections are stopped from the slave servers).

Comment by Andrei Elkin [ 2024-02-02 ]

knielsen, thanks for the analysis.

Wrt mysql.innodb_table_stats the issue is more general. Clearly for tables of this kind replication events should be normally ignored on slave (logging alone may have sense). Perhaps a "slave" server option for that needs to be introduced.

To follow up your RESET MASTER thinking, another policy would be to prevent the command when there's a connected slave that is a running dump thread that then would mark binlog files (one that is being read and the following ones) busy.

bnestere, feel free to comment on.

Comment by Roel Van de Paar [ 2024-02-02 ]

Thank you for the input and analysis knielsen and Elkin!

Any idea what the connection with ROW based replication is? i.e. with MIXED and STATEMENT the RESET MASTER issues do not seem reproducible.

Answering my own question after pondering it for a while, I guess due to the offset "mismatch" in the master binlog position. So it would seem not impossible to also see such outcomes with MIXED/STATEMENT setups, though they would be much less frequent (i.e. exact statement start position required?).

Comment by Roel Van de Paar [ 2024-02-02 ]

Saw one more variation which was not filtered yet. Similar to earlier one, but on Delete_rows_v1 and yielding HA_ERR_END_OF_FILE.

# Requires standard master/slave setup and binlog_format=ROW on master. Execute SQL on the master.
SET sql_mode='';
CREATE TABLE t1 (col VARCHAR(10)) ENGINE=InnoDB;
RESET MASTER;
INSERT INTO t1 VALUES (1);
SET GLOBAL binlog_checksum=NONE;
DELETE FROM t1;

11.4.0 8bf9f21855606ddeb0fe90fb1e70165adbec6746 (Debug)

2024-02-03  9:18:17 6 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './qa-roel-2-relay-bin.000001' position: 4; GTID position ''
2024-02-03  9:18:17 5 [Note] Slave I/O thread: connected to master 'repl_user@127.0.0.1:12966',replication starts at GTID position ''
2024-02-03  9:21:28 6 [ERROR] mariadbd: Can't find record in 't1'
2024-02-03  9:21:28 6 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table test.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log binlog.000002, end_log_pos 477, Gtid 0-1-2, Internal MariaDB error code: 1032
2024-02-03  9:21:28 6 [Warning] Slave: Can't find record in 't1' Error_code: 1032
2024-02-03  9:21:28 6 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000002' position 331; GTID position '0-1-4'
2024-02-03  9:21:28 6 [Note] Slave SQL thread exiting, replication stopped in log 'binlog.000002' at position 331; GTID position '0-1-4', master: 127.0.0.1:12966

Comment by Kristian Nielsen [ 2024-02-03 ]

In row-based replication, the slave searches the table for a specific row (compares all fields or at least the primary key), and thus can detect if that row is not present on the slave.

In statement-based, there's just a DELETE or UPDATE statement, the statement is run on whatever data is in the table, there's no knowledge of which rows are expected to be there.

You would be able to get a duplicate key error in statement-based if the RESET MASTER causes the slave to skip a DELETE, and then later an insert happens with a conflict of the row that was not deleted on the slave.

But the root issue here is that the slave diverges from the master. Once that happens, a variety of errors can happen later, it completely depends on the data and query pattern.

Comment by Elena Stepanova [ 2024-02-03 ]

But the root issue here is that the slave diverges from the master. Once that happens, a variety of errors can happen later, it completely depends on the data and query pattern.

I don't think it's realistic to expect that technical data (statistics, auth data and such) would always be identical on a master and a slave. Even if you completely forbid any direct action on a slave, there can still be legitimate difference. Let's take for example ignore-db and similar parameters, they'll surely lead to some statistics existing on the master but not on the slave. Or different histogram_type will cause EITS differences. Or in case of OM => NS replication system tables on master can have different structure.
For normal user actions changing such data indirectly (ANALYZE TABLE, massive DML for InnoDB stats, table DDL, etc.) special care is taken so that it doesn't trigger row events to be written in the binary log. If however a user starts tampering with such data directly, there is always a possibility that things will go wrong, "user must be aware" and use SQL_LOG_BIN=0 for example.
Or, alternatively, changes to system tables should never be replicated, but it sounds like a strange hack to me.

Comment by Brandon Nesterenko [ 2024-02-03 ]

To the RESET MASTER problem:

Can we kill the active dump threads to force a reconnect (similar to kill_zombie_dump_threads)? And then on reconnect, the binlog dump thread would know if there is an ongoing RESET MASTER because reset_master_pending would be true. So we'd then have a few options, e.g. 1) send some sort of error to the replica and self-kill the binlog dump thread connection, 2) before sending the initial fake rotate event, wait for the RESET MASTER to finish (reset_master_pending would be false) so we'd initialize the replica with the new binlog.

And if master-slave consistency is a problem, perhaps we should track if active binlog dump thread connections are "caught up" with the existing binlogs before resetting them. Then we could delay the RESET MASTER until the binlogs have been sent to the replicas, and then RESET MASTER would kill the dump threads. I imagine this would be optional though, perhaps with an extension to the grammar similar to SHUTDOWN, e.g. RESET MASTER [WAIT FOR ALL REPLICAS].

Another thing to note, we can track reset masters using the pattern established in MDEV-20215 (46c3e7e3).

Comment by Roel Van de Paar [ 2024-02-05 ]

# Requires standard master/slave setup. Execute SQL on the master.
CREATE TABLE t1 (c1 INT) PARTITION BY RANGE (c1) (PARTITION p0 VALUES LESS THAN (7));
RESET MASTER;
DROP TABLE t1;
CREATE TABLE t1 (c1 INT);
CREATE TABLE t2 (c1 INT);
CREATE TABLE t3 (c1 INT);
CREATE TABLE t4 (c1 INT);
INSERT INTO t1 VALUES (1+75);

When not resulting in MDEV-33376, will - sporadically - run into:

11.4.0 8bf9f21855606ddeb0fe90fb1e70165adbec6746 (Optimized)

2024-02-05 15:57:37 6 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.t1; Table has no partition for value 76, Error_code: 1526; handler error HA_ERR_NO_PARTITION_FOUND; the event's master log binlog.000001, end_log_pos 0, Gtid 0-1-6, Internal MariaDB error code: 1526
2024-02-05 15:57:37 6 [Warning] Slave: Table has no partition for value 76 Error_code: 1526
2024-02-05 15:57:37 6 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 1052; GTID position '0-1-5'
2024-02-05 15:57:37 6 [Note] Slave SQL thread exiting, replication stopped in log 'binlog.000001' at position 1052; GTID position '0-1-5', master: 127.0.0.1:12196

This is reproducible with RBR, SBR and MBR. The 'Write_rows_v1' error is seen only when using RBR, otherwise (SBR/MBR) we see:

11.4.0 8bf9f21855606ddeb0fe90fb1e70165adbec6746 (Optimized)

2024-02-05 16:08:58 6 [ERROR] Slave SQL: Error 'Table has no partition for value 76' on query. Default database: 'test'. Query: 'INSERT INTO t1 VALUES (1+75)', Gtid 0-1-6, Internal MariaDB error code: 1526
2024-02-05 16:08:58 6 [Warning] Slave: Table has no partition for value 76 Error_code: 1526
2024-02-05 16:08:58 6 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 1052; GTID position '0-1-5'
2024-02-05 16:08:58 6 [Note] Slave SQL thread exiting, replication stopped in log 'binlog.000001' at position 1052; GTID position '0-1-5', master: 127.0.0.1:10395

Comment by Andrei Elkin [ 2024-02-05 ]

The fact that ROW format is replication-unsafe for the "statistics" system tables can be captured formally in THD::decide_logging_format.

Comment by Kristian Nielsen [ 2024-02-05 ]

Roel, there are two different issues here:

1. Row-based replication, user replicates an update to system tables that differ (here: mysql.innodb_table_stats). This can lead to row not found errors. I think this is "not a bug", as Elena wrote, "user must be aware".

2. RESET MASTER is run on the master while a slave is connected. Again, this is something that cannot really work, RESET MASTER requires stopping and reconfiguring replication. But in this case the way things "do not work" seems poor (the active dump threads remain attached to the deleted binlog file until the new file grows larger than the old one). I think this is a good test find, and I suggest simply giving an error for RESET MASTER with attached slaves is an appropriate fix. But again, once RESET MASTER is run without reconfiguring replication, slaves will be diverged and some random error will occur later; the exact nature of those errors are not meaningful as the root cause is misconfigured replication and master-slave difference.

Hope this helps,

- Kristian.

Comment by Roel Van de Paar [ 2024-02-05 ]

knielsen Thank you. Agree on 2nd item. For the first one I hope there can be a better solution in time. I agree the priority is low for that one.

As for the various errors observed, while there is indeed a common root cause, the various issues are documented here 1) to show the variety in resulting outcomes - for example the last testcase also reproduces on SBR and MBR, which is significantly different from all the others, and 2) before adding them to a filter list.

Comment by Roel Van de Paar [ 2024-02-05 ]

Elkin Thank you. Can you clarify a bit further what you have in mind?

Comment by Andrei Elkin [ 2024-02-06 ]

Roel, one way to fix the p.1 is to mirror the STATEMENT format unsafety. That is to refuse to log in ROW format with a new replication unsafe message. MIXED would translate to the actual STMT.

Comment by Elena Stepanova [ 2024-02-06 ]

What will then MIXED do if you write delete from mysql.innodb_table_stats limit 1?

Comment by Andrei Elkin [ 2024-02-06 ]

elenst, there won't be any error on slave - and to avoid that is my point of course.

Comment by Elena Stepanova [ 2024-02-06 ]

Elkin,
delete from <table> limit 1 is SBR-unsafe statement, that's when MBR switches to row format.
If you make delete from mysql.innodb_table_stats RBR-unsafe, then delete from mysql.innodb_table_stats limit 1 becomes both SBR-unsafe and RBR-unsafe. Hence my question: what will then MBR do?

Comment by Andrei Elkin [ 2024-02-06 ]

elenst, sure the STATEMENT format of queries of the likes of mysql.innodb_table_stats would not be declared UNSAFE anymore. It's technically feasible.

Comment by Elena Stepanova [ 2024-02-06 ]

I tried to offer a simplified example, but okay.

create table t (a int);
insert into t values (1),(2);
delete from t where a in (select n_rows from mysql.innodb_table_stats) limit 1;

Will the delete be logged in statement or row format? If it's SBR, it's unsafe at least because of limit. If it's RBR, it would be unsafe because of the stats table, as for RBR it's the same problem as a direct delete from the stats table.

My point is, be it SBR or RBR, if you operate on tables with different contents between master and slave, you can always end up with inconsistency leading to replication abort, the only question is how many steps you'll have to make for that. So, I don't believe that making stat tables RBR-unsafe will solve the problem.

Comment by Andrei Elkin [ 2024-02-06 ]

In this case mysql.innodb_table_stats is not the target table which is t. The UNSAFEty applies to the operation over t. This DELETE query must be regarded as UNSAFE, so ROW format for it.

On the second thought the data consistency argument is actually serious.
And since ROW format can do harm too, let me retreat to do nothing on slave for events on such tables.
Practically I would add to my.cnf templates

replicate_ignore_table=mysql.innodb_table_stats

Comment by Elena Stepanova [ 2024-02-06 ]

It doesn't matter whether the table is a target or not, this distinction may only prolong the suffering a little bit. As long as a table with inconsistent contents participates in a query which gives birth to binary log events, it can always cause discrepancy => errors on the slave server side => replication abort in a non-idempotent mode.
Which is why replicate_ignore_table won't do much good.

--source include/have_innodb.inc
--source include/have_binlog_format_mixed.inc
--source include/master-slave.inc
 
--connection slave
stop slave;
set global replicate_ignore_table='mysql.innodb_table_stats';
start slave;
 
--connection master
create table t1 (a int) engine=InnoDB;
insert into t1 values (1),(2);
delete from mysql.innodb_table_stats;
update t1 set a = 10 where a not in (select count(*) from mysql.innodb_table_stats);
delete from t1 limit 2;
 
--sync_slave_with_master
 
# Cleanup
--connection master
drop table t1;
--source include/rpl_end.inc

It is likely not the shortest possible example, just the first that came to mind.

Comment by Andrei Elkin [ 2024-02-07 ]

Being the target table matters in your example. That is

   delete from t where a in (select n_rows from mysql.innodb_table_stats) limit 1;

would be SAFE in ROW format, t would be consistent across master-slave. (Conversely and in contrast mysql.innodb_table_stats as the target table is not SAFE in ROW format).
And the same to the last comment's

update t1 set a = 10 where a not in (select count(*) from mysql.innodb_table_stats);

which must be detected as UNSAFE and binlogged in ROW format which you test demonstrates it does not do.

Comment by Elena Stepanova [ 2024-02-07 ]

Being the target table matters in your example

That's because these are just examples, they are expected to be extrapolated properly when you think of a common solution. But you are suggesting a solution each time based on one example only, not on a general problem, that's why I provide the next one trying to explain why the suggestion won't work.

update t1 set a = 10 where a not in (select count(*) from mysql.innodb_table_stats);
which must be detected as UNSAFE and binlogged in ROW format which you test demonstrates it does not do.

Why is that?
In general (not taking into account specific tables involved), this statement is deterministic, it's not converted to row automatically and has no reason to. Unlike the previous one with limit.
Or is it the next solution, in which a statement would be converted to ROW because it uses a stat table? That is,
a) ignore replicating changes to stat tables +
b) convert everything else that uses stat tables to RBR?

Comment by Roel Van de Paar [ 2024-02-07 ]

Also observed as a result of RESET MASTER:

[ERROR] Error reading packet from server: binlog truncated in the middle of event; consider out of disk space on master; the first event '.' at 4, the last event read from 'binlog.000001' at 1839, the last byte read from 'binlog.000001' at 1858. (server_errno=1236)

As well as

[ERROR] Error reading packet from server: The binlog on the master is missing the GTID 0-2-130 requested by the slave (even though both a prior and a subsequent sequence number does exist), and GTID strict mode is enabled (server_errno=1236)

Interesting here is that the in both cases master binlog issues were detected.

Generated at Thu Feb 08 10:37:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.