[MDEV-32279] ANALYZE TABLE is replicated without use_stat_tables Created: 2023-09-28  Updated: 2023-10-17

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.4, 10.5, 10.6, 10.11
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Sergei Petrunia Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None

Attachments: File rpl_a1.test    
Issue Links:
Relates

 Description   

ANALYZE TABLE statement may or may not collect EITS statistics, depending
on the use_stat_tables setting.

It is replicated as statement, that is, the ANALYZE TABLE statement itself is written into the binary log.
If ANALYZE TABLE makes changes to mysql.column|table|index_stat tables, these are not caught by RBR and are not written into the binary log.

That is, we rely on ANALYZE TABLE to be executed on the slave in the same way as it was executed on the master.
But the value of use_stat_tables is not captured and not logged, so ANALYZE TABLE on the slave might not collect EITS when the master did, and vice versa.

Put the attached testcase into e.g. mysql-test/suite/rpl/t/rpl_a1.test and look into its output:

# EITS STATS ON MASTER:
select * from mysql.table_stats;
db_name table_name      cardinality
connection slave;

no stats on master, but

# EITS STATS ON SLAVE:
select * from mysql.table_stats;
db_name table_name      cardinality
test    t1      3

there are stats on the slave.
One can construct cases where it's other way around, too.



 Comments   
Comment by Sergei Golubchik [ 2023-09-28 ]

server has hundreds of variables that affect how statements are executed. most of them aren't replicated.
how to know what variables should be replicated and what shouldn't, what's the rule here?

Comment by Sergei Petrunia [ 2023-10-11 ]

serg, I've tried to see how for example check_constraint_checks is handled and found MDEV-32447.

Discussed with Elkin , take-aways: are that two solutions are possible

  • Log all variables that affect the statement.
  • Capture updates to EITS tables and write them into the binary log, then write something like

    SET STATEMENT use_stat_tables=never ANALYZE TABLE t1
    

    to make sure that the slave doesn't try to collect EITS data on its own.

I'm not sure which one is better.
The first is probably easier to implement as there must be code somewhere that logs the value of the used variable.
The second is a better solution as ANALYZE will have less overhead on the slave. A disadvantage is that slave must have the same data as the master. But do we support replication when the slave has different data than the master in the same table? (Generally, we support multi-source, so the slave may have more data than the master).

Any other arguments?

Comment by Sergei Golubchik [ 2023-10-11 ]

I don't see what is the bug here. If you configure master and slave to do different things they will do different things, exactly as you wanted. I think there's no bug here, everything behaves as expected.

Generated at Thu Feb 08 10:30:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.