[MDEV-26618] OPTIMIZE table not replicating Created: 2021-09-16  Updated: 2023-04-27

Status: Confirmed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Replication
Affects Version/s: 10.4.19, 10.4.21
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: Manuel Arostegui Assignee: Sergei Golubchik
Resolution: Unresolved Votes: 1
Labels: None
Environment:

debian buster


Issue Links:
Blocks
blocks MDEV-25274 OPTIMIZE TABLE does not ignore read_only Open

 Description   

We are trying to optimize a table which is around 50GB and that after the optimization should reduce to around 14GB.

We have noticed that running the optimize on the master (10.4.19+STATEMENT or 10.4.21) it doesn't get written on the binlog and hence it doesn't get replicated to any of the slaves.

	=== db2123 (s5 codfw master): ===
 
mysql:root@localhost [dewiki]> alter table logging engine=InnoDB;
Query OK, 0 rows affected (8 min 47.942 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
mysql:root@localhost [dewiki]> optimize table logging;
+----------------+----------+----------+-------------------------------------------------------------------+
| Table          | Op       | Msg_type | Msg_text                                                          |
+----------------+----------+----------+-------------------------------------------------------------------+
| dewiki.logging | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| dewiki.logging | optimize | status   | OK                                                                |
+----------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (8 min 31.735 sec)
 
root@db2123:/srv/sqldata# mysqlbinlog --base64-output=DECODE-ROWS db2123-bin.001717 | grep "optimize table"
root@db2123:/srv/sqldata# mysqlbinlog --base64-output=DECODE-ROWS db2123-bin.001717 | grep "alter table"
alter table logging engine=InnoDB
root@db2123:/srv/sqldata#
 
=== db2111 (s5 codfw slave - happens everywhere): === 
 
root@db2111:/srv/sqldata#  mysqlbinlog -vvv   db2111-relay-bin.001406 | grep "alter table"
alter table logging engine=InnoDB
root@db2111:/srv/sqldata#  mysqlbinlog -vvv   db2111-relay-bin.001406 | grep "optimize table"
root@db2111:/srv/sqldata#

Using alter table engine=InnoDB does get replicated.

This is a different host (runs ROW based):

mysql:root@localhost [dewiki]> optimize table logging;
+----------------+----------+----------+-------------------------------------------------------------------+
| Table          | Op       | Msg_type | Msg_text                                                          |
+----------------+----------+----------+-------------------------------------------------------------------+
| dewiki.logging | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| dewiki.logging | optimize | status   | OK                                                                |
+----------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (13 min 12.645 sec)
 
mysql:root@localhost [dewiki]> alter table logging engine=InnoDB ROW_FORMAT=COMPRESSED;
Query OK, 0 rows affected (13 min 6.271 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
root@db1154:/srv/sqldata.s5#  mysqlbinlog -vvv --base64-output=DECODE-ROWS db1154-bin.000430 | grep -i "alter table"
alter table logging engine=InnoDB ROW_FORMAT=COMPRESSED
 
root@db1154:/srv/sqldata.s5#  mysqlbinlog -vvv --base64-output=DECODE-ROWS db1154-bin.000430 | grep -i "optimize table"
root@db1154:/srv/sqldata.s5#

We have seen this on 10.4.19 and 10.4.21
Both hosts are intermediate masters

Table definition (happens with both compressed and non compressed version of this table):

*************************** 1. row ***************************
       Table: logging
Create Table: CREATE TABLE `logging` (
  `log_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `log_type` varbinary(32) NOT NULL DEFAULT '',
  `log_action` varbinary(32) NOT NULL DEFAULT '',
  `log_timestamp` varbinary(14) NOT NULL DEFAULT '19700101000000',
  `log_namespace` int(11) NOT NULL DEFAULT 0,
  `log_title` varbinary(255) NOT NULL DEFAULT '',
  `log_comment_id` bigint(20) unsigned NOT NULL,
  `log_params` blob NOT NULL,
  `log_deleted` tinyint(3) unsigned NOT NULL DEFAULT 0,
  `log_actor` bigint(20) unsigned NOT NULL,
  `log_page` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`log_id`),
  KEY `type_time` (`log_type`,`log_timestamp`),
  KEY `page_time` (`log_namespace`,`log_title`,`log_timestamp`),
  KEY `times` (`log_timestamp`),
  KEY `log_page_id_time` (`log_page`,`log_timestamp`),
  KEY `actor_time` (`log_actor`,`log_timestamp`),
  KEY `log_actor_type_time` (`log_actor`,`log_type`,`log_timestamp`),
  KEY `log_type_action` (`log_type`,`log_action`,`log_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=133843953 DEFAULT CHARSET=binary

Server variables: https://phabricator.wikimedia.org/P17279

All this made me doubt if OPTIMIZE (or even ANALYZE) gets replicated by default and I double checked it does:



 Comments   
Comment by Manuel Arostegui [ 2021-09-16 ]

Just tried the same thing on a 10.1 intermediate master we still have and there it does show up on binlogs:

mysql:root@localhost [enwiki]> optimize table mathoid;
+----------------+----------+----------+-------------------------------------------------------------------+
| Table          | Op       | Msg_type | Msg_text                                                          |
+----------------+----------+----------+-------------------------------------------------------------------+
| enwiki.mathoid | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| enwiki.mathoid | optimize | status   | OK                                                                |
+----------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (3 min 23.98 sec)
 
root@db2112:/srv/sqldata# mysqlbinlog -vvv --base64-output=DECODE-ROWS db2112-bin.004066 | grep -i "optimize table"
optimize table mathoid

Comment by Alice Sherepa [ 2021-09-16 ]

Could you please describe the situation more or demonstrate the way to reproduce it?
I tried to repeat, but in my case "optimize table" gets written to the binlog.

So, you have 1->2->3, server 2 and 3 have semi-sync rpl, and then you're doing
alter table + optimize on server 2, but optimize is not in the server's 2 binlog .
Is that correct? What engine was used for the table before the alter?

Comment by Alice Sherepa [ 2021-09-16 ]

Could you please describe the situation more or demonstrate the way to reproduce it?
I tried to repeat, but in my case "optimize table" gets written to the binlog.

So, you have 1->2->3, server 2 and 3 have semi-sync rpl, and then you're doing
alter table + optimize on server 2, but optimize is not in the server's 2 binlog .
Is that correct? What engine was used for the table before the alter?

Comment by Manuel Arostegui [ 2021-09-16 ]

Thanks for the answer.

  • The table has always used InnoDB. We do not use anything else. The alter engine = InnoDB was just to show this issue
  • Correct, we have 1->2->3 and the optimize is applied on #2 (manually, not coming thru #1) (they all use SBR)
  • This only seems to happen on 10.4 as on 10.1 it gets written to the binlog as shown before. The ALTER always gets written but not the optimize.
  • We have tried changing GTID replication from Slave_pos to "no" in case it was some sort of mess caused by "out of band" changes. But there was no difference.
  • The optimize doesn't arrive in anyway to the slaves as the tables do not get defragmented at all (and once the change is applied on the slaves, the on-disk reduction shows up).

This has been tried on different hosts, different databases and even different tables with the same result on 10.4 hosts.
Unfortunately we have run out of ideas on why this could be happening other than a bug.

Comment by Manuel Arostegui [ 2021-09-16 ]

Worth mentioning that there are no errors at all on logs or on the prompt. It simply runs fine but then it sort of vanishes from binlog and replication.

Comment by Manuel Arostegui [ 2021-09-16 ]

Providing more insights:

mysql:root@localhost [shwiki]> show master status\G
*************************** 1. row ***************************
            File: db2123-bin.001717
        Position: 685114045
    Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.000 sec)
 
mysql:root@localhost [shwiki]> show session variables like '%sql_log_bin%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| sql_log_bin   | ON    |
+---------------+-------+
1 row in set (0.001 sec)
 
mysql:root@localhost [shwiki]> optimize table mathoid;
+----------------+----------+----------+-------------------------------------------------------------------+
| Table          | Op       | Msg_type | Msg_text                                                          |
+----------------+----------+----------+-------------------------------------------------------------------+
| shwiki.mathoid | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| shwiki.mathoid | optimize | status   | OK                                                                |
+----------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (0.024 sec)
 
mysql:root@localhost [shwiki]> alter table mathoid engine=InnoDB;
Query OK, 0 rows affected (0.025 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
mysql:root@localhost [shwiki]>
 
 
 
root@db2123:/srv/sqldata# mysqlbinlog --base64-output=DECODE-ROWS db2123-bin.001717 | grep "alter table"
alter table logging engine=InnoDB --> previous alter from previous tests
alter table mathoid engine=InnoDB
root@db2123:/srv/sqldata#

Comment by Alice Sherepa [ 2021-09-16 ]

Thanks for the clarification!
I think I've found the reason - this is the option read_only=ON

KB says: Starting with MariaDB 10.3.20 we have fixed some issues related to read only replicas:
The Admin statements ANALYZE, CHECK and REPAIR will not be logged to the binary log under read-only. ( https://mariadb.com/kb/en/read-only-replicas/ )

Maybe it should be improved so, that if admin statements are executed by the user with the super privilege - they should be written in the binlog (unless NO_WRITE_TO_BINLOG flag was used). Otherwise - it is allowed to execute optimize/analyze, but they are not written to the binlog.

Comment by Manuel Arostegui [ 2021-09-16 ]

Good catch Alice.
This is definitely something that I would love to see changed (and backported to 10.4) or at least produce a "notice" so it is known when trying out of bad changes in production. And also I believe the documentation for analyze and optimize should definitely include that if the behaviour isn't changed as it can mess up things in production

These two:

Thanks a lot

Comment by Sergei Golubchik [ 2021-09-16 ]

This was first changed in the commit b62101f84be4.

I don't see any explanation for why it was changed or any reference to a MDEV number, so perhaps monty could explain why it was done.

There could've been reasons for that, but until I know them, I'd say that OPTIMIZE and ANALYZE should respect read_only. And if executed — read_only or not — they should be written into binlog. This is the most logical behavior, no exception. If somebody wants to run OPTIMIZE despite read_only and not write it to binlog, one can always login as SUPER and do OPTIMIZE ... NO_WRITE_TO_BINLOG.

Comment by Manuel Arostegui [ 2021-10-25 ]

Any rough estimation on when this could be fixed?

Comment by Jaime Crespo [ 2021-11-18 ]

I understand whether OPTIMIZE should respect read_only or not can be changed and discussed. MySQL does not apply it, but mariadb can reasonable choose to keep the same behaviour or change it.

A different issue is that a user with SUPER should be able to run it (typical case- optimization of a read only replica by an administrator). And if it has been executed and log was enabled, has to be logged. I consider this a regression (either fail or complete, but not something in between!).

However, may I ask, until a decision is taken, that the change in behavior is documented on the release notes: https://mariadb.com/kb/en/mariadb-10320-release-notes/ , changelog: https://mariadb.com/kb/en/mariadb-10320-changelog/ (there is no mention of binlog behaviour and SQL semantics change) and possible, the wiki, given that binlog/replication changes are a huge deal for upgrades and compatibility?!

Comment by Sergei Golubchik [ 2021-11-18 ]

greenman, could you please document the status quo and then reassign back to me?

Comment by Ian Gilfillan [ 2021-11-29 ]

Changes documented. This was actually in the pulled MariaDB 10.3.19 release, and already present in the changelog. I've added to the release notes, and to the OPTIMIZE, ANALYZE and CHECK pages.

Generated at Thu Feb 08 09:46:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.