[MDEV-9519] Start Slave on a Galera should error if master binlog_format=statement and wsrep_auto_increment_control=1 Created: 2016-02-04  Updated: 2019-03-06  Resolved: 2019-02-26

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST
Affects Version/s: 10.1.11
Fix Version/s: 10.2.23, 10.3.13, 10.1.39, 10.4.4

Type: Bug Priority: Major
Reporter: VAROQUI Stephane Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: galera

Issue Links:
Blocks
is blocked by MDEV-6445 UUID column type addition for distrib... Closed
Problem/Incident
causes MDEV-17016 Assertion `!auto_increment_lock && !a... Closed

 Description   

Data corruption will happen as soon as the galera cluster size change

Let's take a table t made of single auto increment and the following binlog

# at 770118590
#160203 21:31:19 server id 10  end_log_pos 770118618    Intvar
SET INSERT_ID=134024921/*!*/;
# at 770118618
#160203 21:31:19 server id 10  end_log_pos 770121328    Query   thread_id=961224213     exec_time=0     error_code=0
SET TIMESTAMP=1454531479/*!*/;
INSERT INTO t values  (NULL),(NULL ),(NULL ),(NULL ),(NULL ),(NULL ),(NULL ),(NULL ),(NULL ),(NULL ),(NULL ),(NULL )
  /*!*/;
# at 770121328
#160203 21:31:19 server id 10  end_log_pos 770121355    Xid = 701968214
COMMIT/*!*/;
# at 770121355
#160203 21:31:19 server id 10  end_log_pos 770121414    Query   thread_id=961224213     exec_time=0     error_code=0
SET TIMESTAMP=1454531479/*!*/;
BEGIN
/*!*/;
# at 770121414
#160203 21:31:19 server id 10  end_log_pos 770121442    Intvar
SET INSERT_ID=134024933/*!*/;
# at 770121442
#160203 21:31:19 server id 10  end_log_pos 770124812    Query   thread_id=961224213     exec_time=0     error_code=0
SET TIMESTAMP=1454531479/*!*/;
INSERT INTO t values  (NULL),(NULL ),(NULL ),(NULL ),(NULL ),(NULL ),(NULL ),(NULL ),(NULL ),(NULL ),(NULL ),(NULL )

Replication will break because the step of auto increment will change as soon the cluster size change. In statement base the master did not change the step and auto increment overlay will happen and corrupt the data ?

Last_SQL_Error: Error 'Duplicate entry '134024933
 
MariaDB [db_calameo_v2]> SELECT  ID FROM t WHERE id >= 134024921;
+-----------+
| LinkID    |
+-----------+
| 134024921 |
| 134024923 |
| 134024925 |
| 134024927 |
| 134024929 |
| 134024931 |
| 134024933 |
| 134024935 |
| 134024937 |
| 134024939 |
| 134024941 |
| 134024943 |
+-----------+

So this proposal is to introduce verbose error message at replication start to either change wsrep_auto_increment_control=1 or to use ROW based .



 Comments   
Comment by Kolbe Kegel (Inactive) [ 2016-05-26 ]

I don't see how START SLAVE could fail in this case, since the START SLAVE command won't have any visibility into the content of the log. But perhaps the SQL thread could stop with an error if wsrep_auto_increment_control=1 and SET INSERT_ID is seen in the relay log.

Comment by Julius Goryavsky [ 2018-07-19 ]

Proposed fix: https://github.com/MariaDB/server/pull/817

If we have a 2+ node cluster which is replicating from an async master and the binlog_format is set to STATEMENT and multi-row inserts are executed on a table with an auto_increment column such that values are automatically generated by MySQL, then the server node generates wrong auto_increment values, which (in some cases) are different from what was generated on the async master.

The causes and fixes:

1. We need to improve processing of changing the auto-increment values after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of the node, then we should immediately update the auto_increment_increment and auto_increment_offset global variables, without waiting of the next invocation of the wsrep_view_handler_cb() callback. In the current version these variables retain its initial values if wsrep_auto_increment_control is switched on during operation of the node, which leads to inconsistent results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node, then we must return the original values of the auto_increment_increment and auto_increment_offset global variables, as the user has set. To make this possible, we need to add a "shadow copies" of these variables (which stores the latest values set by the user).

Comment by Julius Goryavsky [ 2018-07-19 ]

https://github.com/MariaDB/server/pull/817

Comment by Jan Lindström (Inactive) [ 2018-07-20 ]

I have small improvements comments on git.

Comment by Julius Goryavsky [ 2018-07-24 ]

Thanks, patch updated in accordance with the review

Comment by Julius Goryavsky [ 2018-07-27 ]

Version for 10.1 branch: https://github.com/MariaDB/server/pull/827

Comment by Jan Lindström (Inactive) [ 2018-07-30 ]

See git for comments.

Comment by Julius Goryavsky [ 2018-08-14 ]

I corrected all the regressions found and fixed another error that interfered with auto-increment inserts/updates in STATEMENT mode (with WSREP enabled)

Comment by Marko Mäkelä [ 2018-08-31 ]

I reverted this for now due to regressions, which might have been fixed by MDEV-17016, which in turn has been waiting for review for a long time.

Comment by Julius Goryavsky [ 2019-02-15 ]

Regressions are closed using MDEV-17016, but since then the server code has changed significantly, so I re-posted this fix as a new pull request for branch 10.3, after which it can be ported to 10.1 / 10.2. I made the change for 10.3, since the code in version 10.3 is the most complex and generates many conflicts with automatic merge. New pull request here: https://github.com/MariaDB/server/pull/1187

Generated at Thu Feb 08 07:35:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.