[MDEV-24273] System versioning replication fails when adding column on master Created: 2020-11-24  Updated: 2023-08-22  Resolved: 2023-08-21

Status: Closed
Project: MariaDB Server
Component/s: Replication, Versioned Tables
Affects Version/s: 10.4.12, 10.3, 10.4, 10.5
Fix Version/s: 10.3.35, 10.4.25, 10.5.16, 10.6.8

Type: Bug Priority: Major
Reporter: Catalin Dumitriu Assignee: Aleksey Midenkov
Resolution: Fixed Votes: 2
Labels: ProductionIssue, ProductionStory
Environment:

Debian 9


Issue Links:
Duplicate
duplicates MDEV-30430 Enabling system versioning on tables ... Closed
Relates
relates to MDEV-12483 Add foreign keys support for partitio... Stalled
relates to MDEV-14767 system_versioning_alter_history break... Closed
relates to MDEV-30430 Enabling system versioning on tables ... Closed

 Description   

Hello,

I have this setup:

  • master
  • slave with system versioning activated

When I add a new column on master for a table, this is also propagated to slave , but when an update is running on master on that table and is replicated to slave, it fails, with error that it cannot change column type from my define type to timestamp(6). I have row based replication with GTID.

I have both secure timestamp=YES and system_versioning_alter_history=KEEP on the slave.

I have the same setup with mariadb 10.3.14 but replication is not going down.

What can be the cause ?



 Comments   
Comment by Elena Stepanova [ 2020-11-24 ]

Thanks for the report. Reproducible as described.

--source include/have_binlog_format_row.inc
--source include/master-slave.inc
 
create table t1 (a int);
 
--sync_slave_with_master
set global system_versioning_alter_history= KEEP;
alter table t1 add system versioning;
 
--connection master
alter table t1 add b int;
insert into t1 values (1,10);
 
--sync_slave_with_master

10.3 75e7132f

Replicate_Wild_Ignore_Table	
Last_Errno	1677
Last_Error	Column 1 of table 'test.t1' cannot be converted from type 'int' to type 'timestamp(6)'
Skip_Counter	0
Exec_Master_Log_Pos	594

Comment by Elena Stepanova [ 2020-11-24 ]

I have the same setup with mariadb 10.3.14 but replication is not going down.

cata06,
One possible reason I can think of is that either you have binlog_format=ROW on your 10.4 instance and binlog_format=MIXED (default) on your 10.3 instance, or you have binlog_format=MIXED (default) on both, but the SQL flow on these instances is somewhat different, and it is such on 10.4 that previous statements make the session in question switch to row mode, which can happen on any number of reasons, while on 10.3 it remains being replicated as statement. Statement binary log events don't cause the problem, only row events do.

Comment by Catalin Dumitriu [ 2020-11-24 ]

Glad that you manage to reproduce the problem. I have to check because i just realized that maybe I didn't alter a table on slave which had system versioning. I use only binlog_format=ROW for my replications. Also did you test it with binlog_format=MIXED(default) ?

Comment by Elena Stepanova [ 2020-11-24 ]

Also did you test it with binlog_format=MIXED(default) ?

There is no need. The important part here is not the binlog_format value itself, but the format with which binary log events are written at the time when the DML statement in question (e.g. update which you mentioned in the description) is executed.
When you have binlog_format=ROW, DML is always written in the row format, so it will always be reproducible.
When you have binlog_format=STATEMENT, DML is always written in the statement format, so this particular problem will not be reproducible (but I do not recommend switching to STATEMENT, because you are likely to get more problems than you would solve).
And with binlog_format=MIXED, DML is sometimes written in the statement format, and sometimes in the row format (hence "mixed"). There are different reasons which make a session operating with binlog_format=MIXED switch from the statement format to the row format; but once it has happened, the session remains in the row format unless instructed otherwise.
So, while with my minimal test case above it wouldn't be reproducible with binlog_format=MIXED, because there is nothing in there that would make the session switch to the row mode by the time my INSERT is executed, there is an endless variety of use cases when it would happen with binlog_format=MIXED, it just needs some specific (and not necessarily unnatural) activity before the problematic DML is executed.

Comment by Catalin Dumitriu [ 2020-11-24 ]

I will wait for a fix than. In the mean time the solution I see is to drop system versioning for the table before doing alter on master and to reactivate it after .

Comment by Elena Stepanova [ 2020-11-24 ]

I am wondering if you could describe in a little more detail your setup of non-versioned => versioned replication – the purpose of such setup, what your servers' configuration and replication topology is, what kind of objects and SQL you use, etc.
It is not necessary for bugfixing, but we are generally interested in real-life use cases in order to improve our testing, and yours is the first of this particular kind that we have encountered.

Comment by Catalin Dumitriu [ 2020-11-24 ]

Sure , I will come back with short descriptions and also with examples .

Comment by Catalin Dumitriu [ 2020-11-26 ]

We wanted to implement a delta mecanism in order to sync tables changes for an interval, like last hour, last day etc. We looked at system versioning and we saw that it keeps track of changes and we can use it.

We created an api which runs a query to take what was changed in the last hour for example. We don't use the commands provided in MariaDB documentation for querying historical data as we not interested how table look at a point in time or which rows were visible. Our interest was to see strictly what changed in database.

This is used for data project were we sync data to an Azure Data Warehouse instance, for people to use it forward.

First we activated system versioning on master server, but because of changes , the history started to increase fast and to slow our daily operations. We cannot create partition due to foreign key constrains to another tables. Even if we cleanup the history periodically the queries were slow. Than we decided to implement a master slave replication with system versioning on slave. The queries were still slow, but we managed to make them fast by adding index for row_start and row_end.
When the indexes were created everything started to move very fast, history was skipped instant, but this was ok only for select, because update queries were not using the indexes , having to scan all the rows.

Here is and example of query used by us:

SELECT 
    {$mergedColumns},
    cast(max(row_start) as char) as last_modified_date,
    'delete' as action_versioning
FROM `{$table}` FOR SYSTEM_TIME ALL 
WHERE 
    row_end >= ? 
    AND row_end < ? 
    AND row_end < CURRENT_TIMESTAMP() 
GROUP BY {$primaryKey}
UNION ALL 
SELECT
    {$mergedColumns},
    cast(max(row_start) as char) as last_modified_date,
    'insert' as action_versioning
FROM {$table} FOR SYSTEM_TIME ALL
WHERE    
    row_start >= ?
    AND row_start < ?
    AND row_end > ?
GROUP BY {$primaryKey}
ORDER BY {$sortBy}
LIMIT ?, ?     

I don't exactly know if MariaDB system versioning is intended to be used this way. Maybe it can be extended for something like this is not case, because I'm sure many users will love this. I'm also available to share other information if needed , even to test. We can schedule a call, maybe is easy this way.

We are using MariaDB 10.3,10.4 on Debian 9. Replication is master-slave with format ROW.

Other questions:
What happens with history when table is altered ? Is still reliable ?

Thank you for your involvement

Comment by Elena Stepanova [ 2020-11-27 ]

cata06,

Thanks a lot for all the information, I'll get back to you on this if I have further questions.
For the specifics of ALTER on a versioning table, I'll leave it to midenok to give an expert answer.

Comment by Aleksey Midenkov [ 2023-07-19 ]

cata06 Looks like this was fixed in 10.4.29 with MDEV-30430
If not please provide reproduction sequence with minimal set of tables and fields.

Comment by Elena Stepanova [ 2023-08-21 ]

The failure stopped happening after this commit in 10.3.35 / 10.4.25 / 10.5.16 / 10.6.8:

commit 9286c9e6472e62b7cd5c5a0cf36006ce9b6f5893
Author: Aleksey Midenkov
Date:   Fri Apr 22 15:49:37 2022 +0300
 
    MDEV-28254 Wrong position for row_start, row_end after adding column to implicit versioned table

Generated at Thu Feb 08 09:28:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.