[MDEV-15309] MariaDB-server w/ Galera not enforcing slave_type_conversions Created: 2018-02-13  Updated: 2018-09-11  Resolved: 2018-09-11

Status: Closed
Project: MariaDB Server
Component/s: Galera, Server
Affects Version/s: 10.1.25, 10.2.12, 10.2.13
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Josh Cullum Assignee: Zdravelina Sokolovska (Inactive)
Resolution: Fixed Votes: 0
Labels: Galera, MariaDB-server
Environment:

QA environment, 3 Node MariaDB Galera Cluster - 10.2.12 + galera-25.3.22, Amazon AMI 2017.9
1.3TB worth of data

QA Environment, 3 Node MariaDB Galera Cluster - 10.1.25 + galera-25.3.20, Amazon AMI 2017.3
1.3TB worth of data


Attachments: File cluster.cnf     File my.cnf.j2    
Issue Links:
Relates
relates to MDEV-9490 MariaDB Galera Cluster does not enfor... Closed
relates to MDEV-15360 add slave_type_conversions value Open

 Description   

I'm running the following procedure on one node in a 3 node galera cluster:

mysql -u root -p
use {foobar};
SET SESSION wsrep_OSU_method='RSU';
SET GLOBAL slave_type_conversions='ALL_LOSSY';
ALTER TABLE {foo} DROP FOREIGN KEY fkb1;
ALTER TABLE {foo} MODIFY COLUMN {column_name} BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, MODIFY COLUMN {second_column} BIGINT UNSIGNED;
ALTER TABLE {foo} ADD CONSTRAINT `fkb1` FOREIGN KEY {second column}  REFERENCES `{foobar}` (`{foo}`) ON DELETE NO ACTION ON UPDATE NO ACTION;
SET SESSION wsrep_OSU_method='TOI';

The change works as expected, however, when trying to sync the node back to the cluster, the following occurs:

2018-02-13 22:52:03 139478253385472 [Note] Start binlog_dump to slave_server(0), pos(nodename-bin.000011, 4)
2018-02-13 22:52:07 139478327580416 [ERROR] Slave SQL: Column 0 of table '{foobar}.{foo}' cannot be converted from type 'int' to type 'bigint(20) unsigned', Internal MariaDB error code: 1677
2018-02-13 22:52:07 139478327580416 [Warning] WSREP: RBR event 3 Update_rows_v1 apply warning: 3, 15861831987
2018-02-13 22:52:07 139478327580416 [Warning] WSREP: Failed to apply app buffer: seqno: 15861831987, status: 1
	 at galera/src/trx_handle.cpp:apply():351
Retrying 2th time
2018-02-13 22:52:07 139478327580416 [ERROR] Slave SQL: Column 0 of table '{foobar}.{foo}' cannot be converted from type 'int' to type 'bigint(20) unsigned', Internal MariaDB error code: 1677
2018-02-13 22:52:07 139478327580416 [Warning] WSREP: RBR event 3 Update_rows_v1 apply warning: 3, 15861831987
2018-02-13 22:52:07 139478327580416 [Warning] WSREP: Failed to apply app buffer: seqno: 15861831987, status: 1
	 at galera/src/trx_handle.cpp:apply():351
Retrying 3th time
2018-02-13 22:52:07 139478327580416 [ERROR] Slave SQL: Column 0 of table '{foobar}.{foo}' cannot be converted from type 'int' to type 'bigint(20) unsigned', Internal MariaDB error code: 1677
2018-02-13 22:52:07 139478327580416 [Warning] WSREP: RBR event 3 Update_rows_v1 apply warning: 3, 15861831987
2018-02-13 22:52:07 139478327580416 [Warning] WSREP: Failed to apply app buffer: seqno: 15861831987, status: 1
	 at galera/src/trx_handle.cpp:apply():351
Retrying 4th time
2018-02-13 22:52:07 139478327580416 [ERROR] Slave SQL: Column 0 of table '{foobar}.{foo}' cannot be converted from type 'int' to type 'bigint(20) unsigned', Internal MariaDB error code: 1677
2018-02-13 22:52:07 139478327580416 [Warning] WSREP: RBR event 3 Update_rows_v1 apply warning: 3, 15861831987
2018-02-13 22:52:07 139478327580416 [ERROR] WSREP: Failed to apply trx: source: 517b5eed-10d4-11e8-9200-9ed4847004dd version: 3 local: 0 state: APPLYING flags: 1 conn_id: 2438 trx_id: 25630496491 seqnos (l: 13, g: 15861831987, s: 15861831986, d: 15861831986, ts: 1168584749936241)
2018-02-13 22:52:07 139478327580416 [ERROR] WSREP: Failed to apply trx 15861831987 4 times
2018-02-13 22:52:07 139478327580416 [ERROR] WSREP: Node consistency compromised, aborting...
2018-02-13 22:52:07 139478327580416 [Note] WSREP: Closing send monitor...
2018-02-13 22:52:07 139478327580416 [Note] WSREP: Closed send monitor.
2018-02-13 22:52:07 139478327580416 [Note] WSREP: gcomm: terminating thread
2018-02-13 22:52:07 139478327580416 [Note] WSREP: gcomm: joining thread
2018-02-13 22:52:07 139478327580416 [Note] WSREP: gcomm: closing backend
2018-02-13 22:52:07 139478327580416 [Note] WSREP: view(view_id(NON_PRIM,1f26c4fc,3) memb {
	1f26c4fc,0
} joined {
} left {
} partitioned {
	517b5eed,0
	70b64642,0
})
2018-02-13 22:52:07 139478327580416 [Note] WSREP: view((empty))
2018-02-13 22:52:07 139478327580416 [Note] WSREP: gcomm: closed
2018-02-13 22:52:07 139622020704000 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-02-13 22:52:07 139622020704000 [Note] WSREP: Flow-control interval: [253, 256]
2018-02-13 22:52:07 139622020704000 [Note] WSREP: Trying to continue unpaused monitor
2018-02-13 22:52:07 139622020704000 [Note] WSREP: Received NON-PRIMARY.
2018-02-13 22:52:07 139622020704000 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 15861831987)
2018-02-13 22:52:07 139622020704000 [Note] WSREP: Received self-leave message.
2018-02-13 22:52:07 139622020704000 [Note] WSREP: Flow-control interval: [253, 256]
2018-02-13 22:52:07 139622020704000 [Note] WSREP: Trying to continue unpaused monitor
2018-02-13 22:52:07 139622020704000 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2018-02-13 22:52:07 139622020704000 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 15861831987)
2018-02-13 22:52:07 139622020704000 [Note] WSREP: RECV thread exiting 0: Success
2018-02-13 22:52:07 139478327580416 [Note] WSREP: recv_thread() joined.
2018-02-13 22:52:07 139478327580416 [Note] WSREP: Closing replication queue.
2018-02-13 22:52:07 139478327580416 [Note] WSREP: Closing slave action queue.
2018-02-13 22:52:07 139478327580416 [Note] WSREP: /usr/sbin/mysqld: Terminated.

This is after having set slave_type_conversions=ALL_LOSSY; on all Galera cluster nodes.

Reference to MDEV-9490 where this issue was apparently fixed - we're still seeing these problems in MariaDB 10.2.12 and 10.1.25



 Comments   
Comment by Zdravelina Sokolovska (Inactive) [ 2018-02-15 ]

jpc2350 Hello Josh , can you attach server.cnf configs ?

Comment by Josh Cullum [ 2018-02-15 ]

Hi @winstone, cluster.cnf my.cnf.j2 attached.

Comment by Zdravelina Sokolovska (Inactive) [ 2018-02-15 ]

hello jpc2350,
the Errors in MDEV-9490 are indeed similar to the Errors you have received
it seems to be from the slave_type_conversions value
try SET GLOBAL slave_type_conversions='ALL_NON_LOSSY,ALL_LOSSY' ;

Comment by Josh Cullum [ 2018-02-15 ]

Hi Winstone,

If you look at the procedure’s that we’re running, we’re already setting GLOBAL slave_type_conversions to ALL_LOSSY which implies ALL_NON_LOSSY....

But I will try and set both, and try again.

Thanks,

Josh

Comment by Josh Cullum [ 2018-02-19 ]

Hi @winstone,
We're double checking but on Initial pass, it seems to have worked. Can you explain why the docs say ALL_LOSSY implies ALL_NON_LOSSY, but In practice, it doesn't?

Comment by Zdravelina Sokolovska (Inactive) [ 2018-02-20 ]

hello jpc2350 , from docs

slave_type_conversions
 
    Description: Determines the type conversion mode on the slave when using row-based replication, including replications in MariaDB Galera cluster. Multiple options can be set, delimited by commas. If left empty, the default, type conversions are disallowed. The variable is dynamic and a change in its value takes effect immediately. This variable tells the server what to do if the table definition is different between the master and slave (for example a column is 'int' on the master and 'bigint' on the slave).
        ALL_NON_LOSSY means that all safe conversions (no data loss) are allowed.
        ALL_LOSSY means that all lossy conversions are allowed (for example 'bigint' to 'int'). This, however, does not imply that safe conversions (non-lossy) are allowed as well. In order to allow all conversions, one needs to allow both lossy as well as non-lossy conversions by setting this variable to ALL_NON_LOSSY,ALL_LOSSY.
        Empty (default) means that the server should give an error and replication should stop if the table definition is different between the master and slave. 
    Commandline: --slave-type-conversions=set
    Scope: Global
    Dynamic: Yes
    Data Type: set
    Default Value: Empty variable
    Valid Values: ALL_LOSSY, ALL_NON_LOSSY, empty
    Introduced: MariaDB 5.5 

but in order to be more sweeping proposed an other yet value for slave_type_conversions to cover implication

Comment by Josh Cullum [ 2018-02-20 ]

Hi Winstone,

Thanks - I hadn't seen `does not imply that safe conversions` bit.

No problem - Thanks for the support!

KR,

Josh

Comment by Zdravelina Sokolovska (Inactive) [ 2018-09-11 ]

jpc2350,
there is actually another intended wsrep_osu_method – NBO;
wsrep_osu_method=NBO
In upcoming Galera Replication Library 4.x it would be a new schema upgrade method, NBO,
It is expected that it will overcome the disadvantage of the TOI method for replicating DDL, blocking practically all updates in the cluster while a long-running DDL is in progress.
that's the short description of it's work flow
At the start of the DDL, table locks (also known as metadata locks) are taken on all nodes in order to ensure that the DDL operation will be consistent on all nods
The DDL operation is executed on all nodes, using a separate applier thread
The table locks are released on all nodes simultaneously

Generated at Thu Feb 08 08:20:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.