[MDEV-15309] MariaDB-server w/ Galera not enforcing slave_type_conversions - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.1.25, 10.2.12, 10.2.13
Fix Version/s: N/A
Component/s: Galera, Server
Labels:
- Galera
- MariaDB-server
Environment:
QA environment, 3 Node MariaDB Galera Cluster - 10.2.12 + galera-25.3.22, Amazon AMI 2017.9
1.3TB worth of data

QA Environment, 3 Node MariaDB Galera Cluster - 10.1.25 + galera-25.3.20, Amazon AMI 2017.3
1.3TB worth of data

Description

I'm running the following procedure on one node in a 3 node galera cluster:

mysql -u root -p

use {foobar};

SET SESSION wsrep_OSU_method='RSU';

SET GLOBAL slave_type_conversions='ALL_LOSSY';

ALTER TABLE {foo} DROP FOREIGN KEY fkb1;

ALTER TABLE {foo} MODIFY COLUMN {column_name} BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, MODIFY COLUMN {second_column} BIGINT UNSIGNED;

ALTER TABLE {foo} ADD CONSTRAINT `fkb1` FOREIGN KEY {second column}  REFERENCES `{foobar}` (`{foo}`) ON DELETE NO ACTION ON UPDATE NO ACTION;

SET SESSION wsrep_OSU_method='TOI';

The change works as expected, however, when trying to sync the node back to the cluster, the following occurs:

2018-02-13 22:52:03 139478253385472 [Note] Start binlog_dump to slave_server(0), pos(nodename-bin.000011, 4)

2018-02-13 22:52:07 139478327580416 [ERROR] Slave SQL: Column 0 of table '{foobar}.{foo}' cannot be converted from type 'int' to type 'bigint(20) unsigned', Internal MariaDB error code: 1677

2018-02-13 22:52:07 139478327580416 [Warning] WSREP: RBR event 3 Update_rows_v1 apply warning: 3, 15861831987

2018-02-13 22:52:07 139478327580416 [Warning] WSREP: Failed to apply app buffer: seqno: 15861831987, status: 1

	 at galera/src/trx_handle.cpp:apply():351

Retrying 2th time

2018-02-13 22:52:07 139478327580416 [ERROR] Slave SQL: Column 0 of table '{foobar}.{foo}' cannot be converted from type 'int' to type 'bigint(20) unsigned', Internal MariaDB error code: 1677

2018-02-13 22:52:07 139478327580416 [Warning] WSREP: RBR event 3 Update_rows_v1 apply warning: 3, 15861831987

2018-02-13 22:52:07 139478327580416 [Warning] WSREP: Failed to apply app buffer: seqno: 15861831987, status: 1

	 at galera/src/trx_handle.cpp:apply():351

Retrying 3th time

2018-02-13 22:52:07 139478327580416 [ERROR] Slave SQL: Column 0 of table '{foobar}.{foo}' cannot be converted from type 'int' to type 'bigint(20) unsigned', Internal MariaDB error code: 1677

2018-02-13 22:52:07 139478327580416 [Warning] WSREP: RBR event 3 Update_rows_v1 apply warning: 3, 15861831987

2018-02-13 22:52:07 139478327580416 [Warning] WSREP: Failed to apply app buffer: seqno: 15861831987, status: 1

	 at galera/src/trx_handle.cpp:apply():351

Retrying 4th time

2018-02-13 22:52:07 139478327580416 [ERROR] Slave SQL: Column 0 of table '{foobar}.{foo}' cannot be converted from type 'int' to type 'bigint(20) unsigned', Internal MariaDB error code: 1677

2018-02-13 22:52:07 139478327580416 [Warning] WSREP: RBR event 3 Update_rows_v1 apply warning: 3, 15861831987

2018-02-13 22:52:07 139478327580416 [ERROR] WSREP: Failed to apply trx: source: 517b5eed-10d4-11e8-9200-9ed4847004dd version: 3 local: 0 state: APPLYING flags: 1 conn_id: 2438 trx_id: 25630496491 seqnos (l: 13, g: 15861831987, s: 15861831986, d: 15861831986, ts: 1168584749936241)

2018-02-13 22:52:07 139478327580416 [ERROR] WSREP: Failed to apply trx 15861831987 4 times

2018-02-13 22:52:07 139478327580416 [ERROR] WSREP: Node consistency compromised, aborting...

2018-02-13 22:52:07 139478327580416 [Note] WSREP: Closing send monitor...

2018-02-13 22:52:07 139478327580416 [Note] WSREP: Closed send monitor.

2018-02-13 22:52:07 139478327580416 [Note] WSREP: gcomm: terminating thread

2018-02-13 22:52:07 139478327580416 [Note] WSREP: gcomm: joining thread

2018-02-13 22:52:07 139478327580416 [Note] WSREP: gcomm: closing backend

2018-02-13 22:52:07 139478327580416 [Note] WSREP: view(view_id(NON_PRIM,1f26c4fc,3) memb {

	1f26c4fc,0

} joined {

} left {

} partitioned {

	517b5eed,0

	70b64642,0

})

2018-02-13 22:52:07 139478327580416 [Note] WSREP: view((empty))

2018-02-13 22:52:07 139478327580416 [Note] WSREP: gcomm: closed

2018-02-13 22:52:07 139622020704000 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1

2018-02-13 22:52:07 139622020704000 [Note] WSREP: Flow-control interval: [253, 256]

2018-02-13 22:52:07 139622020704000 [Note] WSREP: Trying to continue unpaused monitor

2018-02-13 22:52:07 139622020704000 [Note] WSREP: Received NON-PRIMARY.

2018-02-13 22:52:07 139622020704000 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 15861831987)

2018-02-13 22:52:07 139622020704000 [Note] WSREP: Received self-leave message.

2018-02-13 22:52:07 139622020704000 [Note] WSREP: Flow-control interval: [253, 256]

2018-02-13 22:52:07 139622020704000 [Note] WSREP: Trying to continue unpaused monitor

2018-02-13 22:52:07 139622020704000 [Note] WSREP: Received SELF-LEAVE. Closing connection.

2018-02-13 22:52:07 139622020704000 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 15861831987)

2018-02-13 22:52:07 139622020704000 [Note] WSREP: RECV thread exiting 0: Success

2018-02-13 22:52:07 139478327580416 [Note] WSREP: recv_thread() joined.

2018-02-13 22:52:07 139478327580416 [Note] WSREP: Closing replication queue.

2018-02-13 22:52:07 139478327580416 [Note] WSREP: Closing slave action queue.

2018-02-13 22:52:07 139478327580416 [Note] WSREP: /usr/sbin/mysqld: Terminated.

This is after having set slave_type_conversions=ALL_LOSSY; on all Galera cluster nodes.

Reference to ~~MDEV-9490~~ where this issue was apparently fixed - we're still seeing these problems in MariaDB 10.2.12 and 10.1.25

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

cluster.cnf
1 kB
2018-02-15 12:11
my.cnf.j2
3 kB
2018-02-15 12:12

Issue Links

relates to

MDEV-9490 MariaDB Galera Cluster does not enforce slave_type_conversions

Closed

MDEV-15360 add slave_type_conversions value

Open

Activity

Ascending order - Click to sort in descending order

Zdravelina Sokolovska (Inactive) added a comment - 2018-02-15 12:03

jpc2350 Hello Josh , can you attach server.cnf configs ?

Zdravelina Sokolovska (Inactive) added a comment - 2018-02-15 12:03 jpc2350 Hello Josh , can you attach server.cnf configs ?

Josh Cullum added a comment - 2018-02-15 12:12

Hi @winstone, cluster.cnf my.cnf.j2 attached.

Josh Cullum added a comment - 2018-02-15 12:12 Hi @winstone, cluster.cnf my.cnf.j2 attached.

Zdravelina Sokolovska (Inactive) added a comment - 2018-02-15 17:16

hello jpc2350,
the Errors in ~~MDEV-9490~~ are indeed similar to the Errors you have received
it seems to be from the slave_type_conversions value
try SET GLOBAL slave_type_conversions='ALL_NON_LOSSY,ALL_LOSSY' ;

Zdravelina Sokolovska (Inactive) added a comment - 2018-02-15 17:16 hello jpc2350 , the Errors in MDEV-9490 are indeed similar to the Errors you have received it seems to be from the slave_type_conversions value try SET GLOBAL slave_type_conversions='ALL_NON_LOSSY,ALL_LOSSY' ;

Josh Cullum added a comment - 2018-02-15 21:00

Hi Winstone,

If you look at the procedure’s that we’re running, we’re already setting GLOBAL slave_type_conversions to ALL_LOSSY which implies ALL_NON_LOSSY....

But I will try and set both, and try again.

Thanks,

Josh

Josh Cullum added a comment - 2018-02-15 21:00 Hi Winstone, If you look at the procedure’s that we’re running, we’re already setting GLOBAL slave_type_conversions to ALL_LOSSY which implies ALL_NON_LOSSY.... But I will try and set both, and try again. Thanks, Josh

Josh Cullum added a comment - 2018-02-19 17:00

Hi @winstone,
We're double checking but on Initial pass, it seems to have worked. Can you explain why the docs say ALL_LOSSY implies ALL_NON_LOSSY, but In practice, it doesn't?

Josh Cullum added a comment - 2018-02-19 17:00 Hi @winstone, We're double checking but on Initial pass, it seems to have worked. Can you explain why the docs say ALL_LOSSY implies ALL_NON_LOSSY, but In practice, it doesn't?

Zdravelina Sokolovska (Inactive) added a comment - 2018-02-20 11:37

hello jpc2350 , from docs

slave_type_conversions

    Description: Determines the type conversion mode on the slave when using row-based replication, including replications in MariaDB Galera cluster. Multiple options can be set, delimited by commas. If left empty, the default, type conversions are disallowed. The variable is dynamic and a change in its value takes effect immediately. This variable tells the server what to do if the table definition is different between the master and slave (for example a column is 'int' on the master and 'bigint' on the slave).

        ALL_NON_LOSSY means that all safe conversions (no data loss) are allowed.

        ALL_LOSSY means that all lossy conversions are allowed (for example 'bigint' to 'int'). This, however, does not imply that safe conversions (non-lossy) are allowed as well. In order to allow all conversions, one needs to allow both lossy as well as non-lossy conversions by setting this variable to ALL_NON_LOSSY,ALL_LOSSY.

        Empty (default) means that the server should give an error and replication should stop if the table definition is different between the master and slave.

    Commandline: --slave-type-conversions=set

    Scope: Global

    Dynamic: Yes

    Data Type: set

    Default Value: Empty variable

    Valid Values: ALL_LOSSY, ALL_NON_LOSSY, empty

    Introduced: MariaDB 5.5

but in order to be more sweeping proposed an other yet value for slave_type_conversions to cover implication

Zdravelina Sokolovska (Inactive) added a comment - 2018-02-20 11:37 hello jpc2350 , from docs slave_type_conversions Description: Determines the type conversion mode on the slave when using row-based replication, including replications in MariaDB Galera cluster. Multiple options can be set, delimited by commas. If left empty, the default, type conversions are disallowed. The variable is dynamic and a change in its value takes effect immediately. This variable tells the server what to do if the table definition is different between the master and slave (for example a column is 'int' on the master and 'bigint' on the slave). ALL_NON_LOSSY means that all safe conversions (no data loss) are allowed. ALL_LOSSY means that all lossy conversions are allowed (for example 'bigint' to 'int'). This, however, does not imply that safe conversions (non-lossy) are allowed as well. In order to allow all conversions, one needs to allow both lossy as well as non-lossy conversions by setting this variable to ALL_NON_LOSSY,ALL_LOSSY. Empty (default) means that the server should give an error and replication should stop if the table definition is different between the master and slave. Commandline: --slave-type-conversions=set Scope: Global Dynamic: Yes Data Type: set Default Value: Empty variable Valid Values: ALL_LOSSY, ALL_NON_LOSSY, empty Introduced: MariaDB 5.5 but in order to be more sweeping proposed an other yet value for slave_type_conversions to cover implication

Josh Cullum added a comment - 2018-02-20 11:39

Hi Winstone,

Thanks - I hadn't seen `does not imply that safe conversions` bit.

No problem - Thanks for the support!

KR,

Josh

Josh Cullum added a comment - 2018-02-20 11:39 Hi Winstone, Thanks - I hadn't seen `does not imply that safe conversions` bit. No problem - Thanks for the support! KR, Josh

Zdravelina Sokolovska (Inactive) added a comment - 2018-09-11 14:07

jpc2350,
there is actually another intended wsrep_osu_method – NBO;
wsrep_osu_method=NBO
In upcoming Galera Replication Library 4.x it would be a new schema upgrade method, NBO,
It is expected that it will overcome the disadvantage of the TOI method for replicating DDL, blocking practically all updates in the cluster while a long-running DDL is in progress.
that's the short description of it's work flow
At the start of the DDL, table locks (also known as metadata locks) are taken on all nodes in order to ensure that the DDL operation will be consistent on all nods
The DDL operation is executed on all nodes, using a separate applier thread
The table locks are released on all nodes simultaneously

Zdravelina Sokolovska (Inactive) added a comment - 2018-09-11 14:07 jpc2350 , there is actually another intended wsrep_osu_method – NBO; wsrep_osu_method=NBO In upcoming Galera Replication Library 4.x it would be a new schema upgrade method, NBO, It is expected that it will overcome the disadvantage of the TOI method for replicating DDL, blocking practically all updates in the cluster while a long-running DDL is in progress. that's the short description of it's work flow At the start of the DDL, table locks (also known as metadata locks) are taken on all nodes in order to ensure that the DDL operation will be consistent on all nods The DDL operation is executed on all nodes, using a separate applier thread The table locks are released on all nodes simultaneously

People

Assignee:: Zdravelina Sokolovska (Inactive)

Reporter:: Josh Cullum

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2018-02-13 23:14

Updated:: 2018-09-11 14:08

Resolved:: 2018-09-11 14:08

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration