Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15309

MariaDB-server w/ Galera not enforcing slave_type_conversions

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.1.25, 10.2.12, 10.2.13
    • N/A
    • Galera, Server
    • QA environment, 3 Node MariaDB Galera Cluster - 10.2.12 + galera-25.3.22, Amazon AMI 2017.9
      1.3TB worth of data

      QA Environment, 3 Node MariaDB Galera Cluster - 10.1.25 + galera-25.3.20, Amazon AMI 2017.3
      1.3TB worth of data

    Description

      I'm running the following procedure on one node in a 3 node galera cluster:

      mysql -u root -p
      use {foobar};
      SET SESSION wsrep_OSU_method='RSU';
      SET GLOBAL slave_type_conversions='ALL_LOSSY';
      ALTER TABLE {foo} DROP FOREIGN KEY fkb1;
      ALTER TABLE {foo} MODIFY COLUMN {column_name} BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, MODIFY COLUMN {second_column} BIGINT UNSIGNED;
      ALTER TABLE {foo} ADD CONSTRAINT `fkb1` FOREIGN KEY {second column}  REFERENCES `{foobar}` (`{foo}`) ON DELETE NO ACTION ON UPDATE NO ACTION;
      SET SESSION wsrep_OSU_method='TOI';
      

      The change works as expected, however, when trying to sync the node back to the cluster, the following occurs:

      2018-02-13 22:52:03 139478253385472 [Note] Start binlog_dump to slave_server(0), pos(nodename-bin.000011, 4)
      2018-02-13 22:52:07 139478327580416 [ERROR] Slave SQL: Column 0 of table '{foobar}.{foo}' cannot be converted from type 'int' to type 'bigint(20) unsigned', Internal MariaDB error code: 1677
      2018-02-13 22:52:07 139478327580416 [Warning] WSREP: RBR event 3 Update_rows_v1 apply warning: 3, 15861831987
      2018-02-13 22:52:07 139478327580416 [Warning] WSREP: Failed to apply app buffer: seqno: 15861831987, status: 1
      	 at galera/src/trx_handle.cpp:apply():351
      Retrying 2th time
      2018-02-13 22:52:07 139478327580416 [ERROR] Slave SQL: Column 0 of table '{foobar}.{foo}' cannot be converted from type 'int' to type 'bigint(20) unsigned', Internal MariaDB error code: 1677
      2018-02-13 22:52:07 139478327580416 [Warning] WSREP: RBR event 3 Update_rows_v1 apply warning: 3, 15861831987
      2018-02-13 22:52:07 139478327580416 [Warning] WSREP: Failed to apply app buffer: seqno: 15861831987, status: 1
      	 at galera/src/trx_handle.cpp:apply():351
      Retrying 3th time
      2018-02-13 22:52:07 139478327580416 [ERROR] Slave SQL: Column 0 of table '{foobar}.{foo}' cannot be converted from type 'int' to type 'bigint(20) unsigned', Internal MariaDB error code: 1677
      2018-02-13 22:52:07 139478327580416 [Warning] WSREP: RBR event 3 Update_rows_v1 apply warning: 3, 15861831987
      2018-02-13 22:52:07 139478327580416 [Warning] WSREP: Failed to apply app buffer: seqno: 15861831987, status: 1
      	 at galera/src/trx_handle.cpp:apply():351
      Retrying 4th time
      2018-02-13 22:52:07 139478327580416 [ERROR] Slave SQL: Column 0 of table '{foobar}.{foo}' cannot be converted from type 'int' to type 'bigint(20) unsigned', Internal MariaDB error code: 1677
      2018-02-13 22:52:07 139478327580416 [Warning] WSREP: RBR event 3 Update_rows_v1 apply warning: 3, 15861831987
      2018-02-13 22:52:07 139478327580416 [ERROR] WSREP: Failed to apply trx: source: 517b5eed-10d4-11e8-9200-9ed4847004dd version: 3 local: 0 state: APPLYING flags: 1 conn_id: 2438 trx_id: 25630496491 seqnos (l: 13, g: 15861831987, s: 15861831986, d: 15861831986, ts: 1168584749936241)
      2018-02-13 22:52:07 139478327580416 [ERROR] WSREP: Failed to apply trx 15861831987 4 times
      2018-02-13 22:52:07 139478327580416 [ERROR] WSREP: Node consistency compromised, aborting...
      2018-02-13 22:52:07 139478327580416 [Note] WSREP: Closing send monitor...
      2018-02-13 22:52:07 139478327580416 [Note] WSREP: Closed send monitor.
      2018-02-13 22:52:07 139478327580416 [Note] WSREP: gcomm: terminating thread
      2018-02-13 22:52:07 139478327580416 [Note] WSREP: gcomm: joining thread
      2018-02-13 22:52:07 139478327580416 [Note] WSREP: gcomm: closing backend
      2018-02-13 22:52:07 139478327580416 [Note] WSREP: view(view_id(NON_PRIM,1f26c4fc,3) memb {
      	1f26c4fc,0
      } joined {
      } left {
      } partitioned {
      	517b5eed,0
      	70b64642,0
      })
      2018-02-13 22:52:07 139478327580416 [Note] WSREP: view((empty))
      2018-02-13 22:52:07 139478327580416 [Note] WSREP: gcomm: closed
      2018-02-13 22:52:07 139622020704000 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2018-02-13 22:52:07 139622020704000 [Note] WSREP: Flow-control interval: [253, 256]
      2018-02-13 22:52:07 139622020704000 [Note] WSREP: Trying to continue unpaused monitor
      2018-02-13 22:52:07 139622020704000 [Note] WSREP: Received NON-PRIMARY.
      2018-02-13 22:52:07 139622020704000 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 15861831987)
      2018-02-13 22:52:07 139622020704000 [Note] WSREP: Received self-leave message.
      2018-02-13 22:52:07 139622020704000 [Note] WSREP: Flow-control interval: [253, 256]
      2018-02-13 22:52:07 139622020704000 [Note] WSREP: Trying to continue unpaused monitor
      2018-02-13 22:52:07 139622020704000 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      2018-02-13 22:52:07 139622020704000 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 15861831987)
      2018-02-13 22:52:07 139622020704000 [Note] WSREP: RECV thread exiting 0: Success
      2018-02-13 22:52:07 139478327580416 [Note] WSREP: recv_thread() joined.
      2018-02-13 22:52:07 139478327580416 [Note] WSREP: Closing replication queue.
      2018-02-13 22:52:07 139478327580416 [Note] WSREP: Closing slave action queue.
      2018-02-13 22:52:07 139478327580416 [Note] WSREP: /usr/sbin/mysqld: Terminated.
      

      This is after having set slave_type_conversions=ALL_LOSSY; on all Galera cluster nodes.

      Reference to MDEV-9490 where this issue was apparently fixed - we're still seeing these problems in MariaDB 10.2.12 and 10.1.25

      Attachments

        Issue Links

          Activity

            jpc2350 Hello Josh , can you attach server.cnf configs ?

            winstone Zdravelina Sokolovska (Inactive) added a comment - jpc2350 Hello Josh , can you attach server.cnf configs ?
            jpc2350 Josh Cullum added a comment -

            Hi @winstone, cluster.cnf my.cnf.j2 attached.

            jpc2350 Josh Cullum added a comment - Hi @winstone, cluster.cnf my.cnf.j2 attached.

            hello jpc2350,
            the Errors in MDEV-9490 are indeed similar to the Errors you have received
            it seems to be from the slave_type_conversions value
            try SET GLOBAL slave_type_conversions='ALL_NON_LOSSY,ALL_LOSSY' ;

            winstone Zdravelina Sokolovska (Inactive) added a comment - hello jpc2350 , the Errors in MDEV-9490 are indeed similar to the Errors you have received it seems to be from the slave_type_conversions value try SET GLOBAL slave_type_conversions='ALL_NON_LOSSY,ALL_LOSSY' ;
            jpc2350 Josh Cullum added a comment -

            Hi Winstone,

            If you look at the procedure’s that we’re running, we’re already setting GLOBAL slave_type_conversions to ALL_LOSSY which implies ALL_NON_LOSSY....

            But I will try and set both, and try again.

            Thanks,

            Josh

            jpc2350 Josh Cullum added a comment - Hi Winstone, If you look at the procedure’s that we’re running, we’re already setting GLOBAL slave_type_conversions to ALL_LOSSY which implies ALL_NON_LOSSY.... But I will try and set both, and try again. Thanks, Josh
            jpc2350 Josh Cullum added a comment -

            Hi @winstone,
            We're double checking but on Initial pass, it seems to have worked. Can you explain why the docs say ALL_LOSSY implies ALL_NON_LOSSY, but In practice, it doesn't?

            jpc2350 Josh Cullum added a comment - Hi @winstone, We're double checking but on Initial pass, it seems to have worked. Can you explain why the docs say ALL_LOSSY implies ALL_NON_LOSSY, but In practice, it doesn't?

            hello jpc2350 , from docs

            slave_type_conversions
             
                Description: Determines the type conversion mode on the slave when using row-based replication, including replications in MariaDB Galera cluster. Multiple options can be set, delimited by commas. If left empty, the default, type conversions are disallowed. The variable is dynamic and a change in its value takes effect immediately. This variable tells the server what to do if the table definition is different between the master and slave (for example a column is 'int' on the master and 'bigint' on the slave).
                    ALL_NON_LOSSY means that all safe conversions (no data loss) are allowed.
                    ALL_LOSSY means that all lossy conversions are allowed (for example 'bigint' to 'int'). This, however, does not imply that safe conversions (non-lossy) are allowed as well. In order to allow all conversions, one needs to allow both lossy as well as non-lossy conversions by setting this variable to ALL_NON_LOSSY,ALL_LOSSY.
                    Empty (default) means that the server should give an error and replication should stop if the table definition is different between the master and slave. 
                Commandline: --slave-type-conversions=set
                Scope: Global
                Dynamic: Yes
                Data Type: set
                Default Value: Empty variable
                Valid Values: ALL_LOSSY, ALL_NON_LOSSY, empty
                Introduced: MariaDB 5.5 
            

            but in order to be more sweeping proposed an other yet value for slave_type_conversions to cover implication

            winstone Zdravelina Sokolovska (Inactive) added a comment - hello jpc2350 , from docs slave_type_conversions   Description: Determines the type conversion mode on the slave when using row-based replication, including replications in MariaDB Galera cluster. Multiple options can be set, delimited by commas. If left empty, the default, type conversions are disallowed. The variable is dynamic and a change in its value takes effect immediately. This variable tells the server what to do if the table definition is different between the master and slave (for example a column is 'int' on the master and 'bigint' on the slave). ALL_NON_LOSSY means that all safe conversions (no data loss) are allowed. ALL_LOSSY means that all lossy conversions are allowed (for example 'bigint' to 'int'). This, however, does not imply that safe conversions (non-lossy) are allowed as well. In order to allow all conversions, one needs to allow both lossy as well as non-lossy conversions by setting this variable to ALL_NON_LOSSY,ALL_LOSSY. Empty (default) means that the server should give an error and replication should stop if the table definition is different between the master and slave. Commandline: --slave-type-conversions=set Scope: Global Dynamic: Yes Data Type: set Default Value: Empty variable Valid Values: ALL_LOSSY, ALL_NON_LOSSY, empty Introduced: MariaDB 5.5 but in order to be more sweeping proposed an other yet value for slave_type_conversions to cover implication
            jpc2350 Josh Cullum added a comment -

            Hi Winstone,

            Thanks - I hadn't seen `does not imply that safe conversions` bit.

            No problem - Thanks for the support!

            KR,

            Josh

            jpc2350 Josh Cullum added a comment - Hi Winstone, Thanks - I hadn't seen `does not imply that safe conversions` bit. No problem - Thanks for the support! KR, Josh

            jpc2350,
            there is actually another intended wsrep_osu_method – NBO;
            wsrep_osu_method=NBO
            In upcoming Galera Replication Library 4.x it would be a new schema upgrade method, NBO,
            It is expected that it will overcome the disadvantage of the TOI method for replicating DDL, blocking practically all updates in the cluster while a long-running DDL is in progress.
            that's the short description of it's work flow
            At the start of the DDL, table locks (also known as metadata locks) are taken on all nodes in order to ensure that the DDL operation will be consistent on all nods
            The DDL operation is executed on all nodes, using a separate applier thread
            The table locks are released on all nodes simultaneously

            winstone Zdravelina Sokolovska (Inactive) added a comment - jpc2350 , there is actually another intended wsrep_osu_method – NBO; wsrep_osu_method=NBO In upcoming Galera Replication Library 4.x it would be a new schema upgrade method, NBO, It is expected that it will overcome the disadvantage of the TOI method for replicating DDL, blocking practically all updates in the cluster while a long-running DDL is in progress. that's the short description of it's work flow At the start of the DDL, table locks (also known as metadata locks) are taken on all nodes in order to ensure that the DDL operation will be consistent on all nods The DDL operation is executed on all nodes, using a separate applier thread The table locks are released on all nodes simultaneously

            People

              winstone Zdravelina Sokolovska (Inactive)
              jpc2350 Josh Cullum
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.