Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17300

MariaDB Galera cluster crash after issuing ALTER table

Details

    Description

      A simple alter crashed complete cluster after changing ENUM with mariadb internal code: 1677

      ALTER TABLE `table_name`
      CHANGE COLUMN `column_name_9` `column_name_9` ENUM('1','2','3','4') NULL DEFAULT NULL COLLATE 'utf8_general_ci' AFTER `column_name_8`;

      All 5 nodes exited with:
      2018-09-26 8:40:42 87095127 [ERROR] Slave SQL: Column 7 of table 'xxx.table_name' cannot be converted from type 'enum' to type 'varchar(300)', Internal MariaDB error code: 1677

      Table schema:
      column_name_1 int(1) unsigned
      column_name_2 int(1) unsigned
      column_name_3 datetime
      column_name_4 int(1) unsigned
      column_name_5 smallint(1) unsigned
      column_name_6 int(1) unsigned
      column_name_7 int(1) unsigned
      column_name_8 varchar(300)
      column_name_9 enum('1','2','3')

      So server can't change column 7 from type of column 8 to type of column 9 ?

      Why would server issue a change on wrong column?

      This happened on both TOI and RSU, while on RSU just the local node crashed, on TOI this happened to all nodes.

      We were checking if slave_type_conversations could fix the issue, but this can't fix the problem of changing the type of a wrong column?

      https://mariadb.com/kb/en/library/replication-and-binary-log-server-system-variables/#slave_type_conversions

      Hopefully we can resolve this quickly, since this never happened on 10.1.x versions.

      Thanks!

      Attachments

        Issue Links

          Activity

            ibro Ibro created issue -

            Please paste or attach the error log from the server startup up to and including the crash report.

            elenst Elena Stepanova added a comment - Please paste or attach the error log from the server startup up to and including the crash report.
            elenst Elena Stepanova made changes -
            Field Original Value New Value
            Labels 1677 alter 1677 alter need_feedback
            ibro Ibro made changes -
            Attachment mariadb-start-to-crash.log [ 46533 ]
            ibro Ibro added a comment -

            Hello, please see attached (one of the 5 nodes).

            ibro Ibro added a comment - Hello, please see attached (one of the 5 nodes).
            elenst Elena Stepanova made changes -
            Labels 1677 alter need_feedback 1677 alter
            elenst Elena Stepanova made changes -
            Component/s Galera [ 10124 ]
            Fix Version/s 10.3 [ 22126 ]
            Assignee Jan Lindström [ jplindst ]

            ibro, can you also include the SHOW CREATE TABLE output before the ALTER TABLE operation? Is this repeatable with just CREATE TABLE and ALTER TABLE?

            Changing the column type or the order of columns should require the table to be rebuilt. Changing the order of columns is supported with the InnoDB-specific ALGORITHM=INPLACE by rebuilding the table, but changing the column type could require ALGORITHM=COPY, except in special cases where the storage format is not changing.

            If I understood correctly, column_name9 is already located immediately after column_name8, and the storage format of a 4-value ENUM should be the same as the format of a 3-value ENUM. So, the operation should theoretically be instantaneous (no-op for InnoDB) in all 10.x.

            I have the feeling that the problem resides in the SQL layer, outside either InnoDB or Galera.

            I was not able to repeat the failure. The following test passes for me:

            --source include/have_innodb.inc
            CREATE TABLE table_name
            (
            column_name_1 int(1) unsigned,
            column_name_2 int(1) unsigned,
            column_name_3 datetime,
            column_name_4 int(1) unsigned,
            column_name_5 smallint(1) unsigned,
            column_name_6 int(1) unsigned,
            column_name_7 int(1) unsigned,
            column_name_8 varchar(300),
            column_name_9 enum('1','2','3')
            ) ENGINE=InnoDB;
            SHOW CREATE TABLE table_name;
             
            ALTER TABLE `table_name`
            CHANGE COLUMN `column_name_9` `column_name_9` ENUM('1','2','3','4'),
            ALGORITHM=INSTANT;
            SHOW CREATE TABLE table_name;
             
            DROP TABLE table_name;
            

            Same thing if I add either of the redundant clauses AFTER `column_name_8` or NULL DEFAULT NULL.

            I see a problem with the COLLATE clause. How should it even work with anything else than character string types? Internally, ENUM is stored as an integer. As far as I understand, collations should only matter when there are indexes on string columns. If I add a COLLATE clause, the only option is ALGORITHM=COPY:

            mysqltest: At line 16: query 'ALTER TABLE `table_name`
            CHANGE COLUMN `column_name_9` `column_name_9` ENUM('1','2','3','4') NULL DEFAULT NULL COLLATE 'utf8_general_ci' AFTER `column_name_8`,
            ALGORITHM=INPLACE' failed: 1846: ALGORITHM=INPLACE is not supported. Reason: Cannot change column type INPLACE. Try ALGORITHM=COPY

            I think that we need a complete test case.

            marko Marko Mäkelä added a comment - ibro , can you also include the SHOW CREATE TABLE  output before the ALTER TABLE  operation? Is this repeatable with just CREATE TABLE and ALTER TABLE ? Changing the column type or the order of columns should require the table to be rebuilt. Changing the order of columns is supported with the InnoDB-specific ALGORITHM=INPLACE by rebuilding the table, but changing the column type could require ALGORITHM=COPY , except in special cases where the storage format is not changing. If I understood correctly, column_name9 is already located immediately after column_name8 , and the storage format of a 4-value ENUM should be the same as the format of a 3-value ENUM . So, the operation should theoretically be instantaneous (no-op for InnoDB) in all 10.x. I have the feeling that the problem resides in the SQL layer, outside either InnoDB or Galera. I was not able to repeat the failure. The following test passes for me: --source include/have_innodb.inc CREATE TABLE table_name ( column_name_1 int (1) unsigned, column_name_2 int (1) unsigned, column_name_3 datetime, column_name_4 int (1) unsigned, column_name_5 smallint (1) unsigned, column_name_6 int (1) unsigned, column_name_7 int (1) unsigned, column_name_8 varchar (300), column_name_9 enum( '1' , '2' , '3' ) ) ENGINE=InnoDB; SHOW CREATE TABLE table_name;   ALTER TABLE `table_name` CHANGE COLUMN `column_name_9` `column_name_9` ENUM( '1' , '2' , '3' , '4' ), ALGORITHM=INSTANT; SHOW CREATE TABLE table_name;   DROP TABLE table_name; Same thing if I add either of the redundant clauses AFTER `column_name_8` or NULL DEFAULT NULL . I see a problem with the COLLATE  clause. How should it even work with anything else than character string types? Internally, ENUM  is stored as an integer. As far as I understand, collations should only matter when there are indexes on string columns. If I add a COLLATE  clause, the only option is ALGORITHM=COPY : mysqltest: At line 16: query 'ALTER TABLE `table_name` CHANGE COLUMN `column_name_9` `column_name_9` ENUM('1','2','3','4') NULL DEFAULT NULL COLLATE 'utf8_general_ci' AFTER `column_name_8`, ALGORITHM=INPLACE' failed: 1846: ALGORITHM=INPLACE is not supported. Reason: Cannot change column type INPLACE. Try ALGORITHM=COPY I think that we need a complete test case.
            marko Marko Mäkelä made changes -
            ibro Ibro added a comment - - edited

            Got create table from backup:

            /*!40101 SET NAMES binary*/;
            /*!40014 SET FOREIGN_KEY_CHECKS=0*/;
             
            CREATE TABLE `dea_events` (
              `id` int(1) unsigned NOT NULL AUTO_INCREMENT,
              `id_admin` int(1) unsigned NOT NULL,
              `created` datetime NOT NULL,
              `id_school` int(1) unsigned NOT NULL,
              `lang` smallint(1) unsigned NOT NULL,
              `id_event` int(1) unsigned NOT NULL,
              `id_teacher` int(1) unsigned NOT NULL,
              `substitution_type` enum('substitution_by_contribution','substitution_without_contribution','absence','presence') COLLATE utf8_general_ci DEFAULT NULL,
              `comment` varchar(300) COLLATE utf8_general_ci NOT NULL,
              PRIMARY KEY (`id`),
              KEY `FK_id_admin` (`id_admin`),
              KEY `FK_id_teacher` (`id_teacher`),
              KEY `FK_id_school` (`id_school`),
              KEY `FK_id_event` (`id_event`),
              KEY `lang` (`lang`),
              CONSTRAINT `dea_events_ibfk_10` FOREIGN KEY (`id_admin`) REFERENCES `dea_users` (`id`) ON UPDATE CASCADE,
              CONSTRAINT `dea_events_ibfk_11` FOREIGN KEY (`lang`) REFERENCES `dea_lang` (`lang`),
              CONSTRAINT `dea_events_ibfk_2` FOREIGN KEY (`id_school`) REFERENCES `dea_schools` (`id`),
              CONSTRAINT `dea_events_ibfk_6` FOREIGN KEY (`id_event`) REFERENCES `dea_events` (`id`) ON DELETE CASCADE
            ) ENGINE=InnoDB AUTO_INCREMENT=111256026 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;
            

            And the ALTER table:

            ALTER TABLE `dea_events`
            CHANGE COLUMN `substitution_type` `substitution_type` ENUM('substitution_by_contribution','substitution_without_contribution','absence','presence', 'other') NULL DEFAULT NULL COLLATE 'utf8_general_ci' AFTER `comment`;
            

            I think that we figured it out.

            This is probably due to "AFTER `comment`" statement at the end and since column substitution_type is in fact before column `comment` this DDL is not only changing ENUM values, but it's also changing columns position.

            Unfortunately I cannot test this on live env since we will lose all the nodes again.
            Will do test on some development environment instead.

            Thank you for your time!

            ibro Ibro added a comment - - edited Got create table from backup: /*!40101 SET NAMES binary*/ ; /*!40014 SET FOREIGN_KEY_CHECKS=0*/ ;   CREATE TABLE `dea_events` ( `id` int (1) unsigned NOT NULL AUTO_INCREMENT, `id_admin` int (1) unsigned NOT NULL , `created` datetime NOT NULL , `id_school` int (1) unsigned NOT NULL , `lang` smallint (1) unsigned NOT NULL , `id_event` int (1) unsigned NOT NULL , `id_teacher` int (1) unsigned NOT NULL , `substitution_type` enum( 'substitution_by_contribution' , 'substitution_without_contribution' , 'absence' , 'presence' ) COLLATE utf8_general_ci DEFAULT NULL , `comment` varchar (300) COLLATE utf8_general_ci NOT NULL , PRIMARY KEY (`id`), KEY `FK_id_admin` (`id_admin`), KEY `FK_id_teacher` (`id_teacher`), KEY `FK_id_school` (`id_school`), KEY `FK_id_event` (`id_event`), KEY `lang` (`lang`), CONSTRAINT `dea_events_ibfk_10` FOREIGN KEY (`id_admin`) REFERENCES `dea_users` (`id`) ON UPDATE CASCADE , CONSTRAINT `dea_events_ibfk_11` FOREIGN KEY (`lang`) REFERENCES `dea_lang` (`lang`), CONSTRAINT `dea_events_ibfk_2` FOREIGN KEY (`id_school`) REFERENCES `dea_schools` (`id`), CONSTRAINT `dea_events_ibfk_6` FOREIGN KEY (`id_event`) REFERENCES `dea_events` (`id`) ON DELETE CASCADE ) ENGINE=InnoDB AUTO_INCREMENT=111256026 DEFAULT CHARSET=utf8 COLLATE =utf8_general_ci; And the ALTER table: ALTER TABLE `dea_events` CHANGE COLUMN `substitution_type` `substitution_type` ENUM( 'substitution_by_contribution' , 'substitution_without_contribution' , 'absence' , 'presence' , 'other' ) NULL DEFAULT NULL COLLATE 'utf8_general_ci' AFTER `comment`; I think that we figured it out. This is probably due to "AFTER `comment`" statement at the end and since column substitution_type is in fact before column `comment` this DDL is not only changing ENUM values, but it's also changing columns position. Unfortunately I cannot test this on live env since we will lose all the nodes again. Will do test on some development environment instead. Thank you for your time!
            serg Sergei Golubchik made changes -
            Fix Version/s 10.4 [ 22408 ]
            sbktrifork Søren Kröger added a comment - - edited

            I have similar issue on Mariadb 10.2.19, just the other way arround...:

            Setup

            1. 3 nodes running on ubuntu 18.04 LTS, Maxscale in front
            2. Create schema and table

              create database mydb;
              CREATE TABLE `jobs` (
                `jobName` varchar(32) NOT NULL,
                `jobQuery` text DEFAULT NULL,
                `cronSec` varchar(10) DEFAULT '*',
                `cronMin` varchar(10) DEFAULT '*',
                `cronHour` varchar(10) DEFAULT '*',
                `cronDay` varchar(10) DEFAULT '*',
                `cronMonth` varchar(10) DEFAULT '*',
                `cronDayOfWeek` varchar(10) DEFAULT '*',
                `targetId` bigint(20) NOT NULL,
                `sourceId` bigint(20) NOT NULL,
                PRIMARY KEY (`jobName`)
              ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
              

            How to reproduce

            1. Start script creating random writes towards maxscale

              while true; do mysql -utest -ptest -h 127.0.0.1 mydb -e "replace into jobs (Name,job,targetId,sourceId) values (\"test$RANDOM\",\"select $RANDOM\",0,0)"; mysql -utest -ptest -h 127.0.0.1 mydb -e "delete from jobs where Name=\"test$RANDOM\""; echo $count; count=$[count+1]; done
              

            2. Initiate rolling schema change, by draining the first node (which is writer at that point) for connections in maxscale
            3. Run

              SET GLOBAL wsrep_OSU_method='RSU';

              on Node1

            4. Add columns to the table on node1:

              ALTER TABLE `jobs` ADD `targetType` enum('foo','bar','bla','something') DEFAULT 'bla', ADD `targetUser` varchar(100) DEFAULT 'default',ADD `targetSecret` varchar(100) DEFAULT 'default',ADD `targetEndpoint` varchar(100) DEFAULT 'default',ADD `targetFormat` enum('kv','json','csv','ssv','default') DEFAULT 'default',ADD `jobType` enum('sdf','eerw') DEFAULT 'sdf',CHANGE `targetId` `targetId` bigint NOT NULL,CHANGE `sourceId` `sourceId` bigint NOT NULL;
              

            5. Run

              SET GLOBAL wsrep_OSU_method='TOI';

              on Node1

            6. Re-enable node 1 in maxscale and drain node 2
            7. Run

              SET GLOBAL wsrep_OSU_method='RSU';

              on Node2

            8. Add columns to the table on node2:

              ALTER TABLE `jobs` ADD `targetType` enum('foo','bar','bla','something') DEFAULT 'bla', ADD `targetUser` varchar(100) DEFAULT 'default',ADD `targetSecret` varchar(100) DEFAULT 'default',ADD `targetEndpoint` varchar(100) DEFAULT 'default',ADD `targetFormat` enum('kv','json','csv','ssv','default') DEFAULT 'default',ADD `jobType` enum('sdf','eerw') DEFAULT 'sdf',CHANGE `targetId` `targetId` bigint NOT NULL,CHANGE `sourceId` `sourceId` bigint NOT NULL;
              

            9. Node 2 crashes with following message in errorlog

              2020-07-08 10:41:08 140225045473024 [ERROR] Slave SQL: Column 10 of table 'mydb.jobs' cannot be converted from type 'varchar(300)' to type 'enum('foo','bar','bla','somet', Internal MariaDB error code: 1677
              2020-07-08 10:41:08 140224644335360 [ERROR] WSREP: Failed to apply trx: source: a4a849a2-c089-11ea-91bc-5a8b4b2c70b7 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 63332 trx_id: 173888 seqnos (l: 5822, g: 71311, s: 71310, d: 71229, ts: 125314058139380)
              2020-07-08 10:41:08 140224644335360 [ERROR] WSREP: Failed to apply trx 71311 4 times
              2020-07-08 10:41:08 140224644335360 [ERROR] WSREP: Node consistency compromised, aborting...
              2020-07-08 10:41:08 140224644335360 [Note] WSREP: Closing send monitor...
              2020-07-08 10:41:08 140224644335360 [Note] WSREP: Closed send monitor.
              2020-07-08 10:41:08 140224644335360 [Note] WSREP: gcomm: terminating thread
              2020-07-08 10:41:08 140224644335360 [Note] WSREP: gcomm: joining thread
              2020-07-08 10:41:08 140224644335360 [Note] WSREP: gcomm: closing backend
              2020-07-08 10:41:08 140225314678528 [ERROR] WSREP: Failed to apply trx: source: a4a849a2-c089-11ea-91bc-5a8b4b2c70b7 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 63328 trx_id: 173881 seqnos (l: 5818, g: 71308, s: 71307, d: 71179, ts: 125314010896580)
              2020-07-08 10:41:08 140225314678528 [ERROR] WSREP: Failed to apply trx 71308 4 times
              2020-07-08 10:41:08 140225314678528 [ERROR] WSREP: Node consistency compromised, aborting...
              2020-07-08 10:41:08 140224643794688 [ERROR] WSREP: Failed to apply trx: source: a4a849a2-c089-11ea-91bc-5a8b4b2c70b7 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 63327 trx_id: 173879 seqnos (l: 5817, g: 71307, s: 71306, d: 71179, ts: 125313997770580)
              2020-07-08 10:41:08 140224643794688 [ERROR] WSREP: Failed to apply trx 71307 4 times
              2020-07-08 10:41:08 140224644065024 [Warning] WSREP: Failed to apply app buffer: seqno: 71310, status: 1
              	 at galera/src/trx_handle.cpp:apply():353
              Retrying 2th time
              2020-07-08 10:41:08 140224643794688 [ERROR] WSREP: Node consistency compromised, aborting...
              2020-07-08 10:41:08 140225314678528 [Note] WSREP: /pack/mysql/bin/mysqld: Terminated.
              

            Note: The above steps are done by script, so they are done quite fast... maybe too fast?

            Update:
            I tried to do it slowly, introducing sleeps in the insert/delete query loop and also while altering the table - it makes no difference.
            What might be of interest is, that the following appears in the mariadb log at exactly the same time when the alter table command is run (which completes sucessfully btw):

            2020-07-08 14:24:08 140395652138752 [Note] WSREP: Member 1.0 (galera2) desyncs itself from group
            2020-07-08 14:24:08 140395652138752 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 146649)
            2020-07-08 14:24:08 140395385272064 [Note] WSREP: Provider paused at 7bd2be74-bf76-11ea-b37c-eb56bf6479fb:146649 (860)
            2020-07-08 14:24:09 140395385272064 [Note] WSREP: resuming provider at 860
            2020-07-08 14:24:09 140395385272064 [Note] WSREP: Provider resumed.
            2020-07-08 14:24:09 140395652138752 [Note] WSREP: Member 1.0 (galera2) resyncs itself to group
            2020-07-08 14:24:09 140395652138752 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 146649)
            2020-07-08 14:24:09 140395652138752 [Note] WSREP: Member 1.0 (galera2) synced with group.
            2020-07-08 14:24:09 140395652138752 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 146649)
            2020-07-08 14:24:09 140395904915200 [Note] WSREP: Synchronized with group, ready for connections
            2020-07-08 14:24:09 140395904915200 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            

            Also, the node doesn't crash if I stop the insert/delete loop, so it clearly fails to apply the data after the schema has changed even though the queries are compatible with the old and the new schema.

            sbktrifork Søren Kröger added a comment - - edited I have similar issue on Mariadb 10.2.19, just the other way arround...: Setup 3 nodes running on ubuntu 18.04 LTS, Maxscale in front Create schema and table create database mydb; CREATE TABLE `jobs` ( `jobName` varchar(32) NOT NULL, `jobQuery` text DEFAULT NULL, `cronSec` varchar(10) DEFAULT '*', `cronMin` varchar(10) DEFAULT '*', `cronHour` varchar(10) DEFAULT '*', `cronDay` varchar(10) DEFAULT '*', `cronMonth` varchar(10) DEFAULT '*', `cronDayOfWeek` varchar(10) DEFAULT '*', `targetId` bigint(20) NOT NULL, `sourceId` bigint(20) NOT NULL, PRIMARY KEY (`jobName`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; How to reproduce Start script creating random writes towards maxscale while true; do mysql -utest -ptest -h 127.0.0.1 mydb -e "replace into jobs (Name,job,targetId,sourceId) values (\"test$RANDOM\",\"select $RANDOM\",0,0)"; mysql -utest -ptest -h 127.0.0.1 mydb -e "delete from jobs where Name=\"test$RANDOM\""; echo $count; count=$[count+1]; done Initiate rolling schema change, by draining the first node (which is writer at that point) for connections in maxscale Run SET GLOBAL wsrep_OSU_method='RSU'; on Node1 Add columns to the table on node1: ALTER TABLE `jobs` ADD `targetType` enum('foo','bar','bla','something') DEFAULT 'bla', ADD `targetUser` varchar(100) DEFAULT 'default',ADD `targetSecret` varchar(100) DEFAULT 'default',ADD `targetEndpoint` varchar(100) DEFAULT 'default',ADD `targetFormat` enum('kv','json','csv','ssv','default') DEFAULT 'default',ADD `jobType` enum('sdf','eerw') DEFAULT 'sdf',CHANGE `targetId` `targetId` bigint NOT NULL,CHANGE `sourceId` `sourceId` bigint NOT NULL; Run SET GLOBAL wsrep_OSU_method='TOI'; on Node1 Re-enable node 1 in maxscale and drain node 2 Run SET GLOBAL wsrep_OSU_method='RSU'; on Node2 Add columns to the table on node2: ALTER TABLE `jobs` ADD `targetType` enum('foo','bar','bla','something') DEFAULT 'bla', ADD `targetUser` varchar(100) DEFAULT 'default',ADD `targetSecret` varchar(100) DEFAULT 'default',ADD `targetEndpoint` varchar(100) DEFAULT 'default',ADD `targetFormat` enum('kv','json','csv','ssv','default') DEFAULT 'default',ADD `jobType` enum('sdf','eerw') DEFAULT 'sdf',CHANGE `targetId` `targetId` bigint NOT NULL,CHANGE `sourceId` `sourceId` bigint NOT NULL; Node 2 crashes with following message in errorlog 2020-07-08 10:41:08 140225045473024 [ERROR] Slave SQL: Column 10 of table 'mydb.jobs' cannot be converted from type 'varchar(300)' to type 'enum('foo','bar','bla','somet', Internal MariaDB error code: 1677 2020-07-08 10:41:08 140224644335360 [ERROR] WSREP: Failed to apply trx: source: a4a849a2-c089-11ea-91bc-5a8b4b2c70b7 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 63332 trx_id: 173888 seqnos (l: 5822, g: 71311, s: 71310, d: 71229, ts: 125314058139380) 2020-07-08 10:41:08 140224644335360 [ERROR] WSREP: Failed to apply trx 71311 4 times 2020-07-08 10:41:08 140224644335360 [ERROR] WSREP: Node consistency compromised, aborting... 2020-07-08 10:41:08 140224644335360 [Note] WSREP: Closing send monitor... 2020-07-08 10:41:08 140224644335360 [Note] WSREP: Closed send monitor. 2020-07-08 10:41:08 140224644335360 [Note] WSREP: gcomm: terminating thread 2020-07-08 10:41:08 140224644335360 [Note] WSREP: gcomm: joining thread 2020-07-08 10:41:08 140224644335360 [Note] WSREP: gcomm: closing backend 2020-07-08 10:41:08 140225314678528 [ERROR] WSREP: Failed to apply trx: source: a4a849a2-c089-11ea-91bc-5a8b4b2c70b7 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 63328 trx_id: 173881 seqnos (l: 5818, g: 71308, s: 71307, d: 71179, ts: 125314010896580) 2020-07-08 10:41:08 140225314678528 [ERROR] WSREP: Failed to apply trx 71308 4 times 2020-07-08 10:41:08 140225314678528 [ERROR] WSREP: Node consistency compromised, aborting... 2020-07-08 10:41:08 140224643794688 [ERROR] WSREP: Failed to apply trx: source: a4a849a2-c089-11ea-91bc-5a8b4b2c70b7 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 63327 trx_id: 173879 seqnos (l: 5817, g: 71307, s: 71306, d: 71179, ts: 125313997770580) 2020-07-08 10:41:08 140224643794688 [ERROR] WSREP: Failed to apply trx 71307 4 times 2020-07-08 10:41:08 140224644065024 [Warning] WSREP: Failed to apply app buffer: seqno: 71310, status: 1 at galera/src/trx_handle.cpp:apply():353 Retrying 2th time 2020-07-08 10:41:08 140224643794688 [ERROR] WSREP: Node consistency compromised, aborting... 2020-07-08 10:41:08 140225314678528 [Note] WSREP: /pack/mysql/bin/mysqld: Terminated. Note: The above steps are done by script, so they are done quite fast... maybe too fast? Update: I tried to do it slowly, introducing sleeps in the insert/delete query loop and also while altering the table - it makes no difference. What might be of interest is, that the following appears in the mariadb log at exactly the same time when the alter table command is run (which completes sucessfully btw): 2020-07-08 14:24:08 140395652138752 [Note] WSREP: Member 1.0 (galera2) desyncs itself from group 2020-07-08 14:24:08 140395652138752 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 146649) 2020-07-08 14:24:08 140395385272064 [Note] WSREP: Provider paused at 7bd2be74-bf76-11ea-b37c-eb56bf6479fb:146649 (860) 2020-07-08 14:24:09 140395385272064 [Note] WSREP: resuming provider at 860 2020-07-08 14:24:09 140395385272064 [Note] WSREP: Provider resumed. 2020-07-08 14:24:09 140395652138752 [Note] WSREP: Member 1.0 (galera2) resyncs itself to group 2020-07-08 14:24:09 140395652138752 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 146649) 2020-07-08 14:24:09 140395652138752 [Note] WSREP: Member 1.0 (galera2) synced with group. 2020-07-08 14:24:09 140395652138752 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 146649) 2020-07-08 14:24:09 140395904915200 [Note] WSREP: Synchronized with group, ready for connections 2020-07-08 14:24:09 140395904915200 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. Also, the node doesn't crash if I stop the insert/delete loop, so it clearly fails to apply the data after the schema has changed even though the queries are compatible with the old and the new schema.

            Hello,
            I reproduced this issue with mariadb 10.3.31 in production

            Stedounet La Cancellera Yoann added a comment - Hello, I reproduced this issue with mariadb 10.3.31 in production
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 89773 ] MariaDB v4 [ 140946 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Ramesh Sivaraman [ JIRAUSER48189 ]

            Stedounet If you have a reproducible test case, please share it

            ramesh Ramesh Sivaraman added a comment - Stedounet If you have a reproducible test case, please share it
            jplindst Jan Lindström (Inactive) made changes -
            Fix Version/s N/A [ 14700 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Resolution Cannot Reproduce [ 5 ]
            Status Open [ 1 ] Closed [ 6 ]

            I tried to reproduce with table schema and the DDL that crashed the production, without success
            Everything I saw back then was identical with what Soren submitted (log, ddl)

            The DDL I had:
            ALTER TABLE <database>.<table>
            ADD new_col1 TEXT NOT NULL AFTER old_col,
            ADD new_col2 TEXT NOT NULL AFTER new_col1,
            ADD new_col3 TEXT NOT NULL AFTER new_col2;

            Stedounet La Cancellera Yoann added a comment - I tried to reproduce with table schema and the DDL that crashed the production, without success Everything I saw back then was identical with what Soren submitted (log, ddl) The DDL I had: ALTER TABLE <database>.<table> ADD new_col1 TEXT NOT NULL AFTER old_col, ADD new_col2 TEXT NOT NULL AFTER new_col1, ADD new_col3 TEXT NOT NULL AFTER new_col2;

            People

              ramesh Ramesh Sivaraman
              ibro Ibro
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.