Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Cannot Reproduce
-
10.3.8
-
5 node MariaDB galera cluster on CentOS 7.x
Description
A simple alter crashed complete cluster after changing ENUM with mariadb internal code: 1677
ALTER TABLE `table_name`
CHANGE COLUMN `column_name_9` `column_name_9` ENUM('1','2','3','4') NULL DEFAULT NULL COLLATE 'utf8_general_ci' AFTER `column_name_8`;
All 5 nodes exited with:
2018-09-26 8:40:42 87095127 [ERROR] Slave SQL: Column 7 of table 'xxx.table_name' cannot be converted from type 'enum' to type 'varchar(300)', Internal MariaDB error code: 1677
Table schema:
column_name_1 int(1) unsigned
column_name_2 int(1) unsigned
column_name_3 datetime
column_name_4 int(1) unsigned
column_name_5 smallint(1) unsigned
column_name_6 int(1) unsigned
column_name_7 int(1) unsigned
column_name_8 varchar(300)
column_name_9 enum('1','2','3')
So server can't change column 7 from type of column 8 to type of column 9 ?
Why would server issue a change on wrong column?
This happened on both TOI and RSU, while on RSU just the local node crashed, on TOI this happened to all nodes.
We were checking if slave_type_conversations could fix the issue, but this can't fix the problem of changing the type of a wrong column?
Hopefully we can resolve this quickly, since this never happened on 10.1.x versions.
Thanks!
Attachments
Issue Links
- relates to
-
MDEV-17301 Change of COLLATE unnecessarily requires ALGORITHM=COPY
-
- Closed
-
Activity
Hello,
I reproduced this issue with mariadb 10.3.31 in production
I have similar issue on Mariadb 10.2.19, just the other way arround...:
Setup
- 3 nodes running on ubuntu 18.04 LTS, Maxscale in front
- Create schema and table
create database mydb;
CREATE TABLE `jobs` (
`jobName` varchar(32) NOT NULL,
`jobQuery` text DEFAULT NULL,
`cronSec` varchar(10) DEFAULT '*',
`cronMin` varchar(10) DEFAULT '*',
`cronHour` varchar(10) DEFAULT '*',
`cronDay` varchar(10) DEFAULT '*',
`cronMonth` varchar(10) DEFAULT '*',
`cronDayOfWeek` varchar(10) DEFAULT '*',
`targetId` bigint(20) NOT NULL,
`sourceId` bigint(20) NOT NULL,
PRIMARY KEY (`jobName`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
How to reproduce
- Start script creating random writes towards maxscale
while true; do mysql -utest -ptest -h 127.0.0.1 mydb -e "replace into jobs (Name,job,targetId,sourceId) values (\"test$RANDOM\",\"select $RANDOM\",0,0)"; mysql -utest -ptest -h 127.0.0.1 mydb -e "delete from jobs where Name=\"test$RANDOM\""; echo $count; count=$[count+1]; done
- Initiate rolling schema change, by draining the first node (which is writer at that point) for connections in maxscale
- Run
SET GLOBAL wsrep_OSU_method='RSU';
on Node1
- Add columns to the table on node1:
ALTER TABLE `jobs` ADD `targetType` enum('foo','bar','bla','something') DEFAULT 'bla', ADD `targetUser` varchar(100) DEFAULT 'default',ADD `targetSecret` varchar(100) DEFAULT 'default',ADD `targetEndpoint` varchar(100) DEFAULT 'default',ADD `targetFormat` enum('kv','json','csv','ssv','default') DEFAULT 'default',ADD `jobType` enum('sdf','eerw') DEFAULT 'sdf',CHANGE `targetId` `targetId` bigint NOT NULL,CHANGE `sourceId` `sourceId` bigint NOT NULL;
- Run
SET GLOBAL wsrep_OSU_method='TOI';
on Node1
- Re-enable node 1 in maxscale and drain node 2
- Run
SET GLOBAL wsrep_OSU_method='RSU';
on Node2
- Add columns to the table on node2:
ALTER TABLE `jobs` ADD `targetType` enum('foo','bar','bla','something') DEFAULT 'bla', ADD `targetUser` varchar(100) DEFAULT 'default',ADD `targetSecret` varchar(100) DEFAULT 'default',ADD `targetEndpoint` varchar(100) DEFAULT 'default',ADD `targetFormat` enum('kv','json','csv','ssv','default') DEFAULT 'default',ADD `jobType` enum('sdf','eerw') DEFAULT 'sdf',CHANGE `targetId` `targetId` bigint NOT NULL,CHANGE `sourceId` `sourceId` bigint NOT NULL;
- Node 2 crashes with following message in errorlog
2020-07-08 10:41:08 140225045473024 [ERROR] Slave SQL: Column 10 of table 'mydb.jobs' cannot be converted from type 'varchar(300)' to type 'enum('foo','bar','bla','somet', Internal MariaDB error code: 1677
2020-07-08 10:41:08 140224644335360 [ERROR] WSREP: Failed to apply trx: source: a4a849a2-c089-11ea-91bc-5a8b4b2c70b7 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 63332 trx_id: 173888 seqnos (l: 5822, g: 71311, s: 71310, d: 71229, ts: 125314058139380)
2020-07-08 10:41:08 140224644335360 [ERROR] WSREP: Failed to apply trx 71311 4 times
2020-07-08 10:41:08 140224644335360 [ERROR] WSREP: Node consistency compromised, aborting...
2020-07-08 10:41:08 140224644335360 [Note] WSREP: Closing send monitor...
2020-07-08 10:41:08 140224644335360 [Note] WSREP: Closed send monitor.
2020-07-08 10:41:08 140224644335360 [Note] WSREP: gcomm: terminating thread
2020-07-08 10:41:08 140224644335360 [Note] WSREP: gcomm: joining thread
2020-07-08 10:41:08 140224644335360 [Note] WSREP: gcomm: closing backend
2020-07-08 10:41:08 140225314678528 [ERROR] WSREP: Failed to apply trx: source: a4a849a2-c089-11ea-91bc-5a8b4b2c70b7 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 63328 trx_id: 173881 seqnos (l: 5818, g: 71308, s: 71307, d: 71179, ts: 125314010896580)
2020-07-08 10:41:08 140225314678528 [ERROR] WSREP: Failed to apply trx 71308 4 times
2020-07-08 10:41:08 140225314678528 [ERROR] WSREP: Node consistency compromised, aborting...
2020-07-08 10:41:08 140224643794688 [ERROR] WSREP: Failed to apply trx: source: a4a849a2-c089-11ea-91bc-5a8b4b2c70b7 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 63327 trx_id: 173879 seqnos (l: 5817, g: 71307, s: 71306, d: 71179, ts: 125313997770580)
2020-07-08 10:41:08 140224643794688 [ERROR] WSREP: Failed to apply trx 71307 4 times
2020-07-08 10:41:08 140224644065024 [Warning] WSREP: Failed to apply app buffer: seqno: 71310, status: 1
at galera/src/trx_handle.cpp:apply():353
Retrying 2th time
2020-07-08 10:41:08 140224643794688 [ERROR] WSREP: Node consistency compromised, aborting...
2020-07-08 10:41:08 140225314678528 [Note] WSREP: /pack/mysql/bin/mysqld: Terminated.
Note: The above steps are done by script, so they are done quite fast... maybe too fast?
Update:
I tried to do it slowly, introducing sleeps in the insert/delete query loop and also while altering the table - it makes no difference.
What might be of interest is, that the following appears in the mariadb log at exactly the same time when the alter table command is run (which completes sucessfully btw):
2020-07-08 14:24:08 140395652138752 [Note] WSREP: Member 1.0 (galera2) desyncs itself from group
|
2020-07-08 14:24:08 140395652138752 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 146649)
|
2020-07-08 14:24:08 140395385272064 [Note] WSREP: Provider paused at 7bd2be74-bf76-11ea-b37c-eb56bf6479fb:146649 (860)
|
2020-07-08 14:24:09 140395385272064 [Note] WSREP: resuming provider at 860
|
2020-07-08 14:24:09 140395385272064 [Note] WSREP: Provider resumed.
|
2020-07-08 14:24:09 140395652138752 [Note] WSREP: Member 1.0 (galera2) resyncs itself to group
|
2020-07-08 14:24:09 140395652138752 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 146649)
|
2020-07-08 14:24:09 140395652138752 [Note] WSREP: Member 1.0 (galera2) synced with group.
|
2020-07-08 14:24:09 140395652138752 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 146649)
|
2020-07-08 14:24:09 140395904915200 [Note] WSREP: Synchronized with group, ready for connections
|
2020-07-08 14:24:09 140395904915200 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
Also, the node doesn't crash if I stop the insert/delete loop, so it clearly fails to apply the data after the schema has changed even though the queries are compatible with the old and the new schema.
Got create table from backup:
/*!40101 SET NAMES binary*/; |
/*!40014 SET FOREIGN_KEY_CHECKS=0*/; |
|
CREATE TABLE `dea_events` ( |
`id` int(1) unsigned NOT NULL AUTO_INCREMENT, |
`id_admin` int(1) unsigned NOT NULL, |
`created` datetime NOT NULL, |
`id_school` int(1) unsigned NOT NULL, |
`lang` smallint(1) unsigned NOT NULL, |
`id_event` int(1) unsigned NOT NULL, |
`id_teacher` int(1) unsigned NOT NULL, |
`substitution_type` enum('substitution_by_contribution','substitution_without_contribution','absence','presence') COLLATE utf8_general_ci DEFAULT NULL, |
`comment` varchar(300) COLLATE utf8_general_ci NOT NULL, |
PRIMARY KEY (`id`), |
KEY `FK_id_admin` (`id_admin`), |
KEY `FK_id_teacher` (`id_teacher`), |
KEY `FK_id_school` (`id_school`), |
KEY `FK_id_event` (`id_event`), |
KEY `lang` (`lang`), |
CONSTRAINT `dea_events_ibfk_10` FOREIGN KEY (`id_admin`) REFERENCES `dea_users` (`id`) ON UPDATE CASCADE, |
CONSTRAINT `dea_events_ibfk_11` FOREIGN KEY (`lang`) REFERENCES `dea_lang` (`lang`), |
CONSTRAINT `dea_events_ibfk_2` FOREIGN KEY (`id_school`) REFERENCES `dea_schools` (`id`), |
CONSTRAINT `dea_events_ibfk_6` FOREIGN KEY (`id_event`) REFERENCES `dea_events` (`id`) ON DELETE CASCADE |
) ENGINE=InnoDB AUTO_INCREMENT=111256026 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci; |
And the ALTER table:
ALTER TABLE `dea_events` |
CHANGE COLUMN `substitution_type` `substitution_type` ENUM('substitution_by_contribution','substitution_without_contribution','absence','presence', 'other') NULL DEFAULT NULL COLLATE 'utf8_general_ci' AFTER `comment`; |
I think that we figured it out.
This is probably due to "AFTER `comment`" statement at the end and since column substitution_type is in fact before column `comment` this DDL is not only changing ENUM values, but it's also changing columns position.
Unfortunately I cannot test this on live env since we will lose all the nodes again.
Will do test on some development environment instead.
Thank you for your time!
ibro, can you also include the SHOW CREATE TABLE output before the ALTER TABLE operation? Is this repeatable with just CREATE TABLE and ALTER TABLE?
Changing the column type or the order of columns should require the table to be rebuilt. Changing the order of columns is supported with the InnoDB-specific ALGORITHM=INPLACE by rebuilding the table, but changing the column type could require ALGORITHM=COPY, except in special cases where the storage format is not changing.
If I understood correctly, column_name9 is already located immediately after column_name8, and the storage format of a 4-value ENUM should be the same as the format of a 3-value ENUM. So, the operation should theoretically be instantaneous (no-op for InnoDB) in all 10.x.
I have the feeling that the problem resides in the SQL layer, outside either InnoDB or Galera.
I was not able to repeat the failure. The following test passes for me:
--source include/have_innodb.inc
|
CREATE TABLE table_name |
(
|
column_name_1 int(1) unsigned, |
column_name_2 int(1) unsigned, |
column_name_3 datetime,
|
column_name_4 int(1) unsigned, |
column_name_5 smallint(1) unsigned, |
column_name_6 int(1) unsigned, |
column_name_7 int(1) unsigned, |
column_name_8 varchar(300), |
column_name_9 enum('1','2','3') |
) ENGINE=InnoDB;
|
SHOW CREATE TABLE table_name; |
|
ALTER TABLE `table_name` |
CHANGE COLUMN `column_name_9` `column_name_9` ENUM('1','2','3','4'), |
ALGORITHM=INSTANT;
|
SHOW CREATE TABLE table_name; |
|
DROP TABLE table_name; |
Same thing if I add either of the redundant clauses AFTER `column_name_8` or NULL DEFAULT NULL.
I see a problem with the COLLATE clause. How should it even work with anything else than character string types? Internally, ENUM is stored as an integer. As far as I understand, collations should only matter when there are indexes on string columns. If I add a COLLATE clause, the only option is ALGORITHM=COPY:
mysqltest: At line 16: query 'ALTER TABLE `table_name`
CHANGE COLUMN `column_name_9` `column_name_9` ENUM('1','2','3','4') NULL DEFAULT NULL COLLATE 'utf8_general_ci' AFTER `column_name_8`,
ALGORITHM=INPLACE' failed: 1846: ALGORITHM=INPLACE is not supported. Reason: Cannot change column type INPLACE. Try ALGORITHM=COPY
I think that we need a complete test case.
Please paste or attach the error log from the server startup up to and including the crash report.
I tried to reproduce with table schema and the DDL that crashed the production, without success
Everything I saw back then was identical with what Soren submitted (log, ddl)
The DDL I had:
ALTER TABLE <database>.<table>
ADD new_col1 TEXT NOT NULL AFTER old_col,
ADD new_col2 TEXT NOT NULL AFTER new_col1,
ADD new_col3 TEXT NOT NULL AFTER new_col2;