The bug happens if system_versioning_alter_history is changed while MariaDB is running. The setting is unset in the cnf file, therefor defaulting to "ERROR"
Installations are fresh Debian 9 VMs with official MariaDB binaries running with stock configuration with the following file added.
Config on both nodes:
root@galera-test-1:/# cat /etc/mysql/conf.d/galera.cnf
|
[mysqld]
|
character-set-server=utf8mb4
|
log_slave_updates=1
|
innodb_buffer_pool_size=768M
|
binlog_format=ROW
|
default-storage-engine=innodb
|
innodb_autoinc_lock_mode=2
|
bind-address=0.0.0.0
|
skip-name-resolve
|
|
[galera]
|
wsrep_on=ON
|
wsrep_provider=/usr/lib/galera/libgalera_smm.so
|
wsrep_provider_options="gmcast.segment=1"
|
wsrep_cluster_address="gcomm://192.168.100.124,192.168.100.132"
|
wsrep_sst_method=rsync
|
|
[client]
|
default-character-set=utf8mb4
|
On each node, the following is executed:
MariaDB [test]> set global system_versioning_alter_history='keep';
|
Node 1:
MariaDB [test]> create table tbl3 (x int) with system versioning;
|
MariaDB [test]> alter table tbl3 add y int;
|
MariaDB [test]> desc tbl3;
|
+-------+---------+------+-----+---------+-------+
|
| Field | Type | Null | Key | Default | Extra |
|
+-------+---------+------+-----+---------+-------+
|
| x | int(11) | YES | | NULL | |
|
| y | int(11) | YES | | NULL | |
|
+-------+---------+------+-----+---------+-------+
|
Node 2:
MariaDB [test]> desc tbl3;
|
+-------+---------+------+-----+---------+-------+
|
| Field | Type | Null | Key | Default | Extra |
|
+-------+---------+------+-----+---------+-------+
|
| x | int(11) | YES | | NULL | |
|
+-------+---------+------+-----+---------+-------+
|
Node 2 Log:
Aug 3 14:44:46 galera-test-2 mysqld[15499]: 2018-08-03 14:44:46 1 [ERROR] Slave SQL: Error 'Not allowed for system-versioned `test`.`tbl3`. Change @@system_versioning_alter_history to proceed with ALTER.' on query. Default database: 'test'. Query: 'alter table tbl3 add z int', Internal MariaDB error code: 4119
|
Because the Galera session is already started, I believe it is inheriting, and thus storing a copy of system_versioning_alter_history='error', so when the global variable is changed while the instance is running, the command execution will still fail for child nodes. Only having the system_versioning_alter_history='keep' in the cnf file followed by a cluster (rolling) restart seems to work properly.
MDEV-14767 mentioned almost this exact scenario and was marked as fixed, however the issue still persists, though I think that was for master > slave replication, not Galera replication.
If it is indeed an issue with the Galera session's instance of this particular variable, it might be wise to just force it to 'keep' always for that particular session. if it is set to 'error' globally or for a user session, the alter table statement will fail before even being propagated to other galera nodes, so there shoulnd't be any conflicts going on.
Please provide the table structure, example of the ALTER statement which is not replicated, error logs from the node which performs the ALTER and the one that should receive the change bug doesn't, and configuration files for both. I can't reproduce it on a basic example:
node 1
| Variable_name | Value |
| wsrep_cluster_address | gcomm:// |
| wsrep_cluster_name | my_wsrep_cluster |
node 2
| Variable_name | Value |
| wsrep_cluster_address | gcomm://127.0.0.1:4567?gmcast.listen_addr=tcp://127.0.0.1:4566 |
| wsrep_cluster_name | my_wsrep_cluster |
node 1
Records: 0 Duplicates: 0 Warnings: 0
node 2
Records: 0 Duplicates: 0 Warnings: 0
node 1
Records: 2 Duplicates: 0 Warnings: 0
Query OK, 1 row affected (0.12 sec)
node 2
| a | b |
| 2 | 2 |
node 1
Records: 2 Duplicates: 0 Warnings: 0
node 2