Details
Description
I upgraded a mariadb galera cluster from version 10.1.37 to version 10.3.12. The galera version did not change (though the package was rebuilt). Since the update the cluster nodes show different behavior on key usage.
MariaDB [(none)]> SELECT @@hostname;
|
+------------+
|
| @@hostname |
|
+------------+
|
| mariadb01 |
|
+------------+
|
1 row in set (0.000 sec)
|
|
MariaDB [(none)]> CREATE TABLE `test`.`reproduce` (
|
-> `id` INT NOT NULL AUTO_INCREMENT,
|
-> `value1` INT NOT NULL,
|
-> `value2` VARCHAR(255) NOT NULL,
|
-> PRIMARY KEY (`id`),
|
-> KEY `value1` (`value1`),
|
-> KEY `value2` (`value2`));
|
Query OK, 0 rows affected (0.023 sec)
|
|
MariaDB [(none)]> INSERT INTO `test`.`reproduce` VALUES (NULL, 1, "1"),
|
-> (NULL, 1, "2"),
|
-> (NULL, 2, "3"),
|
-> (NULL, 3, "4");
|
Query OK, 4 rows affected (0.008 sec)
|
Records: 4 Duplicates: 0 Warnings: 0
|
|
MariaDB [(none)]> EXPLAIN SELECT * FROM `test`.`reproduce` WHERE `value1` = 1;
|
+------+-------------+-----------+------+---------------+--------+---------+-------+------+-------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-----------+------+---------------+--------+---------+-------+------+-------+
|
| 1 | SIMPLE | reproduce | ref | value1 | value1 | 4 | const | 2 | |
|
+------+-------------+-----------+------+---------------+--------+---------+-------+------+-------+
|
1 row in set (0.000 sec)
|
That looks ok. However another node does not use the key:
MariaDB [(none)]> SELECT @@hostname;
|
+------------+
|
| @@hostname |
|
+------------+
|
| mariadb02 |
|
+------------+
|
1 row in set (0.000 sec)
|
|
MariaDB [(none)]> EXPLAIN SELECT * FROM `test`.`reproduce` WHERE `value1` = 1;
|
+------+-------------+-----------+------+---------------+------+---------+------+------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-----------+------+---------------+------+---------+------+------+-------------+
|
| 1 | SIMPLE | reproduce | ALL | value1 | NULL | NULL | NULL | 1 | Using where |
|
+------+-------------+-----------+------+---------------+------+---------+------+------+-------------+
|
1 row in set (0.000 sec)
|
The host receiving the structural change from client is ok, others fail.
This effects tables with millions of rows, breaking joins that need several seconds or minutes to complete where it finished in some milliseconds before.
This takes down the whole cluster due to massively increased load.