[MDEV-24956] ALTER TABLE not replicated with Galera in MariaDB 10.5.9 Created: 2021-02-23 Updated: 2021-09-30 Resolved: 2021-04-05 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera |
| Affects Version/s: | 10.5.9 |
| Fix Version/s: | 10.4.19, 10.5.10 |
| Type: | Bug | Priority: | Major |
| Reporter: | Petr Šťastný | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | galera | ||
| Environment: |
Debian Buster (LXC containers in Ubuntu 20.10), MariaDB from official repo (10.5.9+maria~buster) |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
After upgrading 3-node cluster from 10.5.8 to 10.5.9, I noticed that ALTER TABLE statements are not replicated. Change is visible only on initiator. No log message anywhere. Later when I try to write a row in that table, other node fails:
I tried many different ALTER TABLE statements, without success. For example, CREATE TABLE works. All DML commands are also ok. After downgrading back to 10.5.8, everything works. |
| Comments |
| Comment by Oli Sennhauser [ 2021-03-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
CANNOT reproduce: CREATE TABLE `test` ( Would be nice to have more details? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Petr Šťastný [ 2021-03-29 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So I decided to upgrade another set of my servers that form a Galera cluster. Unfortunately I ran into the same problem. I spent few hours trying to change configuration to see what can be causing this. It seems that I am the only one who faces this problem? Let's dig into details. I use MariaDB 10.5, always 3 node in Galera cluster, running Debian 10. The problem is that ALTER TABLE commands are not propagated to other nodes. For example, I have the following table:
Let's print the columns:
I issue the following command on one of the nodes:
On that node I see that column definition has changed. But it didn't on other 2 nodes. There is no error message in log file (I event turned wsrep_debug on). It doesn't matter on which node I execute the command or in which table I do it, it always turns out the same way. Tested with the following version from official MariaDB repository: 10.5.9+maria~buster. Whenever I downgrade back to 10.5.8+maria~buster, everything works fine. A also tried to have different versions on the nodes. The problem arises when the node where ALTER command is initiated has 10.5.9 version. When the initiator is 10.5.8 and others are 10.5.9, there is no problem. This is my my.cnf (symlink to mariadb.cnf):
Note that I am using ZFS file system. There are few InnoDB optimizations for that. I tried to disable them (thougth they are causing it), but it did not help. If this still can't help you to find the bug (something that has changed between 10.5.8 and 10.5.9), I can try to deploy new cluster with the same problem and give you a SSH key. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elton M. Labajo [ 2021-03-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, We are experiencing the same issue with 10.5.9-6-MariaDB-enterprise-log MariaDB Enterprise Server. The setup is two galera clusters replicating each other via 3rd node (node c on each cluster). *ALTER TABLE * on cluster A works and gets replicated to Cluster B node c (the replication slave connected to Cluster A), however it doesn't propagate to Cluster B node a and node b. So basically, any *ALTER TABLE * statements are not propagated to the Cluster B node a and node b. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2021-03-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, I can't repeat the problem:
One strange thing on your test case is the fact that 'CREATE TABLE' has 'VARCHAR(140)' but in 'SHOW COLUMNS' it is 'VARCHAR(250)' but this does not really change anything. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Petr Šťastný [ 2021-03-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So do you want me to set up a cluster with my exact configuration with that problem and give you a SSH access? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2021-03-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I would like repeatable test case, that would be lot more easier. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Petr Šťastný [ 2021-03-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Unfortunately I don't think this case is so easy. It seems that the problem depends on maybe on server configuration or maybe other enviroment settings. In my case, on my 2 clusters, any ALTER TABLE command does not work. There is no one special test case that fails and other that is ok. It just don't work at all. I can try to do any change on any column in any table in any database (some testing or real-life) - in all cases, table definition change is not propagated (at the same time, DML is ok). The case I presented here was just a very simple example, not exact situation when it happens and not in other situations. Do you think there is a change between 10.5.8 and 10.5.9 that could inlfluence this so that I can focus on it and try to change configuration settings? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2021-03-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Unfortunately, I do not know any change that could cause the issue. To further diagnose this please provide full error log from all nodes preferable with --wsrep-debug=1 but only in case when you hit the issue. If you hit the issue, did you use SHOW WARNINGS ? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Petr Šťastný [ 2021-03-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I believe that I have reproducible test case for you. After long investigation I realized that I may have made few mistakes during previous investigations. But the bug is still here a I am able to easy reproduce it using the following steps. The bug is reproducible using persistent connections and mysql_change_user() API call, I used PHP for that. Create table, for example:
Now on Node A, run the following PHP code:
On node A, check table structure:
Do the same thing on Node B:
As you can see, the second ALTER TABLE command has not propagated to other nodes! And now use the same PHP script, but remove persistent connection ("p:" prefix in host) and everything works! Let me explain what and why I did. After a lot of research I figured out that I can 100% reproduce the error using persistent connection. I don't know why. When PHP disconnects and connects persistent connection, it uses mysql_change_user() API call, which should clean up session. I am using PHP because there is no option to call mysql_change_user() API in mysql client on command line. So I used PHP for this simple test case. No warnings are returned. It looks like user session is in some strange state after mysql_change_user() which results that DDL is not propagated in Galera cluster. But surprisingly DML works good. And I repeat again - when I downgrade back to 10.5.8, the problem disappears. So it does not look like there is something wrong in PHP in this test case, but in MariaDB server. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elton M. Labajo [ 2021-03-31 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, to illustrate on reproducing the bug I created a video. please click here I hope this will show much more clearer presentation. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mario Karuza (Inactive) [ 2021-03-31 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Added galera_3nodes test based on Elton's video, tested on 10.4.19(dbg) |