[MDEV-31325] Can not run db patches using flyway when use the latest 10.6.13 lts MariaDB version with galera 4 Created: 2023-05-22 Updated: 2023-07-21 Resolved: 2023-06-05 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera |
| Affects Version/s: | 10.6.13 |
| Fix Version/s: | 10.4.30, 10.5.21, 10.6.14, 10.9.7, 10.10.5, 10.11.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | GeorgeKon | Assignee: | Sergei Golubchik |
| Resolution: | Fixed | Votes: | 3 |
| Labels: | Flyway, Galera, MariaDB-server, regression | ||
| Environment: |
OS: Three Almalinux 9 servers running MariaDB 10.6.13 version with galera-4-26.4.14-1 |
||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
After upgrading our Almalinux 9 Mariadb servers to the latest lts version ( 10.6.13 ) we got the following error when we apply a simple alter table patch:
When we downgrade to the previous lts version ( 10.6.12 ) everthing works fine again. Also the problem occured when we used Flyway `9.2.2` version. Then we upgrade the `flyway` version to the latest one (9.18.0) but still the same error message. We got nothing from the /var/lib/mysql/mysqld.log . Below you can see the sql queries (create, insert, alter, rename, drop) that we want to migrate using flyway and all of them failed with the same above error on MariaDB version `10.6.13` but work as expected on `10.6.12` version
Last, when we run the above sql queries manually from a MariaDB server they all worked so the problem seems to be when those patches are applied from flyway. |
| Comments |
| Comment by Daniel Black [ 2023-05-23 ] |
|
FYI Flyway has included a fix which hasn't made it to a release yet. I see the original implemenation was there to stop metadata and migration race conditions. Like the comment, MariaDB has supported "IF NOT EXISTS" for a while now (10.2/10.3?). So sysprg, does LOCK TABLE[S] still cause crashes on Galera? |
| Comment by Petr Šťastný [ 2023-05-30 ] |
|
Why you suddenly changed GET_LOCK() behavior in patch version? Do you realize that you just broke many applications by triggering fatal errors when calling GET_LOCK()? I understand if you do it in new major/minor version, but why this breaking change in patch version 10.6.12 -> 10.6.13? |
| Comment by Romain Lapoux [ 2023-06-02 ] |
|
I can confirm that this has an impact on our backup system and on Matomo (https://github.com/matomo-org/matomo/issues/20752). |
| Comment by Mike Andrews [ 2023-06-02 ] |
|
This change also broke MediaWiki. Edits to any page results in the new "GET_LOCK in cluster" error. I assume Wikipedia doesn't run on Galera, but I suspect I'm not the only private install that does. Also Request Tracker (https://bestpractical.com/request-tracker) broke in the same way. I've let them know in their forums in case they can work around it in a future release. Echoing the sentiment that this was kind of a big change for a minor point release, with unintended consequences. |
| Comment by Mike Andrews [ 2023-06-04 ] |
|
...that said, I kinda get the idea behind the change (just maybe not the timing) so maybe it should be a configurable option, to give app developers more time to work around it? |
| Comment by Sergei Golubchik [ 2023-06-04 ] |
|
We consider reverting it for now and then will think of a less intrusive fix |
| Comment by Jan Lindström [ 2023-06-15 ] |
|
GET_LOCK is not recommended on a cluster see https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/ if you really need to use it you could do SET SESSSION wsrep_on = OFF. I did not know that some tools would use this function on a cluster. |
| Comment by Petr Šťastný [ 2023-06-15 ] |
|
I know how GET_LOCK() works and what are its limitations in Galera cluster. But still I use it and I have good use case for that. I use it for simple synchronisation between multiple processes, and since I know that all these processes are connected to the same server at the same time, it works. And I'm also glad that it doesn't require any cluster-wide synchronization or locking, which makes it very fast. Many tools just don't know that users are going to run them on a cluster. But they still work correctly in a cluster for a similar reason as above - all clients are connected to one server, the other servers perform synchronous replication and are "stand-by" in case the first one fails. So why disable GET_LOCK()? It has its use case even with Galera. |
| Comment by Moises Garcia [ 2023-06-15 ] |
|
I agree with Petr, I use it in the same situation as him, so GET_LOCK() it's really useful even used with Galera. |