[MDEV-31325] Can not run db patches using flyway when use the latest 10.6.13 lts MariaDB version with galera 4 Created: 2023-05-22  Updated: 2023-07-21  Resolved: 2023-06-05

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.6.13
Fix Version/s: 10.4.30, 10.5.21, 10.6.14, 10.9.7, 10.10.5, 10.11.4

Type: Bug Priority: Major
Reporter: GeorgeKon Assignee: Sergei Golubchik
Resolution: Fixed Votes: 3
Labels: Flyway, Galera, MariaDB-server, regression
Environment:

OS: Three Almalinux 9 servers running MariaDB 10.6.13 version with galera-4-26.4.14-1


Issue Links:
Blocks
Duplicate
is duplicated by MDEV-30967 sst mariabackup broken on rocksdb - c... Closed
is duplicated by MDEV-31388 Error in PROCEDURE in v.10.6.13 with ... Closed
Problem/Incident
is caused by MDEV-30473 Do not allow GET_LOCK() / RELEASE_LOC... Closed

 Description   

After upgrading our Almalinux 9 Mariadb servers to the latest lts version ( 10.6.13 ) we got the following error when we apply a simple alter table patch:

ERROR: Unable to release MySQL named lock: Flyway--1574291248
java.sql.SQLSyntaxErrorException: (conn=279023) This version of MariaDB doesn't yet support 'RELEASE_LOCK in cluster (WSREP_ON=ON)'
  at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.createException(ExceptionFactory.java:62)
  at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.create(ExceptionFactory.java:153)
  at org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(MariaDbStatement.java:274)
  at org.mariadb.jdbc.ClientSidePreparedStatement.executeInternal(ClientSidePreparedStatement.java:229)
  at org.mariadb.jdbc.ClientSidePreparedStatement.execute(ClientSidePreparedStatement.java:149)
  at org.flywaydb.core.internal.jdbc.JdbcTemplate.execute(JdbcTemplate.java:182)
  at org.flywaydb.database.mysql.MySQLNamedLockTemplate.execute(MySQLNamedLockTemplate.java:71)
  at org.flywaydb.database.mysql.MySQLConnection.lock(MySQLConnection.java:152)
  at org.flywaydb.core.internal.schemahistory.JdbcTableSchemaHistory.lock(JdbcTableSchemaHistory.java:143)
  at org.flywaydb.core.internal.command.DbMigrate.migrateAll(DbMigrate.java:141)
  at org.flywaydb.core.internal.command.DbMigrate.migrate(DbMigrate.java:98)
  at org.flywaydb.core.Flyway.lambda$migrate$0(Flyway.java:176)
  at org.flywaydb.core.FlywayExecutor.execute(FlywayExecutor.java:204)
  at org.flywaydb.core.Flyway.migrate(Flyway.java:128)
  at org.flywaydb.commandline.Main.executeOperation(Main.java:275)
  at org.flywaydb.commandline.Main.main(Main.java:183)
Caused by: org.mariadb.jdbc.internal.util.exceptions.MariaDbSqlException: This version of MariaDB doesn't yet support 'RELEASE_LOCK in cluster (WSREP_ON=ON)'
  at org.mariadb.jdbc.internal.util.exceptions.MariaDbSqlException.of(MariaDbSqlException.java:34)
  at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.exceptionWithQuery(AbstractQueryProtocol.java:192)
  at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.exceptionWithQuery(AbstractQueryProtocol.java:175)
  at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:319)
  at org.mariadb.jdbc.ClientSidePreparedStatement.executeInternal(ClientSidePreparedStatement.java:220)
  ... 12 more
Caused by: java.sql.SQLException: This version of MariaDB doesn't yet support 'RELEASE_LOCK in cluster (WSREP_ON=ON)'
  at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readErrorPacket(AbstractQueryProtocol.java:1681)
  at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1543)
  at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:1506)
  at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:316)
  ... 13 more
ERROR: Unable to acquire MySQL named lock: Flyway--1574291248
------------------------------------------------------
SQL State  : 42000
Error Code : 1235
Message    : (conn=279023) This version of MariaDB doesn't yet support 'GET_LOCK in cluster (WSREP_ON=ON)'
Caused by: java.sql.SQLSyntaxErrorException: (conn=279023) This version of MariaDB doesn't yet support 'GET_LOCK in cluster (WSREP_ON=ON)'
Caused by: org.mariadb.jdbc.internal.util.exceptions.MariaDbSqlException: This version of MariaDB doesn't yet support 'GET_LOCK in cluster (WSREP_ON=ON)'
Caused by: java.sql.SQLException: This version of MariaDB doesn't yet support 'GET_LOCK in cluster (WSREP_ON=ON)'

When we downgrade to the previous lts version ( 10.6.12 ) everthing works fine again.
Downgrading to the previous minor version ( dnf downgrade MariaDB-client MariaDB-common MariaDB-server MariaDB-shared ). Only MariaDB packages have changed not galera.

Also the problem occured when we used Flyway `9.2.2` version. Then we upgrade the `flyway` version to the latest one (9.18.0) but still the same error message. We got nothing from the /var/lib/mysql/mysqld.log .

Below you can see the sql queries (create, insert, alter, rename, drop) that we want to migrate using flyway and all of them failed with the same above error on MariaDB version `10.6.13` but work as expected on `10.6.12` version

# cat V26_1__insert_debug.sql
INSERT INTO `failed_thingie` (`failed_id`) VALUES
(1),
(2),
(3),
(4),
(5);
 
 
# cat V26_2__alter_debug.sql
ALTER TABLE
    `failed_thingie`
ADD COLUMN
    `failed_version` VARCHAR(10) NOT NULL DEFAULT '10.6.13'
AFTER
    `date`;
 
# cat V26_3__rename_debug.sql
rename table failed_thingie to failed_mariadb_version
 
 
# cat V26_4__drop_debug.sql
DROP TABLE IF EXISTS `failed_mariadb_version`;

Last, when we run the above sql queries manually from a MariaDB server they all worked so the problem seems to be when those patches are applied from flyway.



 Comments   
Comment by Daniel Black [ 2023-05-23 ]

FYI Flyway has included a fix which hasn't made it to a release yet.

I see the original implemenation was there to stop metadata and migration race conditions. Like the comment, MariaDB has supported "IF NOT EXISTS" for a while now (10.2/10.3?).

So sysprg, does LOCK TABLE[S] still cause crashes on Galera?

Comment by Petr Šťastný [ 2023-05-30 ]

Why you suddenly changed GET_LOCK() behavior in patch version? Do you realize that you just broke many applications by triggering fatal errors when calling GET_LOCK()? I understand if you do it in new major/minor version, but why this breaking change in patch version 10.6.12 -> 10.6.13?

Comment by Romain Lapoux [ 2023-06-02 ]

I can confirm that this has an impact on our backup system and on Matomo (https://github.com/matomo-org/matomo/issues/20752).

Comment by Mike Andrews [ 2023-06-02 ]

This change also broke MediaWiki. Edits to any page results in the new "GET_LOCK in cluster" error. I assume Wikipedia doesn't run on Galera, but I suspect I'm not the only private install that does.

Also Request Tracker (https://bestpractical.com/request-tracker) broke in the same way. I've let them know in their forums in case they can work around it in a future release.

Echoing the sentiment that this was kind of a big change for a minor point release, with unintended consequences.

Comment by Mike Andrews [ 2023-06-04 ]

...that said, I kinda get the idea behind the change (just maybe not the timing) so maybe it should be a configurable option, to give app developers more time to work around it?

Comment by Sergei Golubchik [ 2023-06-04 ]

We consider reverting it for now and then will think of a less intrusive fix

Comment by Jan Lindström [ 2023-06-15 ]

GET_LOCK is not recommended on a cluster see https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/ if you really need to use it you could do SET SESSSION wsrep_on = OFF. I did not know that some tools would use this function on a cluster.

Comment by Petr Šťastný [ 2023-06-15 ]

I know how GET_LOCK() works and what are its limitations in Galera cluster. But still I use it and I have good use case for that. I use it for simple synchronisation between multiple processes, and since I know that all these processes are connected to the same server at the same time, it works. And I'm also glad that it doesn't require any cluster-wide synchronization or locking, which makes it very fast.

Many tools just don't know that users are going to run them on a cluster. But they still work correctly in a cluster for a similar reason as above - all clients are connected to one server, the other servers perform synchronous replication and are "stand-by" in case the first one fails.

So why disable GET_LOCK()? It has its use case even with Galera.

Comment by Moises Garcia [ 2023-06-15 ]

I agree with Petr, I use it in the same situation as him, so GET_LOCK() it's really useful even used with Galera.

Generated at Thu Feb 08 10:22:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.