[MDEV-28308] DELETE hangs on one Galera Cluster node (10.2), on others nodes fine Created: 2022-04-13  Updated: 2022-05-17  Resolved: 2022-05-17

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.2.36
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Jon Barca Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None


 Description   

Hello, I have a problem when executing a complex DELETE (with subqueries) of about 200,000 records in a Galera Cluster of 3 nodes (10.2.36 version).

On one of the nodes, the query hangs (it doesn't finish, I have to kill it after several hours) and on the rest nodes it lasts just a couple of minutes.

It's been running every month for a couple of years and had never been longer than 3 minutes, but now it just won't finish.

I've run an analyze on the table, reorganized several indexes, and even run an alter table rebuild, but nothing has changed.

Database is not very big, I have thought about removing that node and adding it again but I would like to know the reason for the problem.

Any advice?

Thanks in advance,
Jon.



 Comments   
Comment by Daniel Black [ 2022-04-13 ]

There isn't anything that can be done with so few details. There are 7 more releases in the 10.2 series some of which contain optimizer fixes and improvements.

In a few weeks the last 10.2 release will happen. Without a very concrete simplified example of the problem you are facing, this is unlikely to be fixed. Even with details its unlikely there is time to fix it in 10.2.

I recommend:

  • Does the EXPLAIN DELETE ... contain the same query plan on all nodes?
  • look at SHOW ENGINE INNODB STATUS does it show an increasing amount of locks held for the delete query?
  • While the query hasn't changed, maybe the data has, and therefore the previous indexes used on the tables are no-longer applicable to the current unchanged query. Look closely at the query and examine if better indexing can be used.
  • Start planning for a major version upgrade e.g. 10.6 and test your system on that.

If you still have problems on an ideally indexed tables system with a well written DELETE query on a later version that 10.2 please include details in this or another bug.

Comment by Jon Barca [ 2022-04-18 ]

Hi Daniel, thanks for the tips.

The EXPLAIN showed the same execution plan on all nodes and the SHOW ENGINE INNODB STATUS did not show any differences either when it lasted a few minutes or when it did not finish. I suspected there was a physical issue with this table so I reorganized it, but that didn't work either.
I really didn't know where else to find out.

We are studying migration to MariaDB 10.4 on CentOS 7.x, which is supported until June 2024.
By the way, there is MariaDB support for the AlmaLlinux, RockyLinux or OracleLinux distros. In "Current Package Platforms" ( https://mariadb.com/kb/en/deprecation-policy/ ) neither of these is mentioned.

Thanks and regards,
Jon.

Comment by Daniel Black [ 2022-04-19 ]

We're still doing RHEL8 releases. At the moment we are doing CI on Rocky Linux using the RHEL8 packages. I don't believe a Alma Linux/Rocky Linux/ Oracle Linux commitment has been made yet.

There is also EXPLAIN FOR <threadid>.

Generated at Thu Feb 08 09:59:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.