[MDEV-23303] deadlocks after upgrade from 10.4.12 to 10.5.4 Created: 2020-07-27  Updated: 2020-09-14  Resolved: 2020-09-14

Status: Closed
Project: MariaDB Server
Component/s: Galera, Locking
Affects Version/s: 10.5.4
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Aleksandr Omelchuk Assignee: Jan Lindström (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: need_feedback, upgrade
Environment:

CentOS Linux release 7.4.1708


Attachments: File galera.cnf     File galera_error.log_2node.gz     File galera_error.log_3node.gz     Text File galera_error_log_part.log     PNG File screenshot-1.png    

 Description   

A couple of weeks ago we have upgraded our MariaDB + Galera (3 nodes) cluster to the last stable version - 10.5.4. In our configuration, all writes go to 1 node only. And we started facing deadlocks (gap locking) issue on that node. Nothing was changed from the DB side except upgrade and nothing was changed from the application side.

Could you please advise on this and please let me know what additional information will you require for further investigation?

Thank you.



 Comments   
Comment by Jan Lindström (Inactive) [ 2020-07-28 ]

You could start sharing your config files and error logs but you must be aware that gap locks are normal on InnoDB and they are needed for consistency. 10.5 does not anymore support innobase_locks_unsafe_for_binlog so you may need to use READ COMMITTED transaction isolation level instead on some cases or you need to handle deadlocks because of gap locking in your application.

Comment by Aleksandr Omelchuk [ 2020-07-28 ]

Thank you for the update.
Attached to the ticket our config file and part from error log before upgrade, during upgrade and with "innodb_print_all_deadlocks" enabled.

As for the gap locks, we are aware of that mechanism and we use READ COMMITTED isolation level in our transactions. (We also tried to set the "READ COMMITTED" for the whole DB but that didn't help).
As I mentioned in the description we used the same configuration from the DB side and application side before and after the upgrade. And only after the upgrade the deadlocks start happening.

Please take a look.
Thank you.

Comment by Jan Lindström (Inactive) [ 2020-07-30 ]

Can you provide show create table `stock`.`stocks`

Comment by Aleksandr Omelchuk [ 2020-07-31 ]

MariaDB [stock]> show create table stocks\G
*************************** 1. row ***************************
       Table: stocks
Create Table: CREATE TABLE `stocks` (
  `stock_key` varchar(100) NOT NULL,
  `last_update` datetime NOT NULL,
  `location_id` varchar(50) NOT NULL,
  `product_id` varchar(50) NOT NULL,
  `stock_data` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL CHECK (json_valid(`stock_data`)),
  `available_for_sale` decimal(15,5) NOT NULL DEFAULT 0.00000,
  `selling_scheme` varchar(15) NOT NULL DEFAULT 'GENERAL',
  `loc_type` varchar(15) DEFAULT NULL,
  `stock_source_key` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`stock_key`),
  KEY `fk_stock_source_key_stock_key` (`stock_source_key`),
  KEY `idx_location_id` (`location_id`),
  CONSTRAINT `fk_stock_source_key_stock_key` FOREIGN KEY (`stock_source_key`) REFERENCES `stocks` (`stock_key`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Comment by Jan Lindström (Inactive) [ 2020-07-31 ]

Table has self-reference foreign key. Any write operation (INSERT, UPDATE, DELETE) on table stocks might require row lock also to rows parent if one exists. For INSERTS we naturally need to make sure that primary key column remains unique. GAP-locks are also necessary and it seems that 10.5 might be stricter on rules at least Galera case.

Comment by Jan Lindström (Inactive) [ 2020-07-31 ]

Based on provided information I do not think there is a Bug, just a deadlock situation that needs to be addressed on application logic.

Comment by Aleksandr Omelchuk [ 2020-08-03 ]

Thank you for the update.
So we will need to rollback our upgrade till we found a solution for our application.

Could you please provide me information about that stricter rules for Galera case? Cause I didn't find any information in Change log regarding the new behavior for deadlocks.

Comment by Aleksandr Omelchuk [ 2020-08-06 ]

We performed a rollback to version 10.4.12 and deadlocks gone.
I agree that deadlocks definitely depend on application logic but the main thing here is - what have been changed in 10.5 version for that case?
Why there is no information in Change Log regarding the new behavior?
Is it possible that there are other undocumented changes in 10.5 version?

Thank you.

Comment by Jan Lindström (Inactive) [ 2020-08-07 ]

With current information it must be innodb_locks_unsafe_for_binlog see https://mariadb.com/kb/en/changes-improvements-in-mariadb-105/

Comment by Aleksandr Omelchuk [ 2020-08-07 ]

We have never used that variable. And it is set to 0 by default.
We always use READ COMMITTED isolation level.
This is not the cause.

Kindly please take a look at this issue.
Thank you.

Comment by Aleksandr Omelchuk [ 2020-08-12 ]

Is it possible to reopen this ticket?

Comment by Jan Lindström (Inactive) [ 2020-08-12 ]

I do not see isolation level setting on your configuration file. Can you provide full unedited error log from all nodes. Do you use only one master?

Comment by Jan Lindström (Inactive) [ 2020-08-12 ]

If I correctly understand the first deadlock on provided error log it is caused by fact that two concurrent INSERT-clauses insert a row so that we take gap-lock to primary key row with stock_key 82736280_1.

Comment by Aleksandr Omelchuk [ 2020-08-13 ]

1. Yes, we have set up MariaDB connector in the sequential mode for writes, so all our writes go to 1st node as default and there we faced with deadlocks.
2. We set isolation level inside the transaction (but once we faced with the issue we tried to set on DB level and that didn't help).
3. Attached the logs from 2 other nodes.
4. As for the first deadlock in error log, there are 2 inserts which was trying to put data with different PKs, but looks like locked with the same key (82736280_1).

Comment by Jan Lindström (Inactive) [ 2020-08-20 ]

Yes, it does lock same key using gap-lock but that is normal in this kind of self reference. I really do not see anything that would indicate a bug. We do not have any change on locking rules between 10.4 and 10.5.

Comment by Aleksandr Omelchuk [ 2020-08-20 ]

Understood. But is it normal with any type of isolation level?
We are not facing with that deadlocks currently on 10.4.12 and 10.4.13 versions.
How can we proceed further with the investigation?

Thank you.

Comment by Aleksandr Omelchuk [ 2020-08-25 ]

As from documentation https://mariadb.com/kb/en/innodb-lock-modes:

Gap locks are disabled if the innodb_locks_unsafe_for_binlog system variable is set, or the isolation level is set to READ COMMITTED.

So we use READ COMMITTED isolation level and as expected we don't face with the gap locking in 10.4. But something goes wrong in 10.5.
Please take a look.

Comment by Jan Lindström (Inactive) [ 2020-08-26 ]

Can you repeat the problematic deadlock in 10.5 easily? If you can, can you share a repeatable test case.

Comment by Aleksandr Omelchuk [ 2020-08-26 ]

We have downgraded all our environments to 10.4 and currently, we can't reproduce the issue.

Comment by Jan Lindström (Inactive) [ 2020-09-14 ]

Galera had a bug https://jira.mariadb.org/browse/MDEV-23557 that could have effect here also as code used persistent b-tree cursor after mini-transaction commit that could cause page contents to change e.g. split or merge, this naturally would be visible only with high concurrency and lot of foreign key actions.

Comment by Jan Lindström (Inactive) [ 2020-09-14 ]

I recommend using more recent version of the server and if this problem is still reproducible, I would need more detailed description how to repeat.

Generated at Thu Feb 08 09:21:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.