[MDEV-23303] deadlocks after upgrade from 10.4.12 to 10.5.4 Created: 2020-07-27 Updated: 2020-09-14 Resolved: 2020-09-14 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Locking |
| Affects Version/s: | 10.5.4 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Aleksandr Omelchuk | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | need_feedback, upgrade | ||
| Environment: |
CentOS Linux release 7.4.1708 |
||
| Attachments: |
|
| Description |
|
A couple of weeks ago we have upgraded our MariaDB + Galera (3 nodes) cluster to the last stable version - 10.5.4. In our configuration, all writes go to 1 node only. And we started facing deadlocks (gap locking) issue on that node. Nothing was changed from the DB side except upgrade and nothing was changed from the application side. Could you please advise on this and please let me know what additional information will you require for further investigation? Thank you. |
| Comments |
| Comment by Jan Lindström (Inactive) [ 2020-07-28 ] | ||||||||||||||||||
|
You could start sharing your config files and error logs but you must be aware that gap locks are normal on InnoDB and they are needed for consistency. 10.5 does not anymore support innobase_locks_unsafe_for_binlog so you may need to use READ COMMITTED transaction isolation level instead on some cases or you need to handle deadlocks because of gap locking in your application. | ||||||||||||||||||
| Comment by Aleksandr Omelchuk [ 2020-07-28 ] | ||||||||||||||||||
|
Thank you for the update. As for the gap locks, we are aware of that mechanism and we use READ COMMITTED isolation level in our transactions. (We also tried to set the "READ COMMITTED" for the whole DB but that didn't help). Please take a look. | ||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2020-07-30 ] | ||||||||||||||||||
|
Can you provide show create table `stock`.`stocks` | ||||||||||||||||||
| Comment by Aleksandr Omelchuk [ 2020-07-31 ] | ||||||||||||||||||
|
| ||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2020-07-31 ] | ||||||||||||||||||
|
Table has self-reference foreign key. Any write operation (INSERT, UPDATE, DELETE) on table stocks might require row lock also to rows parent if one exists. For INSERTS we naturally need to make sure that primary key column remains unique. GAP-locks are also necessary and it seems that 10.5 might be stricter on rules at least Galera case. | ||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2020-07-31 ] | ||||||||||||||||||
|
Based on provided information I do not think there is a Bug, just a deadlock situation that needs to be addressed on application logic. | ||||||||||||||||||
| Comment by Aleksandr Omelchuk [ 2020-08-03 ] | ||||||||||||||||||
|
Thank you for the update. Could you please provide me information about that stricter rules for Galera case? Cause I didn't find any information in Change log regarding the new behavior for deadlocks. | ||||||||||||||||||
| Comment by Aleksandr Omelchuk [ 2020-08-06 ] | ||||||||||||||||||
|
We performed a rollback to version 10.4.12 and deadlocks gone. Thank you. | ||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2020-08-07 ] | ||||||||||||||||||
|
With current information it must be innodb_locks_unsafe_for_binlog see https://mariadb.com/kb/en/changes-improvements-in-mariadb-105/ | ||||||||||||||||||
| Comment by Aleksandr Omelchuk [ 2020-08-07 ] | ||||||||||||||||||
|
We have never used that variable. And it is set to 0 by default. Kindly please take a look at this issue. | ||||||||||||||||||
| Comment by Aleksandr Omelchuk [ 2020-08-12 ] | ||||||||||||||||||
|
Is it possible to reopen this ticket? | ||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2020-08-12 ] | ||||||||||||||||||
|
I do not see isolation level setting on your configuration file. Can you provide full unedited error log from all nodes. Do you use only one master? | ||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2020-08-12 ] | ||||||||||||||||||
|
If I correctly understand the first deadlock on provided error log it is caused by fact that two concurrent INSERT-clauses insert a row so that we take gap-lock to primary key row with stock_key 82736280_1. | ||||||||||||||||||
| Comment by Aleksandr Omelchuk [ 2020-08-13 ] | ||||||||||||||||||
|
1. Yes, we have set up MariaDB connector in the sequential mode for writes, so all our writes go to 1st node as default and there we faced with deadlocks. | ||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2020-08-20 ] | ||||||||||||||||||
|
Yes, it does lock same key using gap-lock but that is normal in this kind of self reference. I really do not see anything that would indicate a bug. We do not have any change on locking rules between 10.4 and 10.5. | ||||||||||||||||||
| Comment by Aleksandr Omelchuk [ 2020-08-20 ] | ||||||||||||||||||
|
Understood. But is it normal with any type of isolation level? Thank you. | ||||||||||||||||||
| Comment by Aleksandr Omelchuk [ 2020-08-25 ] | ||||||||||||||||||
|
As from documentation https://mariadb.com/kb/en/innodb-lock-modes:
So we use READ COMMITTED isolation level and as expected we don't face with the gap locking in 10.4. But something goes wrong in 10.5. | ||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2020-08-26 ] | ||||||||||||||||||
|
Can you repeat the problematic deadlock in 10.5 easily? If you can, can you share a repeatable test case. | ||||||||||||||||||
| Comment by Aleksandr Omelchuk [ 2020-08-26 ] | ||||||||||||||||||
|
We have downgraded all our environments to 10.4 and currently, we can't reproduce the issue. | ||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2020-09-14 ] | ||||||||||||||||||
|
Galera had a bug https://jira.mariadb.org/browse/MDEV-23557 that could have effect here also as code used persistent b-tree cursor after mini-transaction commit that could cause page contents to change e.g. split or merge, this naturally would be visible only with high concurrency and lot of foreign key actions. | ||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2020-09-14 ] | ||||||||||||||||||
|
I recommend using more recent version of the server and if this problem is still reproducible, I would need more detailed description how to repeat. |