[MDEV-29484] Assertion `!trx_free || !trx->locked_connections' failed in spider_free_trx_conn on LOCK TABLES Created: 2022-09-07 Updated: 2023-10-06 Resolved: 2022-10-04 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Locking, Storage Engine - Spider |
| Affects Version/s: | 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11 |
| Fix Version/s: | 10.5.18, 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2, 10.11.1 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Roel Van de Paar | Assignee: | Nayuta Yanagisawa (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | locking, not-10.4, regression-10.5 | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
Leads to:
Bug confirmed present in: Bug (or feature/syntax) confirmed not present in: |
| Comments |
| Comment by Roel Van de Paar [ 2022-09-07 ] | ||||||||||||||||||||||||||||||||||||||||
|
Please note that SET GLOBAL spider_same_server_link=ON; is not used here. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2022-09-07 ] | ||||||||||||||||||||||||||||||||||||||||
|
Bug was introduced between 12/06/22 and 12/07/22 AEST | ||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2022-09-07 ] | ||||||||||||||||||||||||||||||||||||||||
|
Execution at the CLI may not work, try:
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-07 ] | ||||||||||||||||||||||||||||||||||||||||
|
The failing assertion is newly added by the following commit: https://github.com/MariaDB/server/commit/a26700cca579926cddf9a48c45f13b32785746bb | ||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-14 ] | ||||||||||||||||||||||||||||||||||||||||
|
I can reproduce the bug with ./bin/mysql -A -uroot -S./socket.sock --force --binary-mode test < ./in.sql > ./mysql.out 2>&1 but not with CLI or MTR. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-14 ] | ||||||||||||||||||||||||||||||||||||||||
|
trx->locked_connections is incremented (0 -> 1) by the first LOCK TABLE statement. Then, at the second failing LOCK TABLE statement, the variable is not decremented and the corresponding lock is not released.
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-14 ] | ||||||||||||||||||||||||||||||||||||||||
|
If I understand correctly, Spider is designed so that it doesn't release table locks on a backend server by the second LOCK TABLE statement. This seems to be reasonable in some sense because, once the second LOCK TABLE is transmitted to the backend server, the locks by the first LOCK TABLE is automatically released. So, the assertion introduced by a26700c doesn't hold in some cases. I expect a much simpler test case, which just locks a Spider table and then shutdowns the server or something, can result in the same assertion error (while I've not yet found such test case). | ||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-15 ] | ||||||||||||||||||||||||||||||||||||||||
|
The following test case hangs at dropping auto_test_remote when it is run by MTR. This happens on 10.5.16 and 10.4 HEAD which do not have a26700c. So, it is likely to be an old problem.
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-15 ] | ||||||||||||||||||||||||||||||||||||||||
|
OK. Now, I can reproduce the bug by MTR
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2022-09-28 ] | ||||||||||||||||||||||||||||||||||||||||
|
Additional similar testcase
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-29 ] | ||||||||||||||||||||||||||||||||||||||||
|
The following assertion in spider_free_trx_conn() has been added by my recent commit. It holds in many cases, but it doesn't hold for the case when the connection to a remote data node is disconnected.
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-29 ] | ||||||||||||||||||||||||||||||||||||||||
|
However, even though I remove the assertion, the test case result in a hang. According to the debug trace, the hang seems to occur at Query_cache::lock_and_suspend().
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-29 ] | ||||||||||||||||||||||||||||||||||||||||
|
OK. The hang has been introduced by https://github.com/MariaDB/server/commit/a26700cca579926cddf9a48c45f13b32785746bb. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-29 ] | ||||||||||||||||||||||||||||||||||||||||
|
Please review: https://github.com/MariaDB/server/commit/1ae6d188c161cf07d0c389d5705b913b9504fc44 | ||||||||||||||||||||||||||||||||||||||||
| Comment by Alexey Botchkov [ 2022-10-04 ] | ||||||||||||||||||||||||||||||||||||||||
|
ok to push. |