[MDEV-31088] Server freeze due to innodb_change_buffering and innodb_file_per_table=0 Created: 2023-04-19 Updated: 2023-06-02 Resolved: 2023-06-02 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6.13 |
| Fix Version/s: | 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2 |
| Type: | Bug | Priority: | Major |
| Reporter: | Matthias Leich | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | hang | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Description |
|
|
| Comments |
| Comment by Marko Mäkelä [ 2023-04-19 ] | ||||||||||||||||
|
The hang occurs due to the non-default settings of innodb_change_buffering (
Also Thread 18 is waiting on the system tablespace latch due to a change buffer merge:
The system tablespace latch is being held by Thread 15, which is waiting for the page latch that Thread 18 is holding (0x7fe340cb82e0+0x18 = 0x7fe340cb82f8):
| ||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-01 ] | ||||||||||||||||
|
The call to fseg_page_is_allocated() (before | ||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-01 ] | ||||||||||||||||
|
Potentially this could also cause a hang between DROP INDEX and a concurrent page allocation in an .ibd file, but the probability of that should be much lower. A possible fix seems to be to acquire an exclusive tablespace latch earlier, among other places, in dict_drop_index_tree(). It is somewhat difficult to arrange that, because btr_free_if_exists() involves executing nested mini-transactions, and the fil_space_t::latch is not recursive starting with MariaDB Server 10.6. | ||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-02 ] | ||||||||||||||||
|
I don’t think that we can easily fix this completely. What we can do is the following:
I believe that a server deadlock is possible in any operation that involves the change buffer merge (such as SELECT or CHECK TABLE when buffered changes existed), and a page is being allocated in the same tablespace. The probability of that occurring for tables stored in .ibd files (the default is innodb_file_per_table=1) should be much lower than for tables stored in the system tablespace; I imagine that it might happen if the server had been killed right after a DROP INDEX operation was committed, and right after recovery there were writes to the table that caused page allocation. Note: the change buffer was disabled by default in | ||||||||||||||||
| Comment by Matthias Leich [ 2023-06-02 ] | ||||||||||||||||
|
|