[MDEV-29883] Deadlock between InnoDB statistics update and BLOB insert Created: 2022-10-26 Updated: 2023-03-02 Resolved: 2022-10-26 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11 |
| Fix Version/s: | 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2, 10.11.1, 10.3.38, 10.4.28, 10.5.19 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | hang, upstream | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
Today, I was lucky and got the test innodb.innodb-wl5522-debug hanging once. The deadlock would seem to be between these two threads:
This was with some patches, removing the flags BTR_LATCH_FOR_INSERT and BTR_MODIFY_EXTERNAL. I attempted to run the test 300 more times, but the hang was not reproduced. Thread 8 (writing a BLOB) is holding the clustered index U-latch and the tablespace U-latch or X-latch. Thread 10 (updating persistent statistics) in dict_stats_analyze_index() had executed the following code:
That is, it is holding an index S-latch (which does not conflict with the U latch that the BLOB insert is holding), the clustered index root page latch, and waiting for an exclusive latch on the tablespace. It was served an additional portion of luck, and the following hung on the first try:
After killall -ABRT mariadbd I got a nice trace of the hang between an INSERT of a BLOB and an update of statistics. The BLOB write had acquired the tablespace latch before trying to acquire any further page latches:
Hence, the culprit for invalid latching order must be dict_stats_analyze_index(). I think that I saw some waits for fil_space_t::latch in some core dumps for |
| Comments |
| Comment by Marko Mäkelä [ 2022-10-26 ] | |||||||||||||||||||||||||||||||||||||||
|
Before
In
That call was moved from fseg_n_reserved_pages() to btr_get_size() in Before | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-10-26 ] | |||||||||||||||||||||||||||||||||||||||
|
This hang may have been introduced by WL#6326 in MySQL 5.7 and MariaDB Server 10.2.2. Before that change, the shared index latch that had to be held upon calling btr_get_size() would have conflicted with an exclusive latch that had to be held by BLOB operations. When the new dict_index_t::lock mode (SX a.k.a. Update in | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-10-26 ] | |||||||||||||||||||||||||||||||||||||||
|
It looks like the hangs should be possible also with the InnoDB non-persistent statistics. The fix for older versions than 10.6 would be a little different. | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-11-09 ] | |||||||||||||||||||||||||||||||||||||||
|
In the fix for older release series than 10.6, I retained the exclusive fil_space_t::latch acquisition. Only the dict_index_t::lock mode was escalated to exclusive. |