Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.2.35, 10.2.36, 10.3.26, 10.3.27, 10.4.16, 10.4.17, 10.5.7, 10.5.8
-
Tested on CentOS 7, CentOS 8, MariaDB 10.4.16, 10.4.17, 10.5.8
Description
MDEV-23991 reduced ANALYZE TABLE/background analyze lock scope. In doing so btr_get_size(index, BTR_N_LEAF_PAGES, &mtr) was stored temporarily into result.n_leaf_pages instead of index->stat_n_leaf_pages to avoid needing lock.
But the following compare is still using index->stat_n_leaf_pages to determine whether a full table scan is necessary. This variable is neither protected by a lock, nor calculated correctly, reading as 1 no matter how many leaf pages the index has.
This causes an unnecessary full scan of the table, locking the index for write access. At least when a replication thread attempts to write into a larger table, 600 second semaphore wait triggers server crash for coredump. Because the table analysis does not complete, automated table analysis will be re-triggered after crash recovery, causing an endless crash loop.
The fix appears to be using result.n_leaf_pages instead of index->stat_n_leaf_pages in the comparison for whether sampling whole table has been requested, as it is local to the running thread and holds the value used previous to the patch.
if (root_level == 0
N_SAMPLE_PAGES(index) * n_uniq > result.n_leaf_pages) { |
---|
Attachments
Issue Links
- duplicates
-
MDEV-24504 [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
- Closed
- is caused by
-
MDEV-23991 dict_table_stats_lock() has unnecessarily long scope
- Closed
- is duplicated by
-
MDEV-24266 Possible optimizer regression on 10.4.17 with DELETE statements
- Closed
-
MDEV-24438 Primary KEY not used in range lookups
- Closed
-
MDEV-25955 InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
- Closed
- relates to
-
MDEV-24606 InnoDB: Semaphore wait has lasted > 600 second
- Closed
-
MDEV-24869 The replication suddenly stops for N minutes in version after version 10.4.15
- Closed
-
MDEV-25111 Long semaphore wait (> 800 secs), server stops responding
- Closed