Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.6, 10.11, 11.4, 11.8, 12.0
Description
--source include/have_innodb.inc
|
|
CREATE OR REPLACE TABLE t (a INT) ENGINE=INNODB; |
SELECT @@innodb_io_capacity; |
SELECt @@innodb_io_capacity_max; |
SET GLOBAL innodb_io_capacity=18446744073709551615; |
SET GLOBAL innodb_max_dirty_pages_pct=1; |
CREATE OR REPLACE TABLE t (a INT) ENGINE=INNODB; |
CREATE OR REPLACE TABLE t (a INT) ENGINE=INNODB; |
CREATE OR REPLACE TABLE t (a INT) ENGINE=INNODB; |
CREATE OR REPLACE TABLE t (a INT) ENGINE=INNODB; |
CREATE OR REPLACE TABLE t (a INT) ENGINE=INNODB; |
CREATE OR REPLACE TABLE t (a INT) ENGINE=INNODB; |
Leads to:
CS 11.8.1 6f1161aa34cbb178b00fc24cbc46e2e0e2af767a (Optimized, UBASAN, Clang) Build 24/02/2025 |
/test/11.8_opt_san/storage/innobase/buf/buf0flu.cc:2302:19: runtime error: 1.85291e+19 is outside the range of representable values of type 'unsigned long'
|
#0 0x557d0b95b4b3 in page_cleaner_flush_pages_recommendation(unsigned long, unsigned long, double, unsigned long, double) /test/11.8_opt_san/storage/innobase/buf/buf0flu.cc:2302:19
|
#1 0x557d0b95b4b3 in buf_flush_page_cleaner() /test/11.8_opt_san/storage/innobase/buf/buf0flu.cc:2619:18
|
#2 0x14ffc48ecdb3 in execute_native_thread_routine /build/gcc-14-ig5ci0/gcc-14-14.2.0/build/x86_64-linux-gnu/libstdc++-v3/src/c++11/../../../../../src/libstdc++-v3/src/c++11/thread.cc:104:18
|
#3 0x557d0930695c in asan_thread_start(void*) asan_interceptors.cpp.o
|
#4 0x14ffc449caa3 in start_thread nptl/pthread_create.c:447:8
|
#5 0x14ffc4529c3b in clone3 misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
|
|
SUMMARY: UndefinedBehaviorSanitizer: float-cast-overflow /test/11.8_opt_san/storage/innobase/buf/buf0flu.cc:2302:19
|
Setup:
Compiled with a recent version of Clang (I used Clang 18.1.3) with LLVM 18. Ubuntu instructions:
|
# Note: It is strongly recommended to uninstall all old Clang & LLVM packages (ref dpkg --list | grep -iE 'clang|llvm' and use apt purge and dpkg --purge to remove the packages), before following these steps
|
# Note: llvm-17-linker-tools installs /usr/lib/llvm-17/lib/LLVMgold.so, which is needed for compilation, and LLVMgold.so is no longer included in LLVM 18
|
sudo apt install clang llvm-18 llvm-18-linker-tools llvm-18-runtime llvm-18-tools llvm-18-dev libstdc++-14-dev llvm-dev llvm-17-linker-tools
|
sudo ln -s /usr/lib/llvm-17/lib/LLVMgold.so /usr/lib/llvm-18/lib/LLVMgold.so
|
Compiled with: "-DCMAKE_C_COMPILER=/usr/bin/clang -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C{,XX}_FLAGS='-march=native -mtune=native'" and:
|
-DWITH_ASAN=ON -DWITH_ASAN_SCOPE=ON -DWITH_UBSAN=ON -DWSREP_LIB_WITH_ASAN=ON
|
Set before execution:
|
export UBSAN_OPTIONS=print_stacktrace=1:report_error_type=1 # And you may also want to supress UBSAN startup issues using 'suppressions=UBSAN.filter' in UBSAN_OPTIONS. For an example of UBSAN.filter, which includes current startup issues see: https://github.com/mariadb-corporation/mariadb-qa/blob/master/UBSAN.filter
|
Bug confirmed present in:
MariaDB: 11.4.6 (opt), 11.8.1 (opt), 12.0.0 (opt)
Bug (or feature/syntax) confirmed not present in:
MariaDB: 10.11.12 (dbg), 10.11.12 (opt), 12.0.0 (dbg)
Attachments
Issue Links
- relates to
-
MDEV-24369 Page cleaner sleeps indefinitely despite innodb_max_dirty_pages_pct_lwm being exceeded
-
- Closed
-
I don’t see why this would only affect 11.4 and later versions. The logic has not been changed since MariaDB Server 10.6.
GDB won’t cooperate when a floating-point value is stored in a SIMD register, but I could imagine that the value of dirty_pct could be as follows in another debugging session where I repeated this:
(gdb) p buf_pool.LRU.count
$1 = 343
(gdb) p buf_pool.free.count
$2 = 7721
(gdb) p buf_pool.flush_list.count
$3 = 81
(gdb) p 81*100.0/(343+7721)
$4 = 1.0044642857142858
In page_cleaner_flush_pages_recommendation() there is some logic that looks questionable:
prev_time = curr_time;
prev_lsn = cur_lsn;
dirty_pct /= max_pct;
}
We have max_pct=1, and dividing by 1 is a no-op. So, it looks like we could try to multiply ~0ULL by something larger than 1, and clearly the result would not fit in 64 bits. In the Description, we seem to have had dirty_pct=1.00446, which is close to what I observed in this debugging session.
I think that we should not only fix the limits of dirty_pct, but also change the type of innodb_io_capacity and related parameters to be INT UNSIGNED (always 32 bits). With the default innodb_page_size=16k and innodb_doublewrite=ON, the maximum write rate that can be represented by a 32-bit unsigned integer would be 2^(32+14+1) = 2⁴⁷ bytes per second, or 128 TiB/s. It will take a while before we reach such write speeds. I believe that a 16-bit innodb_io_capacity would be insufficient already today: 2^(16+14+1) = 2 GiB/s is on the slow side even for consumer-grade NVMe SSDs.