[MDEV-16660] Inadequate DEFAULT_THREAD_STACK size for AddressSanitizer Created: 2018-07-02  Updated: 2024-01-10  Resolved: 2023-11-17

Status: Closed
Project: MariaDB Server
Component/s: Compiling, Parser
Affects Version/s: 10.3
Fix Version/s: 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: ASAN

Issue Links:
Relates
relates to MDEV-33210 Several performance_schema tests fail... Confirmed

 Description   

When the code is compiled with Clang 6.0.3 and cmake -DCMAKE_BUILD_TYPE=Debug -DWITH_ASAN:BOOL=ON, 3 of these 4 tests will cause a server crash instead of reporting a stack overflow error. If -O1 is added to CMAKE_C_FLAGS and CMAKE_CXX_FLAGS, then these tests will pass:

ASAN_OPTIONS=abort_on_error=1,disable_coredump=0,detect_leaks=0 ./mtr --parallel=auto --force --retry=0 --max-test-fail=0 compat/oracle.parser compat/oracle.sp-package compat/oracle.sp-package-mysqldump compat/oracle.sp-package-security

10.3 71144afa966a85d08053eb616a1021fd339102d1

CURRENT_TEST: compat/oracle.sp-package-mysqldump
mysqltest: At line 42: query 'CALL p1' failed: 2013: Lost connection to MySQL server during query
CURRENT_TEST: compat/oracle.sp-package
mysqltest: At line 1470: query 'CALL pack.p1('p2 pack.p3')' failed: 2013: Lost connection to MySQL server during query
CURRENT_TEST: compat/oracle.parser
mysqltest: At line 73: query 'CALL p2('date')' failed: 1436: Thread stack overrun:  240640 bytes used of a 299008 byte stack, and 81920 bytes needed.  Use 'mysqld --thread_stack=#' to specify a bigger stack
CURRENT_TEST: compat/oracle.sp-package-security
mysqltest: At line 233: query 'GRANT EXECUTE ON PACKAGE BODY db1.pkg1 TO u1@localhost' failed: 2013: Lost connection to MySQL server during query



 Comments   
Comment by Marko Mäkelä [ 2023-11-17 ]

By design, AddressSanitizer will allocate some "sentinel" areas in stack frames so that it can better catch buffer overflows, by trapping access to memory addresses that reside between stack-allocated variables.

Apparently, something has been changed in recent compilers, and I am seeing a need for a larger thread stack size when using -DWITH_ASAN=ON with GCC 12.3.0, GCC 13.2.0, or clang 16.0.6. The minimum stack size to pass bootstrap is smaller for non-debug builds, and smaller for GCC 12 than for GCC 13. Here is an example from clang 16.0.6, CMAKE_BUILD_TYPE=RelWithDebInfo and WITH_ASAN=ON:

10.6 44b9e4169412205f2f1d013d3346420aee9d09d5

main.1st                                 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2023-11-17 13:53:43
line
2023-11-17 13:53:42 0 [ERROR] Could not open mysql.plugin table: "Thread stack overrun:  6566560 bytes used of a 5242880 byte stack, and 81920 bytes needed. Consider increasing the thread_stack system variable.". Some plugins may be not loaded
2023-11-17 13:53:42 0 [Warning] Can't open and lock time zone table: Thread stack overrun:  8642784 bytes used of a 5242880 byte stack, and 81920 bytes needed. Consider increasing the thread_stack system variable. trying to live without them
2023-11-17 13:53:42 0 [ERROR] Can't open the mysql.func table. Please run mysql_upgrade to create it.

I don’t think that this is a bug in our actual code or the stack overflow detection, just an issue with the build parameters. The following patch fixes this for me:

diff --git a/include/my_pthread.h b/include/my_pthread.h
index 3e68538b424..31157c9f063 100644
--- a/include/my_pthread.h
+++ b/include/my_pthread.h
@@ -667,15 +667,11 @@ extern void my_mutex_end(void);
   We need to have at least 256K stack to handle calls to myisamchk_init()
   with the current number of keys and key parts.
 */
-#if defined(__SANITIZE_ADDRESS__) || defined(WITH_UBSAN)
-#ifndef DBUG_OFF
-#define DEFAULT_THREAD_STACK	(1024*1024L)
-#else
-#define DEFAULT_THREAD_STACK	(383*1024L) /* 392192 */
-#endif
-#else
-#define DEFAULT_THREAD_STACK	(292*1024L) /* 299008 */
-#endif
+# if defined(__SANITIZE_ADDRESS__) || defined(WITH_UBSAN)
+#  define DEFAULT_THREAD_STACK	(9L<<20)
+# else
+#  define DEFAULT_THREAD_STACK	(292*1024L) /* 299008 */
+# endif
 #endif
 
 #define MY_PTHREAD_LOCK_READ 0

I think that to be on the safe side, we’d better use 10 MiB instead of the above 9 MiB. That is what I have been using in my local builds recently.

Apparently, on our CI systems, the compilers used for ASAN builds are older, because the problem has not occurred there.

Comment by Marko Mäkelä [ 2023-11-30 ]

On 10.5 and GCC 13.2.0 we seem to need 11 MiB of thread stack; 10 MiB is not enough:

10.5 89a5a8d234832ef9ed5ee814e4db42c636fcde1e

ERROR: 1436  Thread stack overrun:  10551136 bytes used of a 10485760 byte stack, and 16000 bytes needed. Consider increasing the thread_stack system variable.

Generated at Thu Feb 08 08:30:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.