Details
Description
10.6 has failing tests when built with clang 16.
Attachments
Issue Links
- relates to
-
MDEV-33209 Stack overflow in main.json_debug_nonembedded due to incorrect debug injection
-
- Closed
-
-
MDEV-31151 Inaccurate stack size caculation caused stack overflow in pinbox allocator
-
- Closed
-
-
MDEV-34099 AddressSanitizer running out of memory regardless of stack_thread size
-
- Closed
-
Activity
Which commit is supposed to fix this? For me, the following tests would fail when compiling the code with clang-16 for AMD64:
10.6 cb364a78d6be7938a77ff8774a95035b8f703c65 |
CURRENT_TEST: main.execution_constants
|
mysqltest: At line 75: query '$query_head 0 $query_tail' failed with wrong errno <Unknown> (2013): 'Lost connection to server during query', instead of (0)...
|
CURRENT_TEST: main.sp_notembedded
|
mysqltest: At line 191: query 'call bug10100p(255, @var)' failed with wrong errno <Unknown> (2013): 'Lost connection to server during query', instead of ER_STACK_OVERRUN_NEED_MORE (1436)...
|
CURRENT_TEST: main.sp-error
|
mysqltest: At line 2445: query 'call p1(1)' failed with wrong errno <Unknown> (2013): 'Lost connection to server during query', instead of (0)...
|
CURRENT_TEST: main.json_debug_nonembedded
|
mysqltest: At line 16: query 'SELECT * from JSON_TABLE('[{"a": 1, "b": [11,111]}, {"a": 2, "b": [22,222]}]', '$[*]' COLUMNS( a INT PATH '$.a')) as tt' succeeded - should have failed with error ER_STACK_OVERRUN_NEED_MORE (1436)...
|
All these look like a failure to detect a stack overrun.
I should point out that these 4 tests are crashing in CMAKE_BUILD_TYPE=RelWithDebInfo, that is, in a release build. This should be independent of any ALLOCATE_MEM_ON_STACK() debug instrumentation in sql/item_jsonfunc.cc.
For some reason, available_stack_size() is returning negative values. The following experimental patch fixed the 3 non-debug tests for me, in a RelWithDebInfo build. The test main.json_debug_nonembedded depends on debug instrumentation.
diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc
|
index 655dde315aa..c8ac5f9c502 100644
|
--- a/sql/sql_parse.cc
|
+++ b/sql/sql_parse.cc
|
@@ -7438,20 +7438,21 @@ long max_stack_used;
|
*/
|
|
bool
|
-#if defined __GNUC__ && !defined __clang__
|
/*
|
Do not optimize the function in order to preserve a stack variable creation.
|
Otherwise, the variable pointed as "buf" can be removed due to a missing
|
usage.
|
*/
|
-__attribute__((optimize("-O0")))
|
+#ifdef __clang__
|
+__attribute__((optnone))
|
+#elif defined __GNUC__
|
+__attribute__((optimize(0)))
|
#endif
|
check_stack_overrun(THD *thd, long margin, uchar *buf __attribute__((unused)))
|
{
|
- long stack_used;
|
DBUG_ASSERT(thd == current_thd);
|
- if ((stack_used= available_stack_size(thd->thread_stack, &stack_used)) >=
|
- (long) (my_thread_stack_size - margin))
|
+ const long stack_used= -available_stack_size(thd->thread_stack, &stack_used);
|
+ if (stack_used < 0 || stack_used >= (long) (my_thread_stack_size - margin))
|
{
|
thd->is_fatal_error= 1;
|
/* |
On an ASAN instrumented -O3 debug build using GCC 13, I observed a bug in the debug instrumentation:
10.5 055f2e308bf1dd4026df65093a1166526ac35d9b |
CURRENT_TEST: main.json_debug_nonembedded
|
mysqltest: At line 17: query 'SELECT JSON_CONTAINS(@json1, @json2)' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 1436...
|
…
|
AddressSanitizer:DEADLYSIGNAL
|
=================================================================
|
==969262==ERROR: AddressSanitizer: SEGV on unknown address 0x7f3c95699000 (pc 0x7f3ca0770f0a bp 0x7f3c96a95e10 sp 0x7f3c953cdf38 T5)
|
==969262==The signal is caused by a WRITE memory access.
|
#0 0x7f3ca0770f0a in __memset_avx2_unaligned_erms ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:328
|
#1 0x55ab7b559aac in check_contains /mariadb/10.5/sql/item_jsonfunc.cc:1174
|
#2 0x55ab7b5665bb in Item_func_json_contains::val_int() /mariadb/10.5/sql/item_jsonfunc.cc:1374
|
According to man 2 alloca, the behaviour is undefined if the buffer returned by alloca() exceeds the bounds of the stack. And that is exactly what can happen here:
static int check_contains(json_engine_t *js, json_engine_t *value) |
{
|
json_engine_t loc_js;
|
bool set_js; |
DBUG_EXECUTE_IF("json_check_min_stack_requirement", |
{
|
long arbitrary_var; |
long stack_used_up= (available_stack_size(current_thd->thread_stack, &arbitrary_var)); |
ALLOCATE_MEM_ON_STACK(my_thread_stack_size-stack_used_up-STACK_MIN_SIZE);
|
});
|
if (check_stack_overrun(current_thd, STACK_MIN_SIZE , NULL)) |
return 1; |
The ALLOCATE_MEM_ON_STACK macro invokes alloca() and then bzero() and my_checksum() on the buffer. Here, the SIGSEGV occurred during the bzero() call.
I think that in order to have correctly working the debug instrumentation, we should do the following:
- Declare the function attribute optimize(0) (GCC) or optnone (clang).
- Replace ALLOCATE_MEM_ON_STACK with a simple alloca(). Because optimizations for the function are disabled, the alloca() should not be optimized away. In this way, we will not have to engage in undefined behaviour (accessing the potentially overrunning buffer) to outsmart compiler optimizations.
- Revise the check_stack_overrun() so that it will work correctly when the code is compiled with clang version 16.
The issue seems to be that under clang version 16 targeting AMD64, configure.cmake would wrongly set STACK_DIRECTION to 1 instead of -1 and therefore effectively disable check_stack_overrun(). https://stackoverflow.com/questions/6419304/c-program-to-find-direction-of-stack-growth suggests that such checks are futile and we should perhaps simply just apply the logic that debian/rules is already using for cross-compiling:
# Cross building requires stack direction instruction
|
ifneq ($(DEB_BUILD_ARCH),$(DEB_HOST_ARCH))
|
ifneq (,$(filter $(DEB_HOST_ARCH_CPU),alpha amd64 arm arm64 i386 ia64 m68k mips64el mipsel powerpc ppc64 ppc64el riscv64 s390x sh4 sparc64))
|
CMAKEFLAGS += -DSTACK_DIRECTION=-1
|
endif
|
ifneq (,$(filter $(DEB_HOST_ARCH_CPU),hppa))
|
CMAKEFLAGS += -DSTACK_DIRECTION=1
|
endif
|
endif
|
All the affected tests would pass on AMD64:
10.6 b0a43818b4ffda3dd9b353e222a24dd0b5d1513e Debug with correct STACK_DIRECTION=-1 |
main.json_debug_nonembedded w3 [ pass ] 1
|
main.sp-error w2 [ pass ] 203
|
main.execution_constants w1 [ pass ] 187
|
main.sp_notembedded w4 [ pass ] 208
|
10.6 b0a43818b4ffda3dd9b353e222a24dd0b5d1513e RelWithDebInfo with correct STACK_DIRECTION=-1 |
main.json_debug_nonembedded [ skipped ] Requires debug build
|
main.sp-error w1 [ pass ] 188
|
main.execution_constants w3 [ pass ] 169
|
main.sp_notembedded w4 [ pass ] 184
|
There does not seem to be any absolute need to touch the DBUG_EXECUTE_IF("json_check_min_stack_requirement", …) in sql/json_table.cc or sql/item_jsonfunc.cc, although in my opinion they are rather useless because they basically duplicate the logic of check_stack_overrun(). I would suggest something simpler and less obfuscated, like this:
DBUG_EXECUTE_IF("json_check_min_stack_requirement", return 1;); |
if (check_stack_overrun(current_thd, STACK_MIN_SIZE , NULL)) |
return 1; |
The following patch (which I also posted earlier) does not appear to be necessary for the stack overflow check to work:
diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc
|
index abcd690ba8a..fdbc82e9ad9 100644
|
--- a/sql/sql_parse.cc
|
+++ b/sql/sql_parse.cc
|
@@ -7444,15 +7444,17 @@ long max_stack_used;
|
- Passing to check_stack_overrun() prevents the compiler from removing it.
|
*/
|
|
-bool
|
-#if defined __GNUC__ && !defined __clang__
|
/*
|
Do not optimize the function in order to preserve a stack variable creation.
|
Otherwise, the variable pointed as "buf" can be removed due to a missing
|
usage.
|
*/
|
+#ifdef __clang__
|
+__attribute__((optnone))
|
+#elif defined __GNUC__
|
__attribute__((optimize("-O0")))
|
#endif
|
+bool
|
check_stack_overrun(THD *thd, long margin, uchar *buf __attribute__((unused)))
|
{
|
long stack_used; |
I believe that cmake/stack_direction.c invokes undefined behaviour and therefore it may not work.
The stack direction is a property of the ISA. On the large set of architectures that is supported by Debian GNU/Linux, only on HP PA-RISC the stack grows upwards, according to this 10.5 change by otto.
I think that the simplest fix of this is to implement similar logic in CMake and remove the problematic runtime check.
Sergei fixed it ( as informed by sanja )