Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31605

cmake/stack_direction.c does not work correctly on clang 16

Details

    Description

      10.6 has failing tests when built with clang 16.

      Attachments

        Issue Links

          Activity

            rucha174 Rucha Deodhar added a comment -

            Sergei fixed it ( as informed by sanja )

            rucha174 Rucha Deodhar added a comment - Sergei fixed it ( as informed by sanja )

            Which commit is supposed to fix this? For me, the following tests would fail when compiling the code with clang-16 for AMD64:

            10.6 cb364a78d6be7938a77ff8774a95035b8f703c65

            CURRENT_TEST: main.execution_constants
            mysqltest: At line 75: query '$query_head 0 $query_tail' failed with wrong errno <Unknown> (2013): 'Lost connection to server during query', instead of  (0)...
            CURRENT_TEST: main.sp_notembedded
            mysqltest: At line 191: query 'call bug10100p(255, @var)' failed with wrong errno <Unknown> (2013): 'Lost connection to server during query', instead of ER_STACK_OVERRUN_NEED_MORE (1436)...
            CURRENT_TEST: main.sp-error
            mysqltest: At line 2445: query 'call p1(1)' failed with wrong errno <Unknown> (2013): 'Lost connection to server during query', instead of  (0)...
            CURRENT_TEST: main.json_debug_nonembedded
            mysqltest: At line 16: query 'SELECT * from JSON_TABLE('[{"a": 1, "b": [11,111]}, {"a": 2, "b": [22,222]}]', '$[*]' COLUMNS( a INT PATH '$.a')) as tt' succeeded - should have failed with error ER_STACK_OVERRUN_NEED_MORE (1436)...
            

            All these look like a failure to detect a stack overrun.

            marko Marko Mäkelä added a comment - Which commit is supposed to fix this? For me, the following tests would fail when compiling the code with clang-16 for AMD64: 10.6 cb364a78d6be7938a77ff8774a95035b8f703c65 CURRENT_TEST: main.execution_constants mysqltest: At line 75: query '$query_head 0 $query_tail' failed with wrong errno <Unknown> (2013): 'Lost connection to server during query', instead of (0)... CURRENT_TEST: main.sp_notembedded mysqltest: At line 191: query 'call bug10100p(255, @var)' failed with wrong errno <Unknown> (2013): 'Lost connection to server during query', instead of ER_STACK_OVERRUN_NEED_MORE (1436)... CURRENT_TEST: main.sp-error mysqltest: At line 2445: query 'call p1(1)' failed with wrong errno <Unknown> (2013): 'Lost connection to server during query', instead of (0)... CURRENT_TEST: main.json_debug_nonembedded mysqltest: At line 16: query 'SELECT * from JSON_TABLE('[{"a": 1, "b": [11,111]}, {"a": 2, "b": [22,222]}]', '$[*]' COLUMNS( a INT PATH '$.a')) as tt' succeeded - should have failed with error ER_STACK_OVERRUN_NEED_MORE (1436)... All these look like a failure to detect a stack overrun.
            rucha174 Rucha Deodhar added a comment -

            Reopening because the tests are still crashing

            rucha174 Rucha Deodhar added a comment - Reopening because the tests are still crashing

            I should point out that these 4 tests are crashing in CMAKE_BUILD_TYPE=RelWithDebInfo, that is, in a release build. This should be independent of any ALLOCATE_MEM_ON_STACK() debug instrumentation in sql/item_jsonfunc.cc.

            marko Marko Mäkelä added a comment - I should point out that these 4 tests are crashing in CMAKE_BUILD_TYPE=RelWithDebInfo , that is, in a release build. This should be independent of any ALLOCATE_MEM_ON_STACK() debug instrumentation in sql/item_jsonfunc.cc .

            For some reason, available_stack_size() is returning negative values. The following experimental patch fixed the 3 non-debug tests for me, in a RelWithDebInfo build. The test main.json_debug_nonembedded depends on debug instrumentation.

            diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc
            index 655dde315aa..c8ac5f9c502 100644
            --- a/sql/sql_parse.cc
            +++ b/sql/sql_parse.cc
            @@ -7438,20 +7438,21 @@ long max_stack_used;
             */
             
             bool
            -#if defined __GNUC__ && !defined __clang__
             /*
               Do not optimize the function in order to preserve a stack variable creation.
               Otherwise, the variable pointed as "buf" can be removed due to a missing
               usage.
              */
            -__attribute__((optimize("-O0")))
            +#ifdef __clang__
            +__attribute__((optnone))
            +#elif defined __GNUC__
            +__attribute__((optimize(0)))
             #endif
             check_stack_overrun(THD *thd, long margin, uchar *buf __attribute__((unused)))
             {
            -  long stack_used;
               DBUG_ASSERT(thd == current_thd);
            -  if ((stack_used= available_stack_size(thd->thread_stack, &stack_used)) >=
            -      (long) (my_thread_stack_size - margin))
            +  const long stack_used= -available_stack_size(thd->thread_stack, &stack_used);
            +  if (stack_used < 0 || stack_used >= (long) (my_thread_stack_size - margin))
               {
                 thd->is_fatal_error= 1;
                 /*
            

            marko Marko Mäkelä added a comment - For some reason, available_stack_size() is returning negative values. The following experimental patch fixed the 3 non-debug tests for me, in a RelWithDebInfo build. The test main.json_debug_nonembedded depends on debug instrumentation. diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc index 655dde315aa..c8ac5f9c502 100644 --- a/sql/sql_parse.cc +++ b/sql/sql_parse.cc @@ -7438,20 +7438,21 @@ long max_stack_used; */ bool -#if defined __GNUC__ && !defined __clang__ /* Do not optimize the function in order to preserve a stack variable creation. Otherwise, the variable pointed as "buf" can be removed due to a missing usage. */ -__attribute__((optimize("-O0"))) +#ifdef __clang__ +__attribute__((optnone)) +#elif defined __GNUC__ +__attribute__((optimize(0))) #endif check_stack_overrun(THD *thd, long margin, uchar *buf __attribute__((unused))) { - long stack_used; DBUG_ASSERT(thd == current_thd); - if ((stack_used= available_stack_size(thd->thread_stack, &stack_used)) >= - (long) (my_thread_stack_size - margin)) + const long stack_used= -available_stack_size(thd->thread_stack, &stack_used); + if (stack_used < 0 || stack_used >= (long) (my_thread_stack_size - margin)) { thd->is_fatal_error= 1; /*

            On an ASAN instrumented -O3 debug build using GCC 13, I observed a bug in the debug instrumentation:

            10.5 055f2e308bf1dd4026df65093a1166526ac35d9b

            CURRENT_TEST: main.json_debug_nonembedded
            mysqltest: At line 17: query 'SELECT JSON_CONTAINS(@json1, @json2)' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 1436...
            …
            AddressSanitizer:DEADLYSIGNAL
            =================================================================
            ==969262==ERROR: AddressSanitizer: SEGV on unknown address 0x7f3c95699000 (pc 0x7f3ca0770f0a bp 0x7f3c96a95e10 sp 0x7f3c953cdf38 T5)
            ==969262==The signal is caused by a WRITE memory access.
                #0 0x7f3ca0770f0a in __memset_avx2_unaligned_erms ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:328
                #1 0x55ab7b559aac in check_contains /mariadb/10.5/sql/item_jsonfunc.cc:1174
                #2 0x55ab7b5665bb in Item_func_json_contains::val_int() /mariadb/10.5/sql/item_jsonfunc.cc:1374
            

            According to man 2 alloca, the behaviour is undefined if the buffer returned by alloca() exceeds the bounds of the stack. And that is exactly what can happen here:

            static int check_contains(json_engine_t *js, json_engine_t *value)
            {
              json_engine_t loc_js;
              bool set_js;
              DBUG_EXECUTE_IF("json_check_min_stack_requirement",
                              {
                                long arbitrary_var;
                                long stack_used_up= (available_stack_size(current_thd->thread_stack, &arbitrary_var));
                                ALLOCATE_MEM_ON_STACK(my_thread_stack_size-stack_used_up-STACK_MIN_SIZE);
                              });
              if (check_stack_overrun(current_thd, STACK_MIN_SIZE , NULL))
                return 1;
            

            The ALLOCATE_MEM_ON_STACK macro invokes alloca() and then bzero() and my_checksum() on the buffer. Here, the SIGSEGV occurred during the bzero() call.

            I think that in order to have correctly working the debug instrumentation, we should do the following:

            1. Declare the function attribute optimize(0) (GCC) or optnone (clang).
            2. Replace ALLOCATE_MEM_ON_STACK with a simple alloca(). Because optimizations for the function are disabled, the alloca() should not be optimized away. In this way, we will not have to engage in undefined behaviour (accessing the potentially overrunning buffer) to outsmart compiler optimizations.
            3. Revise the check_stack_overrun() so that it will work correctly when the code is compiled with clang version 16.
            marko Marko Mäkelä added a comment - On an ASAN instrumented -O3 debug build using GCC 13, I observed a bug in the debug instrumentation: 10.5 055f2e308bf1dd4026df65093a1166526ac35d9b CURRENT_TEST: main.json_debug_nonembedded mysqltest: At line 17: query 'SELECT JSON_CONTAINS(@json1, @json2)' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 1436... … AddressSanitizer:DEADLYSIGNAL ================================================================= ==969262==ERROR: AddressSanitizer: SEGV on unknown address 0x7f3c95699000 (pc 0x7f3ca0770f0a bp 0x7f3c96a95e10 sp 0x7f3c953cdf38 T5) ==969262==The signal is caused by a WRITE memory access. #0 0x7f3ca0770f0a in __memset_avx2_unaligned_erms ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:328 #1 0x55ab7b559aac in check_contains /mariadb/10.5/sql/item_jsonfunc.cc:1174 #2 0x55ab7b5665bb in Item_func_json_contains::val_int() /mariadb/10.5/sql/item_jsonfunc.cc:1374 According to man 2 alloca , the behaviour is undefined if the buffer returned by alloca() exceeds the bounds of the stack. And that is exactly what can happen here: static int check_contains(json_engine_t *js, json_engine_t *value) { json_engine_t loc_js; bool set_js; DBUG_EXECUTE_IF( "json_check_min_stack_requirement" , { long arbitrary_var; long stack_used_up= (available_stack_size(current_thd->thread_stack, &arbitrary_var)); ALLOCATE_MEM_ON_STACK(my_thread_stack_size-stack_used_up-STACK_MIN_SIZE); }); if (check_stack_overrun(current_thd, STACK_MIN_SIZE , NULL)) return 1; The ALLOCATE_MEM_ON_STACK macro invokes alloca() and then bzero() and my_checksum() on the buffer. Here, the SIGSEGV occurred during the bzero() call. I think that in order to have correctly working the debug instrumentation, we should do the following: Declare the function attribute optimize(0) (GCC) or optnone (clang). Replace ALLOCATE_MEM_ON_STACK with a simple alloca() . Because optimizations for the function are disabled, the alloca() should not be optimized away. In this way, we will not have to engage in undefined behaviour (accessing the potentially overrunning buffer) to outsmart compiler optimizations. Revise the check_stack_overrun() so that it will work correctly when the code is compiled with clang version 16.

            The issue seems to be that under clang version 16 targeting AMD64, configure.cmake would wrongly set STACK_DIRECTION to 1 instead of -1 and therefore effectively disable check_stack_overrun(). https://stackoverflow.com/questions/6419304/c-program-to-find-direction-of-stack-growth suggests that such checks are futile and we should perhaps simply just apply the logic that debian/rules is already using for cross-compiling:

            # Cross building requires stack direction instruction
            ifneq ($(DEB_BUILD_ARCH),$(DEB_HOST_ARCH))
                ifneq (,$(filter $(DEB_HOST_ARCH_CPU),alpha amd64 arm arm64 i386 ia64 m68k mips64el mipsel powerpc ppc64 ppc64el riscv64 s390x sh4 sparc64))
                    CMAKEFLAGS += -DSTACK_DIRECTION=-1
                endif
                ifneq (,$(filter $(DEB_HOST_ARCH_CPU),hppa))
                    CMAKEFLAGS += -DSTACK_DIRECTION=1
                endif
            endif
            

            All the affected tests would pass on AMD64:

            10.6 b0a43818b4ffda3dd9b353e222a24dd0b5d1513e Debug with correct STACK_DIRECTION=-1

            main.json_debug_nonembedded              w3 [ pass ]      1
            main.sp-error                            w2 [ pass ]    203
            main.execution_constants                 w1 [ pass ]    187
            main.sp_notembedded                      w4 [ pass ]    208
            

            10.6 b0a43818b4ffda3dd9b353e222a24dd0b5d1513e RelWithDebInfo with correct STACK_DIRECTION=-1

            main.json_debug_nonembedded              [ skipped ]  Requires debug build
            main.sp-error                            w1 [ pass ]    188
            main.execution_constants                 w3 [ pass ]    169
            main.sp_notembedded                      w4 [ pass ]    184
            

            There does not seem to be any absolute need to touch the DBUG_EXECUTE_IF("json_check_min_stack_requirement", …) in sql/json_table.cc or sql/item_jsonfunc.cc, although in my opinion they are rather useless because they basically duplicate the logic of check_stack_overrun(). I would suggest something simpler and less obfuscated, like this:

              DBUG_EXECUTE_IF("json_check_min_stack_requirement", return 1;);
              if (check_stack_overrun(current_thd, STACK_MIN_SIZE , NULL))
                return 1;
            

            The following patch (which I also posted earlier) does not appear to be necessary for the stack overflow check to work:

            diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc
            index abcd690ba8a..fdbc82e9ad9 100644
            --- a/sql/sql_parse.cc
            +++ b/sql/sql_parse.cc
            @@ -7444,15 +7444,17 @@ long max_stack_used;
               - Passing to check_stack_overrun() prevents the compiler from removing it.
             */
             
            -bool
            -#if defined __GNUC__ && !defined __clang__
             /*
               Do not optimize the function in order to preserve a stack variable creation.
               Otherwise, the variable pointed as "buf" can be removed due to a missing
               usage.
              */
            +#ifdef __clang__
            +__attribute__((optnone))
            +#elif defined __GNUC__
             __attribute__((optimize("-O0")))
             #endif
            +bool
             check_stack_overrun(THD *thd, long margin, uchar *buf __attribute__((unused)))
             {
               long stack_used;
            

            marko Marko Mäkelä added a comment - The issue seems to be that under clang version 16 targeting AMD64, configure.cmake would wrongly set STACK_DIRECTION to 1 instead of -1 and therefore effectively disable check_stack_overrun() . https://stackoverflow.com/questions/6419304/c-program-to-find-direction-of-stack-growth suggests that such checks are futile and we should perhaps simply just apply the logic that debian/rules is already using for cross-compiling: # Cross building requires stack direction instruction ifneq ($(DEB_BUILD_ARCH),$(DEB_HOST_ARCH)) ifneq (,$(filter $(DEB_HOST_ARCH_CPU),alpha amd64 arm arm64 i386 ia64 m68k mips64el mipsel powerpc ppc64 ppc64el riscv64 s390x sh4 sparc64)) CMAKEFLAGS += -DSTACK_DIRECTION=-1 endif ifneq (,$(filter $(DEB_HOST_ARCH_CPU),hppa)) CMAKEFLAGS += -DSTACK_DIRECTION=1 endif endif All the affected tests would pass on AMD64: 10.6 b0a43818b4ffda3dd9b353e222a24dd0b5d1513e Debug with correct STACK_DIRECTION=-1 main.json_debug_nonembedded w3 [ pass ] 1 main.sp-error w2 [ pass ] 203 main.execution_constants w1 [ pass ] 187 main.sp_notembedded w4 [ pass ] 208 10.6 b0a43818b4ffda3dd9b353e222a24dd0b5d1513e RelWithDebInfo with correct STACK_DIRECTION=-1 main.json_debug_nonembedded [ skipped ] Requires debug build main.sp-error w1 [ pass ] 188 main.execution_constants w3 [ pass ] 169 main.sp_notembedded w4 [ pass ] 184 There does not seem to be any absolute need to touch the DBUG_EXECUTE_IF("json_check_min_stack_requirement", …) in sql/json_table.cc or sql/item_jsonfunc.cc , although in my opinion they are rather useless because they basically duplicate the logic of check_stack_overrun() . I would suggest something simpler and less obfuscated, like this: DBUG_EXECUTE_IF( "json_check_min_stack_requirement" , return 1;); if (check_stack_overrun(current_thd, STACK_MIN_SIZE , NULL)) return 1; The following patch (which I also posted earlier) does not appear to be necessary for the stack overflow check to work: diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc index abcd690ba8a..fdbc82e9ad9 100644 --- a/sql/sql_parse.cc +++ b/sql/sql_parse.cc @@ -7444,15 +7444,17 @@ long max_stack_used; - Passing to check_stack_overrun() prevents the compiler from removing it. */ -bool -#if defined __GNUC__ && !defined __clang__ /* Do not optimize the function in order to preserve a stack variable creation. Otherwise, the variable pointed as "buf" can be removed due to a missing usage. */ +#ifdef __clang__ +__attribute__((optnone)) +#elif defined __GNUC__ __attribute__((optimize("-O0"))) #endif +bool check_stack_overrun(THD *thd, long margin, uchar *buf __attribute__((unused))) { long stack_used;

            I believe that cmake/stack_direction.c invokes undefined behaviour and therefore it may not work.

            The stack direction is a property of the ISA. On the large set of architectures that is supported by Debian GNU/Linux, only on HP PA-RISC the stack grows upwards, according to this 10.5 change by otto.

            I think that the simplest fix of this is to implement similar logic in CMake and remove the problematic runtime check.

            marko Marko Mäkelä added a comment - I believe that cmake/stack_direction.c invokes undefined behaviour and therefore it may not work. The stack direction is a property of the ISA. On the large set of architectures that is supported by Debian GNU/Linux, only on HP PA-RISC the stack grows upwards, according to this 10.5 change by otto . I think that the simplest fix of this is to implement similar logic in CMake and remove the problematic runtime check.

            People

              marko Marko Mäkelä
              rucha174 Rucha Deodhar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.