Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28430

lf_alloc isn't safe on aarch64 (or ppc64le)

Details

    Description

      Since 2020-08-24 unit.lf test frequently fails in buildbot on aarch64, and a few times on ppc64le.

      This is occurring after the attempted fix in MDEV-27088. The unit.lf test now runs sufficient iterations to frequently catch out its faulty implementation.

      An example of a stalled test:

      gdb of lf-t stalled on aarch64 from 10.9-43fa8e0b8f3bae1ff8493cfd3adb39443da6a809

      (gdb) directory /source
      Source directories searched: /source:$cdir:$cwd
      (gdb) thread apply all bt -frame-arguments all full
       
      Thread 2 (Thread 0xffff23fff120 (LWP 7523) "lf-t"):
      #0  lf_pinbox_real_free (pins=0xffff980017d8) at /home/mdborg/mariadb-server-10.8/mysys/lf_alloc-pin.c:376
              a = 0xffff23ffe3d8
              b = 0xffff23ffe3f0
              c = 0xffff23ffe3e0
              cur = 0xffff34001518
              npins = 25
              list = 0xffff34001518
              addr = 0xffff23ffe390
              first = 0xfffff74f8aaf
              last = 0xffff34001518
              var = <optimized out>
              stack_ends_here = <optimized out>
              pinbox = 0xaaaac84743f8 <lf_allocator>
      #1  0x0000aaaac811963c in lf_pinbox_free (pins=pins@entry=0xffff980017d8, addr=addr@entry=0xffff68001ea8) at /home/mdborg/mariadb-server-10.8/mysys/lf_alloc-pin.c:271
      No locals.
      #2  0x0000aaaac8116874 in test_lf_alloc (arg=<optimized out>) at /home/mdborg/mariadb-server-10.8/unittest/mysys/lf-t.c:90
              node1 = 0xffff68001ea8
              node2 = 0xffff28001db8
              m = 9409
              x = <optimized out>
              y = 0
              pins = 0xffff980017d8
      #3  0x0000ffff9fdad5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
              ret = <optimized out>
              pd = 0x0
              out = <optimized out>
              unwind_buf = {cancel_jmp_buf = {{jmp_buf = {281471285719328, 281474830928560, 281474830928558, 8448352, 281474830928559, 0, 281471277268992, 8448352, 281473365200928, 281471277268992, 281471285717056, 4514424196649599986, 0, 4514424198325567406, 0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
              not_first_call = 0
      --Type <RET> for more, q to quit, c to continue without paging--
      #4  0x0000ffff9fe15d1c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79
      No locals.
       
      Thread 1 (Thread 0xffff9ff25020 (LWP 7491) "lf-t"):
      #0  __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=7523, futex_word=0xffff23fff1f0) at ./nptl/futex-internal.c:57
              _x3tmp = 0
              _x0tmp = 281471285719536
              _x0 = 281471285719536
              _x3 = 0
              _x4tmp = 0
              _x1tmp = 265
              _x1 = 265
              _x4 = 0
              _x5tmp = 4294967295
              _x2tmp = 7523
              _x2 = 7523
              _x5 = 4294967295
              _x8 = 98
              _sys_result = <optimized out>
              sc_cancel_oldtype = 0
              sc_ret = <optimized out>
              _sys_result = <optimized out>
              _x5tmp = <optimized out>
              _x4tmp = <optimized out>
              _x3tmp = <optimized out>
              _x2tmp = <optimized out>
              _x1tmp = <optimized out>
              _x0tmp = <optimized out>
              _x0 = <optimized out>
              _x1 = <optimized out>
              _x2 = <optimized out>
              _x3 = <optimized out>
      --Type <RET> for more, q to quit, c to continue without paging--c
              _x4 = <optimized out>
              _x5 = <optimized out>
              _x8 = <optimized out>
      #1  __futex_abstimed_wait_common (cancel=true, private=128, abstime=0x0, clockid=0, expected=7523, futex_word=0xffff23fff1f0) at ./nptl/futex-internal.c:87
              err = <optimized out>
              clockbit = 256
              op = 265
              err = <optimized out>
              clockbit = <optimized out>
              op = <optimized out>
      #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0xffff23fff1f0, expected=7523, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128) at ./nptl/futex-internal.c:139
      No locals.
      #3  0x0000ffff9fdaef2c in __pthread_clockjoin_ex (threadid=281471285719328, thread_return=thread_return@entry=0x0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, block=block@entry=true) at ./nptl/pthread_join_common.c:105
              ret = <optimized out>
              _buffer = {__routine = 0xffff9fdaedb0 <cleanup>, __arg = 0xffff23fff548, __canceltype = -1611481088, __prev = 0x0}
              tid = <optimized out>
              pd = 0xffff23fff120
              self = <optimized out>
              result = 0
              pd_result = <optimized out>
      #4  0x0000ffff9fdaedb0 in ___pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at ./nptl/pthread_join.c:24
      No locals.
      #5  0x0000aaaac8116c54 in test_concurrently (test=test@entry=0xaaaac8157e10 "lf_alloc (with my_thread_init)", handler=handler@entry=0xaaaac8116800 <test_lf_alloc>, n=n@entry=30, m=<optimized out>, m@entry=30000) at /home/mdborg/mariadb-server-10.8/unittest/mysys/thr_template.c:46
              threads = 0xaaaae03e27a0
              i = 1
              now = 3568672611079524
      #6  0x0000aaaac8116da0 in do_tests () at /home/mdborg/mariadb-server-10.8/unittest/mysys/lf-t.c:188
      No locals.
      #7  0x0000aaaac81165ec in main (argc=<optimized out>, argv=<optimized out>) at /home/mdborg/mariadb-server-10.8/unittest/mysys/thr_template.c:67
      No locals.
      (gdb) p *lf_allocator
      Structure has no component named operator*.
      (gdb) p lf_allocator
      $1 = {pinbox = {pinarray = {level = {0xffff98000d10, 0x0, 0x0, 0x0}, size_of_element = 184}, free_func = 0xaaaac81193e4 <alloc_free>, free_func_arg = 0xaaaac84743f8 <lf_allocator>, 
          free_ptr_offset = 0, pinstack_top_ver = 2006515725, pins_in_array = 30}, top = 0xffff48004cc8 "x\033", element_size = 8, mallocs = 0, constructor = 0x0, destructor = 0x0}
      (gdb) p lf_hash
      $2 = {array = {level = {0x0, 0x0, 0x0, 0x0}, size_of_element = 8}, alloc = {pinbox = {pinarray = {level = {0x0, 0x0, 0x0, 0x0}, size_of_element = 184}, 
            free_func = 0xaaaac81193e4 <alloc_free>, free_func_arg = 0xaaaac8474358 <lf_hash+40>, free_ptr_offset = 8, pinstack_top_ver = 0, pins_in_array = 0}, top = 0x0, element_size = 36, 
          mallocs = 0, constructor = 0x0, destructor = 0x0}, get_key = 0x0, initializer = 0xaaaac8119c24 <default_initializer(LF_HASH*, void*, void const*)>, 
        hash_function = 0xaaaac8119bf0 <calc_hash(CHARSET_INFO*, uchar const*, size_t)>, charset = 0xaaaac83ff888 <my_charset_bin>, key_offset = 0, key_length = 4, element_size = 4, flags = 1, 
        size = 1, count = 0}
      

      mbeck, svoj, if you have a moment/interest, can you please check the implementation again.

      Attachments

        Issue Links

          Activity

            xiaoniu Xiaotong Niu added a comment -

            Thanks Daniel, we have reproduce it, and we are investigating this issue.

            xiaoniu Xiaotong Niu added a comment - Thanks Daniel, we have reproduce it, and we are investigating this issue.
            xiaoniu Xiaotong Niu added a comment - - edited

            During the debugging process, we simulate time delays in the lf_alloc_new function, then an new error was detected, this error can be detected very quickly on arm, and there is no such problem on x86. Posted here for discussion.

            Code to simulate delays:

            diff --git a/mysys/lf_alloc-pin.c b/mysys/lf_alloc-pin.c
            index 6d80b381..101965c3 100644
            --- a/mysys/lf_alloc-pin.c
            +++ b/mysys/lf_alloc-pin.c
            @@ -501,6 +501,8 @@ void *lf_alloc_new(LF_PINS *pins)
                 do
                 {
                   node= allocator->top;
            +      static volatile int vvv;
            +      for (int i = 0; i < 33; ++i) ++vvv;
                  lf_pin(pins, 0, node);
                 } while (node != allocator->top && LF_BACKOFF());
                 if (!node)
            

            Then an error occurred, and the detail gdb information is in report_gdb.txt .
            It contains the following key information, please note that "node = 0xffff00000000":

            Thread 59 (Thread 0xffff9affd1e0 (LWP 3589006) "lf-t"):
            #0  0x0000aaaaaaac7674 in lf_alloc_new (pins=pins@entry=0xfffff0002088) at /home/nxt/bugfix/lf_new_node_delay/mariadb-server/mysys/lf_alloc-pin.c:519
                    allocator = 0xaaaaaae2bcf8 <lf_allocator>
                    *node = 0xffff00000000 <error: Cannot access memory at address 0xffff00000000>*
            #1  0x0000aaaaaaac4818 in test_lf_alloc (arg=<optimized out>) at /home/nxt/bugfix/lf_new_node_delay/mariadb-server/unittest/mysys/lf-t.c:82
                    node1 = <optimized out>
                    node2 = <optimized out>
                    m = 15000
                    x = <optimized out>
                    y = 0
                    pins = 0xfffff0002088
            #2  0x0000fffff7f89624 in start_thread (arg=0xaaaaaaac47c8 <test_lf_alloc>) at pthread_create.c:477
                    ret = <optimized out>
                    pd = <optimized out>
                    unwind_buf = {cancel_jmp_buf = {{jmp_buf = {281473282201084, 281474976704464, 281474976704462, 281474842152960, 281474976704463, 187649984579528, 281473282201824, 281473282200032, 281474842157056, 281473282200032, 281473282197952, 8158329283288982986, 0, 8158329283742021910, 0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
                    not_first_call = <optimized out>
            #3  0x0000fffff7ee049c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
            No locals.
            
            

            other information:

            Mariadb version:10.4
             
            Cmake option: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DPLUGIN_TOKUDB=NO -DPLUGIN_MROONGA=NO -DPLUGIN_SPIDER=YES -DPLUGIN_OQGRAPH=NO -DPLUGIN_PERFSCHEMA=YES -DPLUGIN_SPHINX=NO
             
            gcc-9 (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
            

            In addition, the #L85 and #L89 in lf-t.c set the next of the node returned by alloc_new() to 0xffff00000000, related code link:
            https://github.com/MariaDB/server/blob/ef7fc586aea1048bc5526192158a8e8e935ddd1a/unittest/mysys/lf-t.c#L85

            xiaoniu Xiaotong Niu added a comment - - edited During the debugging process, we simulate time delays in the lf_alloc_new function, then an new error was detected, this error can be detected very quickly on arm, and there is no such problem on x86. Posted here for discussion. Code to simulate delays: diff --git a/mysys/lf_alloc-pin.c b/mysys/lf_alloc-pin.c index 6d80b381..101965c3 100644 --- a/mysys/lf_alloc-pin.c +++ b/mysys/lf_alloc-pin.c @@ - 501 , 6 + 501 , 8 @@ void *lf_alloc_new(LF_PINS *pins) do { node= allocator->top; + static volatile int vvv; + for ( int i = 0 ; i < 33 ; ++i) ++vvv; lf_pin(pins, 0 , node); } while (node != allocator->top && LF_BACKOFF()); if (!node) Then an error occurred, and the detail gdb information is in report_gdb.txt . It contains the following key information, please note that "node = 0xffff00000000" : Thread 59 (Thread 0xffff9affd1e0 (LWP 3589006 ) "lf-t" ): # 0 0x0000aaaaaaac7674 in lf_alloc_new (pins=pins @entry = 0xfffff0002088 ) at /home/nxt/bugfix/lf_new_node_delay/mariadb-server/mysys/lf_alloc-pin.c: 519 allocator = 0xaaaaaae2bcf8 <lf_allocator> *node = 0xffff00000000 <error: Cannot access memory at address 0xffff00000000 >* # 1 0x0000aaaaaaac4818 in test_lf_alloc (arg=<optimized out>) at /home/nxt/bugfix/lf_new_node_delay/mariadb-server/unittest/mysys/lf-t.c: 82 node1 = <optimized out> node2 = <optimized out> m = 15000 x = <optimized out> y = 0 pins = 0xfffff0002088 # 2 0x0000fffff7f89624 in start_thread (arg= 0xaaaaaaac47c8 <test_lf_alloc>) at pthread_create.c: 477 ret = <optimized out> pd = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = { 281473282201084 , 281474976704464 , 281474976704462 , 281474842152960 , 281474976704463 , 187649984579528 , 281473282201824 , 281473282200032 , 281474842157056 , 281473282200032 , 281473282197952 , 8158329283288982986 , 0 , 8158329283742021910 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 }, mask_was_saved = 0 }}, priv = {pad = { 0x0 , 0x0 , 0x0 , 0x0 }, data = {prev = 0x0 , cleanup = 0x0 , canceltype = 0 }}} not_first_call = <optimized out> # 3 0x0000fffff7ee049c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S: 78 No locals. other information: Mariadb version: 10.4   Cmake option: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DPLUGIN_TOKUDB=NO -DPLUGIN_MROONGA=NO -DPLUGIN_SPIDER=YES -DPLUGIN_OQGRAPH=NO -DPLUGIN_PERFSCHEMA=YES -DPLUGIN_SPHINX=NO   gcc- 9 (Ubuntu 9.4 . 0 -1ubuntu1~ 20.04 . 2 ) 9.4 . 0 In addition, the #L85 and #L89 in lf-t.c set the next of the node returned by alloc_new() to 0xffff00000000 , related code link: https://github.com/MariaDB/server/blob/ef7fc586aea1048bc5526192158a8e8e935ddd1a/unittest/mysys/lf-t.c#L85
            xiaoniu Xiaotong Niu added a comment -

            A PR was submitted for this issue, link:
            https://github.com/MariaDB/server/pull/2804

            xiaoniu Xiaotong Niu added a comment - A PR was submitted for this issue, link: https://github.com/MariaDB/server/pull/2804

            xiaoniu, thank you, great work. I think that in we should also consider refactoring that code further, converting it to C++11 std::atomic, and replacing some excessive use of std::memory_order_seq_cst when possible.

            marko Marko Mäkelä added a comment - xiaoniu , thank you, great work. I think that in we should also consider refactoring that code further, converting it to C++11 std::atomic , and replacing some excessive use of std::memory_order_seq_cst when possible.

            serg, I just confirmed that the generated AMD64 code when applying the fix should be equivalent. This is in line with what https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html says: on typical Total Store Order implementations, there will be a penalty for stores, not loads.

            marko Marko Mäkelä added a comment - serg , I just confirmed that the generated AMD64 code when applying the fix should be equivalent. This is in line with what https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html says: on typical Total Store Order implementations, there will be a penalty for stores, not loads.

            People

              marko Marko Mäkelä
              danblack Daniel Black
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.