[MDEV-28430] lf_alloc isn't safe on aarch64 (or ppc64le) Created: 2022-04-28  Updated: 2024-01-23

Status: In Review
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.1.48, 10.9.0
Fix Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2

Type: Bug Priority: Critical
Reporter: Daniel Black Assignee: Sergei Golubchik
Resolution: Unresolved Votes: 0
Labels: ARMv8, aarch64, hash, lock-free, powerpc, synchronization
Environment:

aarch64 mostly, but small amount of ppc64le test failures too. Never other architectures.


Attachments: Text File report_gdb.txt    
Issue Links:
Relates
relates to MDEV-12897 unit.lf failed in buildbot Closed
relates to MDEV-27088 Server crash on ARM (WMM architecture... Closed
relates to MDEV-31151 Inaccurate stack size caculation caus... Open

 Description   

Since 2020-08-24 unit.lf test frequently fails in buildbot on aarch64, and a few times on ppc64le.

This is occurring after the attempted fix in MDEV-27088. The unit.lf test now runs sufficient iterations to frequently catch out its faulty implementation.

An example of a stalled test:

gdb of lf-t stalled on aarch64 from 10.9-43fa8e0b8f3bae1ff8493cfd3adb39443da6a809

(gdb) directory /source
Source directories searched: /source:$cdir:$cwd
(gdb) thread apply all bt -frame-arguments all full
 
Thread 2 (Thread 0xffff23fff120 (LWP 7523) "lf-t"):
#0  lf_pinbox_real_free (pins=0xffff980017d8) at /home/mdborg/mariadb-server-10.8/mysys/lf_alloc-pin.c:376
        a = 0xffff23ffe3d8
        b = 0xffff23ffe3f0
        c = 0xffff23ffe3e0
        cur = 0xffff34001518
        npins = 25
        list = 0xffff34001518
        addr = 0xffff23ffe390
        first = 0xfffff74f8aaf
        last = 0xffff34001518
        var = <optimized out>
        stack_ends_here = <optimized out>
        pinbox = 0xaaaac84743f8 <lf_allocator>
#1  0x0000aaaac811963c in lf_pinbox_free (pins=pins@entry=0xffff980017d8, addr=addr@entry=0xffff68001ea8) at /home/mdborg/mariadb-server-10.8/mysys/lf_alloc-pin.c:271
No locals.
#2  0x0000aaaac8116874 in test_lf_alloc (arg=<optimized out>) at /home/mdborg/mariadb-server-10.8/unittest/mysys/lf-t.c:90
        node1 = 0xffff68001ea8
        node2 = 0xffff28001db8
        m = 9409
        x = <optimized out>
        y = 0
        pins = 0xffff980017d8
#3  0x0000ffff9fdad5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = 0x0
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {281471285719328, 281474830928560, 281474830928558, 8448352, 281474830928559, 0, 281471277268992, 8448352, 281473365200928, 281471277268992, 281471285717056, 4514424196649599986, 0, 4514424198325567406, 0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
--Type <RET> for more, q to quit, c to continue without paging--
#4  0x0000ffff9fe15d1c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79
No locals.
 
Thread 1 (Thread 0xffff9ff25020 (LWP 7491) "lf-t"):
#0  __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=7523, futex_word=0xffff23fff1f0) at ./nptl/futex-internal.c:57
        _x3tmp = 0
        _x0tmp = 281471285719536
        _x0 = 281471285719536
        _x3 = 0
        _x4tmp = 0
        _x1tmp = 265
        _x1 = 265
        _x4 = 0
        _x5tmp = 4294967295
        _x2tmp = 7523
        _x2 = 7523
        _x5 = 4294967295
        _x8 = 98
        _sys_result = <optimized out>
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
        _sys_result = <optimized out>
        _x5tmp = <optimized out>
        _x4tmp = <optimized out>
        _x3tmp = <optimized out>
        _x2tmp = <optimized out>
        _x1tmp = <optimized out>
        _x0tmp = <optimized out>
        _x0 = <optimized out>
        _x1 = <optimized out>
        _x2 = <optimized out>
        _x3 = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--c
        _x4 = <optimized out>
        _x5 = <optimized out>
        _x8 = <optimized out>
#1  __futex_abstimed_wait_common (cancel=true, private=128, abstime=0x0, clockid=0, expected=7523, futex_word=0xffff23fff1f0) at ./nptl/futex-internal.c:87
        err = <optimized out>
        clockbit = 256
        op = 265
        err = <optimized out>
        clockbit = <optimized out>
        op = <optimized out>
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0xffff23fff1f0, expected=7523, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128) at ./nptl/futex-internal.c:139
No locals.
#3  0x0000ffff9fdaef2c in __pthread_clockjoin_ex (threadid=281471285719328, thread_return=thread_return@entry=0x0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, block=block@entry=true) at ./nptl/pthread_join_common.c:105
        ret = <optimized out>
        _buffer = {__routine = 0xffff9fdaedb0 <cleanup>, __arg = 0xffff23fff548, __canceltype = -1611481088, __prev = 0x0}
        tid = <optimized out>
        pd = 0xffff23fff120
        self = <optimized out>
        result = 0
        pd_result = <optimized out>
#4  0x0000ffff9fdaedb0 in ___pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at ./nptl/pthread_join.c:24
No locals.
#5  0x0000aaaac8116c54 in test_concurrently (test=test@entry=0xaaaac8157e10 "lf_alloc (with my_thread_init)", handler=handler@entry=0xaaaac8116800 <test_lf_alloc>, n=n@entry=30, m=<optimized out>, m@entry=30000) at /home/mdborg/mariadb-server-10.8/unittest/mysys/thr_template.c:46
        threads = 0xaaaae03e27a0
        i = 1
        now = 3568672611079524
#6  0x0000aaaac8116da0 in do_tests () at /home/mdborg/mariadb-server-10.8/unittest/mysys/lf-t.c:188
No locals.
#7  0x0000aaaac81165ec in main (argc=<optimized out>, argv=<optimized out>) at /home/mdborg/mariadb-server-10.8/unittest/mysys/thr_template.c:67
No locals.
(gdb) p *lf_allocator
Structure has no component named operator*.
(gdb) p lf_allocator
$1 = {pinbox = {pinarray = {level = {0xffff98000d10, 0x0, 0x0, 0x0}, size_of_element = 184}, free_func = 0xaaaac81193e4 <alloc_free>, free_func_arg = 0xaaaac84743f8 <lf_allocator>, 
    free_ptr_offset = 0, pinstack_top_ver = 2006515725, pins_in_array = 30}, top = 0xffff48004cc8 "x\033", element_size = 8, mallocs = 0, constructor = 0x0, destructor = 0x0}
(gdb) p lf_hash
$2 = {array = {level = {0x0, 0x0, 0x0, 0x0}, size_of_element = 8}, alloc = {pinbox = {pinarray = {level = {0x0, 0x0, 0x0, 0x0}, size_of_element = 184}, 
      free_func = 0xaaaac81193e4 <alloc_free>, free_func_arg = 0xaaaac8474358 <lf_hash+40>, free_ptr_offset = 8, pinstack_top_ver = 0, pins_in_array = 0}, top = 0x0, element_size = 36, 
    mallocs = 0, constructor = 0x0, destructor = 0x0}, get_key = 0x0, initializer = 0xaaaac8119c24 <default_initializer(LF_HASH*, void*, void const*)>, 
  hash_function = 0xaaaac8119bf0 <calc_hash(CHARSET_INFO*, uchar const*, size_t)>, charset = 0xaaaac83ff888 <my_charset_bin>, key_offset = 0, key_length = 4, element_size = 4, flags = 1, 
  size = 1, count = 0}

mbeck, svoj, if you have a moment/interest, can you please check the implementation again.



 Comments   
Comment by Daniel Black [ 2022-04-28 ]

The lf_hash unit tests appear to be fixed by MDEV-27088, its after this point that the lf_alloc has taken prominence as the sole failure. Probably added by the increase in CYCLES from MDEV-27088. lf_allocation failures did occur pre-MDEV-27088 too however.

Comment by Sergey Vojtovich [ 2022-05-01 ]

danblack, did lf-t crash or just fail? A few lines from the log or a link to the failure would be useful.
UPD: sorry, missed "stalled".

Comment by Daniel Black [ 2022-05-02 ]

Fairly common across all build environments.

from 10.6, fedora (https://buildbot.mariadb.org/#/builders/304/builds/4237/steps/6/logs/stdio)

unit.lf                                  w44 [ fail ]
        Test ended at 2022-04-30 15:02:37
CURRENT_TEST: unit.lf
# N CPUs: 160
1..6
# Testing lf_pinbox (with my_thread_init) with 30 threads, 30000 iterations... 
ok 1 - tested lf_pinbox (with my_thread_init) in 0.276304 secs (0)
# Testing lf_alloc (with my_thread_init) with 30 threads, 30000 iterations... 
Bail out! Signal 11 thrown
# 6 tests planned,  0 failed,  1 was last executed
Bail out! Signal 11 thrown
# 6 tests planned,  0 failed,  1 was last executed
mysqltest failed with unexpected return code 255

Comment by Sergey Vojtovich [ 2022-05-05 ]

This part of coredump (lf_pinbox_real_free() frame) looks suspicious:

list = 0xffff34001518
cur = 0xffff34001518
last = 0xffff34001518

Note that all addresses are equal. last is address of an element processed in previous iteration, cur - in current iteration, list - for the next iteration. In other words dead-loop, that is element->next == element.

Feels like per-thread purgatory list is broken, otoh I can't tell if this coredump can be trusted (e.g. if it comes from heavily optimized binary). It is hard to dig it further without having access to similar machine.

Comment by Daniel Black [ 2023-02-02 ]

rseg (Restartable Sequence) might be a viable alternate implementation https://lwn.net/Articles/883104/

Comment by Marko Mäkelä [ 2023-04-28 ]

vlad.lesin reproduced an assertion failure on AMD64 that suggests that there is a race condition between lf_hash_search() and some other operations. We have an assertion failure:

#7  0x00007ff134c8dfd6 in __assert_fail () from /dev/shm/core-rw_trx_hash_t/libs/libc.so.6
#8  0x00005613abefad92 in rw_trx_hash_t::find (this=<optimized out>, caller_trx=caller_trx@entry=0x7ff1344c5c50, trx_id=<optimized out>, do_ref_count=do_ref_count@entry=true) at /home/vlesin/work/git/10.5-enterprise/storage/innobase/include/trx0sys.h:646
#9  0x00005613abef00cd in trx_sys_t::find (do_ref_count=true, id=<optimized out>, caller_trx=0x7ff1344c5c50, this=<optimized out>) at /home/vlesin/work/git/10.5-enterprise/storage/innobase/include/trx0sys.h:1110
#10 lock_rec_convert_impl_to_expl (caller_trx=caller_trx@entry=0x7ff1344c5c50, block=block@entry=0x7ff129419e60, rec=rec@entry=0x7ff129b0007e "\200", index=index@entry=0x7fef8402b480, offsets=offsets@entry=0x7ff0dc2658c0) at /home/vlesin/work/git/10.5-enterprise/storage/innobase/lock/lock0lock.cc:5546

In the record, the DB_TRX_ID is 0x6b684 (439940). In the trx_sys.rw_trx_hash there are 3 transactions, none with that identifier. One of them was found by the crashing thread:

646	        DBUG_ASSERT(trx_id == trx->id);
(gdb) i lo
__PRETTY_FUNCTION__ = "trx_t* rw_trx_hash_t::find(trx_t*, trx_id_t, bool)"
trx = 0x7ff1344bada0
pins = 0x5613aef81d68
element = 0x7fefc8016a78
(gdb) p *element
$19 = {id = 439951, no = {m_counter = {<std::__atomic_base<unsigned long>> = {_M_i = 439952}, <No data fields>}}, trx = 0x7ff1344bada0, mutex = {m_impl = {m_lock_word = {<std::__atomic_base<unsigned int>> = {_M_i = 2}, <No data fields>}, m_event = 0x7fefc8016b30, m_policy = {context = {<latch_t> = {
            _vptr.latch_t = 0x5613ac9ebf60 <vtable for MutexDebug<TTASEventMutex<GenericPolicy> >+16>, m_id = LATCH_ID_RW_TRX_HASH_ELEMENT, m_rw_lock = false}, m_mutex = 0x7fefc8016a90, m_filename = 0x5613ac566068 "/home/vlesin/work/git/10.5-enterprise/storage/innobase/include/trx0sys.h", m_line = 643, m_thread_id = 140672462386944, m_debug_mutex = {m_freed = false, 
            m_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}}, m_filename = 0x5613ac566068 "/home/vlesin/work/git/10.5-enterprise/storage/innobase/include/trx0sys.h", m_line = 349, m_count = {m_spins = 0, m_waits = 0, 
          m_calls = 0, m_enabled = false}, m_id = LATCH_ID_RW_TRX_HASH_ELEMENT}}}}

This transaction is being committed in another thread:

#6  PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0x7fefc8016a90, n_spins=30, n_delay=4, name=0x5613ac566068 "/home/vlesin/work/git/10.5-enterprise/storage/innobase/include/trx0sys.h", line=701) at /home/vlesin/work/git/10.5-enterprise/storage/innobase/include/ib0mutex.h:590
#7  0x00005613ac0873f5 in rw_trx_hash_t::erase (trx=0x7ff1344bada0, this=<optimized out>) at /home/vlesin/work/git/10.5-enterprise/storage/innobase/include/trx0sys.h:701
#8  trx_sys_t::deregister_rw (trx=0x7ff1344bada0, this=<optimized out>) at /home/vlesin/work/git/10.5-enterprise/storage/innobase/include/trx0sys.h:1098
#9  trx_t::commit_in_memory (this=this@entry=0x7ff1344bada0, mtr=mtr@entry=0x7ff0dc472460) at /home/vlesin/work/git/10.5-enterprise/storage/innobase/trx/trx0trx.cc:1321

This thread is waiting for the element->mutex that the crashing thread is holding.

Note: The transaction ID is 439951, just like the element->id. It looks like the transaction 439940 had been committed and the memory reused for a new transaction 439951 while lf_hash_search() was in progress. Is the element returned by lf_hash_search() supposed to be validated by the caller again? If yes, what is the purpose of element->id?

I believe that the impact of this bug could be ACID violation or weird locking anomalies. If we assume that the debug assertion does not exist, the returned transaction object would have been passed to lock_rec_convert_impl_to_expl_for_trx(). It only checks that the transaction has not been committed. If the unrelated transaction has not been committed, it would grant an exclusive lock to that transaction. This would unnecessarily block the lock-waiting transaction until the unrelated transaction is committed or rolled back.

Comment by Marko Mäkelä [ 2023-04-28 ]

For an Oracle-internal 8.0.0 release of MySQL, the implementation was rewritten in C++11 std::atomic (using implicit std::memory_order_seq_cst in 2015. There have been some changes to the code since then, including replacing some more volatile with more std::atomic, and adding a cmp_func parameter.

I admit that I do not understand the purpose of the pins. I wonder if in this function there is a narrow opportunity for race conditions:

static LF_SLIST *l_search(LF_SLIST **head, CHARSET_INFO *cs,
                         uint32 hashnr, const uchar *key, uint keylen,
                         LF_PINS *pins)
{
  CURSOR cursor;
  int res= l_find(head, cs, hashnr, key, keylen, &cursor, pins, 0);
  if (res)
    lf_pin(pins, 2, cursor.curr);
  else
    lf_unpin(pins, 2);
  lf_unpin(pins, 1);
  lf_unpin(pins, 0);
  return res ? cursor.curr : 0;
}

res would only hold if r==0 in l_find():

      else if (cur_hashnr >= hashnr)
      {
        int r= 1;
        if (cur_hashnr > hashnr ||
            (r= my_strnncoll(cs, cur_key, cur_keylen, key, keylen)) >= 0)
          return !r;
      }

Without understanding the logic, I wonder if the lf_pin(pins, 2, …) should be executed in l_find() before comparing anything. Possibly, adding a sleep right after the l_find() call in l_search() would allow the anomaly to be reproduced more easily.

Comment by Vladislav Lesin [ 2023-05-03 ]

> I admit that I do not understand the purpose of the pins.

As I understood, the purpose is to prevent memory deallocation if it's used. I.e. the node in lf_hash linked list can be deleted, but its content including to the pointer to the next element will still be valid until it's unpinned.

marko, when the comparison is happening, cursor->curr is already pinned to pin 1 with lf_pin(pins, 1, cursor->curr) call either at the beginning of "retry" label, or at the end of "for (;;)" loop in l_find().

The LF_SLIST::key points to the memory object, allocated with pinbox allocator, see rw_trx_hash_t::init()->lf_hash_init()->lf_alloc_init()->lf_pinbox_init() call stack and LF_PINBOX::free_func_arg variable, i.e. lf_alloc_new() in lf_hash_insert() allocates the memory for storing LF_SLIST+rw_trx_hash_element_t in one chunk, see also lf_alloc_new() call from rw_trx_hash_t::insert()->lf_hash_insert(), and LF_SLIST::key points to the following after that LF_SLIST object rw_trx_hash_element_t::id.

In other words, the memory, which LF_SLIST::key points to, is pinned to pin 1, and it's safe to compare it with local trx_id (at least if pinbox allocator works as supposed).

Comment by Vladislav Lesin [ 2023-05-03 ]

One more addition, l_search() can return deleted element(the element can be "deleted" after !DELETED(link) condition was checked in l_find(), but, as the current and the next element of the list are pinned, their content stays valid), rw_trx_hash_t::find() checks element->trx, which is zeroed out by rw_trx_hash_t::erase() before deleting the element from the hash. If the element was deleteв with rw_trx_hash_t::erase(), element->trx would be zeroed out, and "if ((trx= element->trx))" with the assertion would not be executed.

This is the answer to Marko's question: "Is the element returned by lf_hash_search() supposed to be validated by the caller again?" It's validated with element->trx examining in rw_trx_hash_t::find().

> The transaction ID is 439951, just like the element->id. It looks like the transaction 439940 had been committed and the memory reused for a new transaction 439951 while lf_hash_search() was in progress.

If the transaction was committed, rw_trx_hash_t::erase() would be invoked for committed transaction, and element->trx would be zeroed out.

One more note, element->trx is protected with element->mutex, which is acquired from both rw_trx_hash_t::erase() and rw_trx_hash_t::find().

Comment by Vladislav Lesin [ 2023-05-03 ]

The only assumption I have at the moment is that LF_SLIST+rw_trx_hash_element_t chunk was reused when it was pinned.

Take a look rw_trx_hash_t::find():

    rw_trx_hash_element_t *element= reinterpret_cast<rw_trx_hash_element_t*>    
      (lf_hash_search(&hash, pins, reinterpret_cast<const void*>(&trx_id),      
                      sizeof(trx_id_t)));                                       
    if (element)                                                                                                                 
    {                                                                                                                            
      mutex_enter(&element->mutex);                                             
      lf_hash_search_unpin(pins);                                               
      if ((trx= element->trx)) {                                                 
        DBUG_ASSERT(trx_id == trx->id);                                         
        ...                                                                     
      }                                                                         
      mutex_exit(&element->mutex);                                              
    } 

It acquires element->mutex, then unpins transaction pins. After that the "element" can be deallocated and reused by some other thread.

If we take a look rw_trx_hash_t::insert()->lf_hash_insert()->lf_alloc_new() calls, we will not find any element->mutex acquisition, as it was not initialized yet before it's allocation. My assumption is that rw_trx_hash_t::insert() can easily reuse the chunk, unpinned in rw_trx_hash_t::find().

The scenario is the following:

1. Thread 1 have just executed lf_hash_search() in rw_trx_hash_t::find(), but have not acquired element->mutex yet.

2. Thread 2 have removed the element from hash table with rw_trx_hash_t::erase() call.

3. Thread 1 acquired element->mutex and unpinned pin 2 pin with lf_hash_search_unpin(pins) call.

4. Some thread purged memory of the element.

5. Thread 3 reused the memory for the element, filled element->id, element->trx.

6. Thread 1 crashes with failed "DBUG_ASSERT(trx_id == trx->id)" assertion.

If so, then the bug does not relate to the allocator case, described in this ticket. The fix is to invoke "lf_hash_search_unpin(pins);" after "mutex_exit(&element->mutex);" call in rw_trx_hash_t::find().

Comment by Vladislav Lesin [ 2023-05-03 ]

The above scenario is indirectly confirmed with the following trick. If we set one my_sleep(1) before mutex_enter(&element->mutex) call in rw_trx_hash_t::find(), another my_sleep(1) after lf_hash_search_unpin(pins) call in rw_trx_hash_t::find(), then the assertion failure is reproduced much more faster with the initial test case caused it.

Comment by Vladislav Lesin [ 2023-05-04 ]

I filed a separate bug for it: MDEV-31185.

Comment by Nikita Malyavin [ 2023-06-26 ]

Hi all,
after MDEV-31185 been closed, i see no news on this one. Does it still reproduce?

I'll note that during the earlier discussions on the latter, my finding was the following:

According to the description, the failure happens on this line:
if (cur == *c)
Here only c dereferences. It depends on npins, and all I see is that npins is assigned like this:
npins= pinbox->pins_in_array+1;
however pinbox->pins_in_array fetch is not protected by any atomic access

I assume that on x86 it has less chances to make a problem, but can be vital for arm.

Comment by Daniel Black [ 2023-10-09 ]

Still occuring regularly in BB

Comment by Xiaotong Niu [ 2023-10-18 ]

Hi Daniel, Could you please provide the arm platform information where the error occurred? This may help us reproduce this error, thank you.

Comment by Daniel Black [ 2023-10-18 ]

Thanks xiaoniu is this sufficient?

Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          160
On-line CPU(s) list:             0-159
Thread(s) per core:              1
Core(s) per socket:              80
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       ARM
Model:                           1
Model name:                      Neoverse-N1
Stepping:                        r3p1
BogoMIPS:                        50.00
L1d cache:                       10 MiB
L1i cache:                       10 MiB
L2 cache:                        160 MiB
NUMA node0 CPU(s):               0-79
NUMA node1 CPU(s):               80-159
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; CSV2, BHB
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

Comment by Xiaotong Niu [ 2023-10-19 ]

Thanks Daniel, we have reproduce it, and we are investigating this issue.

Comment by Xiaotong Niu [ 2023-10-19 ]

During the debugging process, we simulate time delays in the lf_alloc_new function, then an new error was detected, this error can be detected very quickly on arm, and there is no such problem on x86. Posted here for discussion.

Code to simulate delays:

diff --git a/mysys/lf_alloc-pin.c b/mysys/lf_alloc-pin.c
index 6d80b381..101965c3 100644
--- a/mysys/lf_alloc-pin.c
+++ b/mysys/lf_alloc-pin.c
@@ -501,6 +501,8 @@ void *lf_alloc_new(LF_PINS *pins)
     do
     {
       node= allocator->top;
+      static volatile int vvv;
+      for (int i = 0; i < 33; ++i) ++vvv;
      lf_pin(pins, 0, node);
     } while (node != allocator->top && LF_BACKOFF());
     if (!node)

Then an error occurred, and the detail gdb information is in report_gdb.txt .
It contains the following key information, please note that "node = 0xffff00000000":

Thread 59 (Thread 0xffff9affd1e0 (LWP 3589006) "lf-t"):
#0  0x0000aaaaaaac7674 in lf_alloc_new (pins=pins@entry=0xfffff0002088) at /home/nxt/bugfix/lf_new_node_delay/mariadb-server/mysys/lf_alloc-pin.c:519
        allocator = 0xaaaaaae2bcf8 <lf_allocator>
        *node = 0xffff00000000 <error: Cannot access memory at address 0xffff00000000>*
#1  0x0000aaaaaaac4818 in test_lf_alloc (arg=<optimized out>) at /home/nxt/bugfix/lf_new_node_delay/mariadb-server/unittest/mysys/lf-t.c:82
        node1 = <optimized out>
        node2 = <optimized out>
        m = 15000
        x = <optimized out>
        y = 0
        pins = 0xfffff0002088
#2  0x0000fffff7f89624 in start_thread (arg=0xaaaaaaac47c8 <test_lf_alloc>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {281473282201084, 281474976704464, 281474976704462, 281474842152960, 281474976704463, 187649984579528, 281473282201824, 281473282200032, 281474842157056, 281473282200032, 281473282197952, 8158329283288982986, 0, 8158329283742021910, 0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#3  0x0000fffff7ee049c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
No locals.

other information:

Mariadb version:10.4
 
Cmake option: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DPLUGIN_TOKUDB=NO -DPLUGIN_MROONGA=NO -DPLUGIN_SPIDER=YES -DPLUGIN_OQGRAPH=NO -DPLUGIN_PERFSCHEMA=YES -DPLUGIN_SPHINX=NO
 
gcc-9 (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0

In addition, the #L85 and #L89 in lf-t.c set the next of the node returned by alloc_new() to 0xffff00000000, related code link:
https://github.com/MariaDB/server/blob/ef7fc586aea1048bc5526192158a8e8e935ddd1a/unittest/mysys/lf-t.c#L85

Comment by Xiaotong Niu [ 2023-10-27 ]

A PR was submitted for this issue, link:
https://github.com/MariaDB/server/pull/2804

Comment by Marko Mäkelä [ 2023-10-27 ]

xiaoniu, thank you, great work. I think that in we should also consider refactoring that code further, converting it to C++11 std::atomic, and replacing some excessive use of std::memory_order_seq_cst when possible.

Generated at Thu Feb 08 10:00:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.