[MDEV-23510] Crash in my_strnncoll_binary while running point-select Created: 2020-08-19  Updated: 2021-11-30  Resolved: 2021-02-25

Status: Closed
Project: MariaDB Server
Component/s: Locking, Server, Storage Engine - InnoDB
Affects Version/s: 10.5, 10.6
Fix Version/s: 10.2.38, 10.3.29, 10.4.19, 10.5.10

Type: Bug Priority: Critical
Reporter: Krunal Bauskar Assignee: Daniel Black
Resolution: Fixed Votes: 0
Labels: race
Environment:

MDBonARM, arm64


Issue Links:
PartOf
is part of MDEV-27088 Server crash on ARM (WMM architecture... Closed
Relates
relates to MDEV-6089 MySQL WL#7305 "Improve MDL scalabilit... Closed
relates to MDEV-14638 Replace trx_sys_t::rw_trx_set with LF... Closed

 Description   
  • Running a CPU bound workload (point-select) with increase scalability from 1-1024 (in multiple of 2).
  • Hit the said issue at 32 threads but it is random could hit is even at different scalability.
  • Issues seems to suggest something wrong with memcmp (less likely memory violation more likely alignment issue).
  • Observed with MDB-10.5-trunk (10.5.6) on ARM only. Not observed with x86 yet. If present may need different test-case.
    Existing bug in this area MDEV-20619

----------------

(gdb) bt
#0  0x0000ffffbe077f50 in memcmp () from /lib64/libc.so.6
#1  0x0000aaaaab82747c in my_strnncoll_binary (cs=<optimized out>, s=<optimized out>, slen=14, t=<optimized out>, tlen=14, t_is_prefix=0 '\000')
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/strings/ctype-bin.c:87
#2  0x0000aaaaab817898 in l_find (head=0xffe998006cf8, head@entry=0xaaab058aec20, cs=0xaaaaac02d3f0 <my_charset_bin>, hashnr=<optimized out>, 
    key=key@entry=0xffe924b3c668 "\002arm", keylen=keylen@entry=14, cursor=0xffeafc4c55d8, cursor@entry=0xffeafc4c5618, pins=pins@entry=0xaaab058a3c00, 
    callback=callback@entry=0x0) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/mysys/lf_hash.c:132
#3  0x0000aaaaab818240 in l_search (pins=0xaaab058a3c00, keylen=14, key=0xffe924b3c668 "\002arm", hashnr=<optimized out>, cs=<optimized out>, 
    head=0xaaab058aec20) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/mysys/lf_hash.c:286
#4  lf_hash_search_using_hash_value (hash=hash@entry=0xaaaaac0b9010 <mdl_locks>, pins=pins@entry=0xaaab058a3c00, hashnr=<optimized out>, 
    key=key@entry=0xffe924b3c668, keylen=14) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/mysys/lf_hash.c:488
#5  0x0000aaaaab8183b4 in lf_hash_search (hash=hash@entry=0xaaaaac0b9010 <mdl_locks>, pins=pins@entry=0xaaab058a3c00, key=key@entry=0xffe924b3c668, 
    keylen=<optimized out>) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/mysys/lf_hash.c:528
#6  0x0000aaaaab24acb4 in MDL_map::find_or_insert (this=0xaaaaac0b9010 <mdl_locks>, pins=0xaaab058a3c00, mdl_key=0xffe924b3c660)
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/mdl.cc:825
#7  0x0000aaaaab24c1e0 in MDL_context::try_acquire_lock_impl (this=this@entry=0xffe9240009f8, mdl_request=mdl_request@entry=0xffe924b3c640, 
    out_ticket=0xffeafc4c5768, out_ticket@entry=0xffeafc4c57e8) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/mdl.cc:2076
#8  0x0000aaaaab24c860 in MDL_context::acquire_lock (this=this@entry=0xffe9240009f8, mdl_request=mdl_request@entry=0xffe924b3c640, 
    lock_wait_timeout=86400) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/mdl.cc:2251
#9  0x0000aaaaab122f10 in open_table_get_mdl_lock (thd=thd@entry=0xffe9240008d8, ot_ctx=ot_ctx@entry=0xffeafc4c6118, 
    mdl_request=mdl_request@entry=0xffe924b3c640, flags=flags@entry=0, mdl_ticket=0xffeafc4c5ae0, mdl_ticket@entry=0xffeafc4c5b60)
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_base.cc:1542
#10 0x0000aaaaab126640 in open_table (thd=thd@entry=0xffe9240008d8, table_list=table_list@entry=0xffe924b3c1f8, ot_ctx=ot_ctx@entry=0xffeafc4c6118)
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_base.cc:1803
#11 0x0000aaaaab1291bc in open_and_process_table (ot_ctx=0xffeafc4c6118, has_prelocking_list=24, prelocking_strategy=0xffeafc4c6218, flags=65514, 
    counter=0xffeafc4c619c, tables=0xffe924b3c1f8, thd=0xffe9240008d8)
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_base.cc:3784
#12 open_tables (thd=thd@entry=0xffe9240008d8, options=..., start=0xffeafc4c6188, start@entry=0xffeafc4c61a8, counter=0xffeafc4c619c, 
    counter@entry=0xffeafc4c61bc, flags=65514, flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0xffeafc4c6218)
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_base.cc:4256
#13 0x0000aaaaab12995c in open_and_lock_tables (thd=thd@entry=0xffe9240008d8, options=..., tables=<optimized out>, tables@entry=0xffe924b3c1f8, 
    derived=derived@entry=true, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0xffeafc4c6218)
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_base.cc:5160
#14 0x0000aaaaab17d1c8 in open_and_lock_tables (flags=0, derived=true, tables=0xffe924b3c1f8, thd=0xffe9240008d8)
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_base.h:509
#15 execute_sqlcom_select (thd=thd@entry=0xffe9240008d8, all_tables=0xffe924b3c1f8)
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_parse.cc:6131
#16 0x0000aaaaab17a3a8 in mysql_execute_command (thd=0xffe9240008d8) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_parse.cc:3932
#17 0x0000aaaaab1910c0 in Prepared_statement::execute (this=this@entry=0xffe924c04908, expanded_query=expanded_query@entry=0xffeafc4c6db0, 
    open_cursor=open_cursor@entry=false) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_prepare.cc:4736
---Type <return> to continue, or q <return> to quit---
#18 0x0000aaaaab1911bc in Prepared_statement::execute_loop (this=0xffe924c04908, expanded_query=0xffeafc4c6db0, open_cursor=false, 
    packet=<optimized out>, packet_end=<optimized out>) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_prepare.cc:4225
#19 0x0000aaaaab191a28 in mysql_stmt_execute_common (thd=thd@entry=0xffe9240008d8, stmt_id=42, packet=packet@entry=0xffe924c144e2 "", 
    packet_end=0xffe924c144ec "", packet_end@entry=0xffe924c144d9 "*", cursor_flags=<optimized out>, bulk_op=bulk_op@entry=false, 
    read_types=read_types@entry=false) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_prepare.cc:3226
#20 0x0000aaaaab191ab4 in mysqld_stmt_execute (thd=thd@entry=0xffe9240008d8, packet_arg=packet_arg@entry=0xffe924c144d9 "*", 
    packet_length=packet_length@entry=0) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_prepare.cc:3121
#21 0x0000aaaaab177558 in dispatch_command (command=command@entry=COM_STMT_EXECUTE, thd=thd@entry=0xffe9240008d8, 
    packet=packet@entry=0xffe924c144d9 "*", packet_length=0, packet_length@entry=19, is_com_multi=is_com_multi@entry=false, 
    is_next_command=is_next_command@entry=false) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_parse.cc:1791
#22 0x0000aaaaab176914 in do_command (thd=0xffe9240008d8) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_parse.cc:1348
#23 0x0000aaaaab244870 in do_handle_one_connection (connect=<optimized out>, connect@entry=0xaaab059451b8, put_in_cache=put_in_cache@entry=true)
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_connect.cc:1410
#24 0x0000aaaaab244c70 in handle_one_connection (arg=arg@entry=0xaaab059451b8)
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/sql/sql_connect.cc:1312
#25 0x0000aaaaab5337e0 in pfs_spawn_thread (arg=0xaaab05876d48)
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/storage/perfschema/pfs.cc:2201
#26 0x0000ffffbe767d38 in start_thread () from /lib64/libpthread.so.0
#27 0x0000ffffbe0cf5f0 in thread_start () from /lib64/libc.so.6



 Comments   
Comment by Krunal Bauskar [ 2020-09-04 ]
  • some more analysis suggest it to be race with lf-hash.
  • Code flow shows deleted = false so execution took the said path but when it crashed core-dump state shows said cursor is marked deleted (Deleted bit set in cursor->link). Not sure what is causing the race but the issue happens with the ARM server at high scalability workloads (observed mostly with > 512 threads for 64 core cpu).
Comment by Elena Stepanova [ 2020-10-18 ]

In addition to MDEV-20619, any of MDEV-18900, MDEV-16699, MDEV-23895 can be related.
Without a test case we will have to assume that this one is a duplicate of one of them.

Is it a production server? If not, could you maybe get a stack trace from an instrumented build, e.g. with ASAN-debug? It could help to match it more precisely with other issues.

Comment by Krunal Bauskar [ 2021-01-13 ]

The said issue is now reproduced easily using unit.lf (unit-test meant for lf-hash) on ARM server. Disabling the single handler should help us get the core-dump. Reason looks quite same as the one posted earlier. May be expert in lf-hash can take a look or suggest what else could be looked into.
(I am running unit.lf N times (N=100) on 24 cores machine with default 30 threads and 3000 cycles .. seems like changing it on the said machine reduces reproducibility).

(gdb) bt
#0  memcmp () at ../sysdeps/aarch64/memcmp.S:134
#1  0x0000aaaaaaadb8dc in my_strnncoll_binary (cs=<optimized out>, s=<optimized out>, slen=4, t=<optimized out>, tlen=4, t_is_prefix=0 '\000')
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/strings/ctype-bin.c:87
#2  0x0000aaaaaaad1618 in l_find (head=0xffff88005b18, head@entry=0xfffff0004c20, cs=cs@entry=0xaaaaaada6840 <my_charset_bin>, 
    hashnr=hashnr@entry=1707758225, key=key@entry=0xffffb9ffa9e8 "\001`\332vM\245&\230", keylen=keylen@entry=4, cursor=cursor@entry=0xffffb9ffa958, 
    pins=pins@entry=0xffff60000d38, callback=callback@entry=0x0) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/mysys/lf_hash.c:132
#3  0x0000aaaaaaad1d8c in l_delete (pins=0xffff60000d38, keylen=<optimized out>, key=0xffffb9ffa9e8 "\001`\332vM\245&\230", hashnr=1707758225, 
    cs=0xaaaaaada6840 <my_charset_bin>, head=0xfffff0004c20) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/mysys/lf_hash.c:231
#4  lf_hash_delete (hash=0xaaaaaae20d38 <lf_hash>, pins=0xffff60000d38, key=0xffffb9ffa9e8, keylen=<optimized out>)
    at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/mysys/lf_hash.c:454
#5  0x0000aaaaaaace480 in test_lf_hash (arg=<optimized out>) at /opt/projects/mysql/105/non-forked-mdb/mdb/codebase/server/unittest/mysys/lf-t.c:155
#6  0x0000fffff7fab088 in start_thread (arg=0xffffffffb59f) at pthread_create.c:463
#7  0x0000fffff7f1affc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Comment by Marko Mäkelä [ 2021-01-13 ]

We also have the unit.lf test failing on buildbot.mariadb.org on aarch64-ubuntu-1804:

10.6 92abdcca5a5324f0727112ab2417d10c7a8d5094 + some InnoDB changes

CURRENT_TEST: unit.lf
# N CPUs: 8
1..6
# Testing lf_pinbox (with my_thread_init) with 30 threads, 3000 iterations...
ok 1 - tested lf_pinbox (with my_thread_init) in 0.0110765 secs (0)
# Testing lf_alloc (with my_thread_init) with 30 threads, 3000 iterations...
# 0 mallocs, 9 pins in stack
ok 2 - tested lf_alloc (with my_thread_init) in 0.0144825 secs (0)
# Testing lf_hash (with my_thread_init) with 30 threads, 3000 iterations...
# 0 mallocs, 10 pins in stack, 2048 hash size, 20848 inserts, 465 scans
ok 3 - tested lf_hash (with my_thread_init) in 0.106343 secs (0)
# Testing lf_pinbox (without my_thread_init) with 30 threads, 3000 iterations...
ok 4 - tested lf_pinbox (without my_thread_init) in 0.00633498 secs (0)
# Testing lf_alloc (without my_thread_init) with 30 threads, 3000 iterations...
# 0 mallocs, 9 pins in stack
ok 5 - tested lf_alloc (without my_thread_init) in 0.0531876 secs (0)
# Testing lf_hash (without my_thread_init) with 30 threads, 3000 iterations...
Bail out! Signal 11 thrown
# 6 tests planned,  0 failed,  5 was last executed

Comment by Marko Mäkelä [ 2021-01-20 ]

wlad, the reported stack trace occurs in MDL and you are the maintainer of that area. Apart from MDL (since MDEV-6089), also InnoDB trx_sys.rw_trx_hash is using the lf_hash (ever since MDEV-14638).

I think that we should consider replacing the lf_hash implementation with some regularly maintained library, possibly based on C++ std::atomic.

Comment by Sergei Golubchik [ 2021-01-20 ]

might be a lf-hash issue on arm?

Comment by Elena Stepanova [ 2021-01-20 ]

Taking into account (at least) the stack trace in MDEV-23895, 4th from the top, would you still think it's an arm-specific issue?

Comment by Vladislav Vaintroub [ 2021-01-20 ]

The pure unit test, not poisoned by any Item_func fails, so the problem is in lf_hash and multithreading, and ARM.
The lock-free did not have a lot of coverage on weak memory models before, given prevalence of Intel processors. And lock-free is tricky, s. https://randomascii.wordpress.com/2020/11/29/arm-and-lock-free-programming/

Comment by Vladislav Vaintroub [ 2021-01-20 ]

As for concurrent "not invented here" libraries implementing concurrent hash maps, there is something I just googled up
https://github.com/preshing/junction

The author seems to have a good understanding of concurrency and CPU models etc.

Comment by Sergei Golubchik [ 2021-01-25 ]

Doesn't fail on buildbot.askmonty.org. Using aarch64-ubuntu-1804 VM on an aarch64 ubuntu host.

I need some way to repeat the issue.
krunalbauskar, any suggestions?

Comment by Krunal Bauskar [ 2021-01-27 ]

Sergei,

Reproducing failure has been challenging. Fortunately, recently unit-test started failing for me on my 24 cores vm when run repeatedly.

I tried to reduce it by lowering the scalability but then test-case started to pass continuously. Maybe you can also try to play around with it to see if some configuration can help reproduce the crash with the said VM.

#define CYCLES 3000
#define THREADS 30

Comment by Marko Mäkelä [ 2021-02-12 ]

Another SIGSEGV on aarch64-rhel-8:

CURRENT_TEST: innodb.innodb-autoinc-61209
mysqltest: At line 43: query 'DROP TABLE bug_61209' failed: 2013: Lost connection to MySQL server during query
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffff85f6066c]
:0(bcmp)[0xffff8543e3c8]
strings/ctype-bin.c:88(my_strnncoll_binary)[0xaaaae6a747e4]
mysys/lf_hash.c:132(l_find)[0xaaaae6a65174]
mysys/lf_hash.c:287(l_search)[0xaaaae6a65aa8]
sql/mdl.cc:822(MDL_map::find_or_insert(LF_PINS*, MDL_key const*))[0xaaaae6510990]
sql/mdl.cc:2073(MDL_context::try_acquire_lock_impl(MDL_request*, MDL_ticket**))[0xaaaae651256c]
sql/mdl.cc:2248(MDL_context::acquire_lock(MDL_request*, double))[0xaaaae651297c]
sql/mdl.cc:2459(MDL_context::acquire_locks(I_P_List<MDL_request, I_P_List_adapter<MDL_request, &MDL_request::next_in_list, &MDL_request::prev_in_list>, I_P_List_counter, I_P_List_no_push_back<MDL_request> >*, double))[0xaaaae6513384]
sql/sql_base.cc:4041(lock_table_names(THD*, DDL_options_st const&, TABLE_LIST*, TABLE_LIST*, unsigned long, unsigned int))[0xaaaae63e67d0]
sql/sql_table.cc:2083(mysql_rm_table(THD*, TABLE_LIST*, bool, bool, bool, bool))[0xaaaae64b24cc]

Comment by Daniel Black [ 2021-02-15 ]

I had a theory that a SEGV would be caused by a read of a memory address that was only partially written. Normally we assume writes are atomic but they don't need to be. The l_find uses the LF_SLIST structure for searching. there are a number of 64 pointer types with a uint32 in the middle. This breaks up the alignment. The order isn't necessary so uint32 goes at the end.

https://github.com/MariaDB/server/commit/d30c1331a18d875e553f3fcf544997e4f33fb943 - branch bb-10.5-danielblack-MDEV-23510-arm-lfhash

Comment by Krunal Bauskar [ 2021-02-15 ]

I have tested the fix suggested by Daniel and it is working fine when run with the said unit-test-case.
TC use to crash intermittently but has sufficiently stabilized (w/o any crash observed) postfix.
We can assume said issue fixed with the proposed patch.

Comment by Daniel Black [ 2021-02-21 ]

Ok, ready. I do remember hitting a lf_hash thing in ppc64le but never got to the bottom of this.

bb-10.2-danielblack-MDEV-23510-arm-lfhash

  • lf_hash -> same patch as above, with a few extra volatile directive removed because they are wrong.
  • HAVE_ALIGNED_MALLOC / HAVE_POSIX_MEMALIGN configure tests where missing from the beginning of the 10.0 branch (where only used in performance schema).
  • To ensure that lf_hash structure is aligned my_[mc]alloc_aligned was created (using HAVE_* above) and used by lf_dynarray_lvalue. The uses of this are in lf_hash, maria/lockman.c, and lf_pinbox_get_pins/lf_alloc_get_pins (that only get used in unit tests). Windows use of aligned requires a special free function so these need to be mapped right. In all cases, lf_dynarray_destroy is the destructor of usage in hence my_free_aligned in recursive_free.
  • As minor cleanup, pfs, innodb allocations changed to use new mysys my_[mc]alloc_aligned.

For 10.5 - bb-10.5-danielblack-MDEV-23510-arm-lfhash

  • lf_hash contains movement to C++ to use my_assume_aligned assertions but is otherwise the same.
  • pfs - uses klass->count_alloc for counting (presumably to fix 10.2 bug that pfs_allocated_memory was never decreased).
  • tpool (added 10.5) to use my_ {malloc,free}

    _aligned rather than own implementation

  • innodb/mariabackup 10.5 extended allocated usage significantly so includes extra my_[mc]alloc_aligned rather than own implementation. In a slight case of going to far, added error handling for memory allocation where an obvious within function solution existed.
Comment by Marko Mäkelä [ 2021-02-24 ]

Thank you. I think that I am really only entitled to review the InnoDB change (using the common wrapper my_malloc_aligned()). That change looked OK to me.

As far as I understand, the necessary change is the first one, which to my understanding passed tests on the affected platform.

I did not completely understand the issue, though. If some data must or must not be in the same cache line with something else, then we probably would want to allocate aligned memory, passing the cache line size as the parameter. I did not see that happening. You passed an alignment constraint of only sizeof(void*), which I think should be trivially satisfied by any malloc() implementation.

The implementation of my_malloc_aligned() looks wrong to me in one aspect: It allows a fall-back to unaligned malloc(). We do not want to reintroduce MDEV-21337 and break innodb_flush_method=O_DIRECT or encrypted or page_compressed tables, on any platform.

Comment by Daniel Black [ 2021-02-25 ]

Pushed uncontroversial first commit.

The entire `uchar *key` must be within a cache line to be able to atomicly read it by a CPU. So with 32bits before and after the cache line boundary, one CPU could be writing an upper 1/2 of the address while another CPU is reading the lower 1/2. Eventually the writer CPU will finish, however the reader combining it with the other 1/2 is an invalid address.

leaving other commits for a separate MDEV later:

Generated at Thu Feb 08 09:22:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.