[MDEV-27954] server crash in signal handler, my_write_core, on sparc64 Created: 2022-02-26  Updated: 2023-10-17

Status: Open
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.6.7
Fix Version/s: 10.6

Type: Bug Priority: Minor
Reporter: Otto Kekäläinen Assignee: Andrew Hutchings
Resolution: Unresolved Votes: 0
Labels: crash


 Description   

While reviewing MariaDB 10.6.7 build errors on Debian I saw multiple test failures on sparc64 that crash the server:

main.partition_order                     w2 [ fail ]
        Test ended at 2022-02-25 10:53:53
 
CURRENT_TEST: main.partition_order
mysqltest: At line 534: query 'select * from t1 force index (b) where b > '0' order by b' failed: <Unknown> (2013): Lost connection to server during query
 
The result from queries just before the failure was:
< snip >
a	b
1	1
35	2
30	4
2	5
drop table t1;
CREATE TABLE t1 (
a int not null,
b tinytext not null,
primary key(a),
index (b(10)))
partition by range (a)
partitions 2
(partition x1 values less than (25),
partition x2 values less than (100));
INSERT into t1 values (1, '1');
INSERT into t1 values (2, '5');
INSERT into t1 values (30, '4');
INSERT into t1 values (35, '2');
select * from t1 force index (b) where b > '0' order by b;
 
More results from queries before failure can be found in /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/partition_order.log
 
 - found 'core' (0/5)
 
Trying 'dbx' to get a backtrace
 
Trying 'gdb' to get a backtrace from coredump /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.partition_order/mysqld.1/data/core
Core generated by '/<<PKGBUILDDIR>>/builddir/sql/mariadbd'
Output from gdb follows. The first stack trace is from the failing thread.
The following stack traces are from all threads (so the failing one is
duplicated).
--------------------------
[New LWP 2294694]
[New LWP 2294047]
[New LWP 2294053]
[New LWP 2294055]
[New LWP 2294051]
[New LWP 2294049]
[New LWP 2296516]
[New LWP 2294031]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1".
Core was generated by `/<<PKGBUILDDIR>>/builddir/sql/mariadbd --defaults'.
Program terminated with signal SIGUSR1, User defined signal 1.
#0  0xffff800101047fcc in pthread_kill () from /lib/sparc64-linux-gnu/libpthread.so.0
[Current thread is 1 (Thread 0xffff800107944870 (LWP 2294694))]
#0  0xffff800101047fcc in pthread_kill () from /lib/sparc64-linux-gnu/libpthread.so.0
#1  0x000001000098d944 in handle_fatal_signal (sig=<optimized out>) at ./sql/signal_handler.cc:345
Backtrace stopped: Cannot access memory at address 0xf0
 
Thread 8 (Thread 0xffff800100037780 (LWP 2294031)):
#0  0xffff800101834398 in poll () from /lib/sparc64-linux-gnu/libc.so.6
#1  0x00000100006b3bb4 in poll (__timeout=-1, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/sparc64-linux-gnu/bits/poll2.h:47
#2  handle_connections_sockets () at ./sql/mysqld.cc:6118
#3  0x00000100006b4c0c in mysqld_main (argc=<optimized out>, argv=<optimized out>) at ./sql/mysqld.cc:5823
#4  0xffff800101773cfc in __libc_start_main () from /lib/sparc64-linux-gnu/libc.so.6
#5  0x00000100006a79b0 in _start () at ./sql/sql_list.h:159
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
 
Thread 7 (Thread 0xffff8001063f2870 (LWP 2296516)):
#0  0xffff80010104d890 in __futex_abstimed_wait_common64 () from /lib/sparc64-linux-gnu/libpthread.so.0
#1  0xffff8001010468a0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/sparc64-linux-gnu/libpthread.so.0
#2  0x000001000067595c in psi_cond_timedwait (that=0x100017a85b0 <thread_cache>, mutex=0x100017a8620 <thread_cache+112>, abstime=0xffff8001063f1a80, file=0x10000f38388 "./sql/thread_cache.h", line=<optimized out>) at ./mysys/my_thr_init.c:609
#3  0x0000010000871724 in inline_mysql_cond_timedwait (src_file=0x10000f38388 "./sql/thread_cache.h", src_line=176, abstime=0xffff8001063f1a80, mutex=<optimized out>, that=<optimized out>) at ./include/mysql/psi/mysql_thread.h:1086
#4  Thread_cache::park (this=<optimized out>) at ./sql/thread_cache.h:176
#5  do_handle_one_connection (connect=<optimized out>, put_in_cache=<optimized out>) at ./sql/sql_connect.cc:1431
#6  0x0000010000871858 in handle_one_connection (arg=<optimized out>) at ./sql/sql_connect.cc:1312
#7  0x0000010000bb6690 in pfs_spawn_thread (arg=<optimized out>) at ./storage/perfschema/pfs.cc:2201
#8  0xffff80010103f494 in start_thread () from /lib/sparc64-linux-gnu/libpthread.so.0
#9  0xffff80010183eadc in ?? () from /lib/sparc64-linux-gnu/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
 
Thread 6 (Thread 0xffff80010780d870 (LWP 2294049)):
#0  0xffff80010104d890 in __futex_abstimed_wait_common64 () from /lib/sparc64-linux-gnu/libpthread.so.0
#1  0xffff8001010468a0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/sparc64-linux-gnu/libpthread.so.0
#2  0x000001000067595c in psi_cond_timedwait (that=0x10001fdb4f8 <COND_checkpoint>, mutex=0x10001fdb530 <LOCK_checkpoint>, abstime=0xffff80010780ca60, file=0x100010a1188 "./storage/maria/ma_servicethread.c", line=<optimized out>) at ./mysys/my_thr_init.c:609
#3  0x0000010000b541d4 in inline_mysql_cond_timedwait (src_file=0x100010a1188 "./storage/maria/ma_servicethread.c", src_line=115, abstime=0xffff80010780ca60, mutex=<optimized out>, that=<optimized out>) at ./include/mysql/psi/mysql_thread.h:1086
#4  my_service_thread_sleep (control=0x1000170ea98 <checkpoint_control>, sleep_time=<optimized out>) at ./storage/maria/ma_servicethread.c:115
#5  0x0000010000b4bcec in ma_checkpoint_background (arg=0x1e) at ./storage/maria/ma_checkpoint.c:725
#6  0x0000010000bb6690 in pfs_spawn_thread (arg=<optimized out>) at ./storage/perfschema/pfs.cc:2201
#7  0xffff80010103f494 in start_thread () from /lib/sparc64-linux-gnu/libpthread.so.0
#8  0xffff80010183eadc in ?? () from /lib/sparc64-linux-gnu/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
 
Thread 5 (Thread 0xffff80010785a870 (LWP 2294051)):
#0  0xffff80010178bc40 in sigtimedwait () from /lib/sparc64-linux-gnu/libc.so.6
#1  0x00000100006aa680 in my_sigwait (code=<synthetic pointer>, sig=0xffff800107859a28, set=0xffff800107859a30) at ./include/my_pthread.h:195
#2  signal_hand (arg=<optimized out>) at ./sql/mysqld.cc:3116
#3  0x0000010000bb6690 in pfs_spawn_thread (arg=<optimized out>) at ./storage/perfschema/pfs.cc:2201
#4  0xffff80010103f494 in start_thread () from /lib/sparc64-linux-gnu/libpthread.so.0
#5  0xffff80010183eadc in ?? () from /lib/sparc64-linux-gnu/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
 
Thread 4 (Thread 0xffff8001078f6870 (LWP 2294055)):
#0  0xffff80010104d890 in __futex_abstimed_wait_common64 () from /lib/sparc64-linux-gnu/libpthread.so.0
#1  0xffff8001010468a0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/sparc64-linux-gnu/libpthread.so.0
#2  0x000001000067595c in psi_cond_timedwait (that=0x100017a85b0 <thread_cache>, mutex=0x100017a8620 <thread_cache+112>, abstime=0xffff8001078f5a80, file=0x10000f38388 "./sql/thread_cache.h", line=<optimized out>) at ./mysys/my_thr_init.c:609
#3  0x0000010000871724 in inline_mysql_cond_timedwait (src_file=0x10000f38388 "./sql/thread_cache.h", src_line=176, abstime=0xffff8001078f5a80, mutex=<optimized out>, that=<optimized out>) at ./include/mysql/psi/mysql_thread.h:1086
#4  Thread_cache::park (this=<optimized out>) at ./sql/thread_cache.h:176
#5  do_handle_one_connection (connect=<optimized out>, put_in_cache=<optimized out>) at ./sql/sql_connect.cc:1431
#6  0x0000010000871858 in handle_one_connection (arg=<optimized out>) at ./sql/sql_connect.cc:1312
#7  0x0000010000bb6690 in pfs_spawn_thread (arg=<optimized out>) at ./storage/perfschema/pfs.cc:2201
#8  0xffff80010103f494 in start_thread () from /lib/sparc64-linux-gnu/libpthread.so.0
#9  0xffff80010183eadc in ?? () from /lib/sparc64-linux-gnu/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
 
Thread 3 (Thread 0xffff8001078a8870 (LWP 2294053)):
#0  0xffff80010104d890 in __futex_abstimed_wait_common64 () from /lib/sparc64-linux-gnu/libpthread.so.0
#1  0xffff80010104652c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/sparc64-linux-gnu/libpthread.so.0
#2  0x00000100006758e4 in psi_cond_wait (that=0x100017aa9b8 <COND_manager>, mutex=0x100017aa9f0 <LOCK_manager>, file=0x10000f4d390 "./sql/sql_manager.cc", line=<optimized out>) at ./mysys/my_thr_init.c:596
#3  0x00000100007689d4 in inline_mysql_cond_wait (that=<optimized out>, mutex=<optimized out>, src_file=<optimized out>, src_line=<optimized out>) at ./include/mysql/psi/mysql_thread.h:1070
#4  handle_manager (arg=<optimized out>) at ./sql/sql_manager.cc:103
#5  0x0000010000bb6690 in pfs_spawn_thread (arg=<optimized out>) at ./storage/perfschema/pfs.cc:2201
#6  0xffff80010103f494 in start_thread () from /lib/sparc64-linux-gnu/libpthread.so.0
#7  0xffff80010183eadc in ?? () from /lib/sparc64-linux-gnu/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
 
Thread 2 (Thread 0xffff80010008f870 (LWP 2294047)):
#0  0xffff80010104d890 in __futex_abstimed_wait_common64 () from /lib/sparc64-linux-gnu/libpthread.so.0
#1  0xffff8001010468a0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/sparc64-linux-gnu/libpthread.so.0
#2  0x000001000067595c in psi_cond_timedwait (that=0x10002079e40 <COND_timer>, mutex=0x10002079e78 <LOCK_timer>, abstime=0xffff80010008eb40, file=0x1000111c520 "./mysys/thr_timer.c", line=<optimized out>) at ./mysys/my_thr_init.c:609
#3  0x0000010000e9c740 in inline_mysql_cond_timedwait (that=0x10002079e40 <COND_timer>, mutex=0x10002079e78 <LOCK_timer>, src_file=0x1000111c520 "./mysys/thr_timer.c", src_line=321, abstime=0xffff80010008eb40) at ./include/mysql/psi/mysql_thread.h:1086
#4  timer_handler (arg=<optimized out>) at ./mysys/thr_timer.c:321
#5  0x0000010000bb6690 in pfs_spawn_thread (arg=<optimized out>) at ./storage/perfschema/pfs.cc:2201
#6  0xffff80010103f494 in start_thread () from /lib/sparc64-linux-gnu/libpthread.so.0
#7  0xffff80010183eadc in ?? () from /lib/sparc64-linux-gnu/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
 
Thread 1 (Thread 0xffff800107944870 (LWP 2294694)):
#0  0xffff800101047fcc in pthread_kill () from /lib/sparc64-linux-gnu/libpthread.so.0
#1  0x000001000098d944 in handle_fatal_signal (sig=<optimized out>) at ./sql/signal_handler.cc:345
Backtrace stopped: Cannot access memory at address 0xf0
 
...
 
Too many failed: Failed 18/997 tests, 98.19% were successful.
 
Failing test(s): main.partition_order main.func_str main.group_by main.group_by_null main.features main.repair_symlink-5543 main.type_datetime main.xml main.func_like main.func_math
 
The log files in var/log may give you some hint of what went wrong.
 
If you want to report this error, please read first the documentation
at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html
 
Errors/warnings were found in logfiles during server shutdown after running the
following sequence(s) of tests:
    main.lock main.partition_hash main.ignored_index main.ctype_uca_partitions main.partition_cache_myisam main.partition_default main.partition_mgm_err2 main.partition_myisam main.partition_mgm main.partition_column main.information_schema_part main.partition_list main.huge_frm-6224 main.partition_key_cache main.ctype_partitions main.long_unique main.explain_non_select main.explain_json_format_partitions main.partition_order main.partition_column_prune main.partition_charset main.partition_mrr_myisam main.partition_grant main.partition_csv main.assign_key_cache main.partition_error main.partition_mgm_err main.partition_bug18198 main.partition_mrr_aria main.auto_increment_ranges_myisam main.column_compression_parts
    main.partition_order
    main.func_regexp_pcre main.func_regexp main.func_sapdb main.func_op main.func_str main.func_set
    main.func_str
    main.grant_not_windows main.grant_slave_monitor main.grant_explain_non_select main.grant_master_admin main.grant_read_only main.greedy_optimizer main.grant_server main.group_by main.grant_cache_ps_prot main.grant_kill main.grant_repair main.grant_slave_admin main.grant_lowercase_fs
    main.group_by
    main.group_by_null
    main.group_by_null
    main.explain main.events_2 main.events_grant main.events_slowlog main.failed_auth_3909 main.explain_json main.events_embedded main.enforce_storage_engine main.features main.execution_constants main.except_all main.events_1 main.events_microsec main.except main.events_scheduling
    main.features
    main.subselect_extra main.subselect_extra_no_semijoin main.subselect_cache main.empty_user_table main.type_datetime main.subselect_exists2in_costmat main.subselect_exists2in main.subselect_gis main.subselect_mat main.subselect3_jcl6 main.empty_string_literal main.subselect3 main.empty_table main.empty_server_name-8224
    main.type_datetime
    main.win_orderby main.udf main.win_percent_cume main.xml
149 tests were skipped, 61 by the test itself.

Full log: https://buildd.debian.org/status/fetch.php?pkg=mariadb-10.6&arch=sparc64&ver=1%3A10.6.7-2%7Eexp1&stamp=1645787355&raw=0



 Comments   
Comment by Daniel Black [ 2022-02-27 ]

It looks like pthread_self is returning null and passing this to pthread_kill in my_write_core called from handle_fatal_signal. Nothing has changed here in a very long time.

Comment by Daniel Black [ 2022-02-28 ]

Build logs show test failure on gcc-11_11.2.0-16 but the -17 release notes show significant gcc fixes including "Default 32-bit mode to V8+ on sparc64 (Adrian Glaubitz). Closes: #1004659."

Given significance of these, please retest.

Comment by Daniel Black [ 2022-02-28 ]

Please update the ppc64/ppc64le build failure on MDEV-27936 too please, potentially same underlying cause.

Comment by Daniel Black [ 2022-03-02 ]

rebuild - https://buildd.debian.org/status/fetch.php?pkg=mariadb-10.6&arch=sparc64&ver=1%3A10.6.7-1&stamp=1646198426&raw=0
same error on gcc-11_11.2.0-17

Comment by Otto Kekäläinen [ 2022-03-13 ]

Latest log shows different kind of failure: https://buildd.debian.org/status/fetch.php?pkg=mariadb-10.6&arch=sparc64&ver=1%3A10.6.7-3&stamp=1647029566&raw=0

Filed as MDEV-28052

Comment by Otto Kekäläinen [ 2023-10-14 ]

In https://salsa.debian.org/mariadb-team/mariadb-server/-/commit/335a444d999c05954ffd7ae51c2cb9f4de5d5816I removed overrides for this Jira based on the suggestion above by danblack that the issue is probably fixed in latest GCC. Unfortunately that does not seem to be the case as the exact same test failures still exists (and I will need to add them to the skiplist back again).

Latest sparc64 build + MTR run in Debian: https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=sparc64&ver=1%3A10.11.5-2&stamp=1696889502&raw=0. If you need more build logs, all of them can be found at https://buildd.debian.org/status/logs.php?pkg=mariadb&arch=sparc64.

Comment by Daniel Black [ 2023-10-17 ]

Temeo's approach on UBSAN in https://github.com/codership/galera/issues/558#issuecomment-1764588270 seems like a productive thing to try against MariaDB.

Generated at Thu Feb 08 09:56:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.