[MDEV-22443] terminate called after throwing an instance of 'wsrep::runtime_error' in std::terminate on START TRANSACTION Created: 2020-05-04  Updated: 2023-12-20  Resolved: 2020-08-19

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.4, 10.5
Fix Version/s: 10.4.16, 10.5.7

Type: Bug Priority: Critical
Reporter: Roel Van de Paar Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: affects-tests, not-10.1, not-10.2, not-10.3, regression

Issue Links:
Relates
relates to MDEV-23092 SIGABRT in wsrep::server_state::provi... Closed
relates to MDEV-23466 SIGABRT in wsrep::server_state::provi... Closed

 Description   

SET @@SESSION.wsrep_causal_reads=ON;
SET SESSION wsrep_on=1;
START TRANSACTION READ WRITE;

Leads to:

10.5.3 98003440c2f8d20164a191ced1b7d92b283bb68f

2020-05-04  3:10:25 0 [Note] /test/MD300420-mariadb-10.5.3-linux-x86_64-opt/bin/mysqld: ready for connections.
Version: '10.5.3-MariaDB'  socket: '/test/MD300420-mariadb-10.5.3-linux-x86_64-opt/socket.sock'  port: 10419  MariaDB Server
terminate called after throwing an instance of 'wsrep::runtime_error'
  what():  provider not loaded
200504  3:10:54 [ERROR] mysqld got signal 6 ;

10.5.3 98003440c2f8d20164a191ced1b7d92b283bb68f

Core was generated by `/test/MD300420-mariadb-10.5.3-linux-x86_64-opt/bin/mysqld --no-defaults --core-'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
[Current thread is 1 (Thread 0x7fcccb22d700 (LWP 3156879))]
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
#1  0x000055888b4ec707 in my_write_core (sig=sig@entry=6) at /test/10.5_opt/mysys/stacktrace.c:518
#2  0x000055888aeae92a in handle_fatal_signal (sig=6) at /test/10.5_opt/sql/signal_handler.cc:329
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#5  0x00007fccc9971801 in __GI_abort () at abort.c:79
#6  0x00007fccca14c957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007fccca152ae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007fccca152b21 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007fccca152d54 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x000055888b61db2a in wsrep::server_state::provider (this=<optimized out>) at /test/10.5_opt/wsrep-lib/include/wsrep/server_state.hpp:310
#11 wsrep::server_state::causal_read (this=<optimized out>, timeout=timeout@entry=-1) at /test/10.5_opt/wsrep-lib/src/server_state.cpp:838
#12 0x000055888b614a0a in wsrep::client_state::sync_wait (this=this@entry=0x7fcc91c18120, timeout=timeout@entry=-1) at /test/10.5_opt/wsrep-lib/src/client_state.cpp:716
#13 0x000055888b18f93d in wsrep_sync_wait (thd=thd@entry=0x7fcc91c12018, mask=mask@entry=1) at /test/10.5_opt/sql/wsrep_mysqld.cc:1150
#14 0x000055888adb6605 in trans_begin (thd=thd@entry=0x7fcc91c12018, flags=4) at /test/10.5_opt/sql/transaction.cc:187
#15 0x000055888acba165 in mysql_execute_command (thd=thd@entry=0x7fcc91c12018) at /test/10.5_opt/sql/sql_parse.cc:5525
#16 0x000055888acbfecc in mysql_parse (thd=thd@entry=0x7fcc91c12018, rawbuf=<optimized out>, length=28, parser_state=parser_state@entry=0x7fcccb22c4d0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.5_opt/sql/sql_parse.cc:7957
#17 0x000055888acb4d00 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fcc91c12018, packet=packet@entry=0x7fcc91c3a019 "START TRANSACTION READ WRITE", packet_length=packet_length@entry=28, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.5_opt/sql/sql_parse.cc:1839
#18 0x000055888acb301f in do_command (thd=0x7fcc91c12018) at /test/10.5_opt/sql/sql_parse.cc:1358
#19 0x000055888ada822e in do_handle_one_connection (connect=<optimized out>, connect@entry=0x7fccc8c329b8, put_in_cache=put_in_cache@entry=true) at /test/10.5_opt/sql/sql_connect.cc:1422
#20 0x000055888ada83d4 in handle_one_connection (arg=arg@entry=0x7fccc8c329b8) at /test/10.5_opt/sql/sql_connect.cc:1319
#21 0x000055888b114c9a in pfs_spawn_thread (arg=0x7fccc8c4b018) at /test/10.5_opt/storage/perfschema/pfs.cc:2201
#22 0x00007fccca6546db in start_thread (arg=0x7fcccb22d700) at pthread_create.c:463
#23 0x00007fccc9a5288f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Bug confirmed present in:
MariaDB: 10.4.13 (dbg), 10.4.13 (opt), 10.5.2 (dbg), 10.5.2 (opt), 10.5.3 (dbg), 10.5.3 (opt)

Bug confirmed not present in:
MariaDB: 10.1.45 (dbg), 10.1.45 (opt), 10.2.32 (dbg), 10.2.32 (opt), 10.3.23 (dbg), 10.3.23 (opt)
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.47 (dbg), 5.6.47 (opt), 5.7.29 (dbg), 5.7.29 (opt), 8.0.19 (dbg), 8.0.19 (opt)

MySQL 10.3.23 (dbg):

10.3.23>SET @@SESSION.wsrep_causal_reads=ON;
Query OK, 0 rows affected, 1 warning (0.000 sec)
 
10.3.23>SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1287
Message: '@@wsrep_causal_reads' is deprecated and will be removed in a future release. Please use '@@wsrep_sync_wait=1' instead
1 row in set (0.000 sec)
 
10.3.23>SET SESSION wsrep_on=1;
Query OK, 0 rows affected (0.000 sec)
 
10.3.23>START TRANSACTION READ WRITE;
Query OK, 0 rows affected (0.000 sec)



 Comments   
Comment by Roel Van de Paar [ 2020-07-14 ]

More bugs in this area. This testcase (without [pre-]loading plugin):

SET NAMES utf8, collation_connection='utf16le_bin';
SET GLOBAL wsrep_provider='/invalid/libgalera_smm.so';
SET GLOBAL wsrep_cluster_address='OFF';
SET GLOBAL wsrep_slave_threads=10;

Results in no core dump, and the error log (optimized) shows:

10.5.5 30e7a0a866dce530d8328c6d614e48d39a264f9b (Optimized)

2020-07-14 19:39:29 0 [Note] /test/MD140720-mariadb-10.5.5-linux-x86_64-opt/bin/mysqld: ready for connections.
Version: '10.5.5-MariaDB'  socket: '/test/MD140720-mariadb-10.5.5-linux-x86_64-opt/socket.sock'  port: 12530  MariaDB Server
2020-07-14 19:42:38 4 [Note] WSREP: Stop replication by 4
wsrep loader: [INFO] wsrep_load(): loading provider library '/'
wsrep loader: [ERROR] wsrep_load(): dlopen(): /: cannot read file data: Is a directory
2020-07-14 19:42:38 4 [ERROR] WSREP: Failed to load provider
2020-07-14 19:42:38 4 [Note] WSREP: WSREP (galera) provider is not loaded, can't re(start) replication.
2020-07-14 19:42:38 11 [Note] WSREP: Starting applier thread 11
2020-07-14 19:42:38 14 [Note] WSREP: Starting applier thread 14
2020-07-14 19:42:38 5 [Note] WSREP: Starting applier thread 5
2020-07-14 19:42:38 13 [Note] WSREP: Starting applier thread 13
2020-07-14 19:42:38 10 [Note] WSREP: Starting applier thread 10
2020-07-14 19:42:38 8 [Note] WSREP: Starting applier thread 8
2020-07-14 19:42:38 9 [Note] WSREP: Starting applier thread 9
2020-07-14 19:42:38 6 [Note] WSREP: Starting applier thread 6
2020-07-14 19:42:38 7 [Note] WSREP: Starting applier thread 7
2020-07-14 19:42:38 12 [Note] WSREP: Starting applier thread 12
terminate called after throwing an instance of 'wsrep::runtime_error'
terminate called recursively
200714 19:42:38 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.5.5-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=12
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467802 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x14b8b3012018
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
terminate called recursively
terminate called recursively
terminate called recursively
# END OF LOG

And on debug:

10.5.5 30e7a0a866dce530d8328c6d614e48d39a264f9b (Debug)

2020-07-14 19:39:27 0 [Note] /test/MD140720-mariadb-10.5.5-linux-x86_64-dbg/bin/mysqld: ready for connections.
Version: '10.5.5-MariaDB-debug'  socket: '/test/MD140720-mariadb-10.5.5-linux-x86_64-dbg/socket.sock'  port: 17206  MariaDB Server
2020-07-14 19:42:36 4 [Note] WSREP: Stop replication by 4
wsrep loader: [INFO] wsrep_load(): loading provider library '/'
wsrep loader: [ERROR] wsrep_load(): dlopen(): /: cannot read file data: Is a directory
2020-07-14 19:42:36 4 [ERROR] WSREP: Failed to load provider
2020-07-14 19:42:36 4 [Note] WSREP: WSREP (galera) provider is not loaded, can't re(start) replication.
2020-07-14 19:42:36 9 [Note] WSREP: Starting applier thread 9
2020-07-14 19:42:36 7 [Note] WSREP: Starting applier thread 7
terminate called after throwing an instance of 'wsrep::runtime_error'
terminate called recursively
2020-07-14 19:42:36 5 [Note] WSREP: Starting applier thread 5
200714 19:42:36 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.5.5-MariaDB-debug
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=12
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467906 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x154826837088
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
terminate called recursively
2020-07-14 19:42:36 6 [Note] WSREP: Starting applier thread 6
# END OF LOG

Comment by Roel Van de Paar [ 2020-07-14 ]

This crash affects testing significantly as no core dump is written (resulting in re-reducing test cases each run)

Comment by Roel Van de Paar [ 2020-07-18 ]

Looks like I previously logged MDEV-23092 for the last bug above ^ without updating filters. It can likely be marked as duplicate of this one, assuming the issues are the same.

Comment by Roel Van de Paar [ 2020-08-12 ]

Another testcase...

SET SESSION wsrep_on = 1;
SET SESSION wsrep_sync_wait=7;
START TRANSACTION READ ONLY;

Leads to:

10.4.15 eae968f62d285de97ed607c87bc131cd863d5d03 (Optimized)

Core was generated by `/test/MD110820-mariadb-10.4.15-linux-x86_64-opt/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
[Current thread is 1 (Thread 0x1515bfc8e700 (LWP 1827289))]
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
#1  0x000055d5ced87a77 in my_write_core (sig=sig@entry=6) at /test/10.4_opt/mysys/stacktrace.c:482
#2  0x000055d5ce75f62a in handle_fatal_signal (sig=6) at /test/10.4_opt/sql/signal_handler.cc:343
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#5  0x00001515bdf258b1 in __GI_abort () at abort.c:79
#6  0x00001515be700957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00001515be706ae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00001515be706b21 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00001515be706d54 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x000055d5cee2860a in wsrep::server_state::provider (this=<optimized out>) at /test/10.4_opt/wsrep-lib/include/wsrep/server_state.hpp:310
#11 wsrep::server_state::causal_read (this=<optimized out>, timeout=timeout@entry=-1) at /test/10.4_opt/wsrep-lib/src/server_state.cpp:850
#12 0x000055d5cee20a6a in wsrep::client_state::sync_wait (this=this@entry=0x151598817f68, timeout=timeout@entry=-1) at /test/10.4_opt/wsrep-lib/src/client_state.cpp:714
#13 0x000055d5ce6c26bd in wsrep_sync_wait (thd=thd@entry=0x151598812008, mask=mask@entry=1) at /test/10.4_opt/sql/wsrep_mysqld.cc:1118
#14 0x000055d5ce644d8d in trans_begin (thd=thd@entry=0x151598812008, flags=2) at /test/10.4_opt/sql/transaction.cc:183
#15 0x000055d5ce5507bf in mysql_execute_command (thd=thd@entry=0x151598812008) at /test/10.4_opt/sql/sql_parse.cc:5597
#16 0x000055d5ce55622a in mysql_parse (thd=0x151598812008, rawbuf=<optimized out>, length=27, parser_state=0x1515bfc8d4d0, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /test/10.4_opt/sql/sql_parse.cc:7896
#17 0x000055d5ce558685 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x151598812008, packet=packet@entry=0x151598832009 "START TRANSACTION READ ONLY", packet_length=packet_length@entry=27, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.4_opt/sql/sql_parse.cc:1834
#18 0x000055d5ce559e04 in do_command (thd=0x151598812008) at /test/10.4_opt/sql/sql_parse.cc:1352
#19 0x000055d5ce636dbe in do_handle_one_connection (connect=connect@entry=0x1515bc832748) at /test/10.4_opt/sql/sql_connect.cc:1412
#20 0x000055d5ce636e7d in handle_one_connection (arg=0x1515bc832748) at /test/10.4_opt/sql/sql_connect.cc:1316
#21 0x00001515bee8c6db in start_thread (arg=0x1515bfc8e700) at pthread_create.c:463
#22 0x00001515be006a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment by Roel Van de Paar [ 2020-08-12 ]

jplindst If possible, a fix for this bug would be appreciated. Thank you!

Comment by Roel Van de Paar [ 2020-08-13 ]

jplindst Thank you
sciascid Please include the just-logged MDEV-23466 in the fix if possible, thank you

Generated at Thu Feb 08 09:14:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.