[MDEV-14875] galera_new_cluster crashes mysqld when existing server contains databases - Jira

Marko Mäkelä added a comment - 2018-01-05 16:50

The SIGSEGV occurs because the null pointer wsrep is being dereferenced in the member function call:

      if (WSREP_PROVIDER_EXISTS) {

        if (wsrep->post_rollback(wsrep, &thd->wsrep_ws_handle))

          DBUG_PRINT("wsrep", ("setting rollback fail"));

          WSREP_ERROR("settting rollback fail: thd: %llu, schema: %s, SQL: %s",

                      (long long)thd->real_id, (thd->db ? thd->db : "(null)"),

                      thd->query());

Why is the function wsrep_commit() being called at all if wsrep==NULL?

Marko Mäkelä added a comment - 2018-01-05 16:50 The SIGSEGV occurs because the null pointer wsrep is being dereferenced in the member function call: if (WSREP_PROVIDER_EXISTS) { if (wsrep->post_rollback(wsrep, &thd->wsrep_ws_handle)) { DBUG_PRINT( "wsrep" , ( "setting rollback fail" )); WSREP_ERROR( "settting rollback fail: thd: %llu, schema: %s, SQL: %s" , ( long long )thd->real_id, (thd->db ? thd->db : "(null)" ), thd->query()); } } Why is the function wsrep_commit() being called at all if wsrep==NULL ?

Jan Lindström (Inactive) added a comment - 2018-01-08 12:35

Better question is why wsrep==NULL when WSREP_PROVIDER_EXISTS is true

Jan Lindström (Inactive) added a comment - 2018-01-08 12:35 Better question is why wsrep==NULL when WSREP_PROVIDER_EXISTS is true

Jan Lindström (Inactive) added a comment - 2018-01-08 12:38

Could you give your error log ?

Jan Lindström (Inactive) added a comment - 2018-01-08 12:38 Could you give your error log ?

Juergen Gnoss added a comment - 2018-01-08 13:11

Jan 8 08:03:40 tam05 systemd[1]: Starting MariaDB 10.2.12 database server...
Jan 8 08:03:44 tam05 mysqld[1287]: WSREP: Failed to start mysqld for wsrep recovery: '2018-01-08 8:03:41 0 [Note] /usr/sbin/mysqld (mysqld 10.3.4-MariaDB-debug-log) starting as process 1488 ...
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: For Galera, using innodb_lock_schedule_algorithm=fcfs
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Uses event mutexes
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Using Linux native AIO
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Number of pools: 1
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Using SSE2 crc32 instructions
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Initializing buffer pool, total size = 512M, instances = 1, chunk size = 128M
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Completed initialization of buffer pool
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:42 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:42 0 [Note] InnoDB: The log sequence number 1018153280213 in the system tablespace does not match the log sequence number 1018153280524 in the ib_logfiles!
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:42 0 [Note] InnoDB: Last binlog file 'mysql-bin.000002', position 402911975
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: Creating shared tablespace for temporary tables
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: Waiting for purge to start
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: 5.7.20 started; log sequence number 1018153280524
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] Plugin 'FEEDBACK' is disabled.
Jan 8 08:03:44 tam05 mysqld[1287]: 180108 8:03:43 [ERROR] mysqld got signal 11 ;
Jan 8 08:03:44 tam05 mysqld[1287]: This could be because you hit a bug. It is also possible that this binary
Jan 8 08:03:44 tam05 mysqld[1287]: or one of the libraries it was linked against is corrupt, improperly built,
Jan 8 08:03:44 tam05 mysqld[1287]: or misconfigured. This error can also be caused by malfunctioning hardware.
Jan 8 08:03:44 tam05 mysqld[1287]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Jan 8 08:03:44 tam05 mysqld[1287]: We will try our best to scrape up some info that will hopefully help
Jan 8 08:03:44 tam05 mysqld[1287]: diagnose the problem, but since we have already crashed,
Jan 8 08:03:44 tam05 mysqld[1287]: something is definitely wrong and this may fail.
Jan 8 08:03:44 tam05 mysqld[1287]: Server version: 10.3.4-MariaDB-debug-log
Jan 8 08:03:44 tam05 mysqld[1287]: key_buffer_size=134217728
Jan 8 08:03:44 tam05 mysqld[1287]: read_buffer_size=2097152
Jan 8 08:03:44 tam05 mysqld[1287]: max_used_connections=0
Jan 8 08:03:44 tam05 mysqld[1287]: max_threads=702
Jan 8 08:03:44 tam05 mysqld[1287]: thread_count=6
Jan 8 08:03:44 tam05 mysqld[1287]: It is possible that mysqld could use up to
Jan 8 08:03:44 tam05 mysqld[1287]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4459921 K bytes of memory
Jan 8 08:03:44 tam05 mysqld[1287]: Hope that's ok; if not, decrease some variables in the equation.
Jan 8 08:03:44 tam05 mysqld[1287]: Thread pointer: 0x55a486fb3470
Jan 8 08:03:44 tam05 mysqld[1287]: Attempting backtrace. You can use the following information to find out
Jan 8 08:03:44 tam05 mysqld[1287]: where mysqld died. If you see no messages after this, something went
Jan 8 08:03:44 tam05 mysqld[1287]: terribly wrong...
Jan 8 08:03:44 tam05 mysqld[1287]: stack_bottom = 0x7ffe02b9ab90 thread_stack 0x49000
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(my_print_stacktrace+0x3d)[0x55a481de86ca]
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(handle_fatal_signal+0x3bb)[0x55a481636f2a]
Jan 8 08:03:44 tam05 mysqld[1287]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7f93adb3c0c0]
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z12wsrep_commitP10handlertonP3THDb+0x202)[0x55a481567621]
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(+0x90ecf0)[0x55a48163acf0]
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z19ha_commit_one_phaseP3THDb+0xfe)[0x55a48163abec]
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x53a)[0x55a48163a438]
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z17trans_commit_stmtP3THD+0xc3)[0x55a4814d8c66]
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z18close_mysql_tablesP3THD+0x26)[0x55a48130d437]
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(+0x66b946)[0x55a481397946]
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z11plugin_initPiPPci+0x966)[0x55a4813971fa]
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(+0x568cc3)[0x55a481294cc3]
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x5c4)[0x55a481295d08]
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(main+0x20)[0x55a48128b1c0]
Jan 8 08:03:44 tam05 systemd[1]: mariadb.service: Control process exited, code=exited status=1
Jan 8 08:03:44 tam05 mysqld[1287]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f93ac69b2b1]
Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_start+0x2a)[0x55a48128b09a]
Jan 8 08:03:44 tam05 mysqld[1287]: Trying to get some variables.
Jan 8 08:03:44 tam05 mysqld[1287]: Some pointers may be invalid and cause the dump to abort.
Jan 8 08:03:44 tam05 mysqld[1287]: Query (0x0):
Jan 8 08:03:44 tam05 mysqld[1287]: Connection ID (thread ID): 0
Jan 8 08:03:44 tam05 mysqld[1287]: Status: NOT_KILLED
Jan 8 08:03:44 tam05 mysqld[1287]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on
Jan 8 08:03:44 tam05 mysqld[1287]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Jan 8 08:03:44 tam05 mysqld[1287]: information that should help you find out what is causing the crash.
Jan 8 08:03:44 tam05 mysqld[1287]: Writing a core file at /var/lib/mysql/
Jan 8 08:03:44 tam05 mysqld[1287]: Segmentation fault (core dumped)'
Jan 8 08:03:44 tam05 systemd[1]: Failed to start MariaDB 10.2.12 database server.
Jan 8 08:03:44 tam05 systemd[1]: mariadb.service: Unit entered failed state.
Jan 8 08:03:44 tam05 systemd[1]: mariadb.service: Failed with result 'exit-code'.

Juergen Gnoss added a comment - 2018-01-08 13:11 Jan 8 08:03:40 tam05 systemd [1] : Starting MariaDB 10.2.12 database server... Jan 8 08:03:44 tam05 mysqld [1287] : WSREP: Failed to start mysqld for wsrep recovery: '2018-01-08 8:03:41 0 [Note] /usr/sbin/mysqld (mysqld 10.3.4-MariaDB-debug-log) starting as process 1488 ... Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: For Galera, using innodb_lock_schedule_algorithm=fcfs Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!! Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Uses event mutexes Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Compressed tables use zlib 1.2.8 Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Using Linux native AIO Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Number of pools: 1 Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Using SSE2 crc32 instructions Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Initializing buffer pool, total size = 512M, instances = 1, chunk size = 128M Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Completed initialization of buffer pool Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:42 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:42 0 [Note] InnoDB: The log sequence number 1018153280213 in the system tablespace does not match the log sequence number 1018153280524 in the ib_logfiles! Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:42 0 [Note] InnoDB: Last binlog file 'mysql-bin.000002', position 402911975 Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: 128 out of 128 rollback segments are active. Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: Creating shared tablespace for temporary tables Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: Waiting for purge to start Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: 5.7.20 started; log sequence number 1018153280524 Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery. Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] Plugin 'FEEDBACK' is disabled. Jan 8 08:03:44 tam05 mysqld [1287] : 180108 8:03:43 [ERROR] mysqld got signal 11 ; Jan 8 08:03:44 tam05 mysqld [1287] : This could be because you hit a bug. It is also possible that this binary Jan 8 08:03:44 tam05 mysqld [1287] : or one of the libraries it was linked against is corrupt, improperly built, Jan 8 08:03:44 tam05 mysqld [1287] : or misconfigured. This error can also be caused by malfunctioning hardware. Jan 8 08:03:44 tam05 mysqld [1287] : To report this bug, see https://mariadb.com/kb/en/reporting-bugs Jan 8 08:03:44 tam05 mysqld [1287] : We will try our best to scrape up some info that will hopefully help Jan 8 08:03:44 tam05 mysqld [1287] : diagnose the problem, but since we have already crashed, Jan 8 08:03:44 tam05 mysqld [1287] : something is definitely wrong and this may fail. Jan 8 08:03:44 tam05 mysqld [1287] : Server version: 10.3.4-MariaDB-debug-log Jan 8 08:03:44 tam05 mysqld [1287] : key_buffer_size=134217728 Jan 8 08:03:44 tam05 mysqld [1287] : read_buffer_size=2097152 Jan 8 08:03:44 tam05 mysqld [1287] : max_used_connections=0 Jan 8 08:03:44 tam05 mysqld [1287] : max_threads=702 Jan 8 08:03:44 tam05 mysqld [1287] : thread_count=6 Jan 8 08:03:44 tam05 mysqld [1287] : It is possible that mysqld could use up to Jan 8 08:03:44 tam05 mysqld [1287] : key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4459921 K bytes of memory Jan 8 08:03:44 tam05 mysqld [1287] : Hope that's ok; if not, decrease some variables in the equation. Jan 8 08:03:44 tam05 mysqld [1287] : Thread pointer: 0x55a486fb3470 Jan 8 08:03:44 tam05 mysqld [1287] : Attempting backtrace. You can use the following information to find out Jan 8 08:03:44 tam05 mysqld [1287] : where mysqld died. If you see no messages after this, something went Jan 8 08:03:44 tam05 mysqld [1287] : terribly wrong... Jan 8 08:03:44 tam05 mysqld [1287] : stack_bottom = 0x7ffe02b9ab90 thread_stack 0x49000 Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(my_print_stacktrace+0x3d) [0x55a481de86ca] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(handle_fatal_signal+0x3bb) [0x55a481636f2a] Jan 8 08:03:44 tam05 mysqld [1287] : /lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0) [0x7f93adb3c0c0] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z12wsrep_commitP10handlertonP3THDb+0x202) [0x55a481567621] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(+0x90ecf0) [0x55a48163acf0] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z19ha_commit_one_phaseP3THDb+0xfe) [0x55a48163abec] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x53a) [0x55a48163a438] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z17trans_commit_stmtP3THD+0xc3) [0x55a4814d8c66] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z18close_mysql_tablesP3THD+0x26) [0x55a48130d437] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(+0x66b946) [0x55a481397946] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z11plugin_initPiPPci+0x966) [0x55a4813971fa] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(+0x568cc3) [0x55a481294cc3] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x5c4) [0x55a481295d08] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(main+0x20) [0x55a48128b1c0] Jan 8 08:03:44 tam05 systemd [1] : mariadb.service: Control process exited, code=exited status=1 Jan 8 08:03:44 tam05 mysqld [1287] : /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f93ac69b2b1] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_start+0x2a) [0x55a48128b09a] Jan 8 08:03:44 tam05 mysqld [1287] : Trying to get some variables. Jan 8 08:03:44 tam05 mysqld [1287] : Some pointers may be invalid and cause the dump to abort. Jan 8 08:03:44 tam05 mysqld [1287] : Query (0x0): Jan 8 08:03:44 tam05 mysqld [1287] : Connection ID (thread ID): 0 Jan 8 08:03:44 tam05 mysqld [1287] : Status: NOT_KILLED Jan 8 08:03:44 tam05 mysqld [1287] : Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on Jan 8 08:03:44 tam05 mysqld [1287] : The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains Jan 8 08:03:44 tam05 mysqld [1287] : information that should help you find out what is causing the crash. Jan 8 08:03:44 tam05 mysqld [1287] : Writing a core file at /var/lib/mysql/ Jan 8 08:03:44 tam05 mysqld [1287] : Segmentation fault (core dumped)' Jan 8 08:03:44 tam05 systemd [1] : Failed to start MariaDB 10.2.12 database server. Jan 8 08:03:44 tam05 systemd [1] : mariadb.service: Unit entered failed state. Jan 8 08:03:44 tam05 systemd [1] : mariadb.service: Failed with result 'exit-code'.

Juergen Gnoss added a comment - 2018-01-08 13:25

If it helps, that happens if I use
>galera_new_cluster
to bootstrap the first node.

If I bootstrap the first node with
>mysqld -umysql --wsrep-new-cluster

The first node comes up, other nodes can connect and sync.
But I cannot afterwards shut them down and restart.

Juergen Gnoss added a comment - 2018-01-08 13:25 If it helps, that happens if I use >galera_new_cluster to bootstrap the first node. If I bootstrap the first node with >mysqld -umysql --wsrep-new-cluster The first node comes up, other nodes can connect and sync. But I cannot afterwards shut them down and restart.

Marko Mäkelä added a comment - 2018-01-08 17:00

The macro WSREP_PROVIDER_EXISTS compares a string, not the wsrep pointer:

(wsrep_provider && strncasecmp(wsrep_provider, WSREP_NONE, FN_REFLEN))

This string is not being updated atomically with wsrep. One example where this might fail is in the error handling of the function wsrep_provider_init().
Wouldn’t it be more reliable to just test wsrep instead of wsrep_provider?

Is either variable protected by any mutex? It does not seem so to me. What would serialize multiple concurrent executions of wsrep_provider_update(), for example?

Marko Mäkelä added a comment - 2018-01-08 17:00 The macro WSREP_PROVIDER_EXISTS compares a string, not the wsrep pointer: (wsrep_provider && strncasecmp(wsrep_provider, WSREP_NONE, FN_REFLEN)) This string is not being updated atomically with wsrep . One example where this might fail is in the error handling of the function wsrep_provider_init() . Wouldn’t it be more reliable to just test wsrep instead of wsrep_provider ? Is either variable protected by any mutex? It does not seem so to me. What would serialize multiple concurrent executions of wsrep_provider_update() , for example?

Marko Mäkelä added a comment - 2018-01-08 19:58

In the longer mysqld.trace most references to wsrep are from thread T@1, which is the initial thread. The trace is similar to the earlier error log that leads to a crash.
Unfortunately most of the WSREP code is lacking DBUG_ENTER and DBUG_RETURN instrumentation. Because of this, it is hard to tell which subsystems init_server_components() actually initialized after wsrep_thr_init(). Some of the WSREP initialization is conditional.

jgnoss, one thing that you should be able to do in your setup is that you add more instrumentation to the server startup. Something like this, possibly also in other places:

diff --git a/sql/wsrep_mysqld.cc b/sql/wsrep_mysqld.cc

index 173e955e44b..cd6a3497999 100644

--- a/sql/wsrep_mysqld.cc

+++ b/sql/wsrep_mysqld.cc

@@ -796,6 +796,11 @@ void wsrep_thr_init()

 void wsrep_init_startup (bool first)

+  sql_print_information("wsrep_init_startup (%d, %s, %s, %s)",

+                        first, wsrep_provider, wsrep_sst_method,

+                        wsrep_cluster_address

+                        ? wsrep_cluster_address : "(none)");

   if (wsrep_init()) unireg_abort(1);

   wsrep_thr_lock_init(

@@ -829,6 +834,7 @@ void wsrep_init_startup (bool first)

   wsrep_create_appliers(1);

   if (first && !wsrep_sst_wait()) unireg_abort(1);// wait until SST is completed

+  sql_print_information("wsrep_init_startup completed");

These messages should be displayed in the server error log with a Note prefix. If these messages do not appear in the log, then this function is for some reason not being called at all, and you should instead add some output to the two callers in the file sql/mysqld.cc.

Marko Mäkelä added a comment - 2018-01-08 19:58 In the longer mysqld.trace most references to wsrep are from thread T@1 , which is the initial thread. The trace is similar to the earlier error log that leads to a crash. Unfortunately most of the WSREP code is lacking DBUG_ENTER and DBUG_RETURN instrumentation. Because of this, it is hard to tell which subsystems init_server_components() actually initialized after wsrep_thr_init() . Some of the WSREP initialization is conditional. jgnoss , one thing that you should be able to do in your setup is that you add more instrumentation to the server startup. Something like this, possibly also in other places: diff --git a/sql/wsrep_mysqld.cc b/sql/wsrep_mysqld.cc index 173e955e44b..cd6a3497999 100644 --- a/sql/wsrep_mysqld.cc +++ b/sql/wsrep_mysqld.cc @@ -796,6 +796,11 @@ void wsrep_thr_init() void wsrep_init_startup (bool first) { + sql_print_information("wsrep_init_startup (%d, %s, %s, %s)", + first, wsrep_provider, wsrep_sst_method, + wsrep_cluster_address + ? wsrep_cluster_address : "(none)"); + if (wsrep_init()) unireg_abort(1); wsrep_thr_lock_init( @@ -829,6 +834,7 @@ void wsrep_init_startup (bool first) wsrep_create_appliers(1); if (first && !wsrep_sst_wait()) unireg_abort(1);// wait until SST is completed + sql_print_information("wsrep_init_startup completed"); } These messages should be displayed in the server error log with a Note prefix. If these messages do not appear in the log, then this function is for some reason not being called at all, and you should instead add some output to the two callers in the file sql/mysqld.cc.

Jan Lindström (Inactive) added a comment - 2018-01-09 06:10

One way to get more information is to set wsrep_debug=ON; Could you do that?

Jan Lindström (Inactive) added a comment - 2018-01-09 06:10 One way to get more information is to set wsrep_debug=ON; Could you do that?

Juergen Gnoss added a comment - 2018-01-09 11:39

The trace I added right now is with

wsrep_debug=ON
in my.cnf

What else I did:
on IRC we talked about a database that isn't accessible inside that server.
I used the original mysqld and started that server in standalone
then I dropped that database.
changed config to include galera stuff
run
galera_new_cluster

because I lost trace of all changes I made in my.cnf
I'll attach the last version I used

Juergen Gnoss added a comment - 2018-01-09 11:39 The trace I added right now is with wsrep_debug=ON in my.cnf What else I did: on IRC we talked about a database that isn't accessible inside that server. I used the original mysqld and started that server in standalone then I dropped that database. changed config to include galera stuff run galera_new_cluster because I lost trace of all changes I made in my.cnf I'll attach the last version I used

Marko Mäkelä added a comment - 2018-01-09 12:38 - edited

jgnoss, you mentioned that you would upload the first 128 KiB of the system tablespace, but I cannot find that file either attached to this issue or on ftp.mariadb.com/uploads. (I suspect that Galera is looking for something in the first pages, and will not find it there. I’d want to transplant the content from those pages to an empty database, hoping to reproduce the problem.)

It seems to me that the output that is generated by wsrep_debug=ON would land in the main server error log, not in the DBUG trace log. I did not notice any additional WSREP messages compared to the previous mysqld.trace file.

Can you find out which arguments the wrapper script is passing to mysqld? To do that, I would suggest the following:

sudo su -

mv /usr/sbin/mysqld /usr/sbin/mysqld-real

cat > /usr/sbin/mysqld << EOF

#!/bin/sh

echo mysqld "$@" >> /tmp/mysqld-real-cmd.txt

exec /usr/sbin/mysqld-real "$@"

EOF

chmod +x /usr/sbin/mysqld

This wrapper script should append a line with the arguments to the file /tmp/mysqld-real-cmd.txt each time the server is invoked, and then invoke the server. Aimed with this information, it should hopefully be possible to start the server with similar arguments in gdb. (And maybe the problem is somehow related to the arguments that are being passed.)

Marko Mäkelä added a comment - 2018-01-09 12:38 - edited jgnoss , you mentioned that you would upload the first 128 KiB of the system tablespace, but I cannot find that file either attached to this issue or on ftp.mariadb.com/uploads. (I suspect that Galera is looking for something in the first pages, and will not find it there. I’d want to transplant the content from those pages to an empty database, hoping to reproduce the problem.) It seems to me that the output that is generated by wsrep_debug=ON would land in the main server error log, not in the DBUG trace log. I did not notice any additional WSREP messages compared to the previous mysqld.trace file. Can you find out which arguments the wrapper script is passing to mysqld? To do that, I would suggest the following: sudo su - mv /usr/sbin/mysqld /usr/sbin/mysqld-real cat > /usr/sbin/mysqld << EOF #!/bin/sh echo mysqld "$@" >> /tmp/mysqld-real-cmd.txt exec /usr/sbin/mysqld-real "$@" EOF chmod +x /usr/sbin/mysqld This wrapper script should append a line with the arguments to the file /tmp/mysqld-real-cmd.txt each time the server is invoked, and then invoke the server. Aimed with this information, it should hopefully be possible to start the server with similar arguments in gdb. (And maybe the problem is somehow related to the arguments that are being passed.)

Jan Lindström (Inactive) added a comment - 2018-01-30 10:20

I could not repeat the assertion. I started standalone systemd mariadb server, created few databases and InnoDB tables, stopped mariadb, run galera_new_cluster after configuration change. All started fine for me.

Jan Lindström (Inactive) added a comment - 2018-01-30 10:20 I could not repeat the assertion. I started standalone systemd mariadb server, created few databases and InnoDB tables, stopped mariadb, run galera_new_cluster after configuration change. All started fine for me.

Juergen Gnoss added a comment - 2018-01-30 12:28

Hi Jan,
same here, if I start the galera cluster from a fresh db with a few db's in it, it works fine.
Problem occurs if I start the galera on an existing db, I have.
The db I try is a 260G+ standalone db, upgraded from mysql to mariadb.
Now I try to initiate the first node on that db and that is, where it crashes.

Juergen Gnoss added a comment - 2018-01-30 12:28 Hi Jan, same here, if I start the galera cluster from a fresh db with a few db's in it, it works fine. Problem occurs if I start the galera on an existing db, I have. The db I try is a 260G+ standalone db, upgraded from mysql to mariadb. Now I try to initiate the first node on that db and that is, where it crashes.

Jan Lindström (Inactive) added a comment - 2018-01-30 13:33

What version of mysql db you have ? Did it contain also galera or tc.log ?

Jan Lindström (Inactive) added a comment - 2018-01-30 13:33 What version of mysql db you have ? Did it contain also galera or tc.log ?

Juergen Gnoss added a comment - 2018-01-30 13:45

mysql was 5.7 when I upgraded to mariadb 10.2.12.

The server in question actually is running productive on mariadb 10.2.12

Juergen Gnoss added a comment - 2018-01-30 13:45 mysql was 5.7 when I upgraded to mariadb 10.2.12. The server in question actually is running productive on mariadb 10.2.12

Juergen Gnoss added a comment - 2018-01-30 13:51

sorry, the production server runs

Server version: 10.2.11-MariaDB-10.2.11+maria~jessie-log mariadb.org binary distribution

Juergen Gnoss added a comment - 2018-01-30 13:51 sorry, the production server runs Server version: 10.2.11-MariaDB-10.2.11+maria~jessie-log mariadb.org binary distribution

Jan Lindström (Inactive) added a comment - 2018-01-30 15:42

commit 446b3d356218bff06efe4b3f5df89595cdfe8284
Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date: Tue Jan 30 17:40:40 2018 +0200

~~MDEV-14875~~: galera_new_cluster crashes mysqld when existing server contains databases

Fortify wsrep_hton so that wsrep calls are not done to NULL-pointers.

Jan Lindström (Inactive) added a comment - 2018-01-30 15:42 commit 446b3d356218bff06efe4b3f5df89595cdfe8284 Author: Jan Lindström <jan.lindstrom@mariadb.com> Date: Tue Jan 30 17:40:40 2018 +0200 MDEV-14875 : galera_new_cluster crashes mysqld when existing server contains databases Fortify wsrep_hton so that wsrep calls are not done to NULL-pointers.

Jan Lindström (Inactive) added a comment - 2018-01-30 15:42

Can you test this fix and use debug build ?

Jan Lindström (Inactive) added a comment - 2018-01-30 15:42 Can you test this fix and use debug build ?

Juergen Gnoss added a comment - 2018-01-30 16:05

OK, give me a few min. to test it.

Juergen Gnoss added a comment - 2018-01-30 16:05 OK, give me a few min. to test it.

Juergen Gnoss added a comment - 2018-01-30 16:25

first output of
journalctl -xe

https://pastebin.com/MLn6gQUS

Tell me what I should do to get information you need

Juergen Gnoss added a comment - 2018-01-30 16:25 first output of journalctl -xe https://pastebin.com/MLn6gQUS Tell me what I should do to get information you need

Juergen Gnoss added a comment - 2018-01-30 16:50

output of

coredumpctl gdb -1

https://pastebin.com/DnpY05Ez

Juergen Gnoss added a comment - 2018-01-30 16:50 output of coredumpctl gdb -1 https://pastebin.com/DnpY05Ez

Juergen Gnoss added a comment - 2018-01-30 17:06 - edited

Sorry, I'm on the wrong release.
I just git pulled, forgot to checkout the right version and ended up on 10.3.5

Now it's building on HEAD 446b3d35621

Sorry for that

Juergen Gnoss added a comment - 2018-01-30 17:06 - edited Sorry, I'm on the wrong release. I just git pulled, forgot to checkout the right version and ended up on 10.3.5 Now it's building on HEAD 446b3d35621 Sorry for that

Juergen Gnoss added a comment - 2018-01-30 18:06

I just added the trace of the last

galera_new_cluster

Juergen Gnoss added a comment - 2018-01-30 18:06 I just added the trace of the last galera_new_cluster

Jan Lindström (Inactive) added a comment - 2018-01-31 08:24

Can I have full error log? Something must go wrong at wsrep initialization.

Jan Lindström (Inactive) added a comment - 2018-01-31 08:24 Can I have full error log? Something must go wrong at wsrep initialization.

Juergen Gnoss added a comment - 2018-01-31 11:14

here the files from an actual try to bootstrap the cluster

on my debian 9 install if I use galera_new_cluster the log goes to syslog
so the attached file has the relevant output from syslog

@jan: would it be of help to chat on freenode #maria ?

Juergen Gnoss added a comment - 2018-01-31 11:14 here the files from an actual try to bootstrap the cluster on my debian 9 install if I use galera_new_cluster the log goes to syslog so the attached file has the relevant output from syslog @jan: would it be of help to chat on freenode #maria ?

Jan Lindström (Inactive) added a comment - 2018-01-31 12:04

Yes, it would help, what files you have on datadir ? You should not have galera state files when you create a new cluster, why it can't create normal error log?

Jan Lindström (Inactive) added a comment - 2018-01-31 12:04 Yes, it would help, what files you have on datadir ? You should not have galera state files when you create a new cluster, why it can't create normal error log?

Jan Lindström (Inactive) added a comment - 2018-02-08 09:13

It seems that MariaDB 10.2.13 has fixed this issue:

jan@jan-laptop-asus:/$ sudo /usr/local/bin/galera_new_cluster

jan@jan-laptop-asus:/$ sudo systemctl status mariadb

● mariadb.service - MariaDB 10.2.13 database server

   Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled

  Drop-In: /etc/systemd/system/mariadb.service.d

           └─migrated-from-my.cnf-settings.conf

   Active: active (running) since Tue 2018-02-06 16:41:22 EET; 1 day 18h ago

     Docs: man:mysqld(8)

           https://mariadb.com/kb/en/library/systemd/

 Main PID: 2316 (mysqld)

   Status: "Taking your SQL requests now..."

    Tasks: 35 (limit: 4915)

   CGroup: /system.slice/mariadb.service

           └─2316 /usr/sbin/mysqld --wsrep_start_position=00000000-0000-0000-0000-00000

Warning: Journal has been rotated since unit was started. Log output is incomplete or u

Jan Lindström (Inactive) added a comment - 2018-02-08 09:13 It seems that MariaDB 10.2.13 has fixed this issue: jan@jan-laptop-asus:/$ sudo /usr/local/bin/galera_new_cluster jan@jan-laptop-asus:/$ sudo systemctl status mariadb ● mariadb.service - MariaDB 10.2.13 database server Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled Drop-In: /etc/systemd/system/mariadb.service.d └─migrated-from-my.cnf-settings.conf Active: active (running) since Tue 2018-02-06 16:41:22 EET; 1 day 18h ago Docs: man:mysqld(8) https://mariadb.com/kb/en/library/systemd/ Main PID: 2316 (mysqld) Status: "Taking your SQL requests now..." Tasks: 35 (limit: 4915) CGroup: /system.slice/mariadb.service └─2316 /usr/sbin/mysqld --wsrep_start_position=00000000-0000-0000-0000-00000 Warning: Journal has been rotated since unit was started. Log output is incomplete or u

MariaDB Server

galera_new_cluster crashes mysqld when existing server contains databases

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration