Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14875

galera_new_cluster crashes mysqld when existing server contains databases

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.2.12
    • 10.2.13
    • Debian 9 fresh install
      mariadb from debian repos
      galera from debian repos
    • 10.2.13

    Description

      Try to setup galera cluster from standalone mariadb server that has a few databases.

      From existing mariadb server use /var/lib/mysql as is
      on stock /etc/mysql/my.cnf
      add

      [galera]

      1. Mandatory settings
        wsrep_on=ON
        wsrep_provider=/usr/lib/galera/libgalera_smm.so
        wsrep_cluster_name="tam_cluster"
        wsrep_cluster_address="gcomm://ip1, ip2, ip5"
        wsrep_node_address="ip5"
        wsrep_node_name="node_5"
        wsrep_sst_method=rsync
        binlog_format=row
        default_storage_engine=InnoDB
        innodb_autoinc_lock_mode=2

      run: galera_new_cluster

      result: mysqld crashes

      Attachments

        1. jgnoss_first_pages_001.bin
          128 kB
        2. mdb_dbg.tgz
          302 kB
        3. my.cnf
          6 kB
        4. mysqld_debug.txt
          17 kB
        5. mysqld.debug_outputs.txt
          14 kB
        6. mysqld.trace
          9.42 MB
        7. mysqld.trace
          1.76 MB
        8. mysqld.trace
          1.82 MB
        9. mysqld.trace
          88 kB

        Activity

          jgnoss Juergen Gnoss created issue -
          marko Marko Mäkelä made changes -
          Field Original Value New Value
          Component/s wsrep [ 11500 ]
          marko Marko Mäkelä made changes -
          Fix Version/s 10.2 [ 14601 ]
          Fix Version/s 10.3 [ 22126 ]

          The SIGSEGV occurs because the null pointer wsrep is being dereferenced in the member function call:

                if (WSREP_PROVIDER_EXISTS) {
                  if (wsrep->post_rollback(wsrep, &thd->wsrep_ws_handle))
                  {
                    DBUG_PRINT("wsrep", ("setting rollback fail"));
                    WSREP_ERROR("settting rollback fail: thd: %llu, schema: %s, SQL: %s",
                                (long long)thd->real_id, (thd->db ? thd->db : "(null)"),
                                thd->query());
                  }
                }
          

          Why is the function wsrep_commit() being called at all if wsrep==NULL?

          marko Marko Mäkelä added a comment - The SIGSEGV occurs because the null pointer wsrep is being dereferenced in the member function call: if (WSREP_PROVIDER_EXISTS) { if (wsrep->post_rollback(wsrep, &thd->wsrep_ws_handle)) { DBUG_PRINT( "wsrep" , ( "setting rollback fail" )); WSREP_ERROR( "settting rollback fail: thd: %llu, schema: %s, SQL: %s" , ( long long )thd->real_id, (thd->db ? thd->db : "(null)" ), thd->query()); } } Why is the function wsrep_commit() being called at all if wsrep==NULL ?
          marko Marko Mäkelä made changes -
          Assignee Jan Lindström [ jplindst ]

          Better question is why wsrep==NULL when WSREP_PROVIDER_EXISTS is true

          jplindst Jan Lindström (Inactive) added a comment - Better question is why wsrep==NULL when WSREP_PROVIDER_EXISTS is true

          Could you give your error log ?

          jplindst Jan Lindström (Inactive) added a comment - Could you give your error log ?
          jgnoss Juergen Gnoss added a comment -

          Jan 8 08:03:40 tam05 systemd[1]: Starting MariaDB 10.2.12 database server...
          Jan 8 08:03:44 tam05 mysqld[1287]: WSREP: Failed to start mysqld for wsrep recovery: '2018-01-08 8:03:41 0 [Note] /usr/sbin/mysqld (mysqld 10.3.4-MariaDB-debug-log) starting as process 1488 ...
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: For Galera, using innodb_lock_schedule_algorithm=fcfs
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Uses event mutexes
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Using Linux native AIO
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Number of pools: 1
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Using SSE2 crc32 instructions
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Initializing buffer pool, total size = 512M, instances = 1, chunk size = 128M
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:41 0 [Note] InnoDB: Completed initialization of buffer pool
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:42 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:42 0 [Note] InnoDB: The log sequence number 1018153280213 in the system tablespace does not match the log sequence number 1018153280524 in the ib_logfiles!
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:42 0 [Note] InnoDB: Last binlog file 'mysql-bin.000002', position 402911975
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: Creating shared tablespace for temporary tables
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: Waiting for purge to start
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] InnoDB: 5.7.20 started; log sequence number 1018153280524
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
          Jan 8 08:03:44 tam05 mysqld[1287]: 2018-01-08 8:03:43 0 [Note] Plugin 'FEEDBACK' is disabled.
          Jan 8 08:03:44 tam05 mysqld[1287]: 180108 8:03:43 [ERROR] mysqld got signal 11 ;
          Jan 8 08:03:44 tam05 mysqld[1287]: This could be because you hit a bug. It is also possible that this binary
          Jan 8 08:03:44 tam05 mysqld[1287]: or one of the libraries it was linked against is corrupt, improperly built,
          Jan 8 08:03:44 tam05 mysqld[1287]: or misconfigured. This error can also be caused by malfunctioning hardware.
          Jan 8 08:03:44 tam05 mysqld[1287]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
          Jan 8 08:03:44 tam05 mysqld[1287]: We will try our best to scrape up some info that will hopefully help
          Jan 8 08:03:44 tam05 mysqld[1287]: diagnose the problem, but since we have already crashed,
          Jan 8 08:03:44 tam05 mysqld[1287]: something is definitely wrong and this may fail.
          Jan 8 08:03:44 tam05 mysqld[1287]: Server version: 10.3.4-MariaDB-debug-log
          Jan 8 08:03:44 tam05 mysqld[1287]: key_buffer_size=134217728
          Jan 8 08:03:44 tam05 mysqld[1287]: read_buffer_size=2097152
          Jan 8 08:03:44 tam05 mysqld[1287]: max_used_connections=0
          Jan 8 08:03:44 tam05 mysqld[1287]: max_threads=702
          Jan 8 08:03:44 tam05 mysqld[1287]: thread_count=6
          Jan 8 08:03:44 tam05 mysqld[1287]: It is possible that mysqld could use up to
          Jan 8 08:03:44 tam05 mysqld[1287]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4459921 K bytes of memory
          Jan 8 08:03:44 tam05 mysqld[1287]: Hope that's ok; if not, decrease some variables in the equation.
          Jan 8 08:03:44 tam05 mysqld[1287]: Thread pointer: 0x55a486fb3470
          Jan 8 08:03:44 tam05 mysqld[1287]: Attempting backtrace. You can use the following information to find out
          Jan 8 08:03:44 tam05 mysqld[1287]: where mysqld died. If you see no messages after this, something went
          Jan 8 08:03:44 tam05 mysqld[1287]: terribly wrong...
          Jan 8 08:03:44 tam05 mysqld[1287]: stack_bottom = 0x7ffe02b9ab90 thread_stack 0x49000
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(my_print_stacktrace+0x3d)[0x55a481de86ca]
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(handle_fatal_signal+0x3bb)[0x55a481636f2a]
          Jan 8 08:03:44 tam05 mysqld[1287]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7f93adb3c0c0]
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z12wsrep_commitP10handlertonP3THDb+0x202)[0x55a481567621]
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(+0x90ecf0)[0x55a48163acf0]
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z19ha_commit_one_phaseP3THDb+0xfe)[0x55a48163abec]
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x53a)[0x55a48163a438]
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z17trans_commit_stmtP3THD+0xc3)[0x55a4814d8c66]
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z18close_mysql_tablesP3THD+0x26)[0x55a48130d437]
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(+0x66b946)[0x55a481397946]
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z11plugin_initPiPPci+0x966)[0x55a4813971fa]
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(+0x568cc3)[0x55a481294cc3]
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x5c4)[0x55a481295d08]
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(main+0x20)[0x55a48128b1c0]
          Jan 8 08:03:44 tam05 systemd[1]: mariadb.service: Control process exited, code=exited status=1
          Jan 8 08:03:44 tam05 mysqld[1287]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f93ac69b2b1]
          Jan 8 08:03:44 tam05 mysqld[1287]: /usr/sbin/mysqld(_start+0x2a)[0x55a48128b09a]
          Jan 8 08:03:44 tam05 mysqld[1287]: Trying to get some variables.
          Jan 8 08:03:44 tam05 mysqld[1287]: Some pointers may be invalid and cause the dump to abort.
          Jan 8 08:03:44 tam05 mysqld[1287]: Query (0x0):
          Jan 8 08:03:44 tam05 mysqld[1287]: Connection ID (thread ID): 0
          Jan 8 08:03:44 tam05 mysqld[1287]: Status: NOT_KILLED
          Jan 8 08:03:44 tam05 mysqld[1287]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on
          Jan 8 08:03:44 tam05 mysqld[1287]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
          Jan 8 08:03:44 tam05 mysqld[1287]: information that should help you find out what is causing the crash.
          Jan 8 08:03:44 tam05 mysqld[1287]: Writing a core file at /var/lib/mysql/
          Jan 8 08:03:44 tam05 mysqld[1287]: Segmentation fault (core dumped)'
          Jan 8 08:03:44 tam05 systemd[1]: Failed to start MariaDB 10.2.12 database server.
          Jan 8 08:03:44 tam05 systemd[1]: mariadb.service: Unit entered failed state.
          Jan 8 08:03:44 tam05 systemd[1]: mariadb.service: Failed with result 'exit-code'.

          jgnoss Juergen Gnoss added a comment - Jan 8 08:03:40 tam05 systemd [1] : Starting MariaDB 10.2.12 database server... Jan 8 08:03:44 tam05 mysqld [1287] : WSREP: Failed to start mysqld for wsrep recovery: '2018-01-08 8:03:41 0 [Note] /usr/sbin/mysqld (mysqld 10.3.4-MariaDB-debug-log) starting as process 1488 ... Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: For Galera, using innodb_lock_schedule_algorithm=fcfs Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!! Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Uses event mutexes Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Compressed tables use zlib 1.2.8 Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Using Linux native AIO Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Number of pools: 1 Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Using SSE2 crc32 instructions Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Initializing buffer pool, total size = 512M, instances = 1, chunk size = 128M Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:41 0 [Note] InnoDB: Completed initialization of buffer pool Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:42 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:42 0 [Note] InnoDB: The log sequence number 1018153280213 in the system tablespace does not match the log sequence number 1018153280524 in the ib_logfiles! Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:42 0 [Note] InnoDB: Last binlog file 'mysql-bin.000002', position 402911975 Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: 128 out of 128 rollback segments are active. Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: Creating shared tablespace for temporary tables Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: Waiting for purge to start Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] InnoDB: 5.7.20 started; log sequence number 1018153280524 Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery. Jan 8 08:03:44 tam05 mysqld [1287] : 2018-01-08 8:03:43 0 [Note] Plugin 'FEEDBACK' is disabled. Jan 8 08:03:44 tam05 mysqld [1287] : 180108 8:03:43 [ERROR] mysqld got signal 11 ; Jan 8 08:03:44 tam05 mysqld [1287] : This could be because you hit a bug. It is also possible that this binary Jan 8 08:03:44 tam05 mysqld [1287] : or one of the libraries it was linked against is corrupt, improperly built, Jan 8 08:03:44 tam05 mysqld [1287] : or misconfigured. This error can also be caused by malfunctioning hardware. Jan 8 08:03:44 tam05 mysqld [1287] : To report this bug, see https://mariadb.com/kb/en/reporting-bugs Jan 8 08:03:44 tam05 mysqld [1287] : We will try our best to scrape up some info that will hopefully help Jan 8 08:03:44 tam05 mysqld [1287] : diagnose the problem, but since we have already crashed, Jan 8 08:03:44 tam05 mysqld [1287] : something is definitely wrong and this may fail. Jan 8 08:03:44 tam05 mysqld [1287] : Server version: 10.3.4-MariaDB-debug-log Jan 8 08:03:44 tam05 mysqld [1287] : key_buffer_size=134217728 Jan 8 08:03:44 tam05 mysqld [1287] : read_buffer_size=2097152 Jan 8 08:03:44 tam05 mysqld [1287] : max_used_connections=0 Jan 8 08:03:44 tam05 mysqld [1287] : max_threads=702 Jan 8 08:03:44 tam05 mysqld [1287] : thread_count=6 Jan 8 08:03:44 tam05 mysqld [1287] : It is possible that mysqld could use up to Jan 8 08:03:44 tam05 mysqld [1287] : key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4459921 K bytes of memory Jan 8 08:03:44 tam05 mysqld [1287] : Hope that's ok; if not, decrease some variables in the equation. Jan 8 08:03:44 tam05 mysqld [1287] : Thread pointer: 0x55a486fb3470 Jan 8 08:03:44 tam05 mysqld [1287] : Attempting backtrace. You can use the following information to find out Jan 8 08:03:44 tam05 mysqld [1287] : where mysqld died. If you see no messages after this, something went Jan 8 08:03:44 tam05 mysqld [1287] : terribly wrong... Jan 8 08:03:44 tam05 mysqld [1287] : stack_bottom = 0x7ffe02b9ab90 thread_stack 0x49000 Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(my_print_stacktrace+0x3d) [0x55a481de86ca] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(handle_fatal_signal+0x3bb) [0x55a481636f2a] Jan 8 08:03:44 tam05 mysqld [1287] : /lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0) [0x7f93adb3c0c0] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z12wsrep_commitP10handlertonP3THDb+0x202) [0x55a481567621] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(+0x90ecf0) [0x55a48163acf0] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z19ha_commit_one_phaseP3THDb+0xfe) [0x55a48163abec] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x53a) [0x55a48163a438] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z17trans_commit_stmtP3THD+0xc3) [0x55a4814d8c66] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z18close_mysql_tablesP3THD+0x26) [0x55a48130d437] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(+0x66b946) [0x55a481397946] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z11plugin_initPiPPci+0x966) [0x55a4813971fa] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(+0x568cc3) [0x55a481294cc3] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x5c4) [0x55a481295d08] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(main+0x20) [0x55a48128b1c0] Jan 8 08:03:44 tam05 systemd [1] : mariadb.service: Control process exited, code=exited status=1 Jan 8 08:03:44 tam05 mysqld [1287] : /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f93ac69b2b1] Jan 8 08:03:44 tam05 mysqld [1287] : /usr/sbin/mysqld(_start+0x2a) [0x55a48128b09a] Jan 8 08:03:44 tam05 mysqld [1287] : Trying to get some variables. Jan 8 08:03:44 tam05 mysqld [1287] : Some pointers may be invalid and cause the dump to abort. Jan 8 08:03:44 tam05 mysqld [1287] : Query (0x0): Jan 8 08:03:44 tam05 mysqld [1287] : Connection ID (thread ID): 0 Jan 8 08:03:44 tam05 mysqld [1287] : Status: NOT_KILLED Jan 8 08:03:44 tam05 mysqld [1287] : Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on Jan 8 08:03:44 tam05 mysqld [1287] : The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains Jan 8 08:03:44 tam05 mysqld [1287] : information that should help you find out what is causing the crash. Jan 8 08:03:44 tam05 mysqld [1287] : Writing a core file at /var/lib/mysql/ Jan 8 08:03:44 tam05 mysqld [1287] : Segmentation fault (core dumped)' Jan 8 08:03:44 tam05 systemd [1] : Failed to start MariaDB 10.2.12 database server. Jan 8 08:03:44 tam05 systemd [1] : mariadb.service: Unit entered failed state. Jan 8 08:03:44 tam05 systemd [1] : mariadb.service: Failed with result 'exit-code'.
          jgnoss Juergen Gnoss added a comment -

          If it helps, that happens if I use
          >galera_new_cluster
          to bootstrap the first node.

          If I bootstrap the first node with
          >mysqld -umysql --wsrep-new-cluster

          The first node comes up, other nodes can connect and sync.
          But I cannot afterwards shut them down and restart.

          jgnoss Juergen Gnoss added a comment - If it helps, that happens if I use >galera_new_cluster to bootstrap the first node. If I bootstrap the first node with >mysqld -umysql --wsrep-new-cluster The first node comes up, other nodes can connect and sync. But I cannot afterwards shut them down and restart.
          jgnoss Juergen Gnoss made changes -
          Attachment mysqld.trace [ 44869 ]

          The macro WSREP_PROVIDER_EXISTS compares a string, not the wsrep pointer:

          (wsrep_provider && strncasecmp(wsrep_provider, WSREP_NONE, FN_REFLEN))
          

          This string is not being updated atomically with wsrep. One example where this might fail is in the error handling of the function wsrep_provider_init().
          Wouldn’t it be more reliable to just test wsrep instead of wsrep_provider?

          Is either variable protected by any mutex? It does not seem so to me. What would serialize multiple concurrent executions of wsrep_provider_update(), for example?

          marko Marko Mäkelä added a comment - The macro WSREP_PROVIDER_EXISTS compares a string, not the wsrep pointer: (wsrep_provider && strncasecmp(wsrep_provider, WSREP_NONE, FN_REFLEN)) This string is not being updated atomically with wsrep . One example where this might fail is in the error handling of the function wsrep_provider_init() . Wouldn’t it be more reliable to just test wsrep instead of wsrep_provider ? Is either variable protected by any mutex? It does not seem so to me. What would serialize multiple concurrent executions of wsrep_provider_update() , for example?
          jgnoss Juergen Gnoss made changes -
          Attachment mysqld.trace [ 44870 ]

          In the longer mysqld.trace most references to wsrep are from thread T@1, which is the initial thread. The trace is similar to the earlier error log that leads to a crash.
          Unfortunately most of the WSREP code is lacking DBUG_ENTER and DBUG_RETURN instrumentation. Because of this, it is hard to tell which subsystems init_server_components() actually initialized after wsrep_thr_init(). Some of the WSREP initialization is conditional.

          jgnoss, one thing that you should be able to do in your setup is that you add more instrumentation to the server startup. Something like this, possibly also in other places:

          diff --git a/sql/wsrep_mysqld.cc b/sql/wsrep_mysqld.cc
          index 173e955e44b..cd6a3497999 100644
          --- a/sql/wsrep_mysqld.cc
          +++ b/sql/wsrep_mysqld.cc
          @@ -796,6 +796,11 @@ void wsrep_thr_init()
           
           void wsrep_init_startup (bool first)
           {
          +  sql_print_information("wsrep_init_startup (%d, %s, %s, %s)",
          +                        first, wsrep_provider, wsrep_sst_method,
          +                        wsrep_cluster_address
          +                        ? wsrep_cluster_address : "(none)");
          +
             if (wsrep_init()) unireg_abort(1);
           
             wsrep_thr_lock_init(
          @@ -829,6 +834,7 @@ void wsrep_init_startup (bool first)
             wsrep_create_appliers(1);
           
             if (first && !wsrep_sst_wait()) unireg_abort(1);// wait until SST is completed
          +  sql_print_information("wsrep_init_startup completed");
           }
           
           
          

          These messages should be displayed in the server error log with a Note prefix. If these messages do not appear in the log, then this function is for some reason not being called at all, and you should instead add some output to the two callers in the file sql/mysqld.cc.

          marko Marko Mäkelä added a comment - In the longer mysqld.trace most references to wsrep are from thread T@1 , which is the initial thread. The trace is similar to the earlier error log that leads to a crash. Unfortunately most of the WSREP code is lacking DBUG_ENTER and DBUG_RETURN instrumentation. Because of this, it is hard to tell which subsystems init_server_components() actually initialized after wsrep_thr_init() . Some of the WSREP initialization is conditional. jgnoss , one thing that you should be able to do in your setup is that you add more instrumentation to the server startup. Something like this, possibly also in other places: diff --git a/sql/wsrep_mysqld.cc b/sql/wsrep_mysqld.cc index 173e955e44b..cd6a3497999 100644 --- a/sql/wsrep_mysqld.cc +++ b/sql/wsrep_mysqld.cc @@ -796,6 +796,11 @@ void wsrep_thr_init() void wsrep_init_startup (bool first) { + sql_print_information("wsrep_init_startup (%d, %s, %s, %s)", + first, wsrep_provider, wsrep_sst_method, + wsrep_cluster_address + ? wsrep_cluster_address : "(none)"); + if (wsrep_init()) unireg_abort(1); wsrep_thr_lock_init( @@ -829,6 +834,7 @@ void wsrep_init_startup (bool first) wsrep_create_appliers(1); if (first && !wsrep_sst_wait()) unireg_abort(1);// wait until SST is completed + sql_print_information("wsrep_init_startup completed"); } These messages should be displayed in the server error log with a Note prefix. If these messages do not appear in the log, then this function is for some reason not being called at all, and you should instead add some output to the two callers in the file sql/mysqld.cc.

          One way to get more information is to set wsrep_debug=ON; Could you do that?

          jplindst Jan Lindström (Inactive) added a comment - One way to get more information is to set wsrep_debug=ON; Could you do that?
          jgnoss Juergen Gnoss made changes -
          Attachment mysqld.trace [ 44871 ]
          jgnoss Juergen Gnoss added a comment -

          The trace I added right now is with

          wsrep_debug=ON
          in my.cnf

          What else I did:
          on IRC we talked about a database that isn't accessible inside that server.
          I used the original mysqld and started that server in standalone
          then I dropped that database.
          changed config to include galera stuff
          run
          galera_new_cluster

          because I lost trace of all changes I made in my.cnf
          I'll attach the last version I used

          jgnoss Juergen Gnoss added a comment - The trace I added right now is with wsrep_debug=ON in my.cnf What else I did: on IRC we talked about a database that isn't accessible inside that server. I used the original mysqld and started that server in standalone then I dropped that database. changed config to include galera stuff run galera_new_cluster because I lost trace of all changes I made in my.cnf I'll attach the last version I used
          jgnoss Juergen Gnoss made changes -
          Attachment my.cnf [ 44872 ]
          marko Marko Mäkelä added a comment - - edited

          jgnoss, you mentioned that you would upload the first 128 KiB of the system tablespace, but I cannot find that file either attached to this issue or on ftp.mariadb.com/uploads. (I suspect that Galera is looking for something in the first pages, and will not find it there. I’d want to transplant the content from those pages to an empty database, hoping to reproduce the problem.)

          It seems to me that the output that is generated by wsrep_debug=ON would land in the main server error log, not in the DBUG trace log. I did not notice any additional WSREP messages compared to the previous mysqld.trace file.

          Can you find out which arguments the wrapper script is passing to mysqld? To do that, I would suggest the following:

          sudo su -
          mv /usr/sbin/mysqld /usr/sbin/mysqld-real
          cat > /usr/sbin/mysqld << EOF
          #!/bin/sh
          echo mysqld "$@" >> /tmp/mysqld-real-cmd.txt
          exec /usr/sbin/mysqld-real "$@"
          EOF
          chmod +x /usr/sbin/mysqld
          

          This wrapper script should append a line with the arguments to the file /tmp/mysqld-real-cmd.txt each time the server is invoked, and then invoke the server. Aimed with this information, it should hopefully be possible to start the server with similar arguments in gdb. (And maybe the problem is somehow related to the arguments that are being passed.)

          marko Marko Mäkelä added a comment - - edited jgnoss , you mentioned that you would upload the first 128 KiB of the system tablespace, but I cannot find that file either attached to this issue or on ftp.mariadb.com/uploads. (I suspect that Galera is looking for something in the first pages, and will not find it there. I’d want to transplant the content from those pages to an empty database, hoping to reproduce the problem.) It seems to me that the output that is generated by wsrep_debug=ON would land in the main server error log, not in the DBUG trace log. I did not notice any additional WSREP messages compared to the previous mysqld.trace file. Can you find out which arguments the wrapper script is passing to mysqld? To do that, I would suggest the following: sudo su - mv /usr/sbin/mysqld /usr/sbin/mysqld-real cat > /usr/sbin/mysqld << EOF #!/bin/sh echo mysqld "$@" >> /tmp/mysqld-real-cmd.txt exec /usr/sbin/mysqld-real "$@" EOF chmod +x /usr/sbin/mysqld This wrapper script should append a line with the arguments to the file /tmp/mysqld-real-cmd.txt each time the server is invoked, and then invoke the server. Aimed with this information, it should hopefully be possible to start the server with similar arguments in gdb. (And maybe the problem is somehow related to the arguments that are being passed.)
          jgnoss Juergen Gnoss made changes -
          Attachment jgnoss_first_pages_001.bin [ 44873 ]
          jgnoss Juergen Gnoss made changes -
          Attachment mysqld.debug_outputs.txt [ 44875 ]

          I could not repeat the assertion. I started standalone systemd mariadb server, created few databases and InnoDB tables, stopped mariadb, run galera_new_cluster after configuration change. All started fine for me.

          jplindst Jan Lindström (Inactive) added a comment - I could not repeat the assertion. I started standalone systemd mariadb server, created few databases and InnoDB tables, stopped mariadb, run galera_new_cluster after configuration change. All started fine for me.
          jgnoss Juergen Gnoss added a comment -

          Hi Jan,
          same here, if I start the galera cluster from a fresh db with a few db's in it, it works fine.
          Problem occurs if I start the galera on an existing db, I have.
          The db I try is a 260G+ standalone db, upgraded from mysql to mariadb.
          Now I try to initiate the first node on that db and that is, where it crashes.

          jgnoss Juergen Gnoss added a comment - Hi Jan, same here, if I start the galera cluster from a fresh db with a few db's in it, it works fine. Problem occurs if I start the galera on an existing db, I have. The db I try is a 260G+ standalone db, upgraded from mysql to mariadb. Now I try to initiate the first node on that db and that is, where it crashes.

          What version of mysql db you have ? Did it contain also galera or tc.log ?

          jplindst Jan Lindström (Inactive) added a comment - What version of mysql db you have ? Did it contain also galera or tc.log ?
          jgnoss Juergen Gnoss added a comment -

          mysql was 5.7 when I upgraded to mariadb 10.2.12.

          The server in question actually is running productive on mariadb 10.2.12

          jgnoss Juergen Gnoss added a comment - mysql was 5.7 when I upgraded to mariadb 10.2.12. The server in question actually is running productive on mariadb 10.2.12
          jgnoss Juergen Gnoss added a comment -

          sorry, the production server runs

          Server version: 10.2.11-MariaDB-10.2.11+maria~jessie-log mariadb.org binary distribution

          jgnoss Juergen Gnoss added a comment - sorry, the production server runs Server version: 10.2.11-MariaDB-10.2.11+maria~jessie-log mariadb.org binary distribution

          commit 446b3d356218bff06efe4b3f5df89595cdfe8284
          Author: Jan Lindström <jan.lindstrom@mariadb.com>
          Date: Tue Jan 30 17:40:40 2018 +0200

          MDEV-14875: galera_new_cluster crashes mysqld when existing server contains databases

          Fortify wsrep_hton so that wsrep calls are not done to NULL-pointers.

          jplindst Jan Lindström (Inactive) added a comment - commit 446b3d356218bff06efe4b3f5df89595cdfe8284 Author: Jan Lindström <jan.lindstrom@mariadb.com> Date: Tue Jan 30 17:40:40 2018 +0200 MDEV-14875 : galera_new_cluster crashes mysqld when existing server contains databases Fortify wsrep_hton so that wsrep calls are not done to NULL-pointers.

          Can you test this fix and use debug build ?

          jplindst Jan Lindström (Inactive) added a comment - Can you test this fix and use debug build ?
          jgnoss Juergen Gnoss added a comment -

          OK, give me a few min. to test it.

          jgnoss Juergen Gnoss added a comment - OK, give me a few min. to test it.
          jgnoss Juergen Gnoss added a comment -

          first output of
          journalctl -xe

          https://pastebin.com/MLn6gQUS

          Tell me what I should do to get information you need

          jgnoss Juergen Gnoss added a comment - first output of journalctl -xe https://pastebin.com/MLn6gQUS Tell me what I should do to get information you need
          jgnoss Juergen Gnoss added a comment -

          output of

          coredumpctl gdb -1

          https://pastebin.com/DnpY05Ez

          jgnoss Juergen Gnoss added a comment - output of coredumpctl gdb -1 https://pastebin.com/DnpY05Ez
          jgnoss Juergen Gnoss added a comment - - edited

          Sorry, I'm on the wrong release.
          I just git pulled, forgot to checkout the right version and ended up on 10.3.5

          Now it's building on HEAD 446b3d35621

          Sorry for that

          jgnoss Juergen Gnoss added a comment - - edited Sorry, I'm on the wrong release. I just git pulled, forgot to checkout the right version and ended up on 10.3.5 Now it's building on HEAD 446b3d35621 Sorry for that
          jgnoss Juergen Gnoss made changes -
          Attachment mysqld.trace [ 45100 ]
          jgnoss Juergen Gnoss added a comment -

          I just added the trace of the last

          galera_new_cluster

          jgnoss Juergen Gnoss added a comment - I just added the trace of the last galera_new_cluster

          Can I have full error log? Something must go wrong at wsrep initialization.

          jplindst Jan Lindström (Inactive) added a comment - Can I have full error log? Something must go wrong at wsrep initialization.
          jgnoss Juergen Gnoss made changes -
          Attachment mdb_dbg.tgz [ 45105 ]
          jgnoss Juergen Gnoss added a comment -

          here the files from an actual try to bootstrap the cluster

          on my debian 9 install if I use galera_new_cluster the log goes to syslog
          so the attached file has the relevant output from syslog

          @jan: would it be of help to chat on freenode #maria ?

          jgnoss Juergen Gnoss added a comment - here the files from an actual try to bootstrap the cluster on my debian 9 install if I use galera_new_cluster the log goes to syslog so the attached file has the relevant output from syslog @jan: would it be of help to chat on freenode #maria ?

          Yes, it would help, what files you have on datadir ? You should not have galera state files when you create a new cluster, why it can't create normal error log?

          jplindst Jan Lindström (Inactive) added a comment - Yes, it would help, what files you have on datadir ? You should not have galera state files when you create a new cluster, why it can't create normal error log?
          serg Sergei Golubchik made changes -
          Priority Blocker [ 1 ] Critical [ 2 ]
          jplindst Jan Lindström (Inactive) made changes -
          Status Open [ 1 ] Confirmed [ 10101 ]
          serg Sergei Golubchik made changes -
          Sprint 10.2.13 [ 228 ]
          jplindst Jan Lindström (Inactive) made changes -
          Assignee Jan Lindström [ jplindst ] Seppo Jaakola [ seppo ]
          jplindst Jan Lindström (Inactive) made changes -
          Assignee Seppo Jaakola [ seppo ] Jan Lindström [ jplindst ]

          It seems that MariaDB 10.2.13 has fixed this issue:

          jan@jan-laptop-asus:/$ sudo /usr/local/bin/galera_new_cluster 
          jan@jan-laptop-asus:/$ sudo systemctl status mariadb
          ● mariadb.service - MariaDB 10.2.13 database server
             Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled
            Drop-In: /etc/systemd/system/mariadb.service.d
                     └─migrated-from-my.cnf-settings.conf
             Active: active (running) since Tue 2018-02-06 16:41:22 EET; 1 day 18h ago
               Docs: man:mysqld(8)
                     https://mariadb.com/kb/en/library/systemd/
           Main PID: 2316 (mysqld)
             Status: "Taking your SQL requests now..."
              Tasks: 35 (limit: 4915)
             CGroup: /system.slice/mariadb.service
                     └─2316 /usr/sbin/mysqld --wsrep_start_position=00000000-0000-0000-0000-00000
           
          Warning: Journal has been rotated since unit was started. Log output is incomplete or u
          

          jplindst Jan Lindström (Inactive) added a comment - It seems that MariaDB 10.2.13 has fixed this issue: jan@jan-laptop-asus:/$ sudo /usr/local/bin/galera_new_cluster jan@jan-laptop-asus:/$ sudo systemctl status mariadb ● mariadb.service - MariaDB 10.2.13 database server Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled Drop-In: /etc/systemd/system/mariadb.service.d └─migrated-from-my.cnf-settings.conf Active: active (running) since Tue 2018-02-06 16:41:22 EET; 1 day 18h ago Docs: man:mysqld(8) https://mariadb.com/kb/en/library/systemd/ Main PID: 2316 (mysqld) Status: "Taking your SQL requests now..." Tasks: 35 (limit: 4915) CGroup: /system.slice/mariadb.service └─2316 /usr/sbin/mysqld --wsrep_start_position=00000000-0000-0000-0000-00000   Warning: Journal has been rotated since unit was started. Log output is incomplete or u
          jplindst Jan Lindström (Inactive) made changes -
          Status Confirmed [ 10101 ] In Progress [ 3 ]
          jplindst Jan Lindström (Inactive) made changes -
          issue.field.resolutiondate 2018-02-08 09:15:16.0 2018-02-08 09:15:16.574
          jplindst Jan Lindström (Inactive) made changes -
          Component/s Storage Engine - InnoDB [ 10129 ]
          Component/s Admin statements [ 11400 ]
          Fix Version/s 10.2.13 [ 22910 ]
          Fix Version/s 10.2 [ 14601 ]
          Fix Version/s 10.3 [ 22126 ]
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 84742 ] MariaDB v4 [ 153513 ]

          People

            jplindst Jan Lindström (Inactive)
            jgnoss Juergen Gnoss
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.