Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4179

Server crash / memory corruption with MariaDB-Galera

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Incomplete
    • 5.5.28a-galera
    • N/A
    • Galera
    • None
    • CentOS 5, Ubuntu 11.10

    Description

      130218 7:14:08 [Note] Slave I/O thread: connected to master 'replication@10.240.170.40:53306',replication started in log 'mysql-bin.001384' at position 796660420
      130218 7:14:08 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see http://kb.askmonty.org/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 5.5.28a-MariaDB-log
      key_buffer_size=32768
      read_buffer_size=32768
      max_used_connections=0
      max_threads=102
      thread_count=0
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2511512 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x0x56ceb010
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x2b4098605080 thread_stack 0x48000
      130218 7:14:08 InnoDB: Starting shutdown...
      ??:0(my_print_stacktrace)[0xab166e]
      ??:0(handle_fatal_signal)[0x6eb91c]
      :0()[0x37e080eca0]
      ??:0(my_real_read(st_net*, unsigned long*))[0x50609e]
      ??:0(my_net_read)[0x5065ce]
      ??:0(cli_safe_read)[0x6c4416]
      ??:0(read_event(st_mysql*, Master_info*, bool*))[0x5125ee]
      ??:0(handle_slave_io)[0x51eb11]
      ??:0(pfs_spawn_thread)[0x85e608]
      :0()[0x37e080683d]
      :0()[0x37e00d503d]

      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x0): is an invalid pointer
      Connection ID (thread ID): 2
      Status: NOT_KILLED

      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=on,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off

      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.

      [root@WPDB03 ~]# *** glibc detected *** /usr/sbin/mysqld: malloc(): memory corruption (fast): 0x0000000058a58cd0 ***
      ======= Backtrace: =========
      /lib64/libc.so.6[0x37e0072c3b]
      /lib64/libc.so.6(__libc_malloc+0x6e)[0x37e0073f6e]
      /usr/sbin/mysqld(my_malloc+0x42)[0xaada92]
      /usr/sbin/mysqld(_Z17flush_master_infoP11Master_infobb+0x10b)[0x64269b]
      /usr/sbin/mysqld(handle_slave_io+0x12f9)[0x51ebd9]
      /usr/sbin/mysqld[0x85e608]
      /lib64/libpthread.so.0[0x37e080683d]
      /lib64/libc.so.6(clone+0x6d)[0x37e00d503d]
      ======= Memory map: ========
      00400000-00f4c000 r-xp 00000000 08:03 11798179 /usr/sbin/mysqld
      0114b000-01197000 rw-p 00b4b000 08:03 11798179 /usr/sbin/mysqld
      01197000-019e0000 rw-p 01197000 00:00 0
      01b96000-01c14000 rw-p 00b96000 08:03 11798179 /usr/sbin/mysqld
      1cf5e000-58d51000 rw-p 1cf5e000 00:00 0 [heap]
      37dfc00000-37dfc1c000 r-xp 00000000 08:03 134611002 /lib64/ld-2.5.so
      37dfe1c000-37dfe1d000 r--p 0001c000 08:03 134611002 /lib64/ld-2.5.so
      37dfe1d000-37dfe1e000 rw-p 0001d000 08:03 134611002 /lib64/ld-2.5.so
      37e0000000-37e014f000 r-xp 00000000 08:03 134611021 /lib64/libc-2.5.so
      37e014f000-37e034f000 ---p 0014f000 08:03 134611021 /lib64/libc-2.5.so
      37e034f000-37e0353000 r--p 0014f000 08:03 134611021 /lib64/libc-2.5.so
      37e0353000-37e0354000 rw-p 00153000 08:03 134611021 /lib64/libc-2.5.so
      37e0354000-37e0359000 rw-p 37e0354000 00:00 0
      37e0400000-37e0402000 r-xp 00000000 08:03 134611025 /lib64/libdl-2.5.so
      37e0402000-37e0602000 ---p 00002000 08:03 134611025 /lib64/libdl-2.5.so
      37e0602000-37e0603000 r--p 00002000 08:03 134611025 /lib64/libdl-2.5.so
      37e0603000-37e0604000 rw-p 00003000 08:03 134611025 /lib64/libdl-2.5.so
      37e0800000-37e0816000 r-xp 00000000 08:03 134611023 /lib64/libpthread-2.5.so
      37e0816000-37e0a16000 ---p 00016000 08:03 134611023 /lib64/libpthread-2.5.so
      37e0a16000-37e0a17000 r--p 00016000 08:03 134611023 /lib64/libpthread-2.5.so
      37e0a17000-37e0a18000 rw-p 00017000 08:03 134611023 /lib64/libpthread-2.5.so
      37e0a18000-37e0a1c000 rw-p 37e0a18000 00:00 0
      37e0c00000-37e0c3b000 r-xp 00000000 08:03 134611028 /lib64/libsepol.so.1
      37e0c3b000-37e0e3b000 ---p 0003b000 08:03 134611028 /lib64/libsepol.so.1
      37e0e3b000-37e0e3c000 rw-p 0003b000 08:03 134611028 /lib64/libsepol.so.1
      37e0e3c000-37e0e46000 rw-p 37e0e3c000 00:00 0
      37e1000000-37e1015000 r-xp 00000000 08:03 134611029 /lib64/libselinux.so.1
      37e1015000-37e1215000 ---p 00015000 08:03 134611029 /lib64/libselinux.so.1
      37e1215000-37e1217000 rw-p 00015000 08:03 134611029 /lib64/libselinux.so.1
      37e1217000-37e1218000 rw-p 37e1217000 00:00 0
      37e1400000-37e1482000 r-xp 00000000 08:03 134611040 /lib64/libm-2.5.so
      37e1482000-37e1681000 ---p 00082000 08:03 134611040 /lib64/libm-2.5.so
      37e1681000-37e1682000 r--p 00081000 08:03 134611040 /lib64/libm-2.5.so
      37e1682000-37e1683000 rw-p 00082000 08:03 134611040 /lib64/libm-2.5.so
      37e1800000-37e1814000 r-xp 00000000 08:03 134611031 /lib64/libz.so.1.2.3
      37e1814000-37e1a13000 ---p 00014000 08:03 134611031 /lib64/libz.so.1.2.3
      37e1a13000-37e1a14000 rw-p 00013000 08:03 134611031 /lib64/libz.so.1.2.3
      37e1c00000-37e1c07000 r-xp 00000000 08:03 134611024 /lib64/librt-2.5.so
      37e1c07000-37e1e07000 ---p 00007000 08:03 134611024 /lib64/librt-2.5.so
      37e1e07000-37e1e08000 r--p 00007000 08:03 134611024 /lib64/librt-2.5.so
      37e1e08000-37e1e09000 rw-p 00008000 08:03 134611024 /lib64/librt-2.5.so
      37e2000000-37e200d000 r-xp 00000000 08:03 134611041 /lib64/libgcc_s-4.1.2-20080825.so.1
      37e200d000-37e220d000 ---p 0000d000 08:03 134611041 /lib64/libgcc_s-4.1.2-20080825.so.1
      37e220d000-37e220e000 rw-p 0000d000 08:03 134611041 /lib64/libgcc_s-4.1.2-20080825.so.1
      37e2400000-37e2409000 r-xp 00000000 08:03 134611034 /lib64/libcrypt-2.5.so
      37e2409000-37e2608000 ---p 00009000 08:03 134611034 /lib64/libcrypt-2.5.so
      37e2608000-37e2609000 r--p 00008000 08:03 134611034 /lib64/libcrypt-2.5.so
      37e2609000-37e260a000 rw-p 00009000 08:03 134611034 /lib64/libcrypt-2.5.so
      37e260a000-37e2638000 rw-p 37e260a000 00:00 0
      37e2800000-37e28e6000 r-xp 00000000 08:03 11798342 /usr/lib64/libstdc++.so.6.0.8
      37e28e6000-37e2ae5000 ---p 000e6000 08:03 11798342 /usr/lib64/libstdc++.so.6.0.8
      37e2ae5000-37e2aeb000 r--p 000e5000 08:03 11798342 /usr/lib64/libstdc++.so.6.0.8
      37e2aeb000-37e2aee000 rw-p 000eb000 08:03 11798342 /usr/lib64/libstdc++.so.6.0.8
      37e2aee000-37e2b00000 rw-p 37e2aee000 00:00 0
      37e2c00000-37e2c11000 r-xp 00000000 08:03 134611027 /lib64/libresolv-2.5.so
      37e2c11000-37e2e11000 ---p 00011000 08:03 134611027 /lib64/libresolv-2.5.so
      37e2e11000-37e2e12000 r--p 00011000 08:03 134611027 /lib64/libresolv-2.5.so
      37e2e12000-37e2e13000 rw-p 00012000 08:03 134611027 /lib64/libresolv-2.5.so
      37e2e13000-37e2e15000 rw-p 37e2e13000 00:00 0
      37e3000000-37e3002000 r-xp 00000000 08:03 134611026 /lib64/libkeyutils-1.2.so
      37e3002000-37e3201000 ---p 00002000 08:03 134611026 /lib64/libkeyutils-1.2.so
      37e3201000-37e3202000 rw-p 00001000 08:03 134611026 /lib64/libkeyutils-1.2.so
      37e3400000-37e3402000 r-xp 00000000 08:03 134611030 /lib64/libcom_err.so.2.1
      37e3402000-37e3601000 ---p 00002000 08:03 134611030 /lib64/libcom_err.so.2.1
      37e3601000-37e3602000 rw-p 00001000 08:03 134611030 /lib64/libcom_err.so.2.1
      37e3800000-37e3940000 r-xp 00000000 08:03 134611032 /lib64/libcrypto.so.0.9.8x
      37e3940000-37e3b40000 ---p 00140000 08:03 134611032 /lib64/libcrypto.so.0.9.8x
      37e3b40000-37e3b62000 rw-p 00140000 08:03 134611032 /lib64/libcrypto.so.0.9.8x
      37e3b62000-37e3b66000 rw-p 37e3b62000 00:00 0
      37e4000000-37e4024000 r-xp 00000000 08:03 11798306 /usr/lib64/libk5crypto.so.3.1
      37e4024000-37e4223000 ---p 00024000 08:03 11798306 /usr/lib64/libk5crypto.so.3.1
      37e4223000-37e4225000 rw-p 00023000 08:03 11798306 /usr/lib64/libk5crypto.so.3.1
      37e4400000-37e442c000 r-xp 00000000 08:03 11798308 /usr/lib64/libgssapi_krb5.so.2.2
      37e442c000-37e462c000 ---p 0002c000 08:03 11798308 /usr/lib64/libgssapi_krb5.so.2.2
      37e462c000-37e462e000 rw-p 0002c000 08:03 11798308 /usr/lib64/libgssapi_krb5.so.2.2
      37e4800000-37e4891000 r-xp 00000000 08:03 11798307 /usr/lib64/libkrb5.so.3.3
      37e4891000-37e4a91000 ---p 00091000 08:03 11798307 /usr/lib64/libkrb5.so.3.3
      37e4a91000-37e4a95000 rw-p 00091000 08:03 11798307 /usr/lib64/libkrb5.so.3.3
      37e4c00000-37e4c08000 r-xp 00000000 08:03 11798305 /usr/lib64/libkrb5support.so.0.1
      37e4c08000-37e4e07000 ---p 00008000 08:03 11798305 /usr/lib64/libkrb5support.so.0.1
      37e4e07000-37e4e08000 rw-p 00007000 08:03 11798305 /usr/lib64/libkrb5support.so.0.1
      37e5800000-37e584f000 r-xp 00000000 08:03 134611033 /lib64/libssl.so.0.9.8x
      37e584f000-37e5a4f000 ---p 0004f000 08:03 134611033 /lib64/libssl.so.0.9.8x
      37e5a4f000-37e5a55000 rw-p 0004f000 08:03 134611033 /lib64/libssl.so.0.9.8x
      2b3572849000-2b357284b000 rw-p 2b3572849000 00:00 0
      2b3572853000-2b357285b000 rw-p 2b3572853000 00:00 0
      2b357285b000-2b357285c000 ---p 2b357285b000 00:00 0
      2b357285c000-2b357325c000 rwxp 2b357285c000 00:00 0
      2b357325c000-2b358abef000 rw-p 2b357325c000 00:00 0
      2b358abf6000-2b358ac00000 r-xp 00000000 08:03 134610979 /lib64/libnss_files-2.5.so
      2b358ac00000-2b358adff000 ---p 0000a000 08:03 134610979 /lib64/libnss_files-2.5.so
      2b358adff000-2b358ae00000 r--p 00009000 08:03 134610979 /lib64/libnss_files-2.5.so
      2b358ae00000-2b358ae01000 rw-p 0000a000 08:03 134610979 /lib64/libnss_files-2.5.so
      2b358ae01000-2b3ce9cdc000 rw-p 2b358ae01000 00:00 0
      2b3ce9cdc000-2b3ce9cdd000 ---p 2b3ce9cdc000 00:00 0
      2b3ce9cdd000-2b3cea6dd000 rwxp 2b3ce9cdd000 00:00 0
      2b3cea6dd000-2b3cea6de000 ---p 2b3cea6dd000 00:00 0
      2b3cea6de000-2b3ceb0de000 rwxp 2b3cea6de000 00:00 0
      2b3ceb0de000-2b3ceb0df000 ---p 2b3ceb0de000 00:00 0
      2b3ceb0df000-2b3cebadf000 rwxp 2b3ceb0df000 00:00 0
      2b3cebadf000-2b3cebae0000 ---p 2b3cebadf000 00:00 0
      2b3cebae0000-2b3cec4e0000 rwxp 2b3cebae0000 00:00 0
      2b3cec4e0000-2b3cec4e1000 ---p 2b3cec4e0000 00:00 0
      2b3cec4e1000-2b3cecee1000 rwxp 2b3cec4e1000 00:00 0
      2b3cecee1000-2b3cecee2000 ---p 2b3cecee1000 00:00 0
      2b3cecee2000-2b3ced8e2000 rwxp 2b3cecee2000 00:00 0
      2b3ced8e2000-2b3ced8e3000 ---p 2b3ced8e2000 00:00 0
      2b3ced8e3000-2b3cee2e3000 rwxp 2b3ced8e3000 00:00 0
      2b3cee2e3000-2b3cee2e4000 ---p 2b3cee2e3000 00:00 0
      2b3cee2e4000-2b3ceece4000 rwxp 2b3cee2e4000 00:00 0
      2b3ceece4000-2b3ceece5000 ---p 2b3ceece4000 00:00 0
      2b3ceece5000-2b3cef6e5000 rwxp 2b3ceece5000 00:00 0
      2b3cef6e5000-2b3cef6e6000 ---p 2b3cef6e5000 00:00 0
      2b3cef6e6000-2b3cf00e6000 rwxp 2b3cef6e6000 00:00 0
      2b3cf00e6000-2b3cf00e7000 ---p 2b3cf00e6000 00:00 0
      2b3cf00e7000-2b3cf0ae7000 rwxp 2b3cf00e7000 00:00 0
      2b3cf0ae7000-2b3cf0ae8000 ---p 2b3cf0ae7000 00:00 0
      2b3cf0ae8000-2b3cf14e8000 rwxp 2b3cf0ae8000 00:00 0
      2b3cf14e8000-2b3cf14e9000 ---p 2b3cf14e8000 00:00 0
      2b3cf14e9000-2b3cf1ee9000 rwxp 2b3cf14e9000 00:00 0
      2b3cf1ee9000-2b3cf1eea000 ---p 2b3cf1ee9000 00:00 0
      2b3cf1eea000-2b3cf28ea000 rwxp 2b3cf1eea000 00:00 0
      2b3cf28ea000-2b3cf28eb000 ---p 2b3cf28ea000 00:00 0
      2b3cf28eb000-2b3cf32eb000 rwxp 2b3cf28eb000 00:00 0
      2b3cf32eb000-2b3cf32ec000 ---p 2b3cf32eb000 00:00 0
      2b3cf32ec000-2b3cf3cec000 rwxp 2b3cf32ec000 00:00 0
      2b3cf3cec000-2b3cf3ced000 ---p 2b3cf3cec000 00:00 0
      2b3cf3ced000-2b3cf46ed000 rwxp 2b3cf3ced000 00:00 0
      2b3cf46ed000-2b3cf46ee000 ---p 2b3cf46ed000 00:00 0
      2b3cf46ee000-2b3cf4736000 rwxp 2b3cf46ee000 00:00 0
      2b3cf4736000-2b3cf4737000 ---p 2b3cf4736000 00:00 0
      2b3cf4737000-2b3cf477f000 rwxp 2b3cf4737000 00:00 0
      2b3cf477f000-2b3cf4780000 ---p 2b3cf477f000 00:00 0
      2b3cf4780000-2b3cf47c8000 rwxp 2b3cf4780000 00:00 0
      2b3cf8000000-2b3cf8034000 rw-p 2b3cf8000000 00:00 0
      2b3cf8034000-2b3cfc000000 ---p 2b3cf8034000 00:00 0
      2b3d0c6d8000-2b3d13a5c000 rw-p 2b3d0c6d8000 00:00 0
      7fff16415000-7fff1642a000 rwxp 7ffffffe9000 00:00 0 [stack]
      7fff16436000-7fff16439000 r-xp 7fff16436000 00:00 0 [vdso]
      ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vsyscall]

      Attachments

        Activity

          100% reproducible (3/3 tries):
          1) Load pretty large DB from mysqldump
          2) Configure slave
          3) Wait for it to catch up
          4) Try to restart

          After core dump, the main mysqld is hanging and is not responding. After kill -9, the data is corrupted - the default crash recovery fails and slave doesn't start

          aleksey.sanin Aleksey Sanin (Inactive) added a comment - 100% reproducible (3/3 tries): 1) Load pretty large DB from mysqldump 2) Configure slave 3) Wait for it to catch up 4) Try to restart After core dump, the main mysqld is hanging and is not responding. After kill -9, the data is corrupted - the default crash recovery fails and slave doesn't start

          Hi Aleksey,

          How big is the dump? Can you upload it to our ftp? (ftp.askmonty.org, you can choose the private section if you wish)
          Please also either upload or attach your server config files.

          Are you using thread pool in this installation as well? I'm wondering because the described algorithm involves the node restart, and as you already reported in another bug, in 5.5.28a it doesn't work with the thread pool; so I'm wondering if it could be anyhow related.

          elenst Elena Stepanova added a comment - Hi Aleksey, How big is the dump? Can you upload it to our ftp? (ftp.askmonty.org, you can choose the private section if you wish) Please also either upload or attach your server config files. Are you using thread pool in this installation as well? I'm wondering because the described algorithm involves the node restart, and as you already reported in another bug, in 5.5.28a it doesn't work with the thread pool; so I'm wondering if it could be anyhow related.

          Hi Elena,

          Unfortunately this bug was happening on production data and we can not share it for various reasons. Plus, the gzipped mysqldump is about 500G so it's not practical either. I'll get the configs files uploaded tomorrow though exactly same configs (ignoring different memory sizes, threads number and SSL configs) run fine on our dev environment.

          One more piece of a puzzle. We did another test:
          1) Got stock MariaDB 5.5.28a up and running with the data as slave.
          2) Switched binaries to MariaDB-Galera, didn't change any configs
          3) After restart there is the same crash

          I know it's not much but with 500G of data the setup for the test takes almost 20 hours and it gets into an unrecoverable state afterwards so it's really hard to do any debugging.

          Best,

          Aleksey

          aleksey.sanin Aleksey Sanin (Inactive) added a comment - Hi Elena, Unfortunately this bug was happening on production data and we can not share it for various reasons. Plus, the gzipped mysqldump is about 500G so it's not practical either. I'll get the configs files uploaded tomorrow though exactly same configs (ignoring different memory sizes, threads number and SSL configs) run fine on our dev environment. One more piece of a puzzle. We did another test: 1) Got stock MariaDB 5.5.28a up and running with the data as slave. 2) Switched binaries to MariaDB-Galera, didn't change any configs 3) After restart there is the same crash I know it's not much but with 500G of data the setup for the test takes almost 20 hours and it gets into an unrecoverable state afterwards so it's really hard to do any debugging. Best, Aleksey

          Hi Aleksey,

          Yes, you are right, uploading 500Gb wouldn't be practical even if the data was not private.
          Please do provide your configs, and also full error logs from the nodes, we'll try to figure out what's going on from them.

          Meanwhile, back to the initial description..
          When you said that you'd set up a slave, you actually meant the traditional MySQL replication, not Galera cluster? At least from the part of the log that you'd quoted it looks like you have a normal slave:

          130218 7:14:08 [Note] Slave I/O thread: connected to master 'replication@10.240.170.40:53306',replication started in log 'mysql-bin.001384' at position 796660420

          Or are you using both Galera replication and traditional replication on the same server?

          When you restart the slave after the crash, does it again attempt to start from the same position? If so, can you check the binary log to see, what kind of event is at that position?

          You also said "After core dump, the main mysqld is hanging and is not responding." – what is main mysqld, do you mean the master?

          Thanks

          elenst Elena Stepanova added a comment - Hi Aleksey, Yes, you are right, uploading 500Gb wouldn't be practical even if the data was not private. Please do provide your configs, and also full error logs from the nodes, we'll try to figure out what's going on from them. Meanwhile, back to the initial description.. When you said that you'd set up a slave, you actually meant the traditional MySQL replication, not Galera cluster? At least from the part of the log that you'd quoted it looks like you have a normal slave: 130218 7:14:08 [Note] Slave I/O thread: connected to master 'replication@10.240.170.40:53306',replication started in log 'mysql-bin.001384' at position 796660420 Or are you using both Galera replication and traditional replication on the same server? When you restart the slave after the crash, does it again attempt to start from the same position? If so, can you check the binary log to see, what kind of event is at that position? You also said "After core dump, the main mysqld is hanging and is not responding." – what is main mysqld, do you mean the master? Thanks

          The mysql config file is attached. It has the galera options enabled but actually at the moment of the crash all the options were disabled. Since there were no other logs, that's the only log files we have.

          Sorry, I was not clear what is going on, let me try again.

          We have currently Master->Slave setup that we are trying to convert to Galera cluster. As a first step, we tried to simply swap MariaDB binaries with MariaDB-Galera binaries on Slave using same my.cnf config (all the galera options disabled) and continue replication as before. This is when we've got a crash. Again, I'll stress that there have been no Galera replication just yet. Just regular slave replication.

          Regarding thread pool, it was already disabled. Though the other bug I've run into was purely during shutdown. I didn't see any issue with it during normal run. So this is not a likely candidate.

          During Galera startup, it starts some kind of "recovery" as a separate process:

          mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.XXXXXX

          I believe this is the one that crashed (though I am not 100% sure). The "main" mysqld was still running but did not respond to anything but kill -9 . This is not master (in the replication) but the main mysqld started by mysqld_safe

          After restart from crash, it doesn't even come to slave restart. The process dies complaining about the usual "InnoDB: Database page corruption on disk or a failed". We've tried to recover it with innodb_force_recovery but after recovery pt-checksum reported numerous diffs so we decided it is not worth it the next time (note that we've run pt-checksum just before the upgrade to MariaDB-Galera binaries and it was clean).

          Not sure I answered all your questions, let me know if I missed something

          aleksey.sanin Aleksey Sanin (Inactive) added a comment - The mysql config file is attached. It has the galera options enabled but actually at the moment of the crash all the options were disabled. Since there were no other logs, that's the only log files we have. Sorry, I was not clear what is going on, let me try again. We have currently Master->Slave setup that we are trying to convert to Galera cluster. As a first step, we tried to simply swap MariaDB binaries with MariaDB-Galera binaries on Slave using same my.cnf config (all the galera options disabled) and continue replication as before. This is when we've got a crash. Again, I'll stress that there have been no Galera replication just yet. Just regular slave replication. Regarding thread pool, it was already disabled. Though the other bug I've run into was purely during shutdown. I didn't see any issue with it during normal run. So this is not a likely candidate. During Galera startup, it starts some kind of "recovery" as a separate process: mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.XXXXXX I believe this is the one that crashed (though I am not 100% sure). The "main" mysqld was still running but did not respond to anything but kill -9 . This is not master (in the replication) but the main mysqld started by mysqld_safe After restart from crash, it doesn't even come to slave restart. The process dies complaining about the usual "InnoDB: Database page corruption on disk or a failed". We've tried to recover it with innodb_force_recovery but after recovery pt-checksum reported numerous diffs so we decided it is not worth it the next time (note that we've run pt-checksum just before the upgrade to MariaDB-Galera binaries and it was clean). Not sure I answered all your questions, let me know if I missed something
          elenst Elena Stepanova added a comment - - edited

          Hi Aleksey,

          Thank you for the information, it clarifies a lot.
          It seems I am able to reproduce the crash now, just need to run some tests to determine the essential part.

          elenst Elena Stepanova added a comment - - edited Hi Aleksey, Thank you for the information, it clarifies a lot. It seems I am able to reproduce the crash now, just need to run some tests to determine the essential part.

          Great! Do you know which options caused it?

          aleksey.sanin Aleksey Sanin (Inactive) added a comment - Great! Do you know which options caused it?

          I don't think there was anything wrong with your configuration as such, it's just a bug that needs to be fixed.

          Before starting mysqld server for real, mysqld_safe from the Galera distribution runs it with wsrep-recover option to obtain wsrep start position. It happens unconditionally, even if no wsrep options are set in the server configuration.
          It's a fast operation, the server is on-off in the matter of second(s); but unlike for example bootstrap, here mysqld is run with all usual options. So, if the server was previously configured as a traditional slave, the slave starts also. Apparently, it conflicts with the wsrep recovery algorithm and the conflict causes the crash.

          That said, if I understand the reason of the crash correctly, you could have avoided it if you had skip-slave-start in your server config, and started replication manually instead; but since experiments in your environment are so expensive due to the size of the database, it might make sense to wait till Galera developers confirm the theory.
          Of course, in any case it's just a workaround which is not supposed to be needed in normal conditions.

          The database corruption is another story; I haven't seen it in my tests. Probably it's due to the huge size of your on-disk data or the memory imprint (since you have 28 GB for the buffer pool) that the crashing server hangs, or just spends too long time dying, and you have to kill it in a dirty way. Still, it's not particularly clear why the data would have been corrupted. I have a few wild guesses regarding that, but unfortunately from the log excerpt we can't see what InnoDB had been doing at the moment of the crash, or what kind of event the slave was applying, if any (e.g. if it was ALTER TABLE, the data could have been damaged indeed). So I'm afraid this part might remain a mystery for the time being.

          elenst Elena Stepanova added a comment - I don't think there was anything wrong with your configuration as such, it's just a bug that needs to be fixed. Before starting mysqld server for real, mysqld_safe from the Galera distribution runs it with wsrep-recover option to obtain wsrep start position. It happens unconditionally, even if no wsrep options are set in the server configuration. It's a fast operation, the server is on-off in the matter of second(s); but unlike for example bootstrap, here mysqld is run with all usual options. So, if the server was previously configured as a traditional slave, the slave starts also. Apparently, it conflicts with the wsrep recovery algorithm and the conflict causes the crash. That said, if I understand the reason of the crash correctly, you could have avoided it if you had skip-slave-start in your server config, and started replication manually instead; but since experiments in your environment are so expensive due to the size of the database, it might make sense to wait till Galera developers confirm the theory. Of course, in any case it's just a workaround which is not supposed to be needed in normal conditions. The database corruption is another story; I haven't seen it in my tests. Probably it's due to the huge size of your on-disk data or the memory imprint (since you have 28 GB for the buffer pool) that the crashing server hangs, or just spends too long time dying, and you have to kill it in a dirty way. Still, it's not particularly clear why the data would have been corrupted. I have a few wild guesses regarding that, but unfortunately from the log excerpt we can't see what InnoDB had been doing at the moment of the crash, or what kind of event the slave was applying, if any (e.g. if it was ALTER TABLE, the data could have been damaged indeed). So I'm afraid this part might remain a mystery for the time being.

          Thanks for the update. Since I am very interested in figuring this out and since the whole slave setup was a temporarily thing for the upgrade anyway, I am going to ask our guys to try skip-slave-start tonight. In the best case we get a working galera cluster, in the worst thing the slave DB server will have to load our data one more time

          Regarding data corruption, the 28G value was actually a test. Since the error is in NULL pointer, we tried to shrink the pool size to see if there is an OOM error in the wsrep or something. The actual value is 80G which covers our "active" data set pretty well.

          Lastly, I've actually remembered that we've seen similar issue on dev environment though stack trace was different:

          https://mariadb.atlassian.net/browse/MDEV-4158

          It was the same upgrade process though it didn't crash 100% of the time. May be it is a timing issue somewhere?

          Anyway, thanks again for your help.

          aleksey.sanin Aleksey Sanin (Inactive) added a comment - Thanks for the update. Since I am very interested in figuring this out and since the whole slave setup was a temporarily thing for the upgrade anyway, I am going to ask our guys to try skip-slave-start tonight. In the best case we get a working galera cluster, in the worst thing the slave DB server will have to load our data one more time Regarding data corruption, the 28G value was actually a test. Since the error is in NULL pointer, we tried to shrink the pool size to see if there is an OOM error in the wsrep or something. The actual value is 80G which covers our "active" data set pretty well. Lastly, I've actually remembered that we've seen similar issue on dev environment though stack trace was different: https://mariadb.atlassian.net/browse/MDEV-4158 It was the same upgrade process though it didn't crash 100% of the time. May be it is a timing issue somewhere? Anyway, thanks again for your help.

          Hi Seppo,

          To reproduce the crash, I do the following (on Ubuntu 11.10 64-bit, maria-5.5-galera revno 3378, debug build; can't use 3380 due to MDEV-4185):

          • Start master as
            mysqld --no-defaults --server-id=1 --log-bin=master-bin --binlog-format=row --datadir=<datadir1> --log-bin=master-bin --basedir=<basedir> --port=8306 --loose-lc-messages-dir=<basedir>/sql/share --loose-language=<basedir>/sql/share/english/english --socket=<socket1>
          • Start slave as
            mysqld --no-defaults --server-id=2 --datadir=<datadir2> --basedir=<basedir> --port=8307 --loose-lc-messages-dir=<basedir>/sql/share --loose-language=<basedir>/sql/share/english/english --socket=<socket2>
          • On slave, run
            STOP SLAVE;
            CHANGE MASTER TO master_port=8306, master_host='127.0.0.1', master_user='root';
            START SLAVE;
          • Shutdown slave in the nicest possible way
            mysqladmin -uroot --protocol=tcp --port=8307 shutdown
          • run wsrep recovery (same slave command line as before, but with wsrep-recover), imitating mysqld_safe activities:

          mysqld --no-defaults --server-id=2 --datadir=<datadir2> --basedir=<basedir> --port=8307 --loose-lc-messages-dir=<basedir>/sql/share --loose-language=<basedir>/sql/share/english/english --socket=<socket2> wsrep-recover

          • observe the crash (selected stack traces):

          safe_mutex: Trying to lock unitialized mutex at /home/elenst/maria-5.5-galera/sql/slave.cc, line 4267
          130220 0:48:12 [ERROR] mysqld got signal 6 ;

          Thread 5 (Thread 0x7fb924414740 (LWP 6745)):
          #0 0x00007fb922d4cfc3 in select () at ../sysdeps/unix/syscall-template.S:82
          #1 0x0000000000bd9c0e in os_thread_sleep (tm=100000) at maria-5.5-galera/storage/xtradb/os/os0thread.c:261
          #2 0x0000000000bbb20b in logs_empty_and_mark_files_at_shutdown () at maria-5.5-galera/storage/xtradb/log/log0log.c:3313
          #3 0x0000000000ab38b0 in innobase_shutdown_for_mysql () at maria-5.5-galera/storage/xtradb/srv/srv0start.c:2351
          #4 0x0000000000a3e565 in innobase_end (hton=0x36cd0e0, type=HA_PANIC_CLOSE) at maria-5.5-galera/storage/xtradb/handler/ha_innodb.cc:3259
          #5 0x00000000007e67d3 in ha_finalize_handlerton (plugin=0x36ccc70) at maria-5.5-galera/sql/handler.cc:417
          #6 0x000000000062f778 in plugin_deinitialize (plugin=0x36ccc70, ref_check=false) at maria-5.5-galera/sql/sql_plugin.cc:1168
          #7 0x00000000006319d2 in plugin_shutdown () at maria-5.5-galera/sql/sql_plugin.cc:1953
          #8 0x0000000000564110 in clean_up (print_message=true) at maria-5.5-galera/sql/mysqld.cc:1930
          #9 0x0000000000563f8a in unireg_abort (exit_code=0) at maria-5.5-galera/sql/mysqld.cc:1861
          #10 0x000000000056b70d in mysqld_main (argc=11, argv=0x35e4c80) at maria-5.5-galera/sql/mysqld.cc:5647
          #11 0x0000000000561654 in main (argc=11, argv=0x7fff72fb9dc8) at maria-5.5-galera/sql/main.cc:25

          Thread 1 (Thread 0x7fb9243c9700 (LWP 6766)):
          #0 __pthread_kill (threadid=<optimized out>, signo=6) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:63
          #1 0x0000000000cf57a0 in my_write_core (sig=6) at maria-5.5-galera/mysys/stacktrace.c:457
          #2 0x00000000007e52e0 in handle_fatal_signal (sig=6) at maria-5.5-galera/sql/signal_handler.cc:262
          #3 <signal handler called>
          #4 0x00007fb922ca53a5 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
          #5 0x00007fb922ca8b0b in __GI_abort () at abort.c:92
          #6 0x0000000000cfcbb3 in safe_mutex_lock (mp=0x7fb8fc09d658, my_flags=0, file=0xd50a30 "maria-5.5-galera/sql/slave.cc", line=4267) at maria-5.5-galera/mysys/thr_mutex.c:247
          #7 0x0000000000583859 in inline_mysql_mutex_lock (that=0x7fb8fc09d658, src_file=0xd50a30 "maria-5.5-galera/sql/slave.cc", src_line=4267) at maria-5.5-galera/include/mysql/psi/mysql_thread.h:618
          #8 0x000000000058fa25 in queue_event (mi=0x7fb8fc09c2b0, buf=0x4b672d1 "", event_len=44) at maria-5.5-galera/sql/slave.cc:4267
          #9 0x000000000058cb67 in handle_slave_io (arg=0x7fb8fc09c2b0) at maria-5.5-galera/sql/slave.cc:3267
          #10 0x00007fb9235a4efc in start_thread (arg=0x7fb9243c9700) at pthread_create.c:304
          #11 0x00007fb922d53f4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
          #12 0x0000000000000000 in ?? ()

          elenst Elena Stepanova added a comment - Hi Seppo, To reproduce the crash, I do the following (on Ubuntu 11.10 64-bit, maria-5.5-galera revno 3378, debug build; can't use 3380 due to MDEV-4185 ): Start master as mysqld --no-defaults --server-id=1 --log-bin=master-bin --binlog-format=row --datadir=<datadir1> --log-bin=master-bin --basedir=<basedir> --port=8306 --loose-lc-messages-dir=<basedir>/sql/share --loose-language=<basedir>/sql/share/english/english --socket=<socket1> Start slave as mysqld --no-defaults --server-id=2 --datadir=<datadir2> --basedir=<basedir> --port=8307 --loose-lc-messages-dir=<basedir>/sql/share --loose-language=<basedir>/sql/share/english/english --socket=<socket2> On slave, run STOP SLAVE; CHANGE MASTER TO master_port=8306, master_host='127.0.0.1', master_user='root'; START SLAVE; Shutdown slave in the nicest possible way mysqladmin -uroot --protocol=tcp --port=8307 shutdown run wsrep recovery (same slave command line as before, but with wsrep-recover), imitating mysqld_safe activities: mysqld --no-defaults --server-id=2 --datadir=<datadir2> --basedir=<basedir> --port=8307 --loose-lc-messages-dir=<basedir>/sql/share --loose-language=<basedir>/sql/share/english/english --socket=<socket2> wsrep-recover observe the crash (selected stack traces): safe_mutex: Trying to lock unitialized mutex at /home/elenst/maria-5.5-galera/sql/slave.cc, line 4267 130220 0:48:12 [ERROR] mysqld got signal 6 ; Thread 5 (Thread 0x7fb924414740 (LWP 6745)): #0 0x00007fb922d4cfc3 in select () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000000000bd9c0e in os_thread_sleep (tm=100000) at maria-5.5-galera/storage/xtradb/os/os0thread.c:261 #2 0x0000000000bbb20b in logs_empty_and_mark_files_at_shutdown () at maria-5.5-galera/storage/xtradb/log/log0log.c:3313 #3 0x0000000000ab38b0 in innobase_shutdown_for_mysql () at maria-5.5-galera/storage/xtradb/srv/srv0start.c:2351 #4 0x0000000000a3e565 in innobase_end (hton=0x36cd0e0, type=HA_PANIC_CLOSE) at maria-5.5-galera/storage/xtradb/handler/ha_innodb.cc:3259 #5 0x00000000007e67d3 in ha_finalize_handlerton (plugin=0x36ccc70) at maria-5.5-galera/sql/handler.cc:417 #6 0x000000000062f778 in plugin_deinitialize (plugin=0x36ccc70, ref_check=false) at maria-5.5-galera/sql/sql_plugin.cc:1168 #7 0x00000000006319d2 in plugin_shutdown () at maria-5.5-galera/sql/sql_plugin.cc:1953 #8 0x0000000000564110 in clean_up (print_message=true) at maria-5.5-galera/sql/mysqld.cc:1930 #9 0x0000000000563f8a in unireg_abort (exit_code=0) at maria-5.5-galera/sql/mysqld.cc:1861 #10 0x000000000056b70d in mysqld_main (argc=11, argv=0x35e4c80) at maria-5.5-galera/sql/mysqld.cc:5647 #11 0x0000000000561654 in main (argc=11, argv=0x7fff72fb9dc8) at maria-5.5-galera/sql/main.cc:25 Thread 1 (Thread 0x7fb9243c9700 (LWP 6766)): #0 __pthread_kill (threadid=<optimized out>, signo=6) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:63 #1 0x0000000000cf57a0 in my_write_core (sig=6) at maria-5.5-galera/mysys/stacktrace.c:457 #2 0x00000000007e52e0 in handle_fatal_signal (sig=6) at maria-5.5-galera/sql/signal_handler.cc:262 #3 <signal handler called> #4 0x00007fb922ca53a5 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #5 0x00007fb922ca8b0b in __GI_abort () at abort.c:92 #6 0x0000000000cfcbb3 in safe_mutex_lock (mp=0x7fb8fc09d658, my_flags=0, file=0xd50a30 "maria-5.5-galera/sql/slave.cc", line=4267) at maria-5.5-galera/mysys/thr_mutex.c:247 #7 0x0000000000583859 in inline_mysql_mutex_lock (that=0x7fb8fc09d658, src_file=0xd50a30 "maria-5.5-galera/sql/slave.cc", src_line=4267) at maria-5.5-galera/include/mysql/psi/mysql_thread.h:618 #8 0x000000000058fa25 in queue_event (mi=0x7fb8fc09c2b0, buf=0x4b672d1 "", event_len=44) at maria-5.5-galera/sql/slave.cc:4267 #9 0x000000000058cb67 in handle_slave_io (arg=0x7fb8fc09c2b0) at maria-5.5-galera/sql/slave.cc:3267 #10 0x00007fb9235a4efc in start_thread (arg=0x7fb9243c9700) at pthread_create.c:304 #11 0x00007fb922d53f4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #12 0x0000000000000000 in ?? ()
          seppo Seppo Jaakola added a comment -

          There are two issues:
          1. slave IO thread failed for segfault during reading of replication event
          2. after that, server restart fails for a bug in wsrep recovery processing

          The issue #2 has been reported in: https://bugs.launchpad.net/codership-mysql/+bug/1132974 and the fix was merged as a part of revision: http://bazaar.launchpad.net/~maria-captains/maria/maria-5.5-galera/revision/3383
          Issue #1 is not possible to analyze further with the given information. I can only see that slave IO thread has corrupted memory, but what caused the memory corruption is an open question.

          seppo Seppo Jaakola added a comment - There are two issues: 1. slave IO thread failed for segfault during reading of replication event 2. after that, server restart fails for a bug in wsrep recovery processing The issue #2 has been reported in: https://bugs.launchpad.net/codership-mysql/+bug/1132974 and the fix was merged as a part of revision: http://bazaar.launchpad.net/~maria-captains/maria/maria-5.5-galera/revision/3383 Issue #1 is not possible to analyze further with the given information. I can only see that slave IO thread has corrupted memory, but what caused the memory corruption is an open question.
          guoxiaoqiao Xiaoqiao Guo (Inactive) added a comment - - edited

          my mariadb cluster 5.5.29 with galera 23.2.4 also crashed, and cannot get stack info.

          130425 9:04:25 [Warning] WSREP: last seen seqno below limit for trx source: d07d44b1-acab-11e2-0800-a4b4a09fdbf6 version: 2 local: 1 state: CERTIFYING flags: 129 conn_id: 5 trx_id: 23378603 seqnos (l: 89318, g: 95459, s: 95448, d: -1, ts: 1366851863915610973)
          130425 9:58:02 [ERROR] mysqld got signal 11 ;
          This could be because you hit a bug. It is also possible that this binary
          or one of the libraries it was linked against is corrupt, improperly built,
          or misconfigured. This error can also be caused by malfunctioning hardware.

          To report this bug, see http://kb.askmonty.org/en/reporting-bugs

          We will try our best to scrape up some info that will hopefully help
          diagnose the problem, but since we have already crashed,
          something is definitely wrong and this may fail.

          Server version: 5.5.29-MariaDB-log
          key_buffer_size=134217728
          read_buffer_size=131072
          max_used_connections=25
          max_threads=1026
          thread_count=22
          It is possible that mysqld could use up to
          key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2378518 K bytes of memory
          Hope that's ok; if not, decrease some variables in the equation.

          Thread pointer: 0x0x7f86c00428d0
          Attempting backtrace. You can use the following information to find out
          where mysqld died. If you see no messages after this, something went
          terribly wrong...
          stack_bottom = 0x7f86b412dd68 thread_stack 0x48000
          :0()[0x7f86bb91774e]
          :0()[0x7f86bb55f1ec]
          :0()[0x7f86bac7d500]
          :0()[0x7f86bb4c3d08]
          :0()[0x7f86bb415e65]
          :0()[0x7f86bb41655c]
          :0()[0x7f867a7e02ad]
          :0()[0x7f867a7e362f]
          :0()[0x7f867a7f9b06]
          :0()[0x7f86bb41dfcb]
          :0()[0x7f86bb41ec50]
          :0()[0x7f86bb4209c7]
          :0()[0x7f86bb421283]
          :0()[0x7f86bb4d1c47]
          :0()[0x7f86bb4d1e21]
          :0()[0x7f86bac75851]
          :0()[0x7f86b95e990d]

          Trying to get some variables.
          Some pointers may be invalid and cause the dump to abort.
          Query (0x7f8618004c38): is an invalid pointer
          Connection ID (thread ID): 15
          Status: NOT_KILLED

          Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off

          The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
          information that should help you find out what is causing the crash.
          130425 09:58:03 mysqld_safe Number of processes running now: 0
          130425 09:58:03 mysqld_safe WSREP: not restarting wsrep node automatically
          130425 09:58:03 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
          130425 10:03:30 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql

          and my.cnf content:

          [mysqld]
          datadir=/var/lib/mysql
          socket=/var/lib/mysql/mysql.sock
          user=mysql

          1. Disabling symbolic-links is recommended to prevent assorted security risks
            symbolic-links=0

          wsrep_provider=/usr/lib64/galera/libgalera_smm.so
          wsrep_cluster_address=gcomm://
          wsrep_node_address=gcomm://192.168.100.1
          wsrep_sst_auth=galera:passwd
          wsrep_node_incoming_address=192.168.100.1
          wsrep_sst_receive_address=192.168.100.1

          general_log=1
          general_log_file=/var/log/mysql/mysql.log
          slow_query_log=1
          slow_query_log_file=/var/log/mysql/mysql-slow.log
          long_query_time=1

          skip-name-resolve
          character-set-server=utf8
          max_connection=1024
          #max_connect_errors=1000

          #log-bin=/var/log/mysql/mysql-bin
          #expire_logs_day=3

          innodb_buffer_pool_size = 512M
          innodb_file_per_table = 1

          [mysqld_safe]
          log-error=/var/log/mysqld.log
          pid-file=/var/run/mysqld/mysqld.pid

          guoxiaoqiao Xiaoqiao Guo (Inactive) added a comment - - edited my mariadb cluster 5.5.29 with galera 23.2.4 also crashed, and cannot get stack info. 130425 9:04:25 [Warning] WSREP: last seen seqno below limit for trx source: d07d44b1-acab-11e2-0800-a4b4a09fdbf6 version: 2 local: 1 state: CERTIFYING flags: 129 conn_id: 5 trx_id: 23378603 seqnos (l: 89318, g: 95459, s: 95448, d: -1, ts: 1366851863915610973) 130425 9:58:02 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see http://kb.askmonty.org/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 5.5.29-MariaDB-log key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=25 max_threads=1026 thread_count=22 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2378518 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0x7f86c00428d0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f86b412dd68 thread_stack 0x48000 :0( ) [0x7f86bb91774e] :0( ) [0x7f86bb55f1ec] :0( ) [0x7f86bac7d500] :0( ) [0x7f86bb4c3d08] :0( ) [0x7f86bb415e65] :0( ) [0x7f86bb41655c] :0( ) [0x7f867a7e02ad] :0( ) [0x7f867a7e362f] :0( ) [0x7f867a7f9b06] :0( ) [0x7f86bb41dfcb] :0( ) [0x7f86bb41ec50] :0( ) [0x7f86bb4209c7] :0( ) [0x7f86bb421283] :0( ) [0x7f86bb4d1c47] :0( ) [0x7f86bb4d1e21] :0( ) [0x7f86bac75851] :0( ) [0x7f86b95e990d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x7f8618004c38): is an invalid pointer Connection ID (thread ID): 15 Status: NOT_KILLED Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 130425 09:58:03 mysqld_safe Number of processes running now: 0 130425 09:58:03 mysqld_safe WSREP: not restarting wsrep node automatically 130425 09:58:03 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended 130425 10:03:30 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql and my.cnf content: [mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock user=mysql Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 wsrep_provider=/usr/lib64/galera/libgalera_smm.so wsrep_cluster_address=gcomm:// wsrep_node_address=gcomm://192.168.100.1 wsrep_sst_auth=galera:passwd wsrep_node_incoming_address=192.168.100.1 wsrep_sst_receive_address=192.168.100.1 general_log=1 general_log_file=/var/log/mysql/mysql.log slow_query_log=1 slow_query_log_file=/var/log/mysql/mysql-slow.log long_query_time=1 skip-name-resolve character-set-server=utf8 max_connection=1024 #max_connect_errors=1000 #log-bin=/var/log/mysql/mysql-bin #expire_logs_day=3 innodb_buffer_pool_size = 512M innodb_file_per_table = 1 [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid

          it crashed when run a little time, total crashed 3 time this week.

          guoxiaoqiao Xiaoqiao Guo (Inactive) added a comment - it crashed when run a little time, total crashed 3 time this week.
          seppo Seppo Jaakola added a comment -

          @Xiaoqiao, it looks like your issue is not related to the slave processing bug reported in this tracker.

          To troubleshoot your crash further, make first sure that you have binlog_format=ROW.
          And, if you can enable core files, it helps in analyzing your crash.
          As you have general log enabled, Can you see if there is a specific SQL statement, which always triggers this crash?

          seppo Seppo Jaakola added a comment - @Xiaoqiao, it looks like your issue is not related to the slave processing bug reported in this tracker. To troubleshoot your crash further, make first sure that you have binlog_format=ROW. And, if you can enable core files, it helps in analyzing your crash. As you have general log enabled, Can you see if there is a specific SQL statement, which always triggers this crash?

          No feedback, so closing

          serg Sergei Golubchik added a comment - No feedback, so closing

          People

            sachin.setiya.007 Sachin Setiya (Inactive)
            aleksey.sanin Aleksey Sanin (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.