Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25378

MariaDB 10.4.18 (galera) crash

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.4.18
    • Fix Version/s: 10.4.19
    • Component/s: Galera
    • Labels:
      None
    • Environment:
      Debian GNU/Linux 8.11 (jessie) x86_64

      Description

      Yesterday my entire cluster crashed and was eventually reinitialised from scratch. In the process we also updated MariaDB on some of the servers. This afternoon I had this crash on node 5, which then took out 4 other nodes, leaving only 1 (everything recovered luckily)

      MariaDB 10.4.18
      galera: 4.7(ree4f10fc)

      The crash that started it seems to have been:
      (some information obfuscated)
      ```
      2021-04-09 14:37:31 1 [ERROR] InnoDB: Conflicting lock on table: ``.`table1` index: PRIMARY that has lock
      RECORD LOCKS space id 267 page no 21525 n bits 120 index PRIMARY of table ``.`table1` trx id 196886519 lock mode S locks rec but not gap
      Record lock, heap no 47 PHYSICAL RECORD: n_fields 8; compact format; info bits 0
      0: len 8; hex 000000000057d9c6; asc W ;;
      1: len 6; hex 000000000000; asc ;;
      2: len 7; hex 80000000000000; asc ;;
      3: len 4; hex 8000166d; asc m;;
      4: SQL NULL;
      5: len 11; hex 424d5f5748415453415050; asc astring;;
      6: len 13; hex 2b393730353637353535383033; asc astring;;
      7: len 1; hex 80; asc ;;

      2021-04-09 14:37:31 1 [ERROR] InnoDB: WSREP state:
      2021-04-09 14:37:31 1 [ERROR] WSREP: Thread BF trx_id: 196886521 thread: 1 seqno: 99055 client_state: exec client_mode: high priority transaction_mode: replaying applier: 1 toi: 0 local: 0 query: UPDATE table1 SET `profileId` = number, `isBot` = 0 WHERE `serviceId` = number AND `channelId` = 'astring' AND `channelUserId` = 'astring'?Jp`
      2021-04-09 14:37:31 1 [ERROR] WSREP: Thread BF trx_id: 196886519 thread: 34 seqno: 99054 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: INSERT INTO table2 (`uuid`, `type`, `conversationId`, `timestamp`, `channelUserId`, `channelAccountId`) VALUES ('guid', 'string', 11731790, 1617971851864, 5546465, 10799)?Jp`
      2021-04-09 14:37:31 0x7f53edbe1700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.4.18/storage/innobase/lock/lock0lock.cc line 706
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mysqld startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
      InnoDB: about forcing recovery.
      210409 14:37:31 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see https://mariadb.com/kb/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 10.4.18-MariaDB-1:10.4.18+maria~bionic-log
      key_buffer_size=134217728
      read_buffer_size=2097152
      max_used_connections=129
      max_threads=65537
      thread_count=78
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 403037846 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x7f533005de18
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f53edbe0d28 thread_stack 0x49000
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5605d336452e]
      /usr/sbin/mysqld(handle_fatal_signal+0x515)[0x5605d2dda2d5]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7f59e4c648a0]
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f59e3598f47]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f59e359a8b1]
      /usr/sbin/mysqld(+0x5889fd)[0x5605d2ae09fd]
      /usr/sbin/mysqld(+0xa71d49)[0x5605d2fc9d49]
      /usr/sbin/mysqld(+0xa72ecd)[0x5605d2fcaecd]
      /usr/sbin/mysqld(+0xa782d1)[0x5605d2fd02d1]
      /usr/sbin/mysqld(+0xb10c37)[0x5605d3068c37]
      /usr/sbin/mysqld(+0xb16d74)[0x5605d306ed74]
      /usr/sbin/mysqld(+0xa353b2)[0x5605d2f8d3b2]
      /usr/sbin/mysqld(ZN7handler10ha_rnd_posEPhS0+0x132)[0x5605d2ddf562]
      /usr/sbin/mysqld(+0x7a47c3)[0x5605d2cfc7c3]
      /usr/sbin/mysqld(_ZN14Rows_log_event8find_rowEP14rpl_group_info+0x3a0)[0x5605d2edc4e0]
      /usr/sbin/mysqld(_ZN21Update_rows_log_event11do_exec_rowEP14rpl_group_info+0xe0)[0x5605d2edcb90]
      /usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x237)[0x5605d2ecfcb7]
      /usr/sbin/mysqld(_Z18wsrep_apply_eventsP3THDP14Relay_log_infoPKvm+0x1d8)[0x5605d2d4d488]
      /usr/sbin/mysqld(_ZN22Wsrep_replayer_service15apply_write_setERKN5wsrep7ws_metaERKNS0_12const_bufferERNS0_14mutable_bufferE+0xf4)[0x5605d2d344a4]
      /usr/sbin/mysqld(+0xe83432)[0x5605d33db432]
      /usr/sbin/mysqld(_ZN5wsrep12server_state8on_applyERNS_21high_priority_serviceERKNS_9ws_handleERKNS_7ws_metaERKNS_12const_bufferE+0x155)[0x5605d33dc245]
      /usr/sbin/mysqld(+0xe98af4)[0x5605d33f0af4]
      /usr/lib/galera/libgalera_smm.so(+0x30fb5)[0x7f59e157cfb5]
      /usr/lib/galera/libgalera_smm.so(+0x4845c)[0x7f59e159445c]
      /usr/lib/galera/libgalera_smm.so(+0x1fd75)[0x7f59e156bd75]
      /usr/sbin/mysqld(_ZN5wsrep18wsrep_provider_v266replayERKNS_9ws_handleEPNS_21high_priority_serviceE+0x29)[0x5605d33f1459]
      /usr/sbin/mysqld(_ZN20Wsrep_client_service6replayEv+0xf4)[0x5605d2d340d4]
      /usr/sbin/mysqld(_ZN5wsrep11transaction6replayERSt11unique_lockINS_5mutexEE+0x96)[0x5605d33ea996]
      /usr/sbin/mysqld(_ZN5wsrep11transaction15after_statementEv+0x1b4)[0x5605d33ec5a4]
      /usr/sbin/mysqld(_ZN5wsrep12client_state15after_statementEv+0xb9)[0x5605d33cf229]
      /usr/sbin/mysqld(+0x6757ad)[0x5605d2bcd7ad]
      /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x22d2)[0x5605d2bd01e2]
      /usr/sbin/mysqld(_Z10do_commandP3THD+0x11c)[0x5605d2bd0aec]
      /usr/sbin/mysqld(_Z11tp_callbackP13TP_connection+0xef)[0x5605d2dab70f]
      /usr/sbin/mysqld(+0xa0b3e0)[0x5605d2f633e0]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f59e4c596db]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f59e367ba3f]
      ```

      other nodes then tried to recover, did quorum checks and start state transfer routines, which caused 4 of them to crash with this stack trace
      ```
      =================================================
      2021-04-09 14:37:44 17 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      210409 14:37:44 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see https://mariadb.com/kb/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 10.4.18-MariaDB-1:10.4.18+maria~bionic-log
      key_buffer_size=134217728
      read_buffer_size=2097152
      max_used_connections=91
      max_threads=65537
      thread_count=83
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 403037846 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x7f74ec000c08
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f75741c9e08 thread_stack 0x49000
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5632ddda152e]
      /usr/sbin/mysqld(handle_fatal_signal+0x515)[0x5632dd8172d5]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7f7b5ac268a0]
      /usr/sbin/mysqld(+0x60843d)[0x5632dd59d43d]
      /usr/sbin/mysqld(_Z11lock_tablesP3THDP10TABLE_LISTjj+0x4a0)[0x5632dd5a3460]
      /usr/sbin/mysqld(_Z20open_and_lock_tablesP3THDRK14DDL_options_stP10TABLE_LISTbjP19Prelocking_strategy+0x8a)[0x5632dd5a4eea]
      /usr/sbin/mysqld(_Z24open_n_lock_single_tableP3THDP10TABLE_LIST13thr_lock_typejP19Prelocking_strategy+0x41)[0x5632dd5a4f91]
      /usr/sbin/mysqld(+0x7f78a7)[0x5632dd78c8a7]
      /usr/sbin/mysqld(+0x7f8324)[0x5632dd78d324]
      /usr/sbin/mysqld(_ZNK12Wsrep_schema12restore_viewEP3THDRKN5wsrep2idE+0x24b)[0x5632dd79039b]
      /usr/sbin/mysqld(_ZN20Wsrep_server_service8log_viewEPN5wsrep21high_priority_serviceERKNS0_4viewE+0x700)[0x5632dd999dc0]
      /usr/sbin/mysqld(_ZN5wsrep12server_state7on_viewERKNS_4viewEPNS_21high_priority_serviceE+0x3c5)[0x5632dde1eef5]
      /usr/sbin/mysqld(+0xe9bc1c)[0x5632dde30c1c]
      /usr/lib/galera/libgalera_smm.so(+0x3ee89)[0x7f7b5754ce89]
      /usr/lib/galera/libgalera_smm.so(+0x3fc98)[0x7f7b5754dc98]
      /usr/lib/galera/libgalera_smm.so(+0x4b188)[0x7f7b57559188]
      /usr/lib/galera/libgalera_smm.so(+0x4b3be)[0x7f7b575593be]
      /usr/lib/galera/libgalera_smm.so(+0x7c848)[0x7f7b5758a848]
      /usr/lib/galera/libgalera_smm.so(+0x7caa2)[0x7f7b5758aaa2]
      /usr/lib/galera/libgalera_smm.so(+0x49890)[0x7f7b57557890]
      /usr/lib/galera/libgalera_smm.so(+0x1c2ab)[0x7f7b5752a2ab]
      /usr/sbin/mysqld(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0xe)[0x5632dde2e1de]
      /usr/sbin/mysqld(+0x7f758f)[0x5632dd78c58f]
      /usr/sbin/mysqld(_Z15start_wsrep_THDPv+0x413)[0x5632dd77bf33]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f7b5ac1b6db]
      2021-04-09 14:37:46 0 [Note] WSREP: (1cc929aa-b399, 'tcp://0.0.0.0:4567') reconnecting to dd6e91d4-a06b (tcp://10.30.126.13:4567), attempt 0
      2021-04-09 14:37:46 0 [Note] WSREP: (1cc929aa-b399, 'tcp://0.0.0.0:4567') reconnecting to cc37987d-b641 (tcp://10.35.126.15:4567), attempt 0
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f7b5963da3f]

      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x0): (null)
      Connection ID (thread ID): 17
      Status: NOT_KILLED
      ```

      This all looks very not normal... I'm not sure how to analyse this.
      The only thing that I could find that looks a little bit similar is MDEV-23851

      Any help with figuring out what this stack trace means would be appreciated

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jplindst Jan Lindström
              Reporter:
              djhartman Derk-Jan Hartman
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.