Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24437

Galera 4 read node crashes after DML statement from writer node

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Duplicate
    • 10.5.8
    • N/A
    • Galera
    • None

    Description

      In a 3-node cluster I experienced the read nodes crashed from a DML statement from the writer node. The nodes did not send a SELF-LEAVE, the last node became Non-PRIMARY thus therefore the entire cluster was down.

      This cluster ran without big issues for some time. My guess changing of `wsrep_slave_threads` to 24, combined with high load from batch processing, caused this issue to appear. I will try changing this back to 1, the default, to see if the issue persists. For now this system runs stable as a single node.

      When the crash happened the node was NOT used for reads. The only traffic going on were write transactions from the write node.

      This is a snippet from the log on the slave nodes that crashed:

      root@customer-db02:~# zgrep mariadb /var/log/syslog|more
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: 2020-12-18  2:57:57 15 [ERROR] InnoDB: Conflicting lock on table: `customer`.`customer_customer_table` index: PRIMARY that has lock
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: RECORD LOCKS space id 134 page no 84422 n bits 296 index PRIMARY of table `customer`.`customer_ingestiontime` trx id 54797528 lock mode S
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
      Dec 18 02:57:57 customer-db02 mariadbd[159135]:  0: len 8; hex 73757072656d756d; asc supremum;;
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: 2020-12-18  2:57:57 15 [ERROR] InnoDB: WSREP state:
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: 2020-12-18  2:57:57 15 [ERROR] WSREP: Thread BF trx_id: 54797528 thread: 9 seqno: 27251360 client_state: exec client_mode: high priority transaction_mode: committing applier: 1 toi: 
      0 local: 0 query: NULL
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: 2020-12-18 02:57:57 0x7ff404137700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.8/storage/innobase/lock/lock0lock.cc line 655
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: InnoDB: We intentionally generate a memory trap.
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: InnoDB: If you get repeated assertion failures or crashes, even
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: InnoDB: immediately after the mysqld startup, there may be
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: InnoDB: corruption in the InnoDB tablespace. Please refer to
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: InnoDB: about forcing recovery.
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: 201218  2:57:57 [ERROR] mysqld got signal 6 ;
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: This could be because you hit a bug. It is also possible that this binary
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: or one of the libraries it was linked against is corrupt, improperly built,
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: or misconfigured. This error can also be caused by malfunctioning hardware.
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: We will try our best to scrape up some info that will hopefully help
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: diagnose the problem, but since we have already crashed,
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: something is definitely wrong and this may fail.
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: Server version: 10.5.8-MariaDB-1:10.5.8+maria~focal-log
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: key_buffer_size=134217728
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: read_buffer_size=131072
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: max_used_connections=4
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: max_threads=3002
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: thread_count=31
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: It is possible that mysqld could use up to
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6739644 K  bytes of memory
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: Hope that's ok; if not, decrease some variables in the equation.
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: Thread pointer: 0x7fef88000c58
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: Attempting backtrace. You can use the following information to find out
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: where mysqld died. If you see no messages after this, something went
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: terribly wrong...
      Dec 18 02:57:57 customer-db02 mariadbd[159135]: stack_bottom = 0x7ff404136d98 thread_stack 0x49000
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(my_print_stacktrace+0x32)[0x5640ca7e1692]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: Printing to addr2line failed
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x5640ca238e45]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7ff540fe53c0]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7ff540aec18b]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7ff540acb859]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0x62da5c)[0x5640c9efda5c]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0x608e17)[0x5640c9ed8e17]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0xcb7b3a)[0x5640ca587b3a]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0xcbe09b)[0x5640ca58e09b]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0xdc68f8)[0x5640ca6968f8]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0xd19be5)[0x5640ca5e9be5]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0xd1b1ca)[0x5640ca5eb1ca]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0xd1ba44)[0x5640ca5eba44]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0xd2c711)[0x5640ca5fc711]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0xc70735)[0x5640ca540735]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(_ZN7handler12ha_write_rowEPKh+0x188)[0x5640ca247bd8]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(_ZN14Rows_log_event9write_rowEP14rpl_group_infob+0x184)[0x5640ca3621b4]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(_ZN20Write_rows_log_event11do_exec_rowEP14rpl_group_info+0x81)[0x5640ca3627a1]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x27f)[0x5640ca35775f]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(_Z18wsrep_apply_eventsP3THDP14Relay_log_infoPKvm+0x1e9)[0x5640ca50ae79]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(_ZN21Wsrep_applier_service15apply_write_setERKN5wsrep7ws_metaERKNS0_12const_bufferERNS0_14mutable_bufferE+0xab)[0x5640ca4f386b]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0xf94270)[0x5640ca864270]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(_ZN5wsrep12server_state8on_applyERNS_21high_priority_serviceERKNS_9ws_handleERKNS_7ws_metaERKNS_12const_bufferE+0xc1)[0x5640ca8652c1]
      Dec 18 02:57:58 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0xfa4a3c)[0x5640ca874a3c]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /usr/lib/galera/libgalera_smm.so(+0x1b6de5)[0x7ff5407c7de5]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /usr/lib/galera/libgalera_smm.so(+0x1fb6a2)[0x7ff54080c6a2]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /usr/lib/galera/libgalera_smm.so(+0x203548)[0x7ff540814548]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /usr/lib/galera/libgalera_smm.so(+0x1d5e33)[0x7ff5407e6e33]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /usr/lib/galera/libgalera_smm.so(+0x1d67fb)[0x7ff5407e77fb]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /usr/lib/galera/libgalera_smm.so(+0x1d6a02)[0x7ff5407e7a02]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /usr/lib/galera/libgalera_smm.so(+0x200610)[0x7ff540811610]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /usr/lib/galera/libgalera_smm.so(+0x21cc01)[0x7ff54082dc01]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x5640ca874fc2]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0xc3cf37)[0x5640ca50cf37]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(_Z15start_wsrep_THDPv+0x267)[0x5640ca4fe5e7]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /usr/sbin/mariadbd(+0xbbe266)[0x5640ca48e266]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7ff540fd9609]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7ff540bc8293]
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Trying to get some variables.
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Some pointers may be invalid and cause the dump to abort.
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Query (0x7ff4e9fe1630): insert into customer_customer_table (created, created_by, modified, modified_by, is_absent, customer_id, completed_at, completed_by, ingestion_time, consumer
      id) values ('2020-12-18 02:57:52.47', 'Anonymous', '2020-12-18 02:57:52.47', 'Anonymous', 0, NULL, NULL, NULL, '2021-01-09 22:00:00', 41769)
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Connection ID (thread ID): 15
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Status: NOT_KILLED
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_cond
      ition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,m
      rr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equal
      ities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: information that should help you find out what is causing the crash.
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Writing a core file...
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Working directory at /var/lib/mysql
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Resource Limits:
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Limit                     Soft Limit           Hard Limit           Units
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max cpu time              unlimited            unlimited            seconds
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max file size             unlimited            unlimited            bytes
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max data size             unlimited            unlimited            bytes
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max stack size            8388608              unlimited            bytes
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max core file size        0                    unlimited            bytes
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max resident set          unlimited            unlimited            bytes
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max processes             96078                96078                processes
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max open files            16384                16384                files
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max locked memory         65536                65536                bytes
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max address space         unlimited            unlimited            bytes
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max file locks            unlimited            unlimited            locks
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max pending signals       96078                96078                signals
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max msgqueue size         819200               819200               bytes
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max nice priority         0                    0
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max realtime priority     0                    0
      Dec 18 02:57:59 customer-db02 mariadbd[159135]: Max realtime timeout      unlimited            unlimited            us
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              michaeldg Michaël de groot
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.