[MDEV-19775] Bug in WSREP/Galera with virtual columns (keeps crashing on startup) Created: 2019-06-16  Updated: 2021-12-23  Resolved: 2021-12-23

Status: Closed
Project: MariaDB Server
Component/s: Galera, Storage Engine - InnoDB, Virtual Columns
Affects Version/s: 10.3.15
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Piotr Nizynski Assignee: Jan Lindström (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-14643 InnoDB: Failing assertion: !cursor->... Closed
relates to MDEV-22759 Failing assertion: !cursor->index->is... Confirmed
relates to MDEV-17466 Virtual column value not available du... Closed
relates to MDEV-19338 InnoDB: Failing assertion: !cursor->i... Closed

 Description   

2019-06-16 22:48:34 2 [ERROR] InnoDB: Record in index `serverBitmap-5` of table `xxxxx`.`mt_log` was not found on update: TUPLE (info_bits=0, 2 fields):

{NULL,[4] (0x000000A9)}

at: COMPACT RECORD(info_bits=0, 1 fields):

{[8]infimum (0x696E66696D756D00)}

2019-06-16 22:48:34 0x7f0e341d6700 InnoDB: Assertion failure in file /build/mariadb-10.3-A1CJUH/mariadb-10.3-10.3.15/storage/innobase/row/row0ins.cc line 270
InnoDB: Failing assertion: !cursor->index->is_committed()
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
190616 22:48:34 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.3.15-MariaDB-1-log
key_buffer_size=16384
read_buffer_size=131072
max_used_connections=0
max_threads=258
thread_count=9
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 567206 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f0e10000c08
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f0e341d5e08 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x56186e8aaa5e]
/usr/sbin/mysqld(handle_fatal_signal+0x54d)[0x56186e414fcd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f0e395ca730]
2019-06-16 22:48:34 11 [Warning] Access denied for user 'debian-sys-maint'@'localhost' (using password: TAK)
2019-06-16 22:48:34 12 [Warning] Access denied for user 'debian-sys-maint'@'localhost' (using password: TAK)
2019-06-16 22:48:34 13 [Warning] Access denied for user 'debian-sys-maint'@'localhost' (using password: TAK)
linux/raise.c:51(__GI_raise)[0x7f0e38b877bb]
stdlib/abort.c:81(__GI_abort)[0x7f0e38b72535]
/usr/sbin/mysqld(+0x4bd708)[0x56186e15f708]
/usr/sbin/mysqld(+0x4ab372)[0x56186e14d372]
/usr/sbin/mysqld(+0x9814a1)[0x56186e6234a1]
/usr/sbin/mysqld(+0x9b3f7f)[0x56186e655f7f]
/usr/sbin/mysqld(+0x9b8d17)[0x56186e65ad17]
/usr/sbin/mysqld(+0x991754)[0x56186e633754]
/usr/sbin/mysqld(+0x8ec7ab)[0x56186e58e7ab]
/usr/sbin/mysqld(ZN7handler13ha_update_rowEPKhS1+0x102)[0x56186e41f762]
/usr/sbin/mysqld(_ZN21Update_rows_log_event11do_exec_rowEP14rpl_group_info+0x243)[0x56186e4fcbb3]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x22f)[0x56186e4f045f]
/usr/sbin/mysqld(wsrep_apply_cb+0x4d4)[0x56186e396f14]
/usr/lib/libgalera_smm.so(ZNK6galera9TrxHandle5applyEPvPF15wsrep_cb_statusS1_PKvmjPK14wsrep_trx_metaERS6+0xcb)[0x7f0e370d06ab]
/usr/lib/libgalera_smm.so(+0x228db3)[0x7f0e37113db3]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0x158)[0x7f0e37116e48]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM8recv_ISTEPv+0x28d)[0x7f0e3712780d]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM22request_state_transferEPvRK10wsrep_uuidlPKvl+0xbb6)[0x7f0e3712ca36]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM19process_conf_changeEPvRK15wsrep_view_infoiNS_10Replicator5StateEl+0x4f5)[0x7f0e3711af15]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x2bf)[0x7f0e370f60ef]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x98)[0x7f0e370f6c18]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x13d)[0x7f0e3711a67d]
/usr/lib/libgalera_smm.so(galera_recv+0x2b)[0x7f0e3713199b]
/usr/sbin/mysqld(+0x6f5db5)[0x56186e397db5]
/usr/sbin/mysqld(start_wsrep_THD+0x31f)[0x56186e38afdf]
nptl/pthread_create.c:487(start_thread)[0x7f0e395bffa3]
x86_64/clone.S:97(clone)[0x7f0e38c494cf]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f0e0c014cdb): UPDATE mt_log l1 INNER JOIN mt_log_import ON l1.id=lid SET serverBitmap=serverBitmap| NAME_CONST('serverMask',64)
Connection ID (thread ID): 2
Status: NOT_KILLED



 Comments   
Comment by Piotr Nizynski [ 2019-06-16 ]

serverBitmap-5 is a virtual column defined as serverBitmap & (1 << 5).

Comment by Marko Mäkelä [ 2019-08-30 ]

This looks very similar to MDEV-19338. Maybe fixing MDEV-17466 would fix also these?

Comment by Piotr Nizynski [ 2019-08-31 ]

Here's how I managed to get the cluster running (already on the first day when the report was filed):

  • I shut down the whole cluster
  • I disabled WSREP
  • I started one of the servers
  • on that server, I redefined the column as STORED instead of VIRTUAL
  • I reenabled WSREP and made the 2nd node join the cluster

(Unlike some comments in MDEV-19338 may suggest, I noticed that the bug is in fact strictly connected with WSREP being enabled. It needs both WSREP=on and a virtual column. In lack of any of them, the crash won't occur.)

Comment by Jan Lindström (Inactive) [ 2021-12-23 ]

piotrniz You should not disable WSREP for this operation it is not cluster safe. You should use wsrep_osu_method='RSU' instead.

Generated at Thu Feb 08 08:54:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.