[MDEV-24502] MariaDB 10.2.35 crashes after conflicting lock during DELETE Created: 2020-12-30  Updated: 2021-01-14  Resolved: 2021-01-14

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.2.35
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: jmox Assignee: Jan Lindström (Inactive)
Resolution: Duplicate Votes: 0
Labels: crash
Environment:

CentOS 7.9


Attachments: File 20201229-node1-mariadb.err     File 20201229-node2-mariadb.err     File 20201229-node3-mariadb.err    
Issue Links:
Relates
relates to MDEV-23851 Galera assertion at lock0lock.cc line... Closed

 Description   

Hello,

Edit: This may be related to MDEV-23851 but we would like to have confirmation from your side.

We upgraded MariaDB from 10.2.24 to 10.2.35 and the nodes in Cluster started crashing one day after the update. It seems to happen when there is a conflicting lock during a DELETE.

It's a 3-Nodes cluster. Every single node may crash a couple of times during a day. It also resulted in a crash of the whole cluster few times during the last two weeks.

log-node1

2020-12-29  8:40:08 140172980565760 [ERROR] InnoDB: Conflicting lock on table: `$DB`.`$TABLE1` index: GEN_CLUST_INDEX that has lock
RECORD LOCKS space id 945 page no 3 n bits 168 index GEN_CLUST_INDEX of table `$DB`.`$TABLE1` trx id 275420837 lock_mode X locks rec but not gap
Record lock, heap no 2
Record lock, heap no 98
2020-12-29  8:40:08 140172980565760 [ERROR] InnoDB: WSREP state:
2020-12-29  8:40:08 140172980565760 [ERROR] WSREP: Thread BF trx_id: 275420838 thread: 2 seqno: 87923500 query_state: executing conf_state: no conflict exec_mode: applier applier: 1 query: DELETE FROM process_id
        WHERE process_name = 'my-process'
        AND process_host = 'my-app.example.com'XÝê_
2020-12-29  8:40:08 140172980565760 [ERROR] WSREP: Thread BF trx_id: 275420837 thread: 10 seqno: 87923499 query_state: executing conf_state: no conflict exec_mode: applier applier: 1 query: DELETE FROM process_id
        WHERE process_name = 'my-process-2'
        AND process_host = 'my-app.example.com'XÝê_
2020-12-29 08:40:08 0x7f7c90b6b700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694

Stack:

Server version: 10.2.35-MariaDB-log
key_buffer_size=268435456
read_buffer_size=2097152
max_used_connections=15
max_threads=502
thread_count=26
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2329025 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f7c780009a8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f7c90b6ad20 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55587dc621ee]
/usr/sbin/mysqld(handle_fatal_signal+0x30d)[0x55587d6ff04d]
/lib64/libpthread.so.0(+0xf630)[0x7f7c9b2c0630]
:0(__GI_raise)[0x7f7c99590387]
:0(__GI_abort)[0x7f7c99591a78]
/usr/sbin/mysqld(+0x44918e)[0x55587d4a418e]
/usr/sbin/mysqld(+0x87cd6d)[0x55587d8d7d6d]
/usr/sbin/mysqld(+0x87d9b4)[0x55587d8d89b4]
/usr/sbin/mysqld(+0x884145)[0x55587d8df145]
/usr/sbin/mysqld(+0x884b2a)[0x55587d8dfb2a]
/usr/sbin/mysqld(+0x91a1ba)[0x55587d9751ba]
/usr/sbin/mysqld(+0x91d48f)[0x55587d97848f]
/usr/sbin/mysqld(+0x849855)[0x55587d8a4855]
/usr/sbin/mysqld(+0x82cbb7)[0x55587d887bb7]
/usr/sbin/mysqld(+0x841cc9)[0x55587d89ccc9]
/usr/sbin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x1c7)[0x55587d703c37]
/usr/sbin/mysqld(_ZN14Rows_log_event8find_rowEP14rpl_group_info+0x50e)[0x55587d800efe]
/usr/sbin/mysqld(_ZN21Delete_rows_log_event11do_exec_rowEP14rpl_group_info+0x8e)[0x55587d80100e]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x2fd)[0x55587d7f3e8d]
/usr/sbin/mysqld(wsrep_apply_cb+0x482)[0x55587d6a48c2]
src/trx_handle.cpp:312(galera::TrxHandle::apply(void*, wsrep_cb_status (*)(void*, void const*, unsigned long, unsigned int, wsrep_trx_meta const*), wsrep_trx_meta const&) const)[0x7f7c93d47ef8]
src/replicator_smm.cpp:92(apply_trx_ws(void*, wsrep_cb_status (*)(void*, void const*, unsigned long, unsigned int, wsrep_trx_meta const*), wsrep_cb_status (*)(void*, unsigned int, wsrep_trx_meta const*, bool*, bool), galera::TrxHandle const&, wsrep_trx_meta const&))[0x7f7c93d856f3]
src/replicator_smm.cpp:458(galera::ReplicatorSMM::apply_trx(void*, galera::TrxHandle*))[0x7f7c93d8877c]
src/replicator_smm.cpp:1258(galera::ReplicatorSMM::process_trx(void*, galera::TrxHandle*))[0x7f7c93d8b99e]
src/gcs_action_source.cpp:116(galera::GcsActionSource::dispatch(void*, gcs_action const&, bool&))[0x7f7c93d67078]
src/gcs_action_source.cpp:28(~Release)[0x7f7c93d6876c]
src/replicator_smm.cpp:362(galera::ReplicatorSMM::async_recv(void*))[0x7f7c93d8bf7b]
src/wsrep_provider.cpp:271(galera_recv)[0x7f7c93d99f38]
/usr/sbin/mysqld(+0x64a976)[0x55587d6a5976]
/usr/sbin/mysqld(start_wsrep_THD+0x3eb)[0x55587d698c5b]
pthread_create.c:0(start_thread)[0x7f7c9b2b8ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f7c9965896d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f7c85a75fcb): DELETE FROM process_id
        WHERE process_name = 'my-process'
        AND process_host = 'my-app.example.com'
 
Connection ID (thread ID): 2
Status: NOT_KILLED

The following packages are installed on the servers:

galera-25.3.31-1.el7.centos.x86_64
MariaDB-client-10.2.36-1.el7.centos.x86_64
MariaDB-compat-10.2.36-1.el7.centos.x86_64
MariaDB-common-10.2.36-1.el7.centos.x86_64
MariaDB-server-10.2.35-1.el7.centos.x86_64

Taking 29.12.2020 as example, when the monitoring system alarmed few times about node1 and node2 with the following message:

wsrep_cluster_status: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock'

It happened on node1 during December, 29th at:

  • 08:40am
  • 09:40am
  • 03:50pm

Also on Dec, 29th on node2:

  • 12:40am
  • 12:50am

MariaDB crashed more times, though.

20201229-node1-mariadb.err

2020-12-29 08:40:08 0x7f7c90b6b700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 08:40:19 0x7fb11796e700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 09:40:12 0x7f937c61b700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 15:50:19 0x7f06184cf700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 16:30:02 0x7f80c7722700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 21:50:07 0x7f7d9062e700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694

20201229-node2-mariadb.err

2020-12-29 00:40:12 0x7f19f84c6700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 00:50:13 0x7f77084b1700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 00:50:26 0x7febec0bc700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 00:50:38 0x7f43207d5700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 00:50:52 0x7fcc8c395700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 00:51:03 0x7faa98c43700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 09:00:09 0x7f80983b7700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 14:10:11 0x7f2890261700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 22:40:10 0x7f296120d700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694

20201229-node3-mariadb.err

2020-12-29 08:10:06 0x7faa6473d700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 16:40:03 0x7f3d78084700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694
2020-12-29 19:50:18 0x7f8944225700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.35/storage/innobase/lock/lock0lock.cc line 694

Attached the obfuscated logs from 29.12.2020 of all three nodes.

Is there any known workaround to avoid further crashes? I couldn't find any.

Many thanks in advance.



 Comments   
Comment by Jan Lindström (Inactive) [ 2021-01-14 ]

Duplicate of MDEV-23851

Generated at Thu Feb 08 09:30:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.