Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.3.29
Description
We have a 3 node cluster in our UAT environment, with all traffic going to node 3, an incoming delete conflict causes nodes 1 & 2 to crash. This causes node 3 to go non-primary (as expected).
The crash is always on the non-write nodes (either one of them or both crash) that are applying the deletes concurrently.
the delete is always on the same table named "blobs" with a self-referencing foreign key:
Of note, the logged SQL for the conflict appears to have some garbled data on the end of it (that I can't quite capture in this form):
"SQL: DELETE FROM blobs WHERE id = '7432858'???`^S^F"
The table has been rebuilt with an alter table engine = innodb, yet the issue still occours.
The crashes started 5 days after we upgraded from:
10.3.24-MariaDB-1:10.3.24+maria~bionic-log
patch: wsrep_25.24
prov: 25.3.29(r3902)
... and on a related note but I must emphasise different cluster entirely; our Production cluster which has yet to be upgraded (version as above) is uttering "[ERROR] InnoDB: Record field 15 len 18446744073709551615" which I've traced back to the same collection of tables in the same schema. I've been unable to identify any corruption in the tables themselves (by selecting out data and forcing index usage). The UAT cluster for which this report relates hasn't uttered these messaged, but I've a sneeking suspicion there is some, even if loose relationship between the issues.
Detail of the crash in the UAT env
*************************** 1. row ***************************
|
Table: blobs
|
Create Table: CREATE TABLE `blobs` (
|
`id` int(11) NOT NULL AUTO_INCREMENT,
|
`original_blob_id` int(11) DEFAULT NULL,
|
`sys_name` varchar(100) DEFAULT NULL,
|
`storage_loc` varchar(50) DEFAULT NULL,
|
`storage_loc_pref` varchar(50) DEFAULT NULL,
|
`storage_loc_specific` varchar(50) DEFAULT NULL,
|
`save_path` varchar(255) DEFAULT NULL,
|
`file_url` varchar(255) DEFAULT NULL,
|
`filename` varchar(255) NOT NULL,
|
`filesize` int(11) NOT NULL,
|
`content_type` varchar(50) NOT NULL,
|
`authcode` varchar(50) NOT NULL,
|
`blob_hash` varchar(40) NOT NULL,
|
`is_media_upload` tinyint(1) NOT NULL,
|
`title` varchar(255) NOT NULL,
|
`dim_w` int(11) NOT NULL,
|
`dim_h` int(11) NOT NULL,
|
`date_created` datetime NOT NULL,
|
`is_temp` tinyint(1) NOT NULL,
|
PRIMARY KEY (`id`),
|
KEY `IDX_896C3E356BBE2052` (`original_blob_id`),
|
KEY `authcode_idx` (`authcode`),
|
KEY `storage_loc_idx` (`storage_loc`,`storage_loc_pref`),
|
KEY `sys_name_idx` (`sys_name`),
|
KEY `date_created_idx` (`date_created`,`is_temp`),
|
KEY `storage_loc_pref_idx` (`storage_loc_pref`),
|
CONSTRAINT `FK_896C3E356BBE2052` FOREIGN KEY (`original_blob_id`) REFERENCES `blobs` (`id`) ON DELETE SET NULL
|
) ENGINE=InnoDB AUTO_INCREMENT=7700386 DEFAULT CHARSET=utf
|
node 1 crash:
|
2021-06-22 12:35:02 11 [Note] WSREP: cluster conflict due to high priority abort for threads:
|
2021-06-22 12:35:02 11 [Note] WSREP: Winning thread:
|
THD: 11, mode: applier, state: executing, conflict: no conflict, seqno: 89327026
|
SQL: DELETE FROM blobs WHERE id = '7432858'???`^S^F
|
2021-06-22 12:35:02 11 [Note] WSREP: Victim thread:
|
THD: 15, mode: applier, state: idle, conflict: no conflict, seqno: 89327028
|
SQL: NULL
|
2021-06-22 12:35:02 0 [ERROR] WSREP: Trx 89327026 tries to abort slave trx 89327028. This could be caused by:
|
1) unsupported configuration options combination, please check documentation.
|
2) a bug in the code.
|
3) a database corruption.
|
Node consistency compromized, need to abort. Restart the node to resync with cluster.
|
210622 12:35:02 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.3.29-MariaDB-1:10.3.29+maria~bionic-log
|
key_buffer_size=134217728
|
read_buffer_size=134217728
|
max_used_connections=26
|
max_threads=1002
|
thread_count=24
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 262821945 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x0 thread_stack 0x12c00000
|
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x560ee4349b8e]
|
/usr/sbin/mysqld(handle_fatal_signal+0x515)[0x560ee3dde8c5]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f9ca6c54980]
|
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f9ca688ffb7]
|
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f9ca6891921]
|
/usr/sbin/mysqld(+0x90a914)[0x560ee3f68914]
|
/usr/sbin/mysqld(+0x90f83c)[0x560ee3f6d83c]
|
/usr/sbin/mysqld(handle_manager+0x1f3)[0x560ee3bf0a83]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f9ca6c496db]
|
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f9ca697271f]
|
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
|
information that should help you find out what is causing the crash.
|
Writing a core file...
|
Working directory at /srv/galera-uat/mysql
|
Resource Limits:
|
Limit Soft Limit Hard Limit Units
|
Max cpu time unlimited unlimited seconds
|
Max file size unlimited unlimited bytes
|
Max data size unlimited unlimited bytes
|
Max stack size 8388608 unlimited bytes
|
Max core file size 0 unlimited bytes
|
Max resident set unlimited unlimited bytes
|
Max processes 15430 15430 processes
|
Max open files 4096 4096 files
|
Max locked memory 67108864 67108864 bytes
|
Max address space unlimited unlimited bytes
|
Max file locks unlimited unlimited locks
|
Max pending signals 15430 15430 signals
|
Max msgqueue size 819200 819200 bytes
|
Max nice priority 0 0
|
Max realtime priority 0 0
|
Max realtime timeout unlimited unlimited us
|
Core pattern: |/usr/share/apport/apport %p %s %c %d %P %E
|
node 2 crash:
2021-06-22 12:35:02 15 [Note] WSREP: cluster conflict due to high priority abort for threads:
|
2021-06-22 12:35:02 15 [Note] WSREP: Winning thread:
|
THD: 15, mode: applier, state: executing, conflict: no conflict, seqno: 89327026
|
SQL: DELETE FROM blobs WHERE id = '7432858'???`^S^F
|
2021-06-22 12:35:02 15 [Note] WSREP: Victim thread:
|
THD: 9, mode: applier, state: executing, conflict: no conflict, seqno: 89327028
|
SQL: DELETE FROM blobs WHERE id = '7432867'???`^S^F
|
2021-06-22 12:35:02 0 [ERROR] WSREP: Trx 89327026 tries to abort slave trx 89327028. This could be caused by:
|
1) unsupported configuration options combination, please check documentation.
|
2) a bug in the code.
|
3) a database corruption.
|
Node consistency compromized, need to abort. Restart the node to resync with cluster.
|
210622 12:35:02 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.3.29-MariaDB-1:10.3.29+maria~bionic-log
|
key_buffer_size=134217728
|
read_buffer_size=134217728
|
max_used_connections=8
|
max_threads=1002
|
thread_count=23
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 262821945 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
2021-06-22 12:35:02 9 [Warning] WSREP: conflict state after RBR event applying: 1, 89327028
|
2021-06-22 12:35:02 9 [Warning] WSREP: RBR event apply failed, rolling back: 89327028
|
stack_bottom = 0x0 thread_stack 0x12c00000
|
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5591922ceb8e]
|
/usr/sbin/mysqld(handle_fatal_signal+0x515)[0x559191d638c5]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7ffb20e45980]
|
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7ffb20a80fb7]
|
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7ffb20a82921]
|
/usr/sbin/mysqld(+0x90a914)[0x559191eed914]
|
/usr/sbin/mysqld(+0x90f83c)[0x559191ef283c]
|
/usr/sbin/mysqld(handle_manager+0x1f3)[0x559191b75a83]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7ffb20e3a6db]
|
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7ffb20b6371f]
|
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
|
information that should help you find out what is causing the crash.
|
Writing a core file...
|
Working directory at /srv/galera-uat/mysql
|
Resource Limits:
|
Limit Soft Limit Hard Limit Units
|
Max cpu time unlimited unlimited seconds
|
Max file size unlimited unlimited bytes
|
Max data size unlimited unlimited bytes
|
Max stack size 8388608 unlimited bytes
|
Max core file size 0 unlimited bytes
|
Max resident set unlimited unlimited bytes
|
Max processes 15430 15430 processes
|
Max open files 4096 4096 files
|
Max locked memory 67108864 67108864 bytes
|
Max address space unlimited unlimited bytes
|
Max file locks unlimited unlimited locks
|
Max pending signals 15430 15430 signals
|
Max msgqueue size 819200 819200 bytes
|
Max nice priority 0 0
|
Max realtime priority 0 0
|
Max realtime timeout unlimited unlimited us
|
Core pattern: |/usr/share/apport/apport %p %s %c %d %P %E
|
Attachments
Issue Links
- is caused by
-
MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)
- Closed