Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.5.9
-
CentOS Linux release 7.9.2009 (Core)
Description
Hello thanks for taking the time to review this. I am happy to add additional information as needed.
Explanation.
The most basic way to explain this issue is if we run a query with MsgIds from 2.3 million to 4.3 million with an MsgId greater then 237007468 million but less then 436312266 million and include in the output the message content and that the message includes a specific MsgSource. The server Crashes and restarts.
Queries
– Crashes
WHERE MsgId >= 237007468 AND MsgId <= 436312266
WHERE MsgId > 237007467 AND MsgId < 436312266
WHERE MsgId > 237007466
– Work
WHERE MsgId >= 237007467 AND MsgId <= 436312266
WHERE MsgId BETWEEN 237007467 AND 436312266
This only happens if the return has a blob and we pull back the MsgContent field with these numbers.
The full query that was crashing the DB is number 1.
Further testing showed that number 2 and 3 also crashes the DB
Yet number 4 does not
1. EXPLAIN SELECT MsgId as lastid, MsgId, MsgSource, CONVERT(MsgContent, CHAR) AS EDImessage FROM MessageStore
WHERE MsgId > 237007466 AND MsgId < 436312267
AND MsgSource IN("MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571")
ORDER BY MsgId asc
2. SELECT MsgId as lastid, MsgId, MsgSource, MsgContent FROM MessageStore
WHERE MsgId > 237007466 AND MsgId < 436312267
AND MsgSource IN("MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571")
ORDER BY MsgId asc
3. SELECT MsgId AS lastid, MsgId, MsgSource, CONVERT( MsgContent , CHAR) AS EDImessage FROM MessageStore
WHERE MsgId > 237007466 AND MsgId < 436312267
AND MsgSource IN ('MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571')
ORDER BY MsgId ASC LIMIT 1;
4. SELECT MsgId AS lastid, MsgId, MsgSource, CONVERT( MsgContent , CHAR) AS EDImessage FROM warehouse.MessageStore
WHERE MsgId > 237007466 AND MsgId < 436312267 AND MsgSource IN ('MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571')
ORDER BY MsgId ASC
LIMIT 10,10;
We ran a lot of queries to pull all records one of those gave us the MsgID's that match the IN("MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571") statement then wrote a script to go through and select * from MessageStore where MsgID = (All the Ids returned). This also works.
We ran a mysqlcheck -all-databases and everything came back with an OK
When it fails the entire server locks and reboots the error before that and a bunch of what appears to be base64 code is sent to the logs
The errors we receive are below.
|
2024-04-03 15:39:26 867 [Note] InnoDB: Uncompressed page, stored checksum in field1 5595288, calculated checksums for field1: crc32 3970976477, innodb 5595288, page type 10 == BLOB.none 3735928559, stored checksum in field2 3970976477, calculated checksums for field2: crc32 3970976477, innodb 3970976477, none 3735928559, page LSN 11534 1034519743, low 4 bytes of LSN at page end 1034519743, page number (if stored to page already) 319240129, space id (if created with >= MySQL-4.1.1 and stored already) 61
|
InnoDB: Page may be a BLOB page
|
2024-04-03 15:39:26 867 [Note] InnoDB: You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
|
2024-04-03 15:39:26 867 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './warehouse/MessageStore.ibd' page [page id: space=61, page number=319240129]. You may have to recover from a backup.
|
2024-04-03 15:39:26 867 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
|
len 16384; hex 00556 ETC ETC but the bottom of this bump is a ton of zeros.
|
The first part of the dump gives what looks like data followed by a bunch of zeros. Then asc and what appears to be base64 encoded data and then end of page DUMP.
|
2024-04-03 16:28:50 139 [Note] InnoDB: Uncompressed page, stored checksum in field1 5595288, calculated checksums for field1: crc32 3970976477, innodb 5595288, page type 10 == BLOB.none 3735928559, stored checksum in field2 3970976477, calculated checksums for field2: crc32 3970976477, innodb 3970976477, none 3735928559, page LSN 11534 1034519743, low 4 bytes of LSN at page end 1034519743, page number (if stored to page already) 319240129, space id (if created with >= MySQL-4.1.1 and stored already) 61
|
InnoDB: Page may be a BLOB page
|
2024-04-03 16:28:50 139 [Note] InnoDB: You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
|
240403 16:28:50 [ERROR] mysqld got signal 11 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.5.9-MariaDB-log
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=36
|
max_threads=153
|
thread_count=37
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467864 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x7e96900009b8
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7e991f0afc90 thread_stack 0x49000
|
|
2024-04-03 15:39:19 867 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './warehouse/MessageStore.ibd' page [page id
|
: space=61, page number=319240129]. You may have to recover from a backup.
|
EXPLAINS
Query 1.
|
|
MariaDB [warehouse]> EXPLAIN SELECT MsgId as lastid , MsgId , MsgSource , CONVERT(MsgContent, CHAR) AS EDImessage FROM MessageStore WHERE MsgId > 237007466 AND MsgId < 436312267 AND MsgSource IN("MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571") ORDER BY MsgId asc\G
|
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: MessageStore
|
type: ref
|
possible_keys: PRIMARY,MsgSourceStatusReceivedFacility,MsgSourceStatusStatusDateFacility
|
key: MsgSourceStatusReceivedFacility
|
key_len: 259
|
ref: const
|
rows: 16024
|
Extra: Using index condition; Using where; Using filesort
|
1 row in set (0.001 sec)
|
Query 2.
|
MariaDB [warehouse]> EXPLAIN SELECT MsgId as lastid, MsgId, MsgSource, MsgContent FROM MessageStore WHERE MsgId > 237007466 AND MsgId < 436312267 AND MsgSource IN("MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571") ORDER BY MsgId asc\G
|
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: MessageStore
|
type: ref
|
possible_keys: PRIMARY,MsgSourceStatusReceivedFacility,MsgSourceStatusStatusDateFacility
|
key: MsgSourceStatusReceivedFacility
|
key_len: 259
|
ref: const
|
rows: 16248
|
Extra: Using index condition; Using where; Using filesort
|
1 row in set (0.001 sec)
|
|
Query 3.
|
|
|
Query 4.
|
EXPLAIN SELECT MsgId AS lastid, MsgId, MsgSource, CONVERT( MsgContent , CHAR) AS EDImessage FROM warehouse.MessageStore WHERE MsgId > 237007466 AND MsgId < 436312267 AND MsgSource IN ('MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571') ORDER BY MsgId ASC LIMIT 10,10\G
|
*************************** 1. row ***************************
|
id: 1
|
select_type: SIMPLE
|
table: MessageStore
|
type: ref
|
possible_keys: PRIMARY,MsgSourceStatusReceivedFacility,MsgSourceStatusStatusDateFacility
|
key: MsgSourceStatusReceivedFacility
|
key_len: 259
|
ref: const
|
rows: 16248
|
Extra: Using index condition; Using where; Using filesort
|
1 row in set (0.001 sec)
|
MsgID information.
HEX of ID (237007466): E20726A
HEX of ID (436312267): 1A0198CB
This is only interesting because the hex for 1 is 7 digits and the hex for the higher ID is 8. If the programming only reads 7 digits when running the query.
Table Configuration
CREATE TABLE `MessageStore` (
|
`MsgId` bigint(20) NOT NULL AUTO_INCREMENT,
|
`MsgContent` longblob DEFAULT NULL,
|
`MsgReceived` timestamp /* mariadb-5.3 */ NOT NULL DEFAULT current_timestamp(),
|
`MsgSource` varchar(256) DEFAULT NULL,
|
`MsgFacility` varchar(256) DEFAULT NULL,
|
`MsgType1` varchar(256) DEFAULT NULL,
|
`MsgType2` varchar(256) DEFAULT NULL,
|
`MsgAltId1` varchar(256) DEFAULT NULL,
|
`MsgAltId2` varchar(256) DEFAULT NULL,
|
`MsgMisc1` varchar(256) DEFAULT NULL,
|
`MsgMisc2` varchar(256) DEFAULT NULL,
|
`MsgMisc3` varchar(256) DEFAULT NULL,
|
`MsgStatus` char(3) DEFAULT NULL,
|
`MsgStatusDate` timestamp /* mariadb-5.3 */ NULL DEFAULT NULL,
|
`MsgStatusInfo` text DEFAULT NULL,
|
PRIMARY KEY (`MsgId`),
|
KEY `MsgAltId2` (`MsgAltId2`),
|
KEY `MsgType1` (`MsgType1`),
|
KEY `MsgMisc2` (`MsgMisc2`),
|
KEY `MsgSourceStatusReceivedFacility` (`MsgSource`,`MsgStatus`,`MsgReceived`,`MsgFacility`),
|
KEY `MsgFacility` (`MsgFacility`),
|
KEY `MsgStatus` (`MsgStatus`),
|
KEY `MsgStatusDate` (`MsgStatusDate`),
|
KEY `MsgSourceStatusStatusDateFacility` (`MsgSource`,`MsgStatus`,`MsgStatusDate`,`MsgFacility`)
|
) ENGINE=InnoDB AUTO_INCREMENT=660698066 DEFAULT CHARSET=latin1
|
Additional Thoughts
Value limits on the size of the MsgId or another column may be making the calculation fail. Seems like a value on the ID is "wrapping" if you have a 32 bit number and go over the max value and go up one it will wrap over to 0 like an odometer. This may not be the real issue but the evidence looks like this behavior.
Things done to attempt to resolve.
We ran a mysqlcheck -all-databases Everything reported OK
Moved to a replica and got the same error. I guess corrupted data could have been replicated? I am not certain that can happen.
We are discussing upgrading to the latest 10.5.* but wanted to wait till we here back just in case you need us to test something.
Configuration
[mysqld]
|
datadir=/var/lib/mysql
|
socket=/var/lib/mysql/mysql.sock
|
max_allowed_packet=1024M
|
innodb_max_dirty_pages_pct=0
|
innodb_fast_shutdown=0
|
innodb_buffer_pool_size=768G
|
innodb_log_file_size=64G
|
log-bin
|
log-slave-updates
|
log_error=error.log
|
slow_query_log = 1
|
slow_query_log_file = /var/log/mysql/slow-query.log
|
long_query_time = 15
|
innodb_print_all_deadlocks = 1
|
open_files_limit=200000
|
sync_binlog=1
|
skip-slave-start=ON
|
slave-skip-errors=1062
|
server_id=70148
|
innodb_open_files=6000
|
table_open_cache=4000
|
join_buffer_size=1048576
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
table_definition_cache=4000
|
log-basename=master1
|
log_error=/var/log/mariadb/error.log
|
binlog-format=mixed
|
expire_logs_days=90
|
log_bin_compress=ON
|
# Disabling symbolic-links is recommended to prevent assorted security risks
|
symbolic-links=0
|
log_queries_not_using_indexes = ON
|
log-error = /var/log/mariadb/mariadb.log
|
# Settings user and group are ignored when systemd is used.
|
# If you need to run mysqld under a different user or group,
|
# customize your systemd unit file for mariadb according to the
|
# instructions in http://fedoraproject.org/wiki/Systemd
|
|
[mysqld_safe]
|
pid-file=/var/run/mariadb/mariadb.pid
|
|
#
|
# include all files from the config directory
|
#
|
!includedir /etc/my.cnf.d
|
[mariabackup]
|
open_files_limit=65535
|
|
Attachments
Issue Links
- blocks
-
MDEV-28440 MariaDB crashes during select
- Closed