Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.11.6, 11.2(EOL), 11.4
-
None
-
Debian GNU/Linux 11 (bullseye)
Dell PowerEdge R750
XFS filesystem
Description
We experienced a one-time server crash in production, so far not reproducible.
We are running MariaDB 10.11.6 (1:10.11.6+maria~deb11) installed from a MariaDB repo mirror on Debian GNU/Linux 11 (bullseye) as the database primary for a read- and write-heavy application. It runs on a bare-metal server Dell PowerEdge R750 with 64 cores (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz), 512 GiB RAM on a software RAID-1 NVMe with an XFS filesystem.
The server crash happened on 2023-12-21. On 2023-12-02 we had upgraded to v10.11.6. Prior to that the DB ran without any problems on v10.6.7 for almost 1.5 years.
The server itself crashed with:
InnoDB: Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1
|
InnoDB: File './ibdata1' is corrupted
|
and two assertion failures in trx0undo.cc and buf0lru.cc. All subsequent restart attempts failed so we switched the application over to the replica database.
We did not attempt any forced recovery. The assertion failures:
InnoDB: Assertion failure in file ./storage/innobase/trx/trx0undo.cc line 1416
|
InnoDB: Failing assertion: rollback
|
231221 14:24:48 [ERROR] mysqld got signal 6 ;
|
The backtrace only gave one line before having the next assertion failure.
stack_bottom = 0x7f614d088cd8 thread_stack 0x49000
|
InnoDB: Assertion failure in file ./storage/innobase/buf/buf0lru.cc line 285
|
InnoDB: Failing assertion: !block->page.in_file()
|
See attachment db-syslog.2023-12-21.txt for all the relevant syslog entries.
We have preserved the corrupt 716 MiB ibdata1 (750780416 B) file for further inspection, should the need arise.
Attachments
Issue Links
- relates to
-
MDEV-32817 在最近将版本升级到10.11.5后,针对表进行频繁的读写操作不久后,出现index for table xxxx is corrupt,随后此表tablespace xxxxxx corrupted,最后Tablespace is missing for a table,此表已完全不可用
-
- Closed
-
-
MDEV-33922 InnoDB undo log tablespace file corruption
-
- Closed
-
-
MDEV-34233 InnoDB crashes due to corrupted ibdata1 (Assertion failure in innodb.undo_page)
-
- Closed
-
-
MDEV-34453 Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1
-
- Closed
-
-
MDEV-35385 Server crash after reading outside of bounds on ibdata1
-
- Closed
-
-
MDEV-33275 buf_flush_LRU(): mysql_mutex_assert_owner(&buf_pool.mutex) failed
-
- Closed
-
-
MDEV-34233 InnoDB crashes due to corrupted ibdata1 (Assertion failure in innodb.undo_page)
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Description |
We experienced a one-time server crash in production, so far not reproducible.
We are running MariaDB 10.11.6 (1:10.11.6+maria~deb11) installed from a MariaDB repo mirror on Debian GNU/Linux 11 (bullseye) as the database primary for a read- and write-heavy application. It runs on a bare-metal server Dell PowerEdge R750 with 64 cores (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz), 512 GiB RAM on a software RAID-1 NVMe with an XFS filesystem. The server crash happened on 2023-12-21. On 2023-12-02 we had upgraded to v10.11.6. Prior to that the DB ran without any problems on v10.6.7 for almost 1.5 years. The server itself crashed with: {noformat} InnoDB: Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1 InnoDB: File './ibdata1' is corrupted {noformat} and two assertion failures in {{trx0undo.cc}} and {{buf0lru.cc}}. All subsequent restart attempts failed so we switched the application over to the replica database. We did not attempt any forced recovery. The assertion failures: {noformat} InnoDB: Assertion failure in file ./storage/innobase/trx/trx0undo.cc line 1416 InnoDB: Failing assertion: rollback 231221 14:24:48 [ERROR] mysqld got signal 6 ; {noformat} The backtrace only gave one line before having the next assertion failure. {noformat} stack_bottom = 0x7f614d088cd8 thread_stack 0x49000 InnoDB: Assertion failure in file ./storage/innobase/buf/buf0lru.cc line 285 InnoDB: Failing assertion: !block->page.in_file() {noformat} See attachment for all the relevant syslog entries. |
We experienced a one-time server crash in production, so far not reproducible.
We are running MariaDB 10.11.6 (1:10.11.6+maria~deb11) installed from a MariaDB repo mirror on Debian GNU/Linux 11 (bullseye) as the database primary for a read- and write-heavy application. It runs on a bare-metal server Dell PowerEdge R750 with 64 cores (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz), 512 GiB RAM on a software RAID-1 NVMe with an XFS filesystem. The server crash happened on 2023-12-21. On 2023-12-02 we had upgraded to v10.11.6. Prior to that the DB ran without any problems on v10.6.7 for almost 1.5 years. The server itself crashed with: {noformat} InnoDB: Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1 InnoDB: File './ibdata1' is corrupted {noformat} and two assertion failures in {{trx0undo.cc}} and {{buf0lru.cc}}. All subsequent restart attempts failed so we switched the application over to the replica database. We did not attempt any forced recovery. The assertion failures: {noformat} InnoDB: Assertion failure in file ./storage/innobase/trx/trx0undo.cc line 1416 InnoDB: Failing assertion: rollback 231221 14:24:48 [ERROR] mysqld got signal 6 ; {noformat} The backtrace only gave one line before having the next assertion failure. {noformat} stack_bottom = 0x7f614d088cd8 thread_stack 0x49000 InnoDB: Assertion failure in file ./storage/innobase/buf/buf0lru.cc line 285 InnoDB: Failing assertion: !block->page.in_file() {noformat} See attachment [^db-syslog.2023-12-21.txt] for all the relevant syslog entries. |
Description |
We experienced a one-time server crash in production, so far not reproducible.
We are running MariaDB 10.11.6 (1:10.11.6+maria~deb11) installed from a MariaDB repo mirror on Debian GNU/Linux 11 (bullseye) as the database primary for a read- and write-heavy application. It runs on a bare-metal server Dell PowerEdge R750 with 64 cores (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz), 512 GiB RAM on a software RAID-1 NVMe with an XFS filesystem. The server crash happened on 2023-12-21. On 2023-12-02 we had upgraded to v10.11.6. Prior to that the DB ran without any problems on v10.6.7 for almost 1.5 years. The server itself crashed with: {noformat} InnoDB: Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1 InnoDB: File './ibdata1' is corrupted {noformat} and two assertion failures in {{trx0undo.cc}} and {{buf0lru.cc}}. All subsequent restart attempts failed so we switched the application over to the replica database. We did not attempt any forced recovery. The assertion failures: {noformat} InnoDB: Assertion failure in file ./storage/innobase/trx/trx0undo.cc line 1416 InnoDB: Failing assertion: rollback 231221 14:24:48 [ERROR] mysqld got signal 6 ; {noformat} The backtrace only gave one line before having the next assertion failure. {noformat} stack_bottom = 0x7f614d088cd8 thread_stack 0x49000 InnoDB: Assertion failure in file ./storage/innobase/buf/buf0lru.cc line 285 InnoDB: Failing assertion: !block->page.in_file() {noformat} See attachment [^db-syslog.2023-12-21.txt] for all the relevant syslog entries. |
We experienced a one-time server crash in production, so far not reproducible.
We are running MariaDB 10.11.6 (1:10.11.6+maria~deb11) installed from a MariaDB repo mirror on Debian GNU/Linux 11 (bullseye) as the database primary for a read- and write-heavy application. It runs on a bare-metal server Dell PowerEdge R750 with 64 cores (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz), 512 GiB RAM on a software RAID-1 NVMe with an XFS filesystem. The server crash happened on 2023-12-21. On 2023-12-02 we had upgraded to v10.11.6. Prior to that the DB ran without any problems on v10.6.7 for almost 1.5 years. The server itself crashed with: {noformat} InnoDB: Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1 InnoDB: File './ibdata1' is corrupted {noformat} and two assertion failures in {{trx0undo.cc}} and {{buf0lru.cc}}. All subsequent restart attempts failed so we switched the application over to the replica database. We did not attempt any forced recovery. The assertion failures: {noformat} InnoDB: Assertion failure in file ./storage/innobase/trx/trx0undo.cc line 1416 InnoDB: Failing assertion: rollback 231221 14:24:48 [ERROR] mysqld got signal 6 ; {noformat} The backtrace only gave one line before having the next assertion failure. {noformat} stack_bottom = 0x7f614d088cd8 thread_stack 0x49000 InnoDB: Assertion failure in file ./storage/innobase/buf/buf0lru.cc line 285 InnoDB: Failing assertion: !block->page.in_file() {noformat} See attachment [^db-syslog.2023-12-21.txt] for all the relevant syslog entries. We have preserved the corrupt 716 MiB {{ibdata1}} (750780416 B) file for further inspection, should the need arise. |
Assignee | Marko Mäkelä [ marko ] | |
Status | Open [ 1 ] | Needs Feedback [ 10501 ] |
Status | Needs Feedback [ 10501 ] | Open [ 1 ] |
Attachment | crash-11.4_g5.err [ 72799 ] |
Attachment | crash-11.4_my.cnf [ 72800 ] |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.11 [ 27614 ] | |
Fix Version/s | 11.0 [ 28320 ] | |
Fix Version/s | 11.1 [ 28549 ] | |
Fix Version/s | 11.2 [ 28603 ] | |
Fix Version/s | 11.3 [ 28565 ] | |
Fix Version/s | 11.4 [ 29301 ] | |
Affects Version/s | 11.2 [ 28603 ] | |
Affects Version/s | 11.4 [ 29301 ] | |
Priority | Major [ 3 ] | Blocker [ 1 ] |
Link |
This issue relates to |
Priority | Blocker [ 1 ] | Critical [ 2 ] |
Status | Open [ 1 ] | Needs Feedback [ 10501 ] |
Fix Version/s | N/A [ 14700 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.11 [ 27614 ] | |
Fix Version/s | 11.0 [ 28320 ] | |
Fix Version/s | 11.1 [ 28549 ] | |
Fix Version/s | 11.3 [ 28565 ] | |
Fix Version/s | 11.2 [ 28603 ] | |
Fix Version/s | 11.4 [ 29301 ] | |
Resolution | Incomplete [ 4 ] | |
Status | Needs Feedback [ 10501 ] | Closed [ 6 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
The byte offset 70368744161280 corresponds to the page number 0xffffffff when using the default innodb_page_size=16k.
Now that
MDEV-13542has been fixed, it should be relatively easy to fix the crash on this corrupted data, if you can provide access to the corrupted data directory. Hopefully you can reproduce this with a test workload that does not include any confidential data.Without having access to the data directory, it is hard to say what could cause this corruption. Ever since
MDEV-24854(making innodb_flush_method=O_DIRECT the default) was implemented in MariaDB Server 10.6, we get reports of corruption on some file systems. I do not remember if there were any reports of corruption on XFS yet. The Linux kernel version may be relevant. You could try if innodb_flush_method=fsync works better. If this build uses io_uring, then you might want to test innodb_use_native_aio=0 as well, to check if there could be some bug in io_uring that affects XFS.Most of our internal testing takes place on Linux ext4 or tmpfs.