Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33189

Server crash after reading outside of bounds on ibdata1 , file corrupted, no auto-recovery

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Incomplete
    • 10.11.6, 11.2(EOL), 11.4
    • N/A
    • None
    • Debian GNU/Linux 11 (bullseye)
      Dell PowerEdge R750
      XFS filesystem

    Description

      We experienced a one-time server crash in production, so far not reproducible.

      We are running MariaDB 10.11.6 (1:10.11.6+maria~deb11) installed from a MariaDB repo mirror on Debian GNU/Linux 11 (bullseye) as the database primary for a read- and write-heavy application. It runs on a bare-metal server Dell PowerEdge R750 with 64 cores (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz), 512 GiB RAM on a software RAID-1 NVMe with an XFS filesystem.

      The server crash happened on 2023-12-21. On 2023-12-02 we had upgraded to v10.11.6. Prior to that the DB ran without any problems on v10.6.7 for almost 1.5 years.

      The server itself crashed with:

      InnoDB: Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1
      InnoDB: File './ibdata1' is corrupted
      

      and two assertion failures in trx0undo.cc and buf0lru.cc. All subsequent restart attempts failed so we switched the application over to the replica database.

      We did not attempt any forced recovery. The assertion failures:

      InnoDB: Assertion failure in file ./storage/innobase/trx/trx0undo.cc line 1416
      InnoDB: Failing assertion: rollback
      231221 14:24:48 [ERROR] mysqld got signal 6 ;
      

      The backtrace only gave one line before having the next assertion failure.

      stack_bottom = 0x7f614d088cd8 thread_stack 0x49000
      InnoDB: Assertion failure in file ./storage/innobase/buf/buf0lru.cc line 285
      InnoDB: Failing assertion: !block->page.in_file()
      

      See attachment db-syslog.2023-12-21.txt for all the relevant syslog entries.

      We have preserved the corrupt 716 MiB ibdata1 (750780416 B) file for further inspection, should the need arise.

      Attachments

        Issue Links

          Activity

            wschemmel Wolfgang Schemmel created issue -
            wschemmel Wolfgang Schemmel made changes -
            Field Original Value New Value
            Description We experienced a one-time server crash in production, so far not reproducible.

            We are running MariaDB 10.11.6 (1:10.11.6+maria~deb11) installed from a MariaDB repo mirror on Debian GNU/Linux 11 (bullseye) as the database primary for a read- and write-heavy application. It runs on a bare-metal server Dell PowerEdge R750 with 64 cores (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz), 512 GiB RAM on a software RAID-1 NVMe with an XFS filesystem.

            The server crash happened on 2023-12-21. On 2023-12-02 we had upgraded to v10.11.6. Prior to that the DB ran without any problems on v10.6.7 for almost 1.5 years.

            The server itself crashed with:

            {noformat}
            InnoDB: Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1
            InnoDB: File './ibdata1' is corrupted
            {noformat}

            and two assertion failures in {{trx0undo.cc}} and {{buf0lru.cc}}. All subsequent restart attempts failed so we switched the application over to the replica database.

            We did not attempt any forced recovery. The assertion failures:

            {noformat}
            InnoDB: Assertion failure in file ./storage/innobase/trx/trx0undo.cc line 1416
            InnoDB: Failing assertion: rollback
            231221 14:24:48 [ERROR] mysqld got signal 6 ;
            {noformat}

            The backtrace only gave one line before having the next assertion failure.

            {noformat}
            stack_bottom = 0x7f614d088cd8 thread_stack 0x49000
            InnoDB: Assertion failure in file ./storage/innobase/buf/buf0lru.cc line 285
            InnoDB: Failing assertion: !block->page.in_file()
            {noformat}

            See attachment for all the relevant syslog entries.
            We experienced a one-time server crash in production, so far not reproducible.

            We are running MariaDB 10.11.6 (1:10.11.6+maria~deb11) installed from a MariaDB repo mirror on Debian GNU/Linux 11 (bullseye) as the database primary for a read- and write-heavy application. It runs on a bare-metal server Dell PowerEdge R750 with 64 cores (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz), 512 GiB RAM on a software RAID-1 NVMe with an XFS filesystem.

            The server crash happened on 2023-12-21. On 2023-12-02 we had upgraded to v10.11.6. Prior to that the DB ran without any problems on v10.6.7 for almost 1.5 years.

            The server itself crashed with:

            {noformat}
            InnoDB: Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1
            InnoDB: File './ibdata1' is corrupted
            {noformat}

            and two assertion failures in {{trx0undo.cc}} and {{buf0lru.cc}}. All subsequent restart attempts failed so we switched the application over to the replica database.

            We did not attempt any forced recovery. The assertion failures:

            {noformat}
            InnoDB: Assertion failure in file ./storage/innobase/trx/trx0undo.cc line 1416
            InnoDB: Failing assertion: rollback
            231221 14:24:48 [ERROR] mysqld got signal 6 ;
            {noformat}

            The backtrace only gave one line before having the next assertion failure.

            {noformat}
            stack_bottom = 0x7f614d088cd8 thread_stack 0x49000
            InnoDB: Assertion failure in file ./storage/innobase/buf/buf0lru.cc line 285
            InnoDB: Failing assertion: !block->page.in_file()
            {noformat}

            See attachment [^db-syslog.2023-12-21.txt] for all the relevant syslog entries.
            wschemmel Wolfgang Schemmel made changes -
            Description We experienced a one-time server crash in production, so far not reproducible.

            We are running MariaDB 10.11.6 (1:10.11.6+maria~deb11) installed from a MariaDB repo mirror on Debian GNU/Linux 11 (bullseye) as the database primary for a read- and write-heavy application. It runs on a bare-metal server Dell PowerEdge R750 with 64 cores (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz), 512 GiB RAM on a software RAID-1 NVMe with an XFS filesystem.

            The server crash happened on 2023-12-21. On 2023-12-02 we had upgraded to v10.11.6. Prior to that the DB ran without any problems on v10.6.7 for almost 1.5 years.

            The server itself crashed with:

            {noformat}
            InnoDB: Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1
            InnoDB: File './ibdata1' is corrupted
            {noformat}

            and two assertion failures in {{trx0undo.cc}} and {{buf0lru.cc}}. All subsequent restart attempts failed so we switched the application over to the replica database.

            We did not attempt any forced recovery. The assertion failures:

            {noformat}
            InnoDB: Assertion failure in file ./storage/innobase/trx/trx0undo.cc line 1416
            InnoDB: Failing assertion: rollback
            231221 14:24:48 [ERROR] mysqld got signal 6 ;
            {noformat}

            The backtrace only gave one line before having the next assertion failure.

            {noformat}
            stack_bottom = 0x7f614d088cd8 thread_stack 0x49000
            InnoDB: Assertion failure in file ./storage/innobase/buf/buf0lru.cc line 285
            InnoDB: Failing assertion: !block->page.in_file()
            {noformat}

            See attachment [^db-syslog.2023-12-21.txt] for all the relevant syslog entries.
            We experienced a one-time server crash in production, so far not reproducible.

            We are running MariaDB 10.11.6 (1:10.11.6+maria~deb11) installed from a MariaDB repo mirror on Debian GNU/Linux 11 (bullseye) as the database primary for a read- and write-heavy application. It runs on a bare-metal server Dell PowerEdge R750 with 64 cores (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz), 512 GiB RAM on a software RAID-1 NVMe with an XFS filesystem.

            The server crash happened on 2023-12-21. On 2023-12-02 we had upgraded to v10.11.6. Prior to that the DB ran without any problems on v10.6.7 for almost 1.5 years.

            The server itself crashed with:

            {noformat}
            InnoDB: Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1
            InnoDB: File './ibdata1' is corrupted
            {noformat}

            and two assertion failures in {{trx0undo.cc}} and {{buf0lru.cc}}. All subsequent restart attempts failed so we switched the application over to the replica database.

            We did not attempt any forced recovery. The assertion failures:

            {noformat}
            InnoDB: Assertion failure in file ./storage/innobase/trx/trx0undo.cc line 1416
            InnoDB: Failing assertion: rollback
            231221 14:24:48 [ERROR] mysqld got signal 6 ;
            {noformat}

            The backtrace only gave one line before having the next assertion failure.

            {noformat}
            stack_bottom = 0x7f614d088cd8 thread_stack 0x49000
            InnoDB: Assertion failure in file ./storage/innobase/buf/buf0lru.cc line 285
            InnoDB: Failing assertion: !block->page.in_file()
            {noformat}

            See attachment [^db-syslog.2023-12-21.txt] for all the relevant syslog entries.

            We have preserved the corrupt 716 MiB {{ibdata1}} (750780416 B) file for further inspection, should the need arise.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ]
            Status Open [ 1 ] Needs Feedback [ 10501 ]
            marko Marko Mäkelä made changes -
            Status Needs Feedback [ 10501 ] Open [ 1 ]
            axel Axel Schwenke made changes -
            Attachment crash-11.4_g5.err [ 72799 ]
            axel Axel Schwenke made changes -
            Attachment crash-11.4_my.cnf [ 72800 ]
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.4 [ 29301 ]
            Affects Version/s 11.2 [ 28603 ]
            Affects Version/s 11.4 [ 29301 ]
            Priority Major [ 3 ] Blocker [ 1 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Priority Blocker [ 1 ] Critical [ 2 ]
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] Needs Feedback [ 10501 ]
            serg Sergei Golubchik made changes -
            Fix Version/s N/A [ 14700 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.4 [ 29301 ]
            Resolution Incomplete [ 4 ]
            Status Needs Feedback [ 10501 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Gavriil Papadakis Gavriil Papadakis made changes -

            People

              marko Marko Mäkelä
              wschemmel Wolfgang Schemmel
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.