Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30728

Frequent 'InnoDB: Database page corruption on disk' on MariaDB 10.11 in Debian autopktest on arch ppc64el

Details

    Description

      This might be related to the Debian ppc64el hosts that run the autopkgtests, but reporting this upstream anyway in case other people run into the same errors.

      In Debian the ci.debian.net system frequently reports failures in the upstream mariadb-test-run for arch ppc64el:

      The same does not happen on other archs (no problem on amd64, arm64, s390x).

      The test varies, but error is always the same: SQL fails to execute because a table or all of InnoDB was missing, followed by warnings about database page corruption and some times also a full page dump.

      Example:

      main.index_merge_innodb 'innodb'         w3 [ fail ]
              Test ended at 2023-02-23 05:11:30
       
      CURRENT_TEST: main.index_merge_innodb
      mysqltest: At line 86: query 'SELECT COUNT(*) FROM 
      (SELECT * FROM t1 FORCE INDEX(primary,idx)
      WHERE a BETWEEN 2 AND 7 OR pk=1000000) AS t' failed: ER_NOT_KEYFILE (1034): Index for table 't1' is corrupt; try to repair it
       
      The result from queries just before the failure was:
      < snip >
      INSERT INTO t0(a,b) SELECT a+40, b+4000 FROM t0;
      INSERT INTO t0(a,b) SELECT a+80, b+8000 FROM t0;
      begin;
      INSERT INTO t1(a,b) SELECT t0.a,t0.b FROM t0, seq_1_to_1024;
      INSERT INTO t1 VALUES (1000000, 0, 0);
      commit;
      DROP TABLE t0;
      SET SESSION sort_buffer_size = 1024*36;
      set @tmp_optimizer_switch=@@optimizer_switch;
      set optimizer_switch='derived_merge=off,derived_with_keys=off';
      EXPLAIN
      SELECT COUNT(*) FROM 
      (SELECT * FROM t1 FORCE INDEX(primary,idx)
      WHERE a BETWEEN 2 AND 7 OR pk=1000000) AS t;
      id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
      1	PRIMARY	<derived2>	ALL	NULL	NULL	NULL	NULL	#	
      2	DERIVED	t1	index_merge	PRIMARY,idx	idx,PRIMARY	5,4	NULL	#	Using sort_union(idx,PRIMARY); Using where
      SELECT COUNT(*) FROM 
      (SELECT * FROM t1 FORCE INDEX(primary,idx)
      WHERE a BETWEEN 2 AND 7 OR pk=1000000) AS t;
       
      More results from queries before failure can be found in /tmp/tmp.enF8Q76mMp/var/3/log/index_merge_innodb.log
       
      Warnings from just before the error:
      Error 1034 Index for table 't1' is corrupt; try to repair it
       
       - saving '/tmp/tmp.enF8Q76mMp/var/3/log/main.index_merge_innodb-innodb/' to '/tmp/tmp.enF8Q76mMp/var/log/main.index_merge_innodb-innodb/'
       
      Retrying test main.index_merge_innodb, attempt(2/3)...
       
      worker[3] > Restart  - not started
      main.rowid_filter_innodb 'innodb'        w1 [ fail ]
              Test ended at 2023-02-23 05:11:31
       
      CURRENT_TEST: main.rowid_filter_innodb
      mysqltest: At line 82: query 'create table t1 (a int, b int, key (b), key (a)) engine=innodb' failed: <Unknown> (2013): Lost connection to server during query
       
      The result from queries just before the failure was:
      < snip >
      create table t1 (id int not null primary key) engine=innodb;
      insert into t1 values (2),(1);
      create table t2 (y int,x int,index (x),index (y)) engine=innodb;
      insert into t2 values
      (4,1),(4,777),(2,1),(2,888),(111,1),(222,1),(333,345),(444,1),
      (555,555),(666,1);
      select * from t1 join t2 on t1.id = t2.x where t2.y = 2 and t1.id = 1;
      id	y	x
      1	2	1
      explain extended select * from t1 join t2 on t1.id = t2.x where t2.y = 2 and t1.id = 1;
      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	SIMPLE	t1	const	PRIMARY	PRIMARY	4	const	1	100.00	Using index
      1	SIMPLE	t2	index_merge	x,y	y,x	5,5	NULL	1	100.00	Using intersect(y,x); Using where; Using index
      Warnings:
      Note	1003	select 1 AS `id`,`test`.`t2`.`y` AS `y`,`test`.`t2`.`x` AS `x` from `test`.`t1` join `test`.`t2` where `test`.`t2`.`y` = 2 and `test`.`t2`.`x` = 1
      drop table t1, t2;
      #
      # MDEV-19820: use of rowid filter for innodb table without primary key
      #
      create table t1 (a int, b int, key (b), key (a)) engine=innodb;
       
      More results from queries before failure can be found in /tmp/tmp.enF8Q76mMp/var/1/log/rowid_filter_innodb.log
       
       
      Server [mysqld.1 - pid: 6392, winpid: 6392, exit: 256] failed during test run
      Server log from this test:
      ----------SERVER LOG START-----------
      $ /usr/sbin/mariadbd --defaults-group-suffix=.1 --defaults-file=/tmp/tmp.enF8Q76mMp/var/1/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --loose-sequence --loose-sequence --core-file --loose-debug-sync-timeout=300
      2023-02-23  5:11:26 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32186)
      2023-02-23  5:11:26 0 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)
      2023-02-23  5:11:26 0 [Note] Starting MariaDB 10.11.2-MariaDB-1-log source revision  as process 6393
      2023-02-23  5:11:26 0 [Note] Plugin 'partition' is disabled.
      2023-02-23  5:11:26 0 [Note] InnoDB: Compressed tables use zlib 1.2.13
      2023-02-23  5:11:26 0 [Note] InnoDB: Number of transaction pools: 1
      2023-02-23  5:11:26 0 [Note] InnoDB: Using POWER8 crc32 instructions
      2023-02-23  5:11:26 0 [Note] InnoDB: Using liburing
      2023-02-23  5:11:26 0 [Note] InnoDB: Initializing buffer pool, total size = 8.000MiB, chunk size = 1.000MiB
      2023-02-23  5:11:26 0 [Note] InnoDB: Completed initialization of buffer pool
      2023-02-23  5:11:26 0 [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
      2023-02-23  5:11:27 0 [Note] InnoDB: 128 rollback segments are active.
      2023-02-23  5:11:27 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
      2023-02-23  5:11:27 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
      2023-02-23  5:11:27 0 [Note] InnoDB: log sequence number 46990; transaction id 16
      2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_FT_CONFIG' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_SYS_TABLESTATS' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_FT_DELETED' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_CMP' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'THREAD_POOL_WAITS' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_CMP_RESET' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'THREAD_POOL_QUEUES' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'FEEDBACK' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_FT_INDEX_TABLE' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'THREAD_POOL_GROUPS' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_CMP_PER_INDEX_RESET' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_FT_INDEX_CACHE' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_FT_BEING_DELETED' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_CMPMEM_RESET' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_FT_DEFAULT_STOPWORD' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_SYS_TABLESPACES' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'user_variables' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_TABLESPACES_ENCRYPTION' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'THREAD_POOL_STATS' is disabled.
      2023-02-23  5:11:27 0 [Note] Plugin 'unix_socket' is disabled.
      2023-02-23  5:11:27 0 [Warning] /usr/sbin/mariadbd: unknown variable 'loose-feedback-debug-startup-interval=20'
      2023-02-23  5:11:27 0 [Warning] /usr/sbin/mariadbd: unknown variable 'loose-feedback-debug-first-interval=60'
      2023-02-23  5:11:27 0 [Warning] /usr/sbin/mariadbd: unknown variable 'loose-feedback-debug-interval=60'
      2023-02-23  5:11:27 0 [Warning] /usr/sbin/mariadbd: unknown option '--loose-pam-debug'
      2023-02-23  5:11:27 0 [Warning] /usr/sbin/mariadbd: unknown option '--loose-aria'
      2023-02-23  5:11:27 0 [Warning] /usr/sbin/mariadbd: unknown variable 'loose-debug-sync-timeout=300'
      2023-02-23  5:11:27 0 [Note] InnoDB: Loading buffer pool(s) from /tmp/tmp.enF8Q76mMp/var/1/mysqld.1/data/ib_buffer_pool
      2023-02-23  5:11:27 0 [Note] Server socket created on IP: '127.0.0.1'.
      2023-02-23  5:11:27 0 [Note] /usr/sbin/mariadbd: ready for connections.
      Version: '10.11.2-MariaDB-1-log'  socket: '/tmp/tmp.enF8Q76mMp/tmp/1/mysqld.1.sock'  port: 16000  Debian n/a
      2023-02-23  5:11:27 0 [Note] InnoDB: Buffer pool(s) load completed at 230223  5:11:27
      2023-02-23  5:11:30 4 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=289]. You may have to recover from a backup.
      2023-02-23  5:11:30 4 [Note] InnoDB: Page dump (16384 bytes):
      2023-02-23  5:11:30 4 [Note] InnoDB: 0000000000000121ffffffffffffffff00000000002ea01f0006000000000000
      2023-02-23  5:11:30 4 [Note] InnoDB: 000000000000000000000000000000000002000001a80330000001a800780000
      2023-02-23  5:11:30 4 [Note] InnoDB: 0000000000f321f2000001a8ffffffffffffffffffffffffffffffffffffffff
      2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
      2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
      2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
      2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
      2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
      2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
      2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
      2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
      ...
      

      Full logs:
      Failing test(s): main.index_merge_innodb main.rowid_filter_innodb main.bootstrap_innodb main.parser_bug21114_innodb
      Failing test(s): main.xa_prepared_binlog_off
      Failing test(s): main.ctype_utf8mb4_innodb main.range_vs_index_merge_innodb
      Failing test(s): main.innodb_ext_key main.index_merge_innodb
      Failing test(s): main.innodb_ext_key
      Failing test(s): main.index_merge_innodb
      Failing test(s): main.rowid_filter_innodb main.range_vs_index_merge_innodb
      Failing test(s): main.innodb_ext_key

      Attachments

        Issue Links

          Activity

            Well, in the end it ended up being even deeper:

            https://lore.kernel.org/linuxppc-dev/1105090647.48374193.1700351103830.JavaMail.zimbra@raptorengineeringinc.com/T/#u

            I'm confident it's now resolved, and the explaination makes a lot more sense. Was quite a journey to get there!

            tpearson-raptor Timothy Pearson added a comment - Well, in the end it ended up being even deeper: https://lore.kernel.org/linuxppc-dev/1105090647.48374193.1700351103830.JavaMail.zimbra@raptorengineeringinc.com/T/#u I'm confident it's now resolved, and the explaination makes a lot more sense. Was quite a journey to get there!
            danblack Daniel Black added a comment - - edited

            Wow, yes, makes more sense.

            I read through the very long thread on debugging:

            https://lore.kernel.org/all/480932026.45576726.1699374859845.JavaMail.zimbra@raptorengineeringinc.com/

            My deepest, deepest thanks to you and Jens for this massive effort.

            I think marko and I where 90%+ sure it was in ppc kernel. Per offlist some rr like capability would have been able to identify maybe some state changes.

            Summary of debugging:

            1. kernel bisect
            1. validation by smt1 pinning
            1. alternating testing from kernel regression point and latest
            1. experimentation with timing delays, pinning, kernel worker thread recreation

            Taking notes of concepts mentioned to take into account inside MariaDB.

            To check

            write safety

            https://lore.kernel.org/all/727263b4-2f8c-44bf-b7b8-5a936718df76@kernel.dk/ - previously submitted writes get written? Crash safety in mariadb is the ordering of those commited, we wait until response before a user OK. Hope we're not leaving kernel in unstable state during termination, but maybe that's just kernel problem.

            expectations: https://lore.kernel.org/all/7bd67c30-3b9f-4541-864a-b082917ee8b5@kernel.dk/

            I think we're right.

            signal termination - write doesn't complete

            https://lore.kernel.org/all/431496020.45688461.1699416044413.JavaMail.zimbra@raptorengineeringinc.com/

            SIGKILL was MTR issued

            I think you worked this out and this seems right:

            https://lore.kernel.org/all/469846692.45828995.1699469480871.JavaMail.zimbra@raptorengineeringinc.com/

            exec with pending IO

            https://lore.kernel.org/all/f1761483-1f59-42bd-b556-ad3dde998dbc@kernel.dk/

            exec with pending IO will cancel pending IO

            implications with galera notify and sst in general?

            mix of uring write and async read on same file

            https://lore.kernel.org/all/414471427.46558741.1699728159797.JavaMail.zimbra@raptorengineeringinc.com/

            odd

            but ok, just don't mix buffered and O_DIRECT - https://lore.kernel.org/all/cb7b2bc1-3cf0-43b4-ad66-c35f9c59cbb9@kernel.dk/

            "so we're in potentially less-tested territory. " https://lore.kernel.org/all/1962251239.46563579.1699729495314.JavaMail.zimbra@raptorengineeringinc.com/

            stale read - https://lore.kernel.org/all/1626632f-b190-48e4-b04d-a2d5a3d2083b@kernel.dk/

            danblack Daniel Black added a comment - - edited Wow, yes, makes more sense. I read through the very long thread on debugging: https://lore.kernel.org/all/480932026.45576726.1699374859845.JavaMail.zimbra@raptorengineeringinc.com/ My deepest, deepest thanks to you and Jens for this massive effort. I think marko and I where 90%+ sure it was in ppc kernel. Per offlist some rr like capability would have been able to identify maybe some state changes. Summary of debugging: 1. kernel bisect 1. validation by smt1 pinning 1. alternating testing from kernel regression point and latest 1. experimentation with timing delays, pinning, kernel worker thread recreation Taking notes of concepts mentioned to take into account inside MariaDB. To check write safety https://lore.kernel.org/all/727263b4-2f8c-44bf-b7b8-5a936718df76@kernel.dk/ - previously submitted writes get written? Crash safety in mariadb is the ordering of those commited, we wait until response before a user OK. Hope we're not leaving kernel in unstable state during termination, but maybe that's just kernel problem. expectations: https://lore.kernel.org/all/7bd67c30-3b9f-4541-864a-b082917ee8b5@kernel.dk/ I think we're right. signal termination - write doesn't complete https://lore.kernel.org/all/431496020.45688461.1699416044413.JavaMail.zimbra@raptorengineeringinc.com/ SIGKILL was MTR issued I think you worked this out and this seems right: https://lore.kernel.org/all/469846692.45828995.1699469480871.JavaMail.zimbra@raptorengineeringinc.com/ exec with pending IO https://lore.kernel.org/all/f1761483-1f59-42bd-b556-ad3dde998dbc@kernel.dk/ exec with pending IO will cancel pending IO implications with galera notify and sst in general? mix of uring write and async read on same file https://lore.kernel.org/all/414471427.46558741.1699728159797.JavaMail.zimbra@raptorengineeringinc.com/ odd but ok, just don't mix buffered and O_DIRECT - https://lore.kernel.org/all/cb7b2bc1-3cf0-43b4-ad66-c35f9c59cbb9@kernel.dk/ "so we're in potentially less-tested territory. " https://lore.kernel.org/all/1962251239.46563579.1699729495314.JavaMail.zimbra@raptorengineeringinc.com/ stale read - https://lore.kernel.org/all/1626632f-b190-48e4-b04d-a2d5a3d2083b@kernel.dk/
            tpearson-raptor Timothy Pearson added a comment - - edited

            It gets even better since I ended up starting a second, much smaller debug thead here:

            https://lore.kernel.org/linuxppc-dev/480221078.47953493.1700206777956.JavaMail.zimbra@raptorengineeringinc.com/

            That was after I recognized the data I was seeing in a debugger [1], after one of my test modifications managed to inadvertently crash mariadb within a few hundred instructions of the presumed race (long story short, basically glibc sprintf() calls a lot of VSX instructions when run in a tight loop, which is perfect to provoke the issue).

            In hindsight, I was closer than I thought a week ago – I knew the corrupt data pattern was 128 bits long, but just didn't put two and two together regarding that being the length of our vector registers. Not that it would have helped much – even in that message I was still slightly off, there were two similar code sections and the other one that used vs0 was the actual canary. The rest of the commentary there is correct, however.

            I'm honestly not sure what other tools I could have used, given the race condition. The only thought I have right now is a brand new tool written in assembler that basically writes random patterns to registers as fast as possible, and checks for register corruption every iteration. Either way, one doesn't expect to find bugs in 20 year old working kernel code, which is a testament to how well the Linux kernel is designed and tested overall.

            [1] 0x82004000 is the normal content of the FPSCR under most conditions

            tpearson-raptor Timothy Pearson added a comment - - edited It gets even better since I ended up starting a second, much smaller debug thead here: https://lore.kernel.org/linuxppc-dev/480221078.47953493.1700206777956.JavaMail.zimbra@raptorengineeringinc.com/ That was after I recognized the data I was seeing in a debugger [1] , after one of my test modifications managed to inadvertently crash mariadb within a few hundred instructions of the presumed race (long story short, basically glibc sprintf() calls a lot of VSX instructions when run in a tight loop, which is perfect to provoke the issue). In hindsight, I was closer than I thought a week ago – I knew the corrupt data pattern was 128 bits long, but just didn't put two and two together regarding that being the length of our vector registers. Not that it would have helped much – even in that message I was still slightly off, there were two similar code sections and the other one that used vs0 was the actual canary. The rest of the commentary there is correct, however. I'm honestly not sure what other tools I could have used, given the race condition. The only thought I have right now is a brand new tool written in assembler that basically writes random patterns to registers as fast as possible, and checks for register corruption every iteration. Either way, one doesn't expect to find bugs in 20 year old working kernel code, which is a testament to how well the Linux kernel is designed and tested overall. [1] 0x82004000 is the normal content of the FPSCR under most conditions
            danblack Daniel Black added a comment -

            Merged into Linus Tree.

            Will be part of Kernel 6.7

            danblack Daniel Black added a comment - Merged into Linus Tree . Will be part of Kernel 6.7
            danblack Daniel Black added a comment -

            So finally not so much a NOTABUG as NOTOURBUG.

            danblack Daniel Black added a comment - So finally not so much a NOTABUG as NOTOURBUG.

            People

              danblack Daniel Black
              otto Otto Kekäläinen
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.