Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Not a Bug
Affects Version/s: 10.11.3
Fix Version/s: N/A
Component/s: N/A
Labels:
- corruption
- regression

Description

This might be related to the Debian ppc64el hosts that run the autopkgtests, but reporting this upstream anyway in case other people run into the same errors.

In Debian the ci.debian.net system frequently reports failures in the upstream mariadb-test-run for arch ppc64el:

The same does not happen on other archs (no problem on amd64, arm64, s390x).

The test varies, but error is always the same: SQL fails to execute because a table or all of InnoDB was missing, followed by warnings about database page corruption and some times also a full page dump.

Example:

main.index_merge_innodb 'innodb'         w3 [ fail ]

        Test ended at 2023-02-23 05:11:30

CURRENT_TEST: main.index_merge_innodb

mysqltest: At line 86: query 'SELECT COUNT(*) FROM

(SELECT * FROM t1 FORCE INDEX(primary,idx)

WHERE a BETWEEN 2 AND 7 OR pk=1000000) AS t' failed: ER_NOT_KEYFILE (1034): Index for table 't1' is corrupt; try to repair it

The result from queries just before the failure was:

< snip >

INSERT INTO t0(a,b) SELECT a+40, b+4000 FROM t0;

INSERT INTO t0(a,b) SELECT a+80, b+8000 FROM t0;

begin;

INSERT INTO t1(a,b) SELECT t0.a,t0.b FROM t0, seq_1_to_1024;

INSERT INTO t1 VALUES (1000000, 0, 0);

commit;

DROP TABLE t0;

SET SESSION sort_buffer_size = 1024*36;

set @tmp_optimizer_switch=@@optimizer_switch;

set optimizer_switch='derived_merge=off,derived_with_keys=off';

EXPLAIN

SELECT COUNT(*) FROM

(SELECT * FROM t1 FORCE INDEX(primary,idx)

WHERE a BETWEEN 2 AND 7 OR pk=1000000) AS t;

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra

1	PRIMARY	<derived2>	ALL	NULL	NULL	NULL	NULL	#

2	DERIVED	t1	index_merge	PRIMARY,idx	idx,PRIMARY	5,4	NULL	#	Using sort_union(idx,PRIMARY); Using where

SELECT COUNT(*) FROM

(SELECT * FROM t1 FORCE INDEX(primary,idx)

WHERE a BETWEEN 2 AND 7 OR pk=1000000) AS t;

More results from queries before failure can be found in /tmp/tmp.enF8Q76mMp/var/3/log/index_merge_innodb.log

Warnings from just before the error:

Error 1034 Index for table 't1' is corrupt; try to repair it

 - saving '/tmp/tmp.enF8Q76mMp/var/3/log/main.index_merge_innodb-innodb/' to '/tmp/tmp.enF8Q76mMp/var/log/main.index_merge_innodb-innodb/'

Retrying test main.index_merge_innodb, attempt(2/3)...

worker[3] > Restart  - not started

main.rowid_filter_innodb 'innodb'        w1 [ fail ]

        Test ended at 2023-02-23 05:11:31

CURRENT_TEST: main.rowid_filter_innodb

mysqltest: At line 82: query 'create table t1 (a int, b int, key (b), key (a)) engine=innodb' failed: <Unknown> (2013): Lost connection to server during query

The result from queries just before the failure was:

< snip >

create table t1 (id int not null primary key) engine=innodb;

insert into t1 values (2),(1);

create table t2 (y int,x int,index (x),index (y)) engine=innodb;

insert into t2 values

(4,1),(4,777),(2,1),(2,888),(111,1),(222,1),(333,345),(444,1),

(555,555),(666,1);

select * from t1 join t2 on t1.id = t2.x where t2.y = 2 and t1.id = 1;

id	y	x

1	2	1

explain extended select * from t1 join t2 on t1.id = t2.x where t2.y = 2 and t1.id = 1;

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra

1	SIMPLE	t1	const	PRIMARY	PRIMARY	4	const	1	100.00	Using index

1	SIMPLE	t2	index_merge	x,y	y,x	5,5	NULL	1	100.00	Using intersect(y,x); Using where; Using index

Warnings:

Note	1003	select 1 AS `id`,`test`.`t2`.`y` AS `y`,`test`.`t2`.`x` AS `x` from `test`.`t1` join `test`.`t2` where `test`.`t2`.`y` = 2 and `test`.`t2`.`x` = 1

drop table t1, t2;

# MDEV-19820: use of rowid filter for innodb table without primary key

create table t1 (a int, b int, key (b), key (a)) engine=innodb;

More results from queries before failure can be found in /tmp/tmp.enF8Q76mMp/var/1/log/rowid_filter_innodb.log

Server [mysqld.1 - pid: 6392, winpid: 6392, exit: 256] failed during test run

Server log from this test:

----------SERVER LOG START-----------

$ /usr/sbin/mariadbd --defaults-group-suffix=.1 --defaults-file=/tmp/tmp.enF8Q76mMp/var/1/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --loose-sequence --loose-sequence --core-file --loose-debug-sync-timeout=300

2023-02-23  5:11:26 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32186)

2023-02-23  5:11:26 0 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)

2023-02-23  5:11:26 0 [Note] Starting MariaDB 10.11.2-MariaDB-1-log source revision  as process 6393

2023-02-23  5:11:26 0 [Note] Plugin 'partition' is disabled.

2023-02-23  5:11:26 0 [Note] InnoDB: Compressed tables use zlib 1.2.13

2023-02-23  5:11:26 0 [Note] InnoDB: Number of transaction pools: 1

2023-02-23  5:11:26 0 [Note] InnoDB: Using POWER8 crc32 instructions

2023-02-23  5:11:26 0 [Note] InnoDB: Using liburing

2023-02-23  5:11:26 0 [Note] InnoDB: Initializing buffer pool, total size = 8.000MiB, chunk size = 1.000MiB

2023-02-23  5:11:26 0 [Note] InnoDB: Completed initialization of buffer pool

2023-02-23  5:11:26 0 [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)

2023-02-23  5:11:27 0 [Note] InnoDB: 128 rollback segments are active.

2023-02-23  5:11:27 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...

2023-02-23  5:11:27 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.

2023-02-23  5:11:27 0 [Note] InnoDB: log sequence number 46990; transaction id 16

2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_FT_CONFIG' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_SYS_TABLESTATS' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_FT_DELETED' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_CMP' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'THREAD_POOL_WAITS' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_CMP_RESET' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'THREAD_POOL_QUEUES' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'FEEDBACK' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_FT_INDEX_TABLE' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'THREAD_POOL_GROUPS' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_CMP_PER_INDEX_RESET' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_FT_INDEX_CACHE' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_FT_BEING_DELETED' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_CMPMEM_RESET' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_FT_DEFAULT_STOPWORD' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_SYS_TABLESPACES' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'user_variables' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'INNODB_TABLESPACES_ENCRYPTION' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'THREAD_POOL_STATS' is disabled.

2023-02-23  5:11:27 0 [Note] Plugin 'unix_socket' is disabled.

2023-02-23  5:11:27 0 [Warning] /usr/sbin/mariadbd: unknown variable 'loose-feedback-debug-startup-interval=20'

2023-02-23  5:11:27 0 [Warning] /usr/sbin/mariadbd: unknown variable 'loose-feedback-debug-first-interval=60'

2023-02-23  5:11:27 0 [Warning] /usr/sbin/mariadbd: unknown variable 'loose-feedback-debug-interval=60'

2023-02-23  5:11:27 0 [Warning] /usr/sbin/mariadbd: unknown option '--loose-pam-debug'

2023-02-23  5:11:27 0 [Warning] /usr/sbin/mariadbd: unknown option '--loose-aria'

2023-02-23  5:11:27 0 [Warning] /usr/sbin/mariadbd: unknown variable 'loose-debug-sync-timeout=300'

2023-02-23  5:11:27 0 [Note] InnoDB: Loading buffer pool(s) from /tmp/tmp.enF8Q76mMp/var/1/mysqld.1/data/ib_buffer_pool

2023-02-23  5:11:27 0 [Note] Server socket created on IP: '127.0.0.1'.

2023-02-23  5:11:27 0 [Note] /usr/sbin/mariadbd: ready for connections.

Version: '10.11.2-MariaDB-1-log'  socket: '/tmp/tmp.enF8Q76mMp/tmp/1/mysqld.1.sock'  port: 16000  Debian n/a

2023-02-23  5:11:27 0 [Note] InnoDB: Buffer pool(s) load completed at 230223  5:11:27

2023-02-23  5:11:30 4 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=289]. You may have to recover from a backup.

2023-02-23  5:11:30 4 [Note] InnoDB: Page dump (16384 bytes):

2023-02-23  5:11:30 4 [Note] InnoDB: 0000000000000121ffffffffffffffff00000000002ea01f0006000000000000

2023-02-23  5:11:30 4 [Note] InnoDB: 000000000000000000000000000000000002000001a80330000001a800780000

2023-02-23  5:11:30 4 [Note] InnoDB: 0000000000f321f2000001a8ffffffffffffffffffffffffffffffffffffffff

2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

2023-02-23  5:11:30 4 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

...

Full logs:
Failing test(s): main.index_merge_innodb main.rowid_filter_innodb main.bootstrap_innodb main.parser_bug21114_innodb
Failing test(s): main.xa_prepared_binlog_off
Failing test(s): main.ctype_utf8mb4_innodb main.range_vs_index_merge_innodb
Failing test(s): main.innodb_ext_key main.index_merge_innodb
Failing test(s): main.innodb_ext_key
Failing test(s): main.index_merge_innodb
Failing test(s): main.rowid_filter_innodb main.range_vs_index_merge_innodb
Failing test(s): main.innodb_ext_key

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

image-2023-02-25-13-11-25-771.png
332 kB
2023-02-25 21:11
screenshot-1.png
59 kB
2023-03-24 17:09
screenshot-2.png
71 kB
2023-03-24 17:09

Issue Links

causes

MDBF-542 kernel upgrade on ppc64le-db-bbw1

Closed

relates to

MDEV-29832 rpl.rpl_semi_sync_after_sync_row frequently fails

Open

MDEV-32170 Frequent 'InnoDB: Database page corruption on disk' on MariaDB 10.11 on Debian Bookworm on arch ppc64el

Closed

links to

Debian bug 1032104

Activity

Ascending order - Click to sort in descending order

View 38 older comments

Timothy Pearson added a comment - 2023-11-18 23:48

Well, in the end it ended up being even deeper:

https://lore.kernel.org/linuxppc-dev/1105090647.48374193.1700351103830.JavaMail.zimbra@raptorengineeringinc.com/T/#u

I'm confident it's now resolved, and the explaination makes a lot more sense. Was quite a journey to get there!

Timothy Pearson added a comment - 2023-11-18 23:48 Well, in the end it ended up being even deeper: https://lore.kernel.org/linuxppc-dev/1105090647.48374193.1700351103830.JavaMail.zimbra@raptorengineeringinc.com/T/#u I'm confident it's now resolved, and the explaination makes a lot more sense. Was quite a journey to get there!

Daniel Black added a comment - 2023-11-20 00:45 - edited

Wow, yes, makes more sense.

I read through the very long thread on debugging:

https://lore.kernel.org/all/480932026.45576726.1699374859845.JavaMail.zimbra@raptorengineeringinc.com/

My deepest, deepest thanks to you and Jens for this massive effort.

I think marko and I where 90%+ sure it was in ppc kernel. Per offlist some rr like capability would have been able to identify maybe some state changes.

Summary of debugging:

1. kernel bisect
1. validation by smt1 pinning
1. alternating testing from kernel regression point and latest
1. experimentation with timing delays, pinning, kernel worker thread recreation

Taking notes of concepts mentioned to take into account inside MariaDB.

To check

write safety

https://lore.kernel.org/all/727263b4-2f8c-44bf-b7b8-5a936718df76@kernel.dk/ - previously submitted writes get written? Crash safety in mariadb is the ordering of those commited, we wait until response before a user OK. Hope we're not leaving kernel in unstable state during termination, but maybe that's just kernel problem.

expectations: https://lore.kernel.org/all/7bd67c30-3b9f-4541-864a-b082917ee8b5@kernel.dk/

I think we're right.

signal termination - write doesn't complete

https://lore.kernel.org/all/431496020.45688461.1699416044413.JavaMail.zimbra@raptorengineeringinc.com/

SIGKILL was MTR issued

I think you worked this out and this seems right:

https://lore.kernel.org/all/469846692.45828995.1699469480871.JavaMail.zimbra@raptorengineeringinc.com/

exec with pending IO

https://lore.kernel.org/all/f1761483-1f59-42bd-b556-ad3dde998dbc@kernel.dk/

exec with pending IO will cancel pending IO

implications with galera notify and sst in general?

mix of uring write and async read on same file

https://lore.kernel.org/all/414471427.46558741.1699728159797.JavaMail.zimbra@raptorengineeringinc.com/

odd

but ok, just don't mix buffered and O_DIRECT - https://lore.kernel.org/all/cb7b2bc1-3cf0-43b4-ad66-c35f9c59cbb9@kernel.dk/

"so we're in potentially less-tested territory. " https://lore.kernel.org/all/1962251239.46563579.1699729495314.JavaMail.zimbra@raptorengineeringinc.com/

stale read - https://lore.kernel.org/all/1626632f-b190-48e4-b04d-a2d5a3d2083b@kernel.dk/

Daniel Black added a comment - 2023-11-20 00:45 - edited Wow, yes, makes more sense. I read through the very long thread on debugging: https://lore.kernel.org/all/480932026.45576726.1699374859845.JavaMail.zimbra@raptorengineeringinc.com/ My deepest, deepest thanks to you and Jens for this massive effort. I think marko and I where 90%+ sure it was in ppc kernel. Per offlist some rr like capability would have been able to identify maybe some state changes. Summary of debugging: 1. kernel bisect 1. validation by smt1 pinning 1. alternating testing from kernel regression point and latest 1. experimentation with timing delays, pinning, kernel worker thread recreation Taking notes of concepts mentioned to take into account inside MariaDB. To check write safety https://lore.kernel.org/all/727263b4-2f8c-44bf-b7b8-5a936718df76@kernel.dk/ - previously submitted writes get written? Crash safety in mariadb is the ordering of those commited, we wait until response before a user OK. Hope we're not leaving kernel in unstable state during termination, but maybe that's just kernel problem. expectations: https://lore.kernel.org/all/7bd67c30-3b9f-4541-864a-b082917ee8b5@kernel.dk/ I think we're right. signal termination - write doesn't complete https://lore.kernel.org/all/431496020.45688461.1699416044413.JavaMail.zimbra@raptorengineeringinc.com/ SIGKILL was MTR issued I think you worked this out and this seems right: https://lore.kernel.org/all/469846692.45828995.1699469480871.JavaMail.zimbra@raptorengineeringinc.com/ exec with pending IO https://lore.kernel.org/all/f1761483-1f59-42bd-b556-ad3dde998dbc@kernel.dk/ exec with pending IO will cancel pending IO implications with galera notify and sst in general? mix of uring write and async read on same file https://lore.kernel.org/all/414471427.46558741.1699728159797.JavaMail.zimbra@raptorengineeringinc.com/ odd but ok, just don't mix buffered and O_DIRECT - https://lore.kernel.org/all/cb7b2bc1-3cf0-43b4-ad66-c35f9c59cbb9@kernel.dk/ "so we're in potentially less-tested territory. " https://lore.kernel.org/all/1962251239.46563579.1699729495314.JavaMail.zimbra@raptorengineeringinc.com/ stale read - https://lore.kernel.org/all/1626632f-b190-48e4-b04d-a2d5a3d2083b@kernel.dk/

Timothy Pearson added a comment - 2023-11-20 03:25 - edited

It gets even better since I ended up starting a second, much smaller debug thead here:

https://lore.kernel.org/linuxppc-dev/480221078.47953493.1700206777956.JavaMail.zimbra@raptorengineeringinc.com/

That was after I recognized the data I was seeing in a debugger [1], after one of my test modifications managed to inadvertently crash mariadb within a few hundred instructions of the presumed race (long story short, basically glibc sprintf() calls a lot of VSX instructions when run in a tight loop, which is perfect to provoke the issue).

In hindsight, I was closer than I thought a week ago – I knew the corrupt data pattern was 128 bits long, but just didn't put two and two together regarding that being the length of our vector registers. Not that it would have helped much – even in that message I was still slightly off, there were two similar code sections and the other one that used vs0 was the actual canary. The rest of the commentary there is correct, however.

I'm honestly not sure what other tools I could have used, given the race condition. The only thought I have right now is a brand new tool written in assembler that basically writes random patterns to registers as fast as possible, and checks for register corruption every iteration. Either way, one doesn't expect to find bugs in 20 year old working kernel code, which is a testament to how well the Linux kernel is designed and tested overall.

[1] 0x82004000 is the normal content of the FPSCR under most conditions

Timothy Pearson added a comment - 2023-11-20 03:25 - edited It gets even better since I ended up starting a second, much smaller debug thead here: https://lore.kernel.org/linuxppc-dev/480221078.47953493.1700206777956.JavaMail.zimbra@raptorengineeringinc.com/ That was after I recognized the data I was seeing in a debugger [1] , after one of my test modifications managed to inadvertently crash mariadb within a few hundred instructions of the presumed race (long story short, basically glibc sprintf() calls a lot of VSX instructions when run in a tight loop, which is perfect to provoke the issue). In hindsight, I was closer than I thought a week ago – I knew the corrupt data pattern was 128 bits long, but just didn't put two and two together regarding that being the length of our vector registers. Not that it would have helped much – even in that message I was still slightly off, there were two similar code sections and the other one that used vs0 was the actual canary. The rest of the commentary there is correct, however. I'm honestly not sure what other tools I could have used, given the race condition. The only thought I have right now is a brand new tool written in assembler that basically writes random patterns to registers as fast as possible, and checks for register corruption every iteration. Either way, one doesn't expect to find bugs in 20 year old working kernel code, which is a testament to how well the Linux kernel is designed and tested overall. [1] 0x82004000 is the normal content of the FPSCR under most conditions

Daniel Black added a comment - 2023-12-03 04:11

Merged into Linus Tree.

Will be part of Kernel 6.7

Daniel Black added a comment - 2023-12-03 04:11 Merged into Linus Tree . Will be part of Kernel 6.7

Daniel Black added a comment - 2023-12-20 01:45

So finally not so much a NOTABUG as NOTOURBUG.

Daniel Black added a comment - 2023-12-20 01:45 So finally not so much a NOTABUG as NOTOURBUG.

MariaDB Server

Frequent 'InnoDB: Database page corruption on disk' on MariaDB 10.11 in Debian autopktest on arch ppc64el

Details

Description

Attachments

Attachments

Issue Links

Activity

To check

write safety

signal termination - write doesn't complete

SIGKILL was MTR issued

exec with pending IO

mix of uring write and async read on same file

People

Dates

Git Integration