new bb kernel version on aarch64-debian-11 builder: 4.18.0-394.el8.aarch64 (that has no failures of the type of this MDEV. No Debug builds are being run which limits, but not eliminates the range of encryption.innodb test
2022-09-22 22:26:45 0 [Warning] mariadbd: io_uring_queue_init() failed with ENOMEM: try larger memory locked limit, ulimit -l, or https://mariadb.com/kb/en/systemd/#configuring-limitmemlock under systemd
2022-09-22 22:26:45 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF
Daniel Black
added a comment - - edited new bb kernel version on aarch64-debian-11 builder: 4.18.0-394.el8.aarch64 (that has no failures of the type of this MDEV. No Debug builds are being run which limits, but not eliminates the range of encryption.innodb test
kvm-deb-bullseye-aarch64 5.10.0-17-arm64 (5.10.136)
debian has 5.10.0-18 / 5.10.140-1
fs: Add missing umask strip in vfs_tmpfile
blk-mq: fix io hung due to missing commit_rqs
Can't tell if relent.
bb.org: e.g: https://ci.mariadb.org/28579/logs/aarch64-debian-11/mysqld.1.err.6
2022-09-22 22:26:45 0 [Warning] mariadbd: io_uring_queue_init() failed with ENOMEM: try larger memory locked limit, ulimit -l, or https://mariadb.com/kb/en/systemd/#configuring-limitmemlock under systemd
2022-09-22 22:26:45 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF
amd64-sid failure was a Debian kernel (seems bullseye too) 5.10.0-13-amd64, the latest crashing fault, (~1 month before failure (2 weeks ago)) on bz-bbw2 .
Daniel Black
added a comment - amd64-sid failure was a Debian kernel (seems bullseye too) 5.10.0-13-amd64, the latest crashing fault, (~1 month before failure (2 weeks ago)) on bz-bbw2 .
With memlock limits raised on bb.org, more bb workers like ppc64le-ubuntu-2204, ppc64le-rhel-9 are showing same test assertions, common aspect (though very early). All have Debian 5.10 kernels
Daniel Black
added a comment - With memlock limits raised on bb.org, more bb workers like ppc64le-ubuntu-2204 , ppc64le-rhel-9 are showing same test assertions, common aspect (though very early). All have Debian 5.10 kernels
Got it on my local Debian 11 (amd64), coincidentally(?) next day after upgrade from 5.10.0-14 to 5.10.0-18. One-time occurrence so far, on MariaDB 10.11, uring, tmpfs.
Elena Stepanova
added a comment - - edited Got it on my local Debian 11 (amd64), coincidentally(?) next day after upgrade from 5.10.0-14 to 5.10.0-18. One-time occurrence so far, on MariaDB 10.11, uring, tmpfs.
I am not sure if this is related, but I though to mention it. mleich asked me to check a core dump of a server that had been hung for 15 minutes, doing crash recovery. The culprit seemed to be that 2 asynchronous page reads via io_uring never completed. I suppose that this was run on /dev/shm, like the majority of his tests.
The kernel is much older, though:
marko@pluto:~$ cat /proc/version
Linux version 5.4.0-94-generic (buildd@lcy02-amd64-053) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #106-Ubuntu SMP Thu Jan 6 23:58:14 UTC 2022
If I remember correctly, this is with a manually built liburing, because Ubuntu 20.04 does not include a package of it.
Here is the interesting part of the stack traces:
#7 0x0000556a3937b1ad in safe_cond_wait (cond=cond@entry=0x556a3a913890 <recv_sys+176>, mp=mp@entry=0x556a3a9137e0 <recv_sys>, file=file@entry=0x556a39c81920 "/data/Server/bb-10.11-MDEV-30216-MDEV-26790/storage/innobase/log/log0recv.cc", line=line@entry=3479) at /data/Server/bb-10.11-MDEV-30216-MDEV-26790/mysys/thr_mutex.c:494
#8 0x0000556a38a7a731 in recv_sys_t::apply (this=0x556a3a9137e0 <recv_sys>, last_batch=last_batch@entry=true) at /data/Server/bb-10.11-MDEV-30216-MDEV-26790/storage/innobase/log/log0recv.cc:3479
#9 0x0000556a38d9f021 in srv_start (create_new_db=<optimized out>) at /data/Server/bb-10.11-MDEV-30216-MDEV-26790/storage/innobase/srv/srv0start.cc:1565
#10 0x0000556a388c9025 in innodb_init (p=<optimized out>) at /data/Server/bb-10.11-MDEV-30216-MDEV-26790/storage/innobase/handler/ha_innodb.cc:4157
This is waiting for recv_sys.cond and for recv_sys.pages to become empty. There were two pending page read requests:
These reads were for tablespace 39 (./test/t2.ibd), pages 0 and 3. One thread was waiting in io_uring_wait_cqe(), and the buf_flush_page_cleaner() thread was idle, as expected.
Marko Mäkelä
added a comment - I am not sure if this is related, but I though to mention it. mleich asked me to check a core dump of a server that had been hung for 15 minutes, doing crash recovery. The culprit seemed to be that 2 asynchronous page reads via io_uring never completed. I suppose that this was run on /dev/shm , like the majority of his tests.
The kernel is much older, though:
marko@pluto:~$ cat /proc/version
Linux version 5.4.0-94-generic (buildd@lcy02-amd64-053) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #106-Ubuntu SMP Thu Jan 6 23:58:14 UTC 2022
If I remember correctly, this is with a manually built liburing , because Ubuntu 20.04 does not include a package of it.
Here is the interesting part of the stack traces:
#7 0x0000556a3937b1ad in safe_cond_wait (cond=cond@entry=0x556a3a913890 <recv_sys+176>, mp=mp@entry=0x556a3a9137e0 <recv_sys>, file=file@entry=0x556a39c81920 "/data/Server/bb-10.11-MDEV-30216-MDEV-26790/storage/innobase/log/log0recv.cc", line=line@entry=3479) at /data/Server/bb-10.11-MDEV-30216-MDEV-26790/mysys/thr_mutex.c:494
#8 0x0000556a38a7a731 in recv_sys_t::apply (this=0x556a3a9137e0 <recv_sys>, last_batch=last_batch@entry=true) at /data/Server/bb-10.11-MDEV-30216-MDEV-26790/storage/innobase/log/log0recv.cc:3479
#9 0x0000556a38d9f021 in srv_start (create_new_db=<optimized out>) at /data/Server/bb-10.11-MDEV-30216-MDEV-26790/storage/innobase/srv/srv0start.cc:1565
#10 0x0000556a388c9025 in innodb_init (p=<optimized out>) at /data/Server/bb-10.11-MDEV-30216-MDEV-26790/storage/innobase/handler/ha_innodb.cc:4157
This is waiting for recv_sys.cond and for recv_sys.pages to become empty. There were two pending page read requests:
(gdb) p recv_sys.pages
$8 = std::map with 2 elements = {[{m_id = 167503724544}] = {state = page_recv_t::RECV_BEING_READ, last_offset = 1, log = {head = 0x7f2bd98d2558, tail = 0x7f2bd994d868}}, [{m_id = 167503724547}] = {state = page_recv_t::RECV_BEING_READ, last_offset = 1, log = {head = 0x7f2bd9984220, tail = 0x7f2bd9a75600}}}
These reads were for tablespace 39 ( ./test/t2.ibd ), pages 0 and 3. One thread was waiting in io_uring_wait_cqe() , and the buf_flush_page_cleaner() thread was idle, as expected.
danblack we have upgraded our 2 main ARM64 builder to the latest stable kernel 5.10.0-20-arm64 > 5.10.0-21-arm64.
Also as discussed with marko, we could probably also use backported kernels on that builders, branch 6.0 or 6.1.
What do you think?
Faustin Lammler
added a comment - danblack we have upgraded our 2 main ARM64 builder to the latest stable kernel 5.10.0-20-arm64 > 5.10.0-21-arm64.
Also as discussed with marko , we could probably also use backported kernels on that builders, branch 6.0 or 6.1.
What do you think?
I just reproduced this on my system (AMD64, not ARMv8):
Linux version 6.1.0-6-amd64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.15-1 (2023-03-05)
I was running 48 concurrent instances of the test innodb.undo_truncate,4k on a 10.6 RelWithDebInfo build.
innodb.undo_truncate '2,4k,innodb' w9 [ 38 fail ] Found warnings/errors in server log file!
Test ended at 2023-03-21 14:02:02
line
2023-03-21 14:01:59 0 [ERROR] [FATAL] InnoDB: IO Error: 125 during async write of 4096 bytes, for file 76, returned 0
2023-03-21 14:01:59 0 [ERROR] [FATAL] InnoDB: IO Error: 125 during async write of 4096 bytes, for file 76, returned 0
Attempting backtrace. You can use the following information to find out
^ Found warnings in /dev/shm/10.6g/mysql-test/var/9/log/mysqld.1.err
Actually that fatal error was output before the server had been restarted, but mtr ignored those crashes and did not preserve any core dumps for them. The stack trace is due to a different error that was not listed in the summary above:
2023-03-21 14:02:01 0 [ERROR] [FATAL] InnoDB: Trying to write 4096 bytes at 6815744 outside the bounds of the file: .//undo001
10.6 535a38c4559160c7a46f0d9ece52495b83718d45
#7 0x000055f590502435 in fil_invalid_page_access_msg (is_read=<optimized out>, len=94513198740384, offset=6815744, name=<optimized out>) at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:2661
len=len@entry=4096, buf=<optimized out>, bpage=bpage@entry=0x0) at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:2737
#9 0x000055f5904f50c4 in fil_space_t::flush_freed (this=this@entry=0x55f591a333a0, writable=writable@entry=true) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1036
#10 0x000055f5904f635b in buf_flush_space (id=1) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1187
#11 buf_do_flush_list_batch (lsn=<optimized out>, max_n=<optimized out>) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1449
#12 buf_flush_list_holding_mutex (lsn=<optimized out>, max_n=<optimized out>) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1547
#13 buf_flush_list_holding_mutex (max_n=<optimized out>, lsn=<optimized out>) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1528
#14 0x000055f5904f867a in buf_flush_list (max_n=2000, lsn=lsn@entry=18446744073709551615) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1573
#15 0x000055f5904f890f in buf_flush_buffer_pool () at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:2465
#16 0x000055f5905aed2e in logs_empty_and_mark_files_at_shutdown () at /mariadb/10.6/storage/innobase/log/log0log.cc:1177
Marko Mäkelä
added a comment - I just reproduced this on my system (AMD64, not ARMv8):
Linux version 6.1.0-6-amd64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.15-1 (2023-03-05)
I was running 48 concurrent instances of the test innodb.undo_truncate,4k on a 10.6 RelWithDebInfo build.
innodb.undo_truncate '2,4k,innodb' w9 [ 38 fail ] Found warnings/errors in server log file!
Test ended at 2023-03-21 14:02:02
line
2023-03-21 14:01:59 0 [ERROR] [FATAL] InnoDB: IO Error: 125 during async write of 4096 bytes, for file 76, returned 0
2023-03-21 14:01:59 0 [ERROR] [FATAL] InnoDB: IO Error: 125 during async write of 4096 bytes, for file 76, returned 0
Attempting backtrace. You can use the following information to find out
^ Found warnings in /dev/shm/10.6g/mysql-test/var/9/log/mysqld.1.err
Actually that fatal error was output before the server had been restarted, but mtr ignored those crashes and did not preserve any core dumps for them. The stack trace is due to a different error that was not listed in the summary above:
2023-03-21 14:02:01 0 [ERROR] [FATAL] InnoDB: Trying to write 4096 bytes at 6815744 outside the bounds of the file: .//undo001
10.6 535a38c4559160c7a46f0d9ece52495b83718d45
#7 0x000055f590502435 in fil_invalid_page_access_msg (is_read=<optimized out>, len=94513198740384, offset=6815744, name=<optimized out>) at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:2661
#8 fil_space_t::io (this=this@entry=0x55f591a333a0, type=@0x7ffc60d53080: {bpage = 0x0, slot = 0x0, node = 0x0, type = IORequest::WRITE_ASYNC}, offset=<optimized out>, offset@entry=6815744,
len=len@entry=4096, buf=<optimized out>, bpage=bpage@entry=0x0) at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:2737
#9 0x000055f5904f50c4 in fil_space_t::flush_freed (this=this@entry=0x55f591a333a0, writable=writable@entry=true) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1036
#10 0x000055f5904f635b in buf_flush_space (id=1) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1187
#11 buf_do_flush_list_batch (lsn=<optimized out>, max_n=<optimized out>) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1449
#12 buf_flush_list_holding_mutex (lsn=<optimized out>, max_n=<optimized out>) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1547
#13 buf_flush_list_holding_mutex (max_n=<optimized out>, lsn=<optimized out>) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1528
#14 0x000055f5904f867a in buf_flush_list (max_n=2000, lsn=lsn@entry=18446744073709551615) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:1573
#15 0x000055f5904f890f in buf_flush_buffer_pool () at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:2465
#16 0x000055f5905aed2e in logs_empty_and_mark_files_at_shutdown () at /mariadb/10.6/storage/innobase/log/log0log.cc:1177
Update: it looks like the fatal error 125 may have been output from another thread while some debug information was being produced for the first fatal error. There was no actual stack trace output for either thread:
2023-03-21 14:01:59 0 [ERROR] [FATAL] InnoDB: IO Error: 125 during async write of 4096 bytes, for file 76, returned 0
230321 14:01:59 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.6.13-MariaDB-log source revision: 142587071b22254a87b1378312b1cebaa8f3f11e
key_buffer_size=1048576
read_buffer_size=131072
2023-03-21 14:01:59 0 [ERROR] [FATAL] InnoDB: IO Error: 125 during async write of 4096 bytes, for file 76, returned 0
max_used_connections=3
max_threads=153
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63667 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
The source code revision is not being reported as 535a38c4559160c7a46f0d9ece52495b83718d45 because there were some unrelated code changes applied.
Marko Mäkelä
added a comment - Update: it looks like the fatal error 125 may have been output from another thread while some debug information was being produced for the first fatal error. There was no actual stack trace output for either thread:
2023-03-21 14:01:59 0 [Note] /dev/shm/10.6g/sql/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown
2023-03-21 14:01:59 0 [Note] InnoDB: FTS optimize thread exiting.
2023-03-21 14:01:59 0 [Note] InnoDB: Starting shutdown...
2023-03-21 14:01:59 0 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/10.6g/mysql-test/var/9/mysqld.1/data/ib_buffer_pool
2023-03-21 14:01:59 0 [Note] InnoDB: Restricted to 1478 pages due to innodb_buf_pool_dump_pct=25
2023-03-21 14:01:59 0 [Note] InnoDB: Buffer pool(s) dump completed at 230321 14:01:59
2023-03-21 14:01:59 0 [ERROR] [FATAL] InnoDB: IO Error: 125 during async write of 4096 bytes, for file 76, returned 0
230321 14:01:59 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.6.13-MariaDB-log source revision: 142587071b22254a87b1378312b1cebaa8f3f11e
key_buffer_size=1048576
read_buffer_size=131072
2023-03-21 14:01:59 0 [ERROR] [FATAL] InnoDB: IO Error: 125 during async write of 4096 bytes, for file 76, returned 0
max_used_connections=3
max_threads=153
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63667 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
The source code revision is not being reported as 535a38c4559160c7a46f0d9ece52495b83718d45 because there were some unrelated code changes applied.
I do not know if it is related to this, but I also observed 2 shutdown hangs in a RelWithDebInfo build (on write_slots) when running several innodb.undo_truncate,4k in parallel:
10.6 535a38c4559160c7a46f0d9ece52495b83718d45
#9 io_slots::wait (this=0x5621f0b76a50) at /mariadb/10.6/storage/innobase/os/os0file.cc:117
#10 io_slots::~io_slots (this=0x5621f0b76a50, __in_chrg=<optimized out>) at /mariadb/10.6/storage/innobase/os/os0file.cc:132
#11 os_aio_free () at /mariadb/10.6/storage/innobase/os/os0file.cc:3660
#12 0x00005621ee609e2d in innodb_shutdown () at /mariadb/10.6/storage/innobase/srv/srv0start.cc:1978
In one of the hangs that I checked, the buf_dblwr indicated no pending page writes. Log writes never are asynchronous (until MDEV-26603 is done).
Marko Mäkelä
added a comment - I do not know if it is related to this, but I also observed 2 shutdown hangs in a RelWithDebInfo build (on write_slots ) when running several innodb.undo_truncate,4k in parallel:
10.6 535a38c4559160c7a46f0d9ece52495b83718d45
#9 io_slots::wait (this=0x5621f0b76a50) at /mariadb/10.6/storage/innobase/os/os0file.cc:117
#10 io_slots::~io_slots (this=0x5621f0b76a50, __in_chrg=<optimized out>) at /mariadb/10.6/storage/innobase/os/os0file.cc:132
#11 os_aio_free () at /mariadb/10.6/storage/innobase/os/os0file.cc:3660
#12 0x00005621ee609e2d in innodb_shutdown () at /mariadb/10.6/storage/innobase/srv/srv0start.cc:1978
In one of the hangs that I checked, the buf_dblwr indicated no pending page writes. Log writes never are asynchronous (until MDEV-26603 is done).
>> I did see that one, and I think that's either specific to the old
>> 5.10 stable kernel (eg -21 won't have it), because it's cancelation
>> off thread exit.
>
> possible
>
>> Either that, or you're doing fork/exit with IO inflight.
>
> exit with IO inlight might be possible. fork much less likely.
It's exit, from the trace that I saw. Easily reproducible with just
adding a delay to the worker handling.
> I came in as https://jira.mariadb.org/browse/MDEV-29610 which I note
> Marko claims 6.1.15-1 and by the build date it might be the same era
(JIRA edit - my mistake - its a much later build)
> as -20, so since 6.1.16+ is out I assume its the same fix.
>
> thanks for the pointer
Yeah, if it's exit with IO pending, it'll be in any kernel. You should
not do that...
Daniel Black
added a comment - Got email from Jens Axobe
>> I did see that one, and I think that's either specific to the old
>> 5.10 stable kernel (eg -21 won't have it), because it's cancelation
>> off thread exit.
>
> possible
>
>> Either that, or you're doing fork/exit with IO inflight.
>
> exit with IO inlight might be possible. fork much less likely.
It's exit, from the trace that I saw. Easily reproducible with just
adding a delay to the worker handling.
> I came in as https://jira.mariadb.org/browse/MDEV-29610 which I note
> Marko claims 6.1.15-1 and by the build date it might be the same era
(JIRA edit - my mistake - its a much later build)
> as -20, so since 6.1.16+ is out I assume its the same fix.
>
> thanks for the pointer
Yeah, if it's exit with IO pending, it'll be in any kernel. You should
not do that...
danblack, any news on this? I see that 125 is ECANCELED in /usr/include/asm-generic/errno.h.
mleich got this on 5.15.0-52-generic #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux on an ext4 file system that was loopback mounted in a file that resided in /dev/shm. (Such indirection makes sense when using the older libaio; it does not support files residing directly on /dev/shm.)
Marko Mäkelä
added a comment - danblack , any news on this? I see that 125 is ECANCELED in /usr/include/asm-generic/errno.h .
mleich got this on 5.15.0-52-generic #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux on an ext4 file system that was loopback mounted in a file that resided in /dev/shm . (Such indirection makes sense when using the older libaio ; it does not support files residing directly on /dev/shm .)
# 2023-05-17T11:03:59 [216405] | 2023-05-17 11:03:55 0 [ERROR] [FATAL] InnoDB: IO Error: 125 during async lru write of 4096 bytes, for file 9, returned 0
danblack, do you have a patch that makes it easier to hit this bug?
Marko Mäkelä
added a comment - Yes, "doing fork/exit with IO inflight" seems to apply here:
# 2023-05-17T11:03:59 [216405] | 2023-05-17 11:03:55 0 [Note] /data/Server_bin/bb-10.6-MDEV-31234_RelWithDebInfo/bin/mysqld (initiated by: root[root] @ localhost [127.0.0.1]): Normal shutdown
# 2023-05-17T11:03:59 [216405] | 2023-05-17 11:03:55 0 [Note] InnoDB: FTS optimize thread exiting.
# 2023-05-17T11:03:59 [216405] | 2023-05-17 11:03:55 0 [ERROR] [FATAL] InnoDB: IO Error: 125 during async lru write of 4096 bytes, for file 9, returned 0
danblack , do you have a patch that makes it easier to hit this bug?
In MDEV-27593 I changed the wording of the message and made it a non-fatal error. Actually, I accidentally made it Note instead of ERROR, so this should not cause any mtr test failures anymore:
<< request.node->name << "(" << cb->m_fh << "), returned "
10.6 72928e640e62559f4d4b82e9ff1102bb72bd5828 with patch
var/9/log/mysqld.1.err:2023-08-01 15:27:09 0 [Note] InnoDB: IO Error: 42during write of 16384 bytes, for file ./ibdata1(11), returned 16384
Marko Mäkelä
added a comment - In MDEV-27593 I changed the wording of the message and made it a non-fatal error. Actually, I accidentally made it Note instead of ERROR , so this should not cause any mtr test failures anymore:
diff --git a/storage/innobase/os/os0file.cc b/storage/innobase/os/os0file.cc
index 906ce79e8fd..99f212b6eeb 100644
--- a/storage/innobase/os/os0file.cc
+++ b/storage/innobase/os/os0file.cc
@@ -3459,8 +3459,11 @@ static void write_io_callback(void *c)
const IORequest &request= *static_cast<const IORequest*>
(static_cast<const void*>(cb->m_userdata));
+ if (!ut_rnd_interval(100))
+ cb->m_err= 42;
+
if (UNIV_UNLIKELY(cb->m_err != 0))
- ib::info () << "IO Error: " << cb->m_err
+ ib::error() << "IO Error: " << cb->m_err
<< "during write of "
<< cb->m_len << " bytes, for file "
<< request.node->name << "(" << cb->m_fh << "), returned "
10.6 72928e640e62559f4d4b82e9ff1102bb72bd5828 with patch
var/9/log/mysqld.1.err:2023-08-01 15:27:09 0 [Note] InnoDB: IO Error: 42during write of 16384 bytes, for file ./ibdata1(11), returned 16384
2023-10-06 3:37:06 4 [Note] InnoDB: Importing tablespace for table 'test/tp' that was exported from host 'c8473bd36d6b'
2023-10-06 3:37:06 4 [Note] InnoDB: Phase I - Update all pages
2023-10-06 3:37:06 4 [Note] InnoDB: Sync to disk
2023-10-06 3:37:06 4 [Note] InnoDB: Sync to disk - done!
2023-10-06 3:37:06 4 [Note] InnoDB: Phase III - Flush changes to disk
2023-10-06 3:37:06 4 [Note] InnoDB: Phase IV - Flush complete
2023-10-06 3:37:06 4 [Note] InnoDB: `test`.`tp` autoinc value set to 2
2023-10-06 3:37:07 0 [Note] InnoDB: IO Error: 125during write of 491520 bytes, for file ./ibdata1(11), returned 0
----------SERVER LOG END-------------
mysqltest failed but provided no output
The server apparently hung here. The reason could be unrelated to the io_uring error, such as MDEV-32588.
Marko Mäkelä
added a comment - I got some logs of failures that occurred after MDEV-27593 was pushed into 10.6 (2003-08-01). I found only one failure that includes that fix:
bb-11.0-ycp-mdev-31996 d07012f4d84e57f1895f848b029fabd220b5d5be ppc64le-ubuntu-2204
encryption.innodb-checksum-algorithm '32k,ctr,innodb,strict_crc32' w8 [ fail ]
…
2023-10-06 3:37:06 4 [Note] InnoDB: Importing tablespace for table 'test/tp' that was exported from host 'c8473bd36d6b'
2023-10-06 3:37:06 4 [Note] InnoDB: Phase I - Update all pages
2023-10-06 3:37:06 4 [Note] InnoDB: Sync to disk
2023-10-06 3:37:06 4 [Note] InnoDB: Sync to disk - done!
2023-10-06 3:37:06 4 [Note] InnoDB: Phase III - Flush changes to disk
2023-10-06 3:37:06 4 [Note] InnoDB: Phase IV - Flush complete
2023-10-06 3:37:06 4 [Note] InnoDB: `test`.`tp` autoinc value set to 2
2023-10-06 3:37:07 0 [Note] InnoDB: IO Error: 125during write of 491520 bytes, for file ./ibdata1(11), returned 0
----------SERVER LOG END-------------
mysqltest failed but provided no output
The server apparently hung here. The reason could be unrelated to the io_uring error, such as MDEV-32588 .
A Linux kernel bug that was behind MDEV-30728 is specific to the POWER ISA. Therefore, this issue (errors on io_uring write completion callback, observed on ARMv8 and AMD64) should be unaffected by that fix.
Marko Mäkelä
added a comment - A Linux kernel bug that was behind MDEV-30728 is specific to the POWER ISA. Therefore, this issue (errors on io_uring write completion callback, observed on ARMv8 and AMD64) should be unaffected by that fix.
new bb kernel version on aarch64-debian-11 builder: 4.18.0-394.el8.aarch64 (that has no failures of the type of this MDEV. No Debug builds are being run which limits, but not eliminates the range of encryption.innodb test
kvm-deb-bullseye-aarch64 5.10.0-17-arm64 (5.10.136)
debian has 5.10.0-18 / 5.10.140-1
fs: Add missing umask strip in vfs_tmpfile
blk-mq: fix io hung due to missing commit_rqs
Can't tell if relent.
bb.org: e.g: https://ci.mariadb.org/28579/logs/aarch64-debian-11/mysqld.1.err.6
2022-09-22 22:26:45 0 [Warning] mariadbd: io_uring_queue_init() failed with ENOMEM: try larger memory locked limit, ulimit -l, or https://mariadb.com/kb/en/systemd/#configuring-limitmemlock under systemd
2022-09-22 22:26:45 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF