[MDEV-32011] 'aio write' returned OS error 222. Cannot continue operation Created: 2023-08-25  Updated: 2023-10-30  Resolved: 2023-10-30

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.7
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: mansi dadheech Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Environment:

NAME="Red Hat Enterprise Linux"
VERSION="8.6 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.6"


Attachments: PNG File screenshot-1.png     PNG File screenshot-2.png     PNG File screenshot-3.png    
Issue Links:
Duplicate
duplicates MDEV-27593 Crashing on I/O error is unhelpful Closed

 Description   

Our environment have three DB replication cluster including 2 master nodes and 1 slave. [When writer points to 1st database 2nd one will behave like a slave]. The issue is on both the slave nodes mysqld is getting crashed with the following error in /var/log/messages

Aug 24 10:55:18 ALEPO-DB2-ILB NetworkManager[1266]: <info>  [1692856518.8257] dhcp4 (enp9s0): canceled DHCP transaction
Aug 24 10:59:13 ALEPO-DB2-ILB systemd[1]: Started Process Core Dump (PID 128368/UID 0).
Aug 24 10:59:13 ALEPO-DB2-ILB systemd-coredump[128369]: Resource limits disable core dumping for process 4079310 (mariadbd).
Aug 24 10:59:13 ALEPO-DB2-ILB systemd-coredump[128369]: Process 4079310 (mariadbd) of user 993 dumped core.
Aug 24 10:59:13 ALEPO-DB2-ILB systemd[1]: systemd-coredump@10-128368-0.service: Succeeded.
Aug 24 10:59:14 ALEPO-DB2-ILB systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT
Aug 24 10:59:14 ALEPO-DB2-ILB systemd[1]: mariadb.service: Failed with result 'signal'.
Aug 24 10:59:19 ALEPO-DB2-ILB systemd[1]: mariadb.service: Service RestartSec=5s expired, scheduling restart.
Aug 24 10:59:19 ALEPO-DB2-ILB systemd[1]: mariadb.service: Scheduled restart job, restart counter is at 2.
Aug 24 10:59:19 ALEPO-DB2-ILB systemd[1]: Stopped MariaDB 10.7.4 database server.
Aug 24 10:59:19 ALEPO-DB2-ILB systemd[1]: Starting MariaDB 10.7.4 database server...
Aug 24 10:59:19 ALEPO-DB2-ILB mariadbd[128596]: 2023-08-24 10:59:19 0 [Note] /usr/sbin/mariadbd (server 10.7.4-MariaDB-log) starting as process 128596 ...
Aug 24 10:59:19 ALEPO-DB2-ILB mariadbd[128596]: 2023-08-24 10:59:19 0 [Warning] Could not increase number of max_open_files to more than 32768 (request: 33629)

And at the same time, mysqld log shows this issue:

2023-08-24 10:59:13 0 [ERROR] InnoDB: Operating system error number 22 in a file operation.
2023-08-24 10:59:13 0 [ERROR] InnoDB: Error number 22 means 'Invalid argument'
2023-08-24 10:59:13 0 [Note] InnoDB: Some operating system error numbers are described at https://mariadb.com/kb/en/library/operating-system-error-codes/
2023-08-24 10:59:13 0 [ERROR] InnoDB: File ./aaadb/authenticationlog#P#p202308.ibd: 'aio write' returned OS error 222. Cannot continue operation
230824 10:59:13 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.7.4-MariaDB-log
key_buffer_size=67108864
read_buffer_size=131072
max_used_connections=8
max_threads=802
thread_count=15
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1831745 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
??:0(my_print_stacktrace)[0x55f4b120e31e]
??:0(handle_fatal_signal)[0x55f4b0d09755]
??:0(__restore_rt)[0x7f387467dcf0]
:0(__GI_raise)[0x7f38739d8aff]
:0(__GI_abort)[0x7f38739abea5]
??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55f4b0968add]
??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, bool> std::_Rb_tree<unsigned int, std::pair<unsigned int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::_Select1st<std::pair<unsigned int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, std::__cxx??:0(std::thread::thread<void (&)(), , void>(void (&)()))[0x55f4b114262e]
??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, bool> >, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, bool>, std::_Select1st<std::pair<unsigned long const, bool> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, bool> > >::_M_emplace_unique<unsigned long&, bool>(unsigned long&, bool&&))[0x55f4b1102ad2]
??:0(std::thread::thread<void (&)(), , void>(void (&)()))[0x55f4b1142abc]
??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, bool> std::_Rb_tree<unsigned int, std::pair<unsigned int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::_Select1st<std::pair<unsigned int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, std::__cxx??:0(tpool::task_group::execute(tpool::task*))[0x55f4b11a333c]
??:0(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0x55f4b11a220f]
??:0(std::error_code::default_error_condition() const)[0x7f3874194ba3]
??:0(start_thread)[0x7f38746731ca]
:0(__GI___clone)[0x7f38739c3e73]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             2061710              2061710              processes
Max open files            32768                32768                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       2061710              2061710              signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e



 Comments   
Comment by Daniel Black [ 2023-08-25 ]

What storage type and filesystem are in use by your datadir?

Comment by Marko Mäkelä [ 2023-08-25 ]

I believe that the crash has likely been fixed by MDEV-27593.

Still, it would be very nice to know what might have caused the 100+100+EINVAL to be returned. Was this with the MDEV-24854 default innodb_flush_method=O_DIRECT? What is the innodb_page_size? Is ROW_FORMAT=COMPRESSED being used on the table authenticationlog? If yes, with which KEY_BLOCK_SIZE?

Comment by mansi dadheech [ 2023-08-25 ]

Storage type is 10k SAS RAID10 and filesystem of datadir is
/dev/mapper/rhel-var ext4 5.7T 506G 4.9T 10% /var


Let me know if anything else is required. Thanks

Comment by Daniel Black [ 2023-08-26 ]

So confirming that SHOW CREATE TABLE authenticationlog also shows the ROW_FORMAT=DYNAMIC?

For the rhel/var LVM - if you look at the pvs the physical volume, navigate to /sys/dev/block/DEVICE then cat queue/physical_block_size what does that show?

innodb_flush_method=flush is probably the workaround for you.

Note 10.7.5 would have included this change that provides more information on what was attempted. If you are able to bump to this or later version to test that would be useful.

Also note that 10.7 is out of maintainence.

Comment by mansi dadheech [ 2023-08-28 ]

Thanks,
Yes, the authentication table also has the same format as we are going with the default one only.
And innodb_flush_method=flush, is "flush" an accurate value in this configuration, as I can see that no such value is available in the official documentation.
please find the physical block size output.

Comment by Marko Mäkelä [ 2023-08-28 ]

I think that danblack intended to write innodb_flush_method=fsync. The default value was changed in MDEV-24854. In MDEV-30136 (MariaDB Server 11.0), the parameter was refactored into 4 Boolean parameters that can be updated while the server is running.

Comment by Sergei Golubchik [ 2023-10-02 ]

mansi, did you try innodb_flush_method=fsync as a workaround as suggested above? Alternatively you could've upgraded to any maintained version and try there.

Generated at Thu Feb 08 10:28:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.