[MDEV-30572] main.large_pages 'innodb' fails on architecture hppa: InnoDB: Operating system error number 14 in a file operation - Jira

Otto Kekäläinen created issue - 2023-02-05 21:26

Otto Kekäläinen made changes - 2023-02-05 21:26

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-12039~~ [ ~~MDEV-12039~~ ]

Otto Kekäläinen made changes - 2023-02-05 21:27

Description

The official Debian builds of MariaDB 1:10.11.1-2 failed on Debian builders arch hppa with at https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-2&stamp=1675231483&raw=0 with:

{noformat}
main.large_pages 'innodb' w2 [ fail ]
Test ended at 2023-02-01 00:35:17
CURRENT_TEST: main.large_pages
Failed to start mysqld.1
mysqltest failed but provided no output
- found 'core' (0/5)
Trying 'dbx' to get a backtrace
Trying 'lldb' to get a backtrace from coredump /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/mysqld.1/data/core
Compressed file /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/mysqld.1/data/core
- saving '/<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/' to '/<<PKGBUILDDIR>>/builddir/mysql-test/var/log/main.large_pages-innodb/'
Retrying test main.large_pages, attempt(2/3)...
***Warnings generated in error logs during shutdown after running tests: main.large_pages
2023-02-01 0:35:16 0 [Warning] mariadbd: Couldn't allocate 8388608 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01 0:35:16 0 [Warning] mariadbd: Couldn't allocate 6291456 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01 0:35:16 0 [Warning] mariadbd: Couldn't allocate 6291456 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01 0:35:16 0 [Warning] mariadbd: Couldn't allocate 4194304 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01 0:35:16 0 [Warning] mariadbd: Couldn't allocate 4194304 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01 0:35:16 0 [Warning] InnoDB: Retry attempts for reading partial data failed.
2023-02-01 0:35:16 0 [ERROR] InnoDB: Operating system error number 14 in a file operation.
2023-02-01 0:35:16 0 [ERROR] InnoDB: Error number 14 means 'Bad address'
2023-02-01 0:35:16 0 [ERROR] InnoDB: File (unknown): 'read' returned OS error 214. Cannot continue operation
Attempting backtrace. You can use the following information to find out
{noformat}

The only other recorded case of OS error 14 was in ~~MDEV-12039~~.

The official Debian builds of MariaDB 1:10.11.1-2 failed on Debian builders arch hppa with at https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-2&stamp=1675231483&raw=0 with:

{noformat}
main.large_pages 'innodb' w2 [ fail ]
Test ended at 2023-02-01 00:35:17
CURRENT_TEST: main.large_pages
Failed to start mysqld.1
mysqltest failed but provided no output
- found 'core' (0/5)
Trying 'dbx' to get a backtrace
Trying 'lldb' to get a backtrace from coredump /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/mysqld.1/data/core
Compressed file /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/mysqld.1/data/core
- saving '/<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/' to '/<<PKGBUILDDIR>>/builddir/mysql-test/var/log/main.large_pages-innodb/'
Retrying test main.large_pages, attempt(2/3)...
***Warnings generated in error logs during shutdown after running tests: main.large_pages
2023-02-01 0:35:16 0 [Warning] mariadbd: Couldn't allocate 8388608 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01 0:35:16 0 [Warning] mariadbd: Couldn't allocate 6291456 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01 0:35:16 0 [Warning] mariadbd: Couldn't allocate 6291456 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01 0:35:16 0 [Warning] mariadbd: Couldn't allocate 4194304 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01 0:35:16 0 [Warning] mariadbd: Couldn't allocate 4194304 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01 0:35:16 0 [Warning] InnoDB: Retry attempts for reading partial data failed.
2023-02-01 0:35:16 0 [ERROR] InnoDB: Operating system error number 14 in a file operation.
2023-02-01 0:35:16 0 [ERROR] InnoDB: Error number 14 means 'Bad address'
2023-02-01 0:35:16 0 [ERROR] InnoDB: File (unknown): 'read' returned OS error 214. Cannot continue operation
Attempting backtrace. You can use the following information to find out
{noformat}

The only other recorded case of OS error 14 was in ~~MDEV-12039~~.

This and other hppa issues tracked downstream in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006529

Marko Mäkelä added a comment - 2023-02-06 04:48

The error code 22 should be EINVAL. The warning message that danblack clarified some time ago is somewhat misleading, because a smaller size would only be attempted if errno==ENOMEM.

errno=14 should be EFAULT. My first guess is that there is some fallback that allocates unaligned memory (instead of invoking memalign() or similar), and then an O_DIRECT file operation using an unaligned buffer will fail.

Marko Mäkelä added a comment - 2023-02-06 04:48 The error code 22 should be EINVAL . The warning message that danblack clarified some time ago is somewhat misleading, because a smaller size would only be attempted if errno==ENOMEM . errno=14 should be EFAULT . My first guess is that there is some fallback that allocates unaligned memory (instead of invoking memalign() or similar), and then an O_DIRECT file operation using an unaligned buffer will fail.

Marko Mäkelä made changes - 2023-02-06 04:48

Component/s		Server [ 13907 ]
Assignee		Daniel Black [ danblack ]

Otto Kekäläinen added a comment - 2023-02-06 05:15

For the record, latest build https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-3&stamp=1675654714&raw=0 of mariadb 1:10.11.1-3 passed but testsuite failed with:

main.mysqldump 'innodb'                  w25 [ fail ]

        Test ended at 2023-02-06 03:25:25

CURRENT_TEST: main.mysqldump

--- /<<PKGBUILDDIR>>/mysql-test/main/mysqldump.result 2022-11-14 18:10:21.000000000 +0000

+++ /<<PKGBUILDDIR>>/mysql-test/main/mysqldump.reject 2023-02-06 03:25:24.020036990 +0000

@@ -5524,7 +5524,7 @@

 proc

one

 DROP DATABASE bug25717383;

-mariadb-dump: Got error: 2005: "Unknown server host 'unknownhost'" when trying to connect

+mariadb-dump: Got error: 2002: "Can't connect to server on 'unknownhost'" when trying to connect

 mariadb-dump: Couldn't execute 'SHOW SLAVE STATUS': Server has gone away (2006)

 Usage: mariadb-dump [OPTIONS] database [tables]

 OR     mariadb-dump [OPTIONS] --databases DB1 [DB2 DB3...]

Otto Kekäläinen added a comment - 2023-02-06 05:15 For the record, latest build https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-3&stamp=1675654714&raw=0 of mariadb 1:10.11.1-3 passed but testsuite failed with: main.mysqldump 'innodb' w25 [ fail ] Test ended at 2023-02-06 03:25:25 CURRENT_TEST: main.mysqldump --- /<<PKGBUILDDIR>>/mysql-test/main/mysqldump.result 2022-11-14 18:10:21.000000000 +0000 +++ /<<PKGBUILDDIR>>/mysql-test/main/mysqldump.reject 2023-02-06 03:25:24.020036990 +0000 @@ -5524,7 +5524,7 @@ proc one DROP DATABASE bug25717383; -mariadb-dump: Got error: 2005: "Unknown server host 'unknownhost'" when trying to connect +mariadb-dump: Got error: 2002: "Can't connect to server on 'unknownhost'" when trying to connect mariadb-dump: Couldn't execute 'SHOW SLAVE STATUS': Server has gone away (2006) Usage: mariadb-dump [OPTIONS] database [tables] OR mariadb-dump [OPTIONS] --databases DB1 [DB2 DB3...]

Daniel Black added a comment - 2023-02-06 06:57

EINVAL in the memory allocation is an invalid length, so it look like hppa putting a wrong error code or populating /sys/kernel/mm/hugepages incorrectly (have I mentioned unsupported arch often enough?). As Marko pointed out, it will only try smaller if ENOMEM. So its returning null. (fixing message now).

Further down, EFAULT is "Bad address" so at least that's consistent.

O_DIRECT errors I'd expect EINVAL. So I really except EFAULT meaning if trying a read to a memory address that is invalid (null).

Could it be a log_t::attach has a ut_malloc_dontdump that fails and an assumption of buf allocated like what occurs in recv_sys_t::find_checkpoint?

mariadb-dump error ignored.

Daniel Black added a comment - 2023-02-06 06:57 EINVAL in the memory allocation is an invalid length, so it look like hppa putting a wrong error code or populating /sys/kernel/mm/hugepages incorrectly (have I mentioned unsupported arch often enough?). As Marko pointed out, it will only try smaller if ENOMEM. So its returning null. (fixing message now). Further down, EFAULT is "Bad address" so at least that's consistent. O_DIRECT errors I'd expect EINVAL. So I really except EFAULT meaning if trying a read to a memory address that is invalid (null). Could it be a log_t::attach has a ut_malloc_dontdump that fails and an assumption of buf allocated like what occurs in recv_sys_t::find_checkpoint ? mariadb-dump error ignored.

Daniel Black added a comment - 2023-02-07 10:31

Corrected large page error message.

Daniel Black added a comment - 2023-02-07 10:31 Corrected large page error message .

Otto Kekäläinen added a comment - 2023-02-10 16:54

The above was backported in https://salsa.debian.org/mariadb-team/mariadb-server/-/commit/7ac10dee3b961cf69b330de23df5f8554450783e to latest Debian build. However, now fails to start at all in https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-4&stamp=1676007600&raw=0

```
MariaDB Version 10.11.1-MariaDB-4

SSL connections supported
Using suites: main
Collecting tests...
Installing system database...
found 'core' (0/5)
Core generated by '/<<PKGBUILDDIR>>/builddir/sql/mariadbd'
Output from gdb follows. The first stack trace is from the failing thread.
The following stack traces are from all threads (so the failing one is
duplicated).
--------------------------
warning: Can't open file anon_inode:[io_uring] which was expanded to anon_inode:[io_uring] during file-backed mapping note processing
warning: Can't open file anon_inode:[io_uring] which was expanded to anon_inode:[io_uring] during file-backed mapping note processing
[New LWP 31431]
[New LWP 31427]
[New LWP 31429]
[New LWP 31428]
[New LWP 31430]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/hppa-linux-gnu/libthread_db.so.1".
Core was generated by `/<<PKGBUILDDIR>>/builddir/sql/mariadbd --no-defaults --dis'.
Program terminated with signal SIGABRT, Aborted.
#0 0x43469c84 in my_register_filename (fd=1137958840, FileName=0x6 <error: Cannot access memory at address 0x6>, type_of_file=3646115528, error_message_number=<optimized out>, MyFlags=<optimized out>) at ./mysys/my_open.c:140
140 ./mysys/my_open.c: No such file or directory.
[Current thread is 1 (Thread 0xd9d34380 (LWP 31431))]
#0 0x43469c84 in my_register_filename (fd=1137958840, FileName=0x6 <error: Cannot access memory at address 0x6>, type_of_file=3646115528, error_message_number=<optimized out>, MyFlags=<optimized out>) at ./mysys/my_open.c:140
Backtrace stopped: Cannot access memory at address 0x7ab3
```

The same upload also had other patches, so what we are seeing might be due to something else as well.

Thus I re-opened https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006529 and I also see that https://jira.mariadb.org/browse/MDEV-30572 remains open.

Otto Kekäläinen added a comment - 2023-02-10 16:54 The above was backported in https://salsa.debian.org/mariadb-team/mariadb-server/-/commit/7ac10dee3b961cf69b330de23df5f8554450783e to latest Debian build. However, now fails to start at all in https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-4&stamp=1676007600&raw=0 ``` MariaDB Version 10.11.1-MariaDB-4 SSL connections supported Using suites: main Collecting tests... Installing system database... found 'core' (0/5) Core generated by '/<<PKGBUILDDIR>>/builddir/sql/mariadbd' Output from gdb follows. The first stack trace is from the failing thread. The following stack traces are from all threads (so the failing one is duplicated). -------------------------- warning: Can't open file anon_inode: [io_uring] which was expanded to anon_inode: [io_uring] during file-backed mapping note processing warning: Can't open file anon_inode: [io_uring] which was expanded to anon_inode: [io_uring] during file-backed mapping note processing [New LWP 31431] [New LWP 31427] [New LWP 31429] [New LWP 31428] [New LWP 31430] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/hppa-linux-gnu/libthread_db.so.1". Core was generated by `/<<PKGBUILDDIR>>/builddir/sql/mariadbd --no-defaults --dis'. Program terminated with signal SIGABRT, Aborted. #0 0x43469c84 in my_register_filename (fd=1137958840, FileName=0x6 <error: Cannot access memory at address 0x6>, type_of_file=3646115528, error_message_number=<optimized out>, MyFlags=<optimized out>) at ./mysys/my_open.c:140 140 ./mysys/my_open.c: No such file or directory. [Current thread is 1 (Thread 0xd9d34380 (LWP 31431))] #0 0x43469c84 in my_register_filename (fd=1137958840, FileName=0x6 <error: Cannot access memory at address 0x6>, type_of_file=3646115528, error_message_number=<optimized out>, MyFlags=<optimized out>) at ./mysys/my_open.c:140 Backtrace stopped: Cannot access memory at address 0x7ab3 ``` The same upload also had other patches, so what we are seeing might be due to something else as well. Thus I re-opened https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006529 and I also see that https://jira.mariadb.org/browse/MDEV-30572 remains open.

Daniel Black added a comment - 2023-02-13 03:39

Putting every hppa issue in this one jira issue isn't helpful. Above stack track isn't complete. my_register_filename with just a large filedescriptor seems implausible as is the FileName address, and type_of_file (enum up to 7), though it would segfault on line 140 if this did occur. Without a full stack track I couldn't guess why this occured.

The only things to be considered here that I see are:

should my_large_malloc fall back to a conventional malloc on a mmap == error EINVAL (I'm currently not convinced)
does InnoDB correctly handle my_large_malloc allocation failures.

Daniel Black added a comment - 2023-02-13 03:39 Putting every hppa issue in this one jira issue isn't helpful. Above stack track isn't complete. my_register_filename with just a large filedescriptor seems implausible as is the FileName address, and type_of_file (enum up to 7), though it would segfault on line 140 if this did occur. Without a full stack track I couldn't guess why this occured. The only things to be considered here that I see are: should my_large_malloc fall back to a conventional malloc on a mmap == error EINVAL (I'm currently not convinced) does InnoDB correctly handle my_large_malloc allocation failures.

Marko Mäkelä added a comment - 2023-02-13 07:02

danblack, I think that my_large_malloc() needs to fall back to the aligned_malloc() wrapper that is defined in include/aligned.h. InnoDB certainly assumes that it gets aligned memory.

Marko Mäkelä added a comment - 2023-02-13 07:02 danblack , I think that my_large_malloc() needs to fall back to the aligned_malloc() wrapper that is defined in include/aligned.h . InnoDB certainly assumes that it gets aligned memory.

Otto Kekäläinen added a comment - 2023-02-22 07:22

Helge Deller reported on the second issue:

mariadb fails on the hppa architecture, because there is a kernel bug
(on parisc and probably other architectures) in the io_uring syscall.
This is worked on upstream, e.g. this mail thread:
https://lore.kernel.org/io-uring/507c7873-8888-dbcb-c512-4659af486848@bell.net/T/#t
We hope to get the kernel fixed in upcoming versions.

Otto Kekäläinen added a comment - 2023-02-22 07:22 Helge Deller reported on the second issue: mariadb fails on the hppa architecture, because there is a kernel bug (on parisc and probably other architectures) in the io_uring syscall. This is worked on upstream, e.g. this mail thread: https://lore.kernel.org/io-uring/507c7873-8888-dbcb-c512-4659af486848@bell.net/T/#t We hope to get the kernel fixed in upcoming versions.

Elena Stepanova made changes - 2023-03-27 23:49

Fix Version/s		10.11 [ 27614 ]
Affects Version/s		10.11 [ 27614 ]

Daniel Black made changes - 2023-09-19 03:58

Fix Version/s		N/A [ 14700 ]
Fix Version/s	10.11 [ 27614 ]
Resolution		Not a Bug [ 6 ]
Status	Open [ 1 ]	Closed [ 6 ]

Otto Kekäläinen added a comment - 2023-10-19 04:36

This is passing in latest https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.5-3&stamp=1697355657&raw=0

main.large_pages 'innodb'                [ pass ]    104

Otto Kekäläinen added a comment - 2023-10-19 04:36 This is passing in latest https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.5-3&stamp=1697355657&raw=0 main.large_pages 'innodb' [ pass ] 104

MariaDB Server

main.large_pages 'innodb' fails on architecture hppa: InnoDB: Operating system error number 14 in a file operation

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration