[MDEV-30572] main.large_pages 'innodb' fails on architecture hppa: InnoDB: Operating system error number 14 in a file operation Created: 2023-02-05  Updated: 2023-10-19  Resolved: 2023-09-19

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.11
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Otto Kekäläinen Assignee: Daniel Black
Resolution: Not a Bug Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-12039 Out of memory with page compression Closed

 Description   

The official Debian builds of MariaDB 1:10.11.1-2 failed on Debian builders arch hppa with at https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-2&stamp=1675231483&raw=0 with:

main.large_pages 'innodb'                w2 [ fail ]
        Test ended at 2023-02-01 00:35:17
CURRENT_TEST: main.large_pages
Failed to start mysqld.1
mysqltest failed but provided no output
 - found 'core' (0/5)
Trying 'dbx' to get a backtrace
Trying 'lldb' to get a backtrace from coredump /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/mysqld.1/data/core
Compressed file /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/mysqld.1/data/core
 - saving '/<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/' to '/<<PKGBUILDDIR>>/builddir/mysql-test/var/log/main.large_pages-innodb/'
Retrying test main.large_pages, attempt(2/3)...
***Warnings generated in error logs during shutdown after running tests: main.large_pages
2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 8388608 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 6291456 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 6291456 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 4194304 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 4194304 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
2023-02-01  0:35:16 0 [Warning] InnoDB: Retry attempts for reading partial data failed.
2023-02-01  0:35:16 0 [ERROR] InnoDB: Operating system error number 14 in a file operation.
2023-02-01  0:35:16 0 [ERROR] InnoDB: Error number 14 means 'Bad address'
2023-02-01  0:35:16 0 [ERROR] InnoDB: File (unknown): 'read' returned OS error 214. Cannot continue operation
Attempting backtrace. You can use the following information to find out

The only other recorded case of OS error 14 was in MDEV-12039.

This and other hppa issues tracked downstream in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006529



 Comments   
Comment by Marko Mäkelä [ 2023-02-06 ]

The error code 22 should be EINVAL. The warning message that danblack clarified some time ago is somewhat misleading, because a smaller size would only be attempted if errno==ENOMEM.

errno=14 should be EFAULT. My first guess is that there is some fallback that allocates unaligned memory (instead of invoking memalign() or similar), and then an O_DIRECT file operation using an unaligned buffer will fail.

Comment by Otto Kekäläinen [ 2023-02-06 ]

For the record, latest build https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-3&stamp=1675654714&raw=0 of mariadb 1:10.11.1-3 passed but testsuite failed with:

main.mysqldump 'innodb'                  w25 [ fail ]
        Test ended at 2023-02-06 03:25:25
CURRENT_TEST: main.mysqldump
--- /<<PKGBUILDDIR>>/mysql-test/main/mysqldump.result 2022-11-14 18:10:21.000000000 +0000
+++ /<<PKGBUILDDIR>>/mysql-test/main/mysqldump.reject 2023-02-06 03:25:24.020036990 +0000
@@ -5524,7 +5524,7 @@
 proc
 one
 DROP DATABASE bug25717383;
-mariadb-dump: Got error: 2005: "Unknown server host 'unknownhost'" when trying to connect
+mariadb-dump: Got error: 2002: "Can't connect to server on 'unknownhost'" when trying to connect
 mariadb-dump: Couldn't execute 'SHOW SLAVE STATUS': Server has gone away (2006)
 Usage: mariadb-dump [OPTIONS] database [tables]
 OR     mariadb-dump [OPTIONS] --databases DB1 [DB2 DB3...]

Comment by Daniel Black [ 2023-02-06 ]

EINVAL in the memory allocation is an invalid length, so it look like hppa putting a wrong error code or populating /sys/kernel/mm/hugepages incorrectly (have I mentioned unsupported arch often enough?). As Marko pointed out, it will only try smaller if ENOMEM. So its returning null. (fixing message now).

Further down, EFAULT is "Bad address" so at least that's consistent.

O_DIRECT errors I'd expect EINVAL. So I really except EFAULT meaning if trying a read to a memory address that is invalid (null).

Could it be a log_t::attach has a ut_malloc_dontdump that fails and an assumption of buf allocated like what occurs in recv_sys_t::find_checkpoint?

mariadb-dump error ignored.

Comment by Daniel Black [ 2023-02-07 ]

Corrected large page error message.

Comment by Otto Kekäläinen [ 2023-02-10 ]

The above was backported in https://salsa.debian.org/mariadb-team/mariadb-server/-/commit/7ac10dee3b961cf69b330de23df5f8554450783e to latest Debian build. However, now fails to start at all in https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-4&stamp=1676007600&raw=0

```
MariaDB Version 10.11.1-MariaDB-4

  • SSL connections supported
    Using suites: main
    Collecting tests...
    Installing system database...
  • found 'core' (0/5)
    Core generated by '/<<PKGBUILDDIR>>/builddir/sql/mariadbd'
    Output from gdb follows. The first stack trace is from the failing thread.
    The following stack traces are from all threads (so the failing one is
    duplicated).
    --------------------------
    warning: Can't open file anon_inode:[io_uring] which was expanded to anon_inode:[io_uring] during file-backed mapping note processing
    warning: Can't open file anon_inode:[io_uring] which was expanded to anon_inode:[io_uring] during file-backed mapping note processing
    [New LWP 31431]
    [New LWP 31427]
    [New LWP 31429]
    [New LWP 31428]
    [New LWP 31430]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/hppa-linux-gnu/libthread_db.so.1".
    Core was generated by `/<<PKGBUILDDIR>>/builddir/sql/mariadbd --no-defaults --dis'.
    Program terminated with signal SIGABRT, Aborted.
    #0 0x43469c84 in my_register_filename (fd=1137958840, FileName=0x6 <error: Cannot access memory at address 0x6>, type_of_file=3646115528, error_message_number=<optimized out>, MyFlags=<optimized out>) at ./mysys/my_open.c:140
    140 ./mysys/my_open.c: No such file or directory.
    [Current thread is 1 (Thread 0xd9d34380 (LWP 31431))]
    #0 0x43469c84 in my_register_filename (fd=1137958840, FileName=0x6 <error: Cannot access memory at address 0x6>, type_of_file=3646115528, error_message_number=<optimized out>, MyFlags=<optimized out>) at ./mysys/my_open.c:140
    Backtrace stopped: Cannot access memory at address 0x7ab3
    ```

The same upload also had other patches, so what we are seeing might be due to something else as well.

Thus I re-opened https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006529 and I also see that https://jira.mariadb.org/browse/MDEV-30572 remains open.

Comment by Daniel Black [ 2023-02-13 ]

Putting every hppa issue in this one jira issue isn't helpful. Above stack track isn't complete. my_register_filename with just a large filedescriptor seems implausible as is the FileName address, and type_of_file (enum up to 7), though it would segfault on line 140 if this did occur. Without a full stack track I couldn't guess why this occured.

The only things to be considered here that I see are:

  • should my_large_malloc fall back to a conventional malloc on a mmap == error EINVAL (I'm currently not convinced)
  • does InnoDB correctly handle my_large_malloc allocation failures.
Comment by Marko Mäkelä [ 2023-02-13 ]

danblack, I think that my_large_malloc() needs to fall back to the aligned_malloc() wrapper that is defined in include/aligned.h. InnoDB certainly assumes that it gets aligned memory.

Comment by Otto Kekäläinen [ 2023-02-22 ]

Helge Deller reported on the second issue:

mariadb fails on the hppa architecture, because there is a kernel bug
(on parisc and probably other architectures) in the io_uring syscall.
This is worked on upstream, e.g. this mail thread:
https://lore.kernel.org/io-uring/507c7873-8888-dbcb-c512-4659af486848@bell.net/T/#t
We hope to get the kernel fixed in upcoming versions.

Comment by Otto Kekäläinen [ 2023-10-19 ]

This is passing in latest https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.5-3&stamp=1697355657&raw=0

main.large_pages 'innodb'                [ pass ]    104

Generated at Thu Feb 08 10:17:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.