Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30572

main.large_pages 'innodb' fails on architecture hppa: InnoDB: Operating system error number 14 in a file operation

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Not a Bug
    • 10.11
    • N/A
    • Server
    • None

    Description

      The official Debian builds of MariaDB 1:10.11.1-2 failed on Debian builders arch hppa with at https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-2&stamp=1675231483&raw=0 with:

      main.large_pages 'innodb'                w2 [ fail ]
              Test ended at 2023-02-01 00:35:17
      CURRENT_TEST: main.large_pages
      Failed to start mysqld.1
      mysqltest failed but provided no output
       - found 'core' (0/5)
      Trying 'dbx' to get a backtrace
      Trying 'lldb' to get a backtrace from coredump /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/mysqld.1/data/core
      Compressed file /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/mysqld.1/data/core
       - saving '/<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/' to '/<<PKGBUILDDIR>>/builddir/mysql-test/var/log/main.large_pages-innodb/'
      Retrying test main.large_pages, attempt(2/3)...
      ***Warnings generated in error logs during shutdown after running tests: main.large_pages
      2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 8388608 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
      2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 6291456 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
      2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 6291456 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
      2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 4194304 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
      2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 4194304 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
      2023-02-01  0:35:16 0 [Warning] InnoDB: Retry attempts for reading partial data failed.
      2023-02-01  0:35:16 0 [ERROR] InnoDB: Operating system error number 14 in a file operation.
      2023-02-01  0:35:16 0 [ERROR] InnoDB: Error number 14 means 'Bad address'
      2023-02-01  0:35:16 0 [ERROR] InnoDB: File (unknown): 'read' returned OS error 214. Cannot continue operation
      Attempting backtrace. You can use the following information to find out
      

      The only other recorded case of OS error 14 was in MDEV-12039.

      This and other hppa issues tracked downstream in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006529

      Attachments

        Issue Links

          Activity

            otto Otto Kekäläinen created issue -
            otto Otto Kekäläinen made changes -
            Field Original Value New Value
            otto Otto Kekäläinen made changes -
            Description The official Debian builds of MariaDB 1:10.11.1-2 failed on Debian builders arch hppa with at https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-2&stamp=1675231483&raw=0 with:

            {noformat}
            main.large_pages 'innodb'                w2 [ fail ]
                    Test ended at 2023-02-01 00:35:17
            CURRENT_TEST: main.large_pages
            Failed to start mysqld.1
            mysqltest failed but provided no output
             - found 'core' (0/5)
            Trying 'dbx' to get a backtrace
            Trying 'lldb' to get a backtrace from coredump /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/mysqld.1/data/core
            Compressed file /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/mysqld.1/data/core
             - saving '/<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/' to '/<<PKGBUILDDIR>>/builddir/mysql-test/var/log/main.large_pages-innodb/'
            Retrying test main.large_pages, attempt(2/3)...
            ***Warnings generated in error logs during shutdown after running tests: main.large_pages
            2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 8388608 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
            2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 6291456 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
            2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 6291456 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
            2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 4194304 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
            2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 4194304 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
            2023-02-01  0:35:16 0 [Warning] InnoDB: Retry attempts for reading partial data failed.
            2023-02-01  0:35:16 0 [ERROR] InnoDB: Operating system error number 14 in a file operation.
            2023-02-01  0:35:16 0 [ERROR] InnoDB: Error number 14 means 'Bad address'
            2023-02-01  0:35:16 0 [ERROR] InnoDB: File (unknown): 'read' returned OS error 214. Cannot continue operation
            Attempting backtrace. You can use the following information to find out
            {noformat}

            The only other recorded case of OS error 14 was in MDEV-12039.
            The official Debian builds of MariaDB 1:10.11.1-2 failed on Debian builders arch hppa with at https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-2&stamp=1675231483&raw=0 with:

            {noformat}
            main.large_pages 'innodb'                w2 [ fail ]
                    Test ended at 2023-02-01 00:35:17
            CURRENT_TEST: main.large_pages
            Failed to start mysqld.1
            mysqltest failed but provided no output
             - found 'core' (0/5)
            Trying 'dbx' to get a backtrace
            Trying 'lldb' to get a backtrace from coredump /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/mysqld.1/data/core
            Compressed file /<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/mysqld.1/data/core
             - saving '/<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.large_pages-innodb/' to '/<<PKGBUILDDIR>>/builddir/mysql-test/var/log/main.large_pages-innodb/'
            Retrying test main.large_pages, attempt(2/3)...
            ***Warnings generated in error logs during shutdown after running tests: main.large_pages
            2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 8388608 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
            2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 6291456 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
            2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 6291456 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
            2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 4194304 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
            2023-02-01  0:35:16 0 [Warning] mariadbd: Couldn't allocate 4194304 bytes (Large/HugeTLB memory page size 2097152); errno 22; continuing to smaller size
            2023-02-01  0:35:16 0 [Warning] InnoDB: Retry attempts for reading partial data failed.
            2023-02-01  0:35:16 0 [ERROR] InnoDB: Operating system error number 14 in a file operation.
            2023-02-01  0:35:16 0 [ERROR] InnoDB: Error number 14 means 'Bad address'
            2023-02-01  0:35:16 0 [ERROR] InnoDB: File (unknown): 'read' returned OS error 214. Cannot continue operation
            Attempting backtrace. You can use the following information to find out
            {noformat}

            The only other recorded case of OS error 14 was in MDEV-12039.

            This and other hppa issues tracked downstream in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006529

            The error code 22 should be EINVAL. The warning message that danblack clarified some time ago is somewhat misleading, because a smaller size would only be attempted if errno==ENOMEM.

            errno=14 should be EFAULT. My first guess is that there is some fallback that allocates unaligned memory (instead of invoking memalign() or similar), and then an O_DIRECT file operation using an unaligned buffer will fail.

            marko Marko Mäkelä added a comment - The error code 22 should be EINVAL . The warning message that danblack clarified some time ago is somewhat misleading, because a smaller size would only be attempted if errno==ENOMEM . errno=14 should be EFAULT . My first guess is that there is some fallback that allocates unaligned memory (instead of invoking memalign() or similar), and then an O_DIRECT file operation using an unaligned buffer will fail.
            marko Marko Mäkelä made changes -
            Component/s Server [ 13907 ]
            Assignee Daniel Black [ danblack ]

            For the record, latest build https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-3&stamp=1675654714&raw=0 of mariadb 1:10.11.1-3 passed but testsuite failed with:

            main.mysqldump 'innodb'                  w25 [ fail ]
                    Test ended at 2023-02-06 03:25:25
            CURRENT_TEST: main.mysqldump
            --- /<<PKGBUILDDIR>>/mysql-test/main/mysqldump.result 2022-11-14 18:10:21.000000000 +0000
            +++ /<<PKGBUILDDIR>>/mysql-test/main/mysqldump.reject 2023-02-06 03:25:24.020036990 +0000
            @@ -5524,7 +5524,7 @@
             proc
             one
             DROP DATABASE bug25717383;
            -mariadb-dump: Got error: 2005: "Unknown server host 'unknownhost'" when trying to connect
            +mariadb-dump: Got error: 2002: "Can't connect to server on 'unknownhost'" when trying to connect
             mariadb-dump: Couldn't execute 'SHOW SLAVE STATUS': Server has gone away (2006)
             Usage: mariadb-dump [OPTIONS] database [tables]
             OR     mariadb-dump [OPTIONS] --databases DB1 [DB2 DB3...]
            

            otto Otto Kekäläinen added a comment - For the record, latest build https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-3&stamp=1675654714&raw=0 of mariadb 1:10.11.1-3 passed but testsuite failed with: main.mysqldump 'innodb'                  w25 [ fail ]         Test ended at 2023-02-06 03:25:25 CURRENT_TEST: main.mysqldump --- /<<PKGBUILDDIR>>/mysql-test/main/mysqldump.result 2022-11-14 18:10:21.000000000 +0000 +++ /<<PKGBUILDDIR>>/mysql-test/main/mysqldump.reject 2023-02-06 03:25:24.020036990 +0000 @@ -5524,7 +5524,7 @@  proc  one  DROP DATABASE bug25717383; -mariadb-dump: Got error: 2005: "Unknown server host 'unknownhost'" when trying to connect +mariadb-dump: Got error: 2002: "Can't connect to server on 'unknownhost'" when trying to connect  mariadb-dump: Couldn't execute 'SHOW SLAVE STATUS': Server has gone away (2006)  Usage: mariadb-dump [OPTIONS] database [tables]  OR     mariadb-dump [OPTIONS] --databases DB1 [DB2 DB3...]
            danblack Daniel Black added a comment -

            EINVAL in the memory allocation is an invalid length, so it look like hppa putting a wrong error code or populating /sys/kernel/mm/hugepages incorrectly (have I mentioned unsupported arch often enough?). As Marko pointed out, it will only try smaller if ENOMEM. So its returning null. (fixing message now).

            Further down, EFAULT is "Bad address" so at least that's consistent.

            O_DIRECT errors I'd expect EINVAL. So I really except EFAULT meaning if trying a read to a memory address that is invalid (null).

            Could it be a log_t::attach has a ut_malloc_dontdump that fails and an assumption of buf allocated like what occurs in recv_sys_t::find_checkpoint?

            mariadb-dump error ignored.

            danblack Daniel Black added a comment - EINVAL in the memory allocation is an invalid length, so it look like hppa putting a wrong error code or populating /sys/kernel/mm/hugepages incorrectly (have I mentioned unsupported arch often enough?). As Marko pointed out, it will only try smaller if ENOMEM. So its returning null. (fixing message now). Further down, EFAULT is "Bad address" so at least that's consistent. O_DIRECT errors I'd expect EINVAL. So I really except EFAULT meaning if trying a read to a memory address that is invalid (null). Could it be a log_t::attach has a ut_malloc_dontdump that fails and an assumption of buf allocated like what occurs in recv_sys_t::find_checkpoint ? mariadb-dump error ignored.
            danblack Daniel Black added a comment - Corrected large page error message .

            The above was backported in https://salsa.debian.org/mariadb-team/mariadb-server/-/commit/7ac10dee3b961cf69b330de23df5f8554450783e to latest Debian build. However, now fails to start at all in https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-4&stamp=1676007600&raw=0

            ```
            MariaDB Version 10.11.1-MariaDB-4

            • SSL connections supported
              Using suites: main
              Collecting tests...
              Installing system database...
            • found 'core' (0/5)
              Core generated by '/<<PKGBUILDDIR>>/builddir/sql/mariadbd'
              Output from gdb follows. The first stack trace is from the failing thread.
              The following stack traces are from all threads (so the failing one is
              duplicated).
              --------------------------
              warning: Can't open file anon_inode:[io_uring] which was expanded to anon_inode:[io_uring] during file-backed mapping note processing
              warning: Can't open file anon_inode:[io_uring] which was expanded to anon_inode:[io_uring] during file-backed mapping note processing
              [New LWP 31431]
              [New LWP 31427]
              [New LWP 31429]
              [New LWP 31428]
              [New LWP 31430]
              [Thread debugging using libthread_db enabled]
              Using host libthread_db library "/lib/hppa-linux-gnu/libthread_db.so.1".
              Core was generated by `/<<PKGBUILDDIR>>/builddir/sql/mariadbd --no-defaults --dis'.
              Program terminated with signal SIGABRT, Aborted.
              #0 0x43469c84 in my_register_filename (fd=1137958840, FileName=0x6 <error: Cannot access memory at address 0x6>, type_of_file=3646115528, error_message_number=<optimized out>, MyFlags=<optimized out>) at ./mysys/my_open.c:140
              140 ./mysys/my_open.c: No such file or directory.
              [Current thread is 1 (Thread 0xd9d34380 (LWP 31431))]
              #0 0x43469c84 in my_register_filename (fd=1137958840, FileName=0x6 <error: Cannot access memory at address 0x6>, type_of_file=3646115528, error_message_number=<optimized out>, MyFlags=<optimized out>) at ./mysys/my_open.c:140
              Backtrace stopped: Cannot access memory at address 0x7ab3
              ```

            The same upload also had other patches, so what we are seeing might be due to something else as well.

            Thus I re-opened https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006529 and I also see that https://jira.mariadb.org/browse/MDEV-30572 remains open.

            otto Otto Kekäläinen added a comment - The above was backported in https://salsa.debian.org/mariadb-team/mariadb-server/-/commit/7ac10dee3b961cf69b330de23df5f8554450783e to latest Debian build. However, now fails to start at all in https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.1-4&stamp=1676007600&raw=0 ``` MariaDB Version 10.11.1-MariaDB-4 SSL connections supported Using suites: main Collecting tests... Installing system database... found 'core' (0/5) Core generated by '/<<PKGBUILDDIR>>/builddir/sql/mariadbd' Output from gdb follows. The first stack trace is from the failing thread. The following stack traces are from all threads (so the failing one is duplicated). -------------------------- warning: Can't open file anon_inode: [io_uring] which was expanded to anon_inode: [io_uring] during file-backed mapping note processing warning: Can't open file anon_inode: [io_uring] which was expanded to anon_inode: [io_uring] during file-backed mapping note processing [New LWP 31431] [New LWP 31427] [New LWP 31429] [New LWP 31428] [New LWP 31430] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/hppa-linux-gnu/libthread_db.so.1". Core was generated by `/<<PKGBUILDDIR>>/builddir/sql/mariadbd --no-defaults --dis'. Program terminated with signal SIGABRT, Aborted. #0 0x43469c84 in my_register_filename (fd=1137958840, FileName=0x6 <error: Cannot access memory at address 0x6>, type_of_file=3646115528, error_message_number=<optimized out>, MyFlags=<optimized out>) at ./mysys/my_open.c:140 140 ./mysys/my_open.c: No such file or directory. [Current thread is 1 (Thread 0xd9d34380 (LWP 31431))] #0 0x43469c84 in my_register_filename (fd=1137958840, FileName=0x6 <error: Cannot access memory at address 0x6>, type_of_file=3646115528, error_message_number=<optimized out>, MyFlags=<optimized out>) at ./mysys/my_open.c:140 Backtrace stopped: Cannot access memory at address 0x7ab3 ``` The same upload also had other patches, so what we are seeing might be due to something else as well. Thus I re-opened https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006529 and I also see that https://jira.mariadb.org/browse/MDEV-30572 remains open.
            danblack Daniel Black added a comment -

            Putting every hppa issue in this one jira issue isn't helpful. Above stack track isn't complete. my_register_filename with just a large filedescriptor seems implausible as is the FileName address, and type_of_file (enum up to 7), though it would segfault on line 140 if this did occur. Without a full stack track I couldn't guess why this occured.

            The only things to be considered here that I see are:

            • should my_large_malloc fall back to a conventional malloc on a mmap == error EINVAL (I'm currently not convinced)
            • does InnoDB correctly handle my_large_malloc allocation failures.
            danblack Daniel Black added a comment - Putting every hppa issue in this one jira issue isn't helpful. Above stack track isn't complete. my_register_filename with just a large filedescriptor seems implausible as is the FileName address, and type_of_file (enum up to 7), though it would segfault on line 140 if this did occur. Without a full stack track I couldn't guess why this occured. The only things to be considered here that I see are: should my_large_malloc fall back to a conventional malloc on a mmap == error EINVAL (I'm currently not convinced) does InnoDB correctly handle my_large_malloc allocation failures.

            danblack, I think that my_large_malloc() needs to fall back to the aligned_malloc() wrapper that is defined in include/aligned.h. InnoDB certainly assumes that it gets aligned memory.

            marko Marko Mäkelä added a comment - danblack , I think that my_large_malloc() needs to fall back to the aligned_malloc() wrapper that is defined in include/aligned.h . InnoDB certainly assumes that it gets aligned memory.

            Helge Deller reported on the second issue:

            mariadb fails on the hppa architecture, because there is a kernel bug
            (on parisc and probably other architectures) in the io_uring syscall.
            This is worked on upstream, e.g. this mail thread:
            https://lore.kernel.org/io-uring/507c7873-8888-dbcb-c512-4659af486848@bell.net/T/#t
            We hope to get the kernel fixed in upcoming versions.

            otto Otto Kekäläinen added a comment - Helge Deller reported on the second issue: mariadb fails on the hppa architecture, because there is a kernel bug (on parisc and probably other architectures) in the io_uring syscall. This is worked on upstream, e.g. this mail thread: https://lore.kernel.org/io-uring/507c7873-8888-dbcb-c512-4659af486848@bell.net/T/#t We hope to get the kernel fixed in upcoming versions.
            elenst Elena Stepanova made changes -
            Fix Version/s 10.11 [ 27614 ]
            Affects Version/s 10.11 [ 27614 ]
            danblack Daniel Black made changes -
            Fix Version/s N/A [ 14700 ]
            Fix Version/s 10.11 [ 27614 ]
            Resolution Not a Bug [ 6 ]
            Status Open [ 1 ] Closed [ 6 ]
            otto Otto Kekäläinen added a comment - This is passing in latest https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=hppa&ver=1%3A10.11.5-3&stamp=1697355657&raw=0 main.large_pages 'innodb' [ pass ] 104

            People

              danblack Daniel Black
              otto Otto Kekäläinen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.