Details

    Description

      Server has following configuration:

      innodb_flush_method			= O_DIRECT
      

      More intensively it's used ("intensive" means 20-30 select queries in processlist, 1-2 insert queries updating multiple rows), more frequent following situation is:

      2020-01-28 16:40:58 0 [ERROR] InnoDB: Operating system error number 22 in a file operation.
      2020-01-28 16:40:58 0 [ERROR] InnoDB: Error number 22 means 'Invalid argument'
      2020-01-28 16:40:58 0 [Note] InnoDB: Some operating system error numbers are described at https://mariadb.com/kb/en/library/operating-system-error-codes/
      2020-01-28 16:40:58 0 [ERROR] InnoDB: File ./user_167583/email.ibd: 'Linux aio' returned OS error 222. Cannot continue operation
      200128 16:40:58 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
      

      and corresponding system message:

      [Tue Jan 28 16:40:58 2020] mysqld[48573]: segfault at 0 ip 000055a53fc5a5a1 sp 00007e93a6eb1e60 error 6 in mysqld[55a53f426000+8ac000]
      [Tue Jan 28 16:40:58 2020] Code: c7 04 24 00 00 00 00 48 89 ea 4c 89 ee 44 89 f7 e8 14 cf 7c ff 49 89 c7 48 39 c5 0f 84 f8 00 00 00 e8 63 1d 00 00 41 8b 0c 24 <89> 08 85 c9 74 39 49 83 ff ff 0f 84 9f 00 00 00 f6 c3 06 75 2a 4d
      

      This happens once in several hours. However, this is an issue as node is a part of galera cluster and crash renders into snapshot transfer to failed node.

      Changing configuration to

      #innodb_flush_method			= O_DIRECT
      innodb_use_native_aio = 0
      

      "solves" stability issue with setting flush method to default fsync() one.

      Expected behavior would be stable mariadb operation with O_DIRECT.

      Attachments

        Issue Links

          Activity

            euglorg Eugene created issue -
            euglorg Eugene made changes -
            Field Original Value New Value
            Description Server has following configuration:

            {code:java}
            innodb_flush_method = O_DIRECT
            {code}

            More intensively it's used ("intensive" means 20-30 select queries in processlist, 1-2 insert queries updating multiple rows), more frequent following situation is:

            {code:java}
            2020-01-28 16:40:58 0 [ERROR] InnoDB: Operating system error number 22 in a file operation.
            2020-01-28 16:40:58 0 [ERROR] InnoDB: Error number 22 means 'Invalid argument'
            2020-01-28 16:40:58 0 [Note] InnoDB: Some operating system error numbers are described at https://mariadb.com/kb/en/library/operating-system-error-codes/
            2020-01-28 16:40:58 0 [ERROR] InnoDB: File ./user_167583/email.ibd: 'Linux aio' returned OS error 222. Cannot continue operation
            200128 16:40:58 [ERROR] mysqld got signal 6 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.
            {code}

            This happens once in several hours. However, this is an issue as node is a part of galera cluster and crash renders into snapshot transfer to failed node.

            Changing configuration to
            #innodb_flush_method = O_DIRECT
            innodb_use_native_aio = 0

            "solves" stability issue with setting flush method to default _fsync()_ one.

            Expected behavior would be stable mariadb operation with O_DIRECT.
            Server has following configuration:

            {code:java}
            innodb_flush_method = O_DIRECT
            {code}

            More intensively it's used ("intensive" means 20-30 select queries in processlist, 1-2 insert queries updating multiple rows), more frequent following situation is:

            {code:java}
            2020-01-28 16:40:58 0 [ERROR] InnoDB: Operating system error number 22 in a file operation.
            2020-01-28 16:40:58 0 [ERROR] InnoDB: Error number 22 means 'Invalid argument'
            2020-01-28 16:40:58 0 [Note] InnoDB: Some operating system error numbers are described at https://mariadb.com/kb/en/library/operating-system-error-codes/
            2020-01-28 16:40:58 0 [ERROR] InnoDB: File ./user_167583/email.ibd: 'Linux aio' returned OS error 222. Cannot continue operation
            200128 16:40:58 [ERROR] mysqld got signal 6 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.
            {code}

            and corresponding system message:

            {code:java}
            [Tue Jan 28 16:40:58 2020] mysqld[48573]: segfault at 0 ip 000055a53fc5a5a1 sp 00007e93a6eb1e60 error 6 in mysqld[55a53f426000+8ac000]
            [Tue Jan 28 16:40:58 2020] Code: c7 04 24 00 00 00 00 48 89 ea 4c 89 ee 44 89 f7 e8 14 cf 7c ff 49 89 c7 48 39 c5 0f 84 f8 00 00 00 e8 63 1d 00 00 41 8b 0c 24 <89> 08 85 c9 74 39 49 83 ff ff 0f 84 9f 00 00 00 f6 c3 06 75 2a 4d
            {code}


            This happens once in several hours. However, this is an issue as node is a part of galera cluster and crash renders into snapshot transfer to failed node.

            Changing configuration to
            #innodb_flush_method = O_DIRECT
            innodb_use_native_aio = 0

            "solves" stability issue with setting flush method to default _fsync()_ one.

            Expected behavior would be stable mariadb operation with O_DIRECT.
            euglorg Eugene made changes -
            Description Server has following configuration:

            {code:java}
            innodb_flush_method = O_DIRECT
            {code}

            More intensively it's used ("intensive" means 20-30 select queries in processlist, 1-2 insert queries updating multiple rows), more frequent following situation is:

            {code:java}
            2020-01-28 16:40:58 0 [ERROR] InnoDB: Operating system error number 22 in a file operation.
            2020-01-28 16:40:58 0 [ERROR] InnoDB: Error number 22 means 'Invalid argument'
            2020-01-28 16:40:58 0 [Note] InnoDB: Some operating system error numbers are described at https://mariadb.com/kb/en/library/operating-system-error-codes/
            2020-01-28 16:40:58 0 [ERROR] InnoDB: File ./user_167583/email.ibd: 'Linux aio' returned OS error 222. Cannot continue operation
            200128 16:40:58 [ERROR] mysqld got signal 6 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.
            {code}

            and corresponding system message:

            {code:java}
            [Tue Jan 28 16:40:58 2020] mysqld[48573]: segfault at 0 ip 000055a53fc5a5a1 sp 00007e93a6eb1e60 error 6 in mysqld[55a53f426000+8ac000]
            [Tue Jan 28 16:40:58 2020] Code: c7 04 24 00 00 00 00 48 89 ea 4c 89 ee 44 89 f7 e8 14 cf 7c ff 49 89 c7 48 39 c5 0f 84 f8 00 00 00 e8 63 1d 00 00 41 8b 0c 24 <89> 08 85 c9 74 39 49 83 ff ff 0f 84 9f 00 00 00 f6 c3 06 75 2a 4d
            {code}


            This happens once in several hours. However, this is an issue as node is a part of galera cluster and crash renders into snapshot transfer to failed node.

            Changing configuration to
            #innodb_flush_method = O_DIRECT
            innodb_use_native_aio = 0

            "solves" stability issue with setting flush method to default _fsync()_ one.

            Expected behavior would be stable mariadb operation with O_DIRECT.
            Server has following configuration:

            {code:java}
            innodb_flush_method = O_DIRECT
            {code}

            More intensively it's used ("intensive" means 20-30 select queries in processlist, 1-2 insert queries updating multiple rows), more frequent following situation is:

            {code:java}
            2020-01-28 16:40:58 0 [ERROR] InnoDB: Operating system error number 22 in a file operation.
            2020-01-28 16:40:58 0 [ERROR] InnoDB: Error number 22 means 'Invalid argument'
            2020-01-28 16:40:58 0 [Note] InnoDB: Some operating system error numbers are described at https://mariadb.com/kb/en/library/operating-system-error-codes/
            2020-01-28 16:40:58 0 [ERROR] InnoDB: File ./user_167583/email.ibd: 'Linux aio' returned OS error 222. Cannot continue operation
            200128 16:40:58 [ERROR] mysqld got signal 6 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.
            {code}

            and corresponding system message:

            {code:java}
            [Tue Jan 28 16:40:58 2020] mysqld[48573]: segfault at 0 ip 000055a53fc5a5a1 sp 00007e93a6eb1e60 error 6 in mysqld[55a53f426000+8ac000]
            [Tue Jan 28 16:40:58 2020] Code: c7 04 24 00 00 00 00 48 89 ea 4c 89 ee 44 89 f7 e8 14 cf 7c ff 49 89 c7 48 39 c5 0f 84 f8 00 00 00 e8 63 1d 00 00 41 8b 0c 24 <89> 08 85 c9 74 39 49 83 ff ff 0f 84 9f 00 00 00 f6 c3 06 75 2a 4d
            {code}


            This happens once in several hours. However, this is an issue as node is a part of galera cluster and crash renders into snapshot transfer to failed node.

            Changing configuration to
            {code:java}
            #innodb_flush_method = O_DIRECT
            innodb_use_native_aio = 0
            {code}

            "solves" stability issue with setting flush method to default _fsync()_ one.

            Expected behavior would be stable mariadb operation with O_DIRECT.
            asun Andrew Sun added a comment -

            I have reproduced this crash on Linux 5.5.4, 2x Xeon X5670, XFS on Intel DC P4600 NVMe SSD with MariaDB 10.4.12.

            asun Andrew Sun added a comment - I have reproduced this crash on Linux 5.5.4, 2x Xeon X5670, XFS on Intel DC P4600 NVMe SSD with MariaDB 10.4.12.
            kevg Eugene Kosov (Inactive) made changes -
            Assignee Eugene Kosov [ kevg ]
            kevg Eugene Kosov (Inactive) added a comment - - edited

            Hi. We're aware of that bug but we weren't able to reproduce it. And because of that we can't fix it still. Can you help us with reproduction?

            kevg Eugene Kosov (Inactive) added a comment - - edited Hi. We're aware of that bug but we weren't able to reproduce it. And because of that we can't fix it still. Can you help us with reproduction?
            serg Sergei Golubchik made changes -
            Summary Linux aio returned OS erorr 222 Linux aio returned OS erorr 22
            serg Sergei Golubchik made changes -
            Labels innodb innodb need_feedback

            Please, provide us with such a data. Run this /sbin/blockdev --getss /dev/nvme0n1p2 on a disk device where InnoDB data is located.

            kevg Eugene Kosov (Inactive) added a comment - Please, provide us with such a data. Run this /sbin/blockdev --getss /dev/nvme0n1p2 on a disk device where InnoDB data is located.
            asun Andrew Sun added a comment -

            @kevg blockdev --getss /dev/nvme0n1p1 returns 4096

            asun Andrew Sun added a comment - @kevg blockdev --getss /dev/nvme0n1p1 returns 4096
            marko Marko Mäkelä made changes -
            Summary Linux aio returned OS erorr 22 Linux aio returned OS error 22
            kevg Eugene Kosov (Inactive) made changes -
            Labels innodb need_feedback innodb
            kevg Eugene Kosov (Inactive) made changes -
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            kevg Eugene Kosov (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            kevg Eugene Kosov (Inactive) made changes -
            Priority Major [ 3 ] Blocker [ 1 ]
            kevg Eugene Kosov (Inactive) made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]

            InnoDB was written with the assumption that block size is always 512. And this won't be fixed in GA versions. We know that ROW_FORMAT=COMPRESSED definitely doesn't work with block size of 4096.

            The fix in 10.2 is do disable O_DIRECT. In the best case it will be disabled partially, in the worst case it will be disabled completely.

            kevg Eugene Kosov (Inactive) added a comment - InnoDB was written with the assumption that block size is always 512. And this won't be fixed in GA versions. We know that ROW_FORMAT=COMPRESSED definitely doesn't work with block size of 4096. The fix in 10.2 is do disable O_DIRECT . In the best case it will be disabled partially, in the worst case it will be disabled completely.
            nicklamb Nick (Inactive) made changes -
            Assignee Eugene Kosov [ kevg ] Nick [ nicklamb ]

            kevg Does this mean customers should use a different innodb_flush_method ? If yes, do we have one that's recommended? Is this something that will be fixed eventually in a later release?

            nicklamb Nick (Inactive) added a comment - kevg Does this mean customers should use a different innodb_flush_method ? If yes, do we have one that's recommended? Is this something that will be fixed eventually in a later release?

            The simple solution of disabling O_DIRECT on page_compressed tables, or on ROW_FORMAT=COMPRESSED tables with a page size smaller than 4KiB, seems to work fine. I tested it on my SSD (512-byte block size) and an instrumentation patch by kevg that would complain about not-4KiB-aligned writes on O_DIRECT files.

            marko Marko Mäkelä added a comment - The simple solution of disabling O_DIRECT on page_compressed tables, or on ROW_FORMAT=COMPRESSED tables with a page size smaller than 4KiB, seems to work fine. I tested it on my SSD (512-byte block size) and an instrumentation patch by kevg that would complain about not-4KiB-aligned writes on O_DIRECT files.
            marko Marko Mäkelä made changes -
            Assignee Nick [ nicklamb ] Eugene Kosov [ kevg ]
            kevg Eugene Kosov (Inactive) made changes -
            Fix Version/s 10.2.35 [ 25022 ]
            Fix Version/s 10.3.26 [ 25021 ]
            Fix Version/s 10.4.16 [ 25020 ]
            Fix Version/s 10.5.7 [ 25019 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -

            As noted in MDEV-25121, this fix was incomplete. Unfortunately, we did not have access to hardware where the block size would be larger than 512 bytes.

            marko Marko Mäkelä added a comment - As noted in MDEV-25121 , this fix was incomplete. Unfortunately, we did not have access to hardware where the block size would be larger than 512 bytes.

            Also, there was no OS_DATA_FILE_NO_O_DIRECT on Windows, eventhough alignment, offset, and size requirements are well documented on Windows (for at least 20 years if not more)

            wlad Vladislav Vaintroub added a comment - Also, there was no OS_DATA_FILE_NO_O_DIRECT on Windows, eventhough alignment, offset, and size requirements are well documented on Windows (for at least 20 years if not more)
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 103446 ] MariaDB v4 [ 157253 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 137893

            People

              kevg Eugene Kosov (Inactive)
              euglorg Eugene
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.