Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34859

Failed to initialise non-blocking API

Details

    Description

      Running the tests on amd64 and arm64 we are only seeing this on arm64.

      Happens with both 10.9 and 11.4.

      worker[1]  - saving '/home/brad/ports/pobj/mariadb-10.9.8/build-aarch64/mysql-test/var/mysqld.1'
      worker[1] main.connect                             worker[1] [ fail ]
              Test ended at 2024-09-03 03:14:47
      worker[1] 
      CURRENT_TEST: main.connect
      safe_process[98524]: parent_pid: 51941
      safe_process[98524]: Started child 12834, terminated: 0
      mysqltest: At line 19: Failed to initialise non-blocking API
       
      The result from queries just before the failure was:
      SET global secure_auth=0;
      safe_process[98524]: Got signal 20, child_pid: 12834
      safe_process[98524]: Killing child: 12834
      safe_process[98524]: Child exit: 1
       
      main.connect                             [ fail ]
              Test ended at 2024-09-03 03:14:47
       
      CURRENT_TEST: main.connect
      safe_process[98524]: parent_pid: 51941
      safe_process[98524]: Started child 12834, terminated: 0
      mysqltest: At line 19: Failed to initialise non-blocking API
      

      Attachments

        Activity

          brad0 Brad Smith created issue -
          knielsen Kristian Nielsen made changes -
          Field Original Value New Value
          Assignee Kristian Nielsen [ knielsen ]

          Hi brad0, thanks for the report.

          I think I know what the problem is. The non-blocking API uses a co-routine implementation. It can use a native assembler backend, but that is not implemented for arm64. Or it can fallback to use ucontext, but ucontext is no longer supported in OpenBSD AFAIK. So the result is that the non-blocking API is not available on OpenBSD arm64 :-/

          It would actually be good to implement native arm64 support, that is much faster than ucontext even on platforms that support it.

          If I implement a patch that provides a native implementation on arm64, will you be able to test it? (I can supply either a custom build or a patch for you to compile yourself)?

          knielsen Kristian Nielsen added a comment - Hi brad0 , thanks for the report. I think I know what the problem is. The non-blocking API uses a co-routine implementation. It can use a native assembler backend, but that is not implemented for arm64. Or it can fallback to use ucontext, but ucontext is no longer supported in OpenBSD AFAIK. So the result is that the non-blocking API is not available on OpenBSD arm64 :-/ It would actually be good to implement native arm64 support, that is much faster than ucontext even on platforms that support it. If I implement a patch that provides a native implementation on arm64, will you be able to test it? (I can supply either a custom build or a patch for you to compile yourself)?
          brad0 Brad Smith added a comment - - edited

          Yes, very much so. A patch please.

          And to clarify OpenBSD has never supported ucontext.

          It sounds like we should be looking for this on other archs we support.

          brad0 Brad Smith added a comment - - edited Yes, very much so. A patch please. And to clarify OpenBSD has never supported ucontext. It sounds like we should be looking for this on other archs we support.
          knielsen Kristian Nielsen made changes -
          Fix Version/s 10.5 [ 23123 ]
          knielsen Kristian Nielsen made changes -
          Status Open [ 1 ] Confirmed [ 10101 ]
          knielsen Kristian Nielsen made changes -
          Status Confirmed [ 10101 ] In Progress [ 3 ]

          Aha, thanks for the clarification on ucontext. To my knowledge, there is no simple cpu-architecture-independent alternative to ucontext available, is there? If there is, this could be used as a fallback for more exotic architectures.

          The required code for a native implementation is 3 assembler functions in libmariadb/libmariadb/ma_context.c for saving and restoring the co-routine context. The implementation for amd64, for reference, is around 200 lines.

          I will try to come up with an arm64 implementation and supply a patch for testing.

          knielsen Kristian Nielsen added a comment - Aha, thanks for the clarification on ucontext. To my knowledge, there is no simple cpu-architecture-independent alternative to ucontext available, is there? If there is, this could be used as a fallback for more exotic architectures. The required code for a native implementation is 3 assembler functions in libmariadb/libmariadb/ma_context.c for saving and restoring the co-routine context. The implementation for amd64, for reference, is around 200 lines. I will try to come up with an arm64 implementation and supply a patch for testing.
          brad0 Brad Smith added a comment -

          Boost context which PHP has copied from for their co-routines support, which we ran into on both sides for sparc64; and support was written.

          brad0 Brad Smith added a comment - Boost context which PHP has copied from for their co-routines support, which we ran into on both sides for sparc64; and support was written.
          knielsen Kristian Nielsen made changes -

          brad0 , I implemented a patch for native arm64 / aarch64 support for the non-blocking client library, attached as mdev34859_aarch64_nonblock_library.diff

          Note that this is a patch for the client library, contained in the libmariadb/ sub-directory of the server sources.

          I did not have easy access to openbsd/arm64, tested on debian/arm64 where it seems to work. Just let me know if you have any problems testing on openbsd.

          And thanks Brad for the pointer to Boost context. That looks interesting, I will try to look into using that as a better portable fallback, ucontext is sub-optimal for several reasons. Might require some more work on the higher levels of libmariadb, as it appears currently it is built using C only, no C++. Hopefully that can be sorted out.

          knielsen Kristian Nielsen added a comment - brad0 , I implemented a patch for native arm64 / aarch64 support for the non-blocking client library, attached as mdev34859_aarch64_nonblock_library.diff Note that this is a patch for the client library, contained in the libmariadb/ sub-directory of the server sources. I did not have easy access to openbsd/arm64, tested on debian/arm64 where it seems to work. Just let me know if you have any problems testing on openbsd. And thanks Brad for the pointer to Boost context. That looks interesting, I will try to look into using that as a better portable fallback, ucontext is sub-optimal for several reasons. Might require some more work on the higher levels of libmariadb, as it appears currently it is built using C only, no C++. Hopefully that can be sorted out.
          knielsen Kristian Nielsen made changes -
          knielsen Kristian Nielsen made changes -
          Attachment async_api_boost_context.patch [ 74008 ]
          knielsen Kristian Nielsen added a comment - - edited

          I implemented a second patch for falling back to boost::context when available (and native implementation does not exist for the architecture). Attached as async_api_boost_context.patch .

          This could be useful for platforms that are not natively supported and where ucontext is not available; it should also be more efficient than ucontext.

          But boost::context is C++, and libmariadb is otherwise pure C. The patch introduces a dependency on C++ into the project, as well as a library dependency on boost. This seems not ideal for the commonly used platforms that have native non-blocking API support and do not need any boost dependency. Thus I don't think the patch as given is suitable for inclusion in libmariadb release. Maybe instead the use of boost::context could be optional, disabled by default and requiring specific cmake argument passed by user to enable.

          I have also pushed the two patches to github here (user knielsen and branch knielsen_async_api): https://github.com/knielsen/mariadb-connector-c/commits/knielsen_async_api/

          brad0: If you have access to other platforms than amd64, i386, arm64 which are missing ucontext but have boost::context available, it would be very valuable if you can test the second patch and see if it works to enable the non-blocking API there.

          knielsen Kristian Nielsen added a comment - - edited I implemented a second patch for falling back to boost::context when available (and native implementation does not exist for the architecture). Attached as async_api_boost_context.patch . This could be useful for platforms that are not natively supported and where ucontext is not available; it should also be more efficient than ucontext. But boost::context is C++, and libmariadb is otherwise pure C. The patch introduces a dependency on C++ into the project, as well as a library dependency on boost. This seems not ideal for the commonly used platforms that have native non-blocking API support and do not need any boost dependency. Thus I don't think the patch as given is suitable for inclusion in libmariadb release. Maybe instead the use of boost::context could be optional, disabled by default and requiring specific cmake argument passed by user to enable. I have also pushed the two patches to github here (user knielsen and branch knielsen_async_api): https://github.com/knielsen/mariadb-connector-c/commits/knielsen_async_api/ brad0 : If you have access to other platforms than amd64, i386, arm64 which are missing ucontext but have boost::context available, it would be very valuable if you can test the second patch and see if it works to enable the non-blocking API there.
          brad0 Brad Smith added a comment - - edited

          That would leave powerpc, powerpc64, mips64, riscv64 and sparc64.

          I'll see what I can do.

          I would expect the Boost fallback would be something that has to be explicitly enabled and optional by default. But provided as a fallback for a few of the other architectures in use.

          On another note, would you be interested in adding more native implementations if there was hardware to test on?

          brad0 Brad Smith added a comment - - edited That would leave powerpc, powerpc64, mips64, riscv64 and sparc64. I'll see what I can do. I would expect the Boost fallback would be something that has to be explicitly enabled and optional by default. But provided as a fallback for a few of the other architectures in use. On another note, would you be interested in adding more native implementations if there was hardware to test on?

          Thanks, Bran.

          Yes, I think I can add some more native implementations if I have hardware available to test on, especially now that I have it present in memory how it should be done. How many will also depend on how easily available I can find documentation on the ABI and assembler syntax of each platform.

          - Kristian.

          knielsen Kristian Nielsen added a comment - Thanks, Bran. Yes, I think I can add some more native implementations if I have hardware available to test on, especially now that I have it present in memory how it should be done. How many will also depend on how easily available I can find documentation on the ABI and assembler syntax of each platform. - Kristian.
          brad0 Brad Smith added a comment - - edited

          I have built and run the native diff through a full test suite run. It appears to be working Ok.

          I have modified the diff to force testing on AArch64 and it finished the whole test suite Ok.

          I'll see about testing on another arch.

          On another note AArch64 support wasn't added until GCC 4.8. The diff should be modified to check for >= 5 || 4.8 || Clang.

          brad0 Brad Smith added a comment - - edited I have built and run the native diff through a full test suite run. It appears to be working Ok. I have modified the diff to force testing on AArch64 and it finished the whole test suite Ok. I'll see about testing on another arch. On another note AArch64 support wasn't added until GCC 4.8. The diff should be modified to check for >= 5 || 4.8 || Clang.

          Thanks, Brad!

          The GCC version test is for DWARF support, not for AArch64 support. It sounds like the DWARF support was there for AArch64 from the beginning, so the version check in that place in the code can probably just be removed.

          knielsen Kristian Nielsen added a comment - Thanks, Brad! The GCC version test is for DWARF support, not for AArch64 support. It sounds like the DWARF support was there for AArch64 from the beginning, so the version check in that place in the code can probably just be removed.
          knielsen Kristian Nielsen made changes -
          Attachment async_api_boost_context_v2.patch [ 74078 ]

          brad0, I attached a version 2 of the boost::context patch as async_api_boost_context_v2.patch
          This patch is functionally identical, but it is now enabled explicitly when running cmake: -DWITH_BOOST_CONTEXT=ON

          Note that this is when building the libmariadb directly. When building the server, need to pass -DCONC_WITH_BOOST_CONTEXT=ON to the server-level cmake, this will then be interpreted by the recursive libmariadb build as -DWITH_BOOST_CONTEXT. (Once this gets pushed in libmariadb I will add similar magic to the server to make -DWITH_BOOST_CONTEXT work directly also there).

          I talked to Georg Richter last week, and he seemed positive towards the idea, I have created a pull request for the two patches: https://github.com/mariadb-corporation/mariadb-connector-c/pull/257

          knielsen Kristian Nielsen added a comment - brad0 , I attached a version 2 of the boost::context patch as async_api_boost_context_v2.patch This patch is functionally identical, but it is now enabled explicitly when running cmake: -DWITH_BOOST_CONTEXT=ON Note that this is when building the libmariadb directly. When building the server, need to pass -DCONC_WITH_BOOST_CONTEXT=ON to the server-level cmake, this will then be interpreted by the recursive libmariadb build as -DWITH_BOOST_CONTEXT. (Once this gets pushed in libmariadb I will add similar magic to the server to make -DWITH_BOOST_CONTEXT work directly also there). I talked to Georg Richter last week, and he seemed positive towards the idea, I have created a pull request for the two patches: https://github.com/mariadb-corporation/mariadb-connector-c/pull/257
          knielsen Kristian Nielsen made changes -
          Assignee Kristian Nielsen [ knielsen ] Georg Richter [ georg ]
          Status In Progress [ 3 ] In Review [ 10002 ]
          brad0 Brad Smith added a comment -

          The second rev works Ok and looks good to me. The option seems to function as expected.

          brad0 Brad Smith added a comment - The second rev works Ok and looks good to me. The option seems to function as expected.

          Great, thanks brad0!

          knielsen Kristian Nielsen added a comment - Great, thanks brad0 !
          brad0 Brad Smith added a comment - - edited

          I have added the aarch64 diff on top of 11.4.3 to our -current MariaDB port.

          https://github.com/openbsd/ports/commit/3b37cbe7e14e952d9cc9a2a8a5a536dbdabb5d84

          brad0 Brad Smith added a comment - - edited I have added the aarch64 diff on top of 11.4.3 to our -current MariaDB port. https://github.com/openbsd/ports/commit/3b37cbe7e14e952d9cc9a2a8a5a536dbdabb5d84

          Looks good Brad, thanks!

          Georg Richter has merged my patches to connector-c, so they will be part of the next release, both the Aarch64 native one and the generic boost::context fallback.

          Thanks for the help with testing, much appreciated.

          - Kristian.

          knielsen Kristian Nielsen added a comment - Looks good Brad, thanks! Georg Richter has merged my patches to connector-c, so they will be part of the next release, both the Aarch64 native one and the generic boost::context fallback. Thanks for the help with testing, much appreciated. - Kristian.
          knielsen Kristian Nielsen made changes -
          Assignee Georg Richter [ georg ] Kristian Nielsen [ knielsen ]
          knielsen Kristian Nielsen made changes -
          Status In Review [ 10002 ] Stalled [ 10000 ]

          Unfortunately, now it fails in buildbot in the server 10.6 branch on many (may be all?) aarch64 builders. Like

          main.non_blocking_api                    w3 [ fail ]
                  Test ended at 2024-10-17 21:44:45
           
          CURRENT_TEST: main.non_blocking_api
          mysqltest got signal 11
          read_command_buf (0xaaab00952210): CREATE TABLE t1 (a INT PABLE t1 EY)
           
          conn->name (0xaaab00969dd8): con_nonblock
           
          Attempting backtrace...
          stack_bottom = 0x0 thread_stack 0x3c000
          /usr/bin/mariadb-test(my_print_stacktrace+0x30)[0xaaaac42bd1b8]
          /usr/bin/mariadb-test(+0x63920)[0xaaaac4274920]
          addr2line: 'linux-vdso.so.1': No such file
          linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffa8f9f5b0]
          multiarch/memcpy_advsimd.S:101(__memcpy_simd)[0xffffa88cb068]
          /usr/bin/mariadb-test(+0x97d70)[0xaaaac42a8d70]
          /usr/bin/mariadb-test(ma_net_write_command+0x208)[0xaaaac42a91c0]
          /usr/bin/mariadb-test(mthd_my_send_cmd+0x118)[0xaaaac428ea10]
          /usr/bin/mariadb-test(_Z21wrap_mysql_send_queryP8st_mysqlPKcm+0x18)[0xaaaac4274bf0]
          /usr/bin/mariadb-test(_Z16run_query_normalP13st_connectionP10st_commandiPKcmP17st_dynamic_stringS6_+0x32c)[0xaaaac42852f4]
          /usr/bin/mariadb-test(_Z9run_queryP13st_connectionP10st_commandi+0x1e0)[0xaaaac4285538]
          /usr/bin/mariadb-test(main+0xc5c)[0xaaaac4273664]
          csu/libc-start.c:342(__libc_start_main)[0xffffa8866e10]
          /usr/bin/mariadb-test(+0x63620)[0xaaaac4274620]
          Writing a core file...
           
           - saving '/dev/shm/var/3/log/main.non_blocking_api/' to '/dev/shm/var/log/main.non_blocking_api/'

          See https://buildbot.mariadb.net/buildbot/grid?category=main&branch=10.6

          serg Sergei Golubchik added a comment - Unfortunately, now it fails in buildbot in the server 10.6 branch on many (may be all?) aarch64 builders. Like main.non_blocking_api w3 [ fail ] Test ended at 2024-10-17 21:44:45   CURRENT_TEST: main.non_blocking_api mysqltest got signal 11 read_command_buf (0xaaab00952210): CREATE TABLE t1 (a INT PABLE t1 EY)   conn->name (0xaaab00969dd8): con_nonblock   Attempting backtrace... stack_bottom = 0x0 thread_stack 0x3c000 /usr/bin/mariadb-test(my_print_stacktrace+0x30)[0xaaaac42bd1b8] /usr/bin/mariadb-test(+0x63920)[0xaaaac4274920] addr2line: 'linux-vdso.so.1': No such file linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffa8f9f5b0] multiarch/memcpy_advsimd.S:101(__memcpy_simd)[0xffffa88cb068] /usr/bin/mariadb-test(+0x97d70)[0xaaaac42a8d70] /usr/bin/mariadb-test(ma_net_write_command+0x208)[0xaaaac42a91c0] /usr/bin/mariadb-test(mthd_my_send_cmd+0x118)[0xaaaac428ea10] /usr/bin/mariadb-test(_Z21wrap_mysql_send_queryP8st_mysqlPKcm+0x18)[0xaaaac4274bf0] /usr/bin/mariadb-test(_Z16run_query_normalP13st_connectionP10st_commandiPKcmP17st_dynamic_stringS6_+0x32c)[0xaaaac42852f4] /usr/bin/mariadb-test(_Z9run_queryP13st_connectionP10st_commandi+0x1e0)[0xaaaac4285538] /usr/bin/mariadb-test(main+0xc5c)[0xaaaac4273664] csu/libc-start.c:342(__libc_start_main)[0xffffa8866e10] /usr/bin/mariadb-test(+0x63620)[0xaaaac4274620] Writing a core file...   - saving '/dev/shm/var/3/log/main.non_blocking_api/' to '/dev/shm/var/log/main.non_blocking_api/' See https://buildbot.mariadb.net/buildbot/grid?category=main&branch=10.6
          serg Sergei Golubchik made changes -
          Priority Major [ 3 ] Blocker [ 1 ]
          serg Sergei Golubchik made changes -
          Fix Version/s 10.6 [ 24028 ]
          Fix Version/s 10.5 [ 23123 ]

          serg The fix for the failure is here: https://github.com/knielsen/mariadb-connector-c/commit/fe1517d15fc804e22a0321ecaf603746e2216c0f

          brad0 You will need to update your aarch64 diff in your -current MariaDB port with this patch, sorry for the trouble. There is a typo in the code that will restore registers incorrectly and lead to crashes in some cases depending on what code the compiler generates; we must have been unlucky to never trigger that in our testing.

          knielsen Kristian Nielsen added a comment - serg The fix for the failure is here: https://github.com/knielsen/mariadb-connector-c/commit/fe1517d15fc804e22a0321ecaf603746e2216c0f brad0 You will need to update your aarch64 diff in your -current MariaDB port with this patch, sorry for the trouble. There is a typo in the code that will restore registers incorrectly and lead to crashes in some cases depending on what code the compiler generates; we must have been unlucky to never trigger that in our testing.
          knielsen Kristian Nielsen made changes -
          Assignee Kristian Nielsen [ knielsen ] Sergei Golubchik [ serg ]

          I wrote an update for the documentation (https://github.com/mariadb-corporation/mariadb-connector-c/wiki/configuration_options) for the new build option. But I am not sure how to get that included?

          diff --git a/configuration_options.md b/configuration_options.md
          index 6e8da57..726b4f8 100644
          --- a/configuration_options.md
          +++ b/configuration_options.md
          @@ -27,6 +27,11 @@ If you want to use a different generator, e.g. for nmake on Windows, you need to
           |WITH_OPENSSL|ON| Possible values are ON or OFF. Not supported anymore since Connector/C 3.0|
           |WITH_SSL|SCHANNEL (windows), otherwise OPENSSL|Specifies type of TLS/SSL library. E.g. GNUTLS, OPENSSL or SCHANNEL (Windows only). OFF disables TLS/SSL functionality|
           
          +### Non-blocking client library options
          +| Option | Default | Description |
          +|-|-|-
          +|WITH_BOOST_CONTEXT|OFF| Use `boost::context` instead of `ucontext` for the non-blocking client API. Can be used to build the non-blocking API on platforms that do not have `ucontext`. Note that on x86_64 (aka amd64), i386, and aarch64 (aka arm64), a native implementation is always used over `ucontext` or `boost::context`. (Added in 3.3.12)|
          +
           ### Client plugins
           Client plugins can be configured as dynamic plugins (DYNAMIC) or built-in plugins (STATIC) by specifying the plugin name followed by suffix `_PLUGIN_TYPE` as key, and `DYNAMIC` or `STATIC` as value.
           
          

          knielsen Kristian Nielsen added a comment - I wrote an update for the documentation ( https://github.com/mariadb-corporation/mariadb-connector-c/wiki/configuration_options ) for the new build option. But I am not sure how to get that included? diff --git a/configuration_options.md b/configuration_options.md index 6e8da57..726b4f8 100644 --- a/configuration_options.md +++ b/configuration_options.md @@ -27,6 +27,11 @@ If you want to use a different generator, e.g. for nmake on Windows, you need to |WITH_OPENSSL|ON| Possible values are ON or OFF. Not supported anymore since Connector/C 3.0| |WITH_SSL|SCHANNEL (windows), otherwise OPENSSL|Specifies type of TLS/SSL library. E.g. GNUTLS, OPENSSL or SCHANNEL (Windows only). OFF disables TLS/SSL functionality| +### Non-blocking client library options +| Option | Default | Description | +|-|-|- +|WITH_BOOST_CONTEXT|OFF| Use `boost::context` instead of `ucontext` for the non-blocking client API. Can be used to build the non-blocking API on platforms that do not have `ucontext`. Note that on x86_64 (aka amd64), i386, and aarch64 (aka arm64), a native implementation is always used over `ucontext` or `boost::context`. (Added in 3.3.12)| + ### Client plugins Client plugins can be configured as dynamic plugins (DYNAMIC) or built-in plugins (STATIC) by specifying the plugin name followed by suffix `_PLUGIN_TYPE` as key, and `DYNAMIC` or `STATIC` as value.
          brad0 Brad Smith added a comment - - edited

          Thanks. I'll push the bug fix.

          brad0 Brad Smith added a comment - - edited Thanks. I'll push the bug fix.
          serg Sergei Golubchik made changes -
          Component/s libmariadb [ 14006 ]
          Fix Version/s 10.6.20 [ 29903 ]
          Fix Version/s 10.11.10 [ 29904 ]
          Fix Version/s 11.2.6 [ 29906 ]
          Fix Version/s 11.4.4 [ 29907 ]
          Fix Version/s 10.6 [ 24028 ]
          Resolution Fixed [ 1 ]
          Status Stalled [ 10000 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Assignee Sergei Golubchik [ serg ] Kristian Nielsen [ knielsen ]

          People

            knielsen Kristian Nielsen
            brad0 Brad Smith
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.