Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25760

Assertion failure on io_uring_cqe_get_data() returning -EAGAIN

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Fixed
    • 10.6.0, 10.6.1
    • 10.6.2
    • Server
    • None

    Description

      This is almost 100% reproducable crash on MariaDB server 10.6 as of 7e1ec1550ceff29a983bf799622d97b73b79ce43 compiled with -DWITH_URING=yes.

      I run sysbench-tpcc (https://github.com/Percona-Lab/sysbench-tpcc) prepare on 40-core machine as

      ./tpcc.lua --mysql-host=yang04g --mysql-user=sbtest --mysql-password=sbtest --mysql-db=sbtest --time=1200 --threads=56 --report-interval=1 --tables=10 --scale=100 --use_fk=0 --mysql_table_options='DEFAULT CHARSET=utf8mb4' prepare
      

      against the similar 40-core machine with the mariadb server. After several minutes of the workload the server crashes.

      Backtrace:

      10.6 7e1ec1550ceff29a983bf799622d97b73b79ce43

      #0  0x00007f2e109d8aa1 in pthread_kill () from /lib64/libpthread.so.0
      #1  0x000055af0b0902c7 in my_write_core (sig=<optimized out>) at /root/krizhanovsky/server/mysys/stacktrace.c:424
      #2  0x000055af0abd3610 in handle_fatal_signal (sig=6) at /root/krizhanovsky/server/sql/signal_handler.cc:343
      #3  <signal handler called>
      #4  0x00007f2e10634387 in raise () from /lib64/libc.so.6
      #5  0x00007f2e10635a78 in abort () from /lib64/libc.so.6
      #6  0x000055af0a8da889 in ut_dbg_assertion_failed (expr=expr@entry=0x55af0b2a86a7 "cb->m_err == DB_SUCCESS", 
          file=file@entry=0x55af0b2a8a10 "/root/krizhanovsky/server/storage/innobase/os/os0file.cc", line=line@entry=3843)
          at /root/krizhanovsky/server/storage/innobase/ut/ut0dbg.cc:60
      #7  0x000055af0a8c3fe0 in io_callback (cb=<optimized out>) at /root/krizhanovsky/server/storage/innobase/os/os0file.cc:3843
      #8  io_callback (cb=<optimized out>) at /root/krizhanovsky/server/storage/innobase/os/os0file.cc:3841
      #9  0x000055af0b035668 in tpool::task_group::execute (this=0x55af0c8f27d0, t=0x55af0c917c78) at /root/krizhanovsky/server/tpool/task_group.cc:55
      #10 0x000055af0b0345af in tpool::thread_pool_generic::worker_main (this=0x55af0c816320, thread_var=0x55af0c823fc0) at /root/krizhanovsky/server/tpool/tpool_generic.cc:550
      #11 0x000055af0b0f7cff in execute_native_thread_routine ()
      #12 0x00007f2e109d3ea5 in start_thread () from /lib64/libpthread.so.0
      #13 0x00007f2e106fc8dd in clone () from /lib64/libc.so.6
      

      Following patch

      --- a/tpool/aio_liburing.cc
      +++ b/tpool/aio_liburing.cc
      @@ -152,6 +152,9 @@ class aio_uring final : public tpool::aio
             if (res < 0)
             {
               iocb->m_err= -res;
      +        my_printf_error(ER_UNKNOWN_ERROR,
      +                        "io_uring_cqe_get_data() returned %d\n",
      +                        ME_ERROR_LOG | ME_FATAL, res);
               iocb->m_ret_len= 0;
             }
             else
      

      produces line

      2021-05-23 11:07:09 0 [ERROR] mariadbd: io_uring_cqe_get_data() returned -11
      

      in the error log.

      Attachments

        Issue Links

          Activity

            krizhanovsky Alexander Krizhanovsky created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            Issue Type Task [ 3 ] Bug [ 1 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6 [ 24028 ]
            Affects Version/s 10.6.1 [ 24437 ]
            Affects Version/s 10.6.0 [ 24431 ]
            Assignee Eugene Kosov [ kevg ]
            marko Marko Mäkelä made changes -
            Description This is almost 100% reproducable crash on MariaDB server 10.6 as of 7e1ec1550ceff29a983bf799622d97b73b79ce43 compiled with -DWITH_URING=yes.

            I run sysbench-tpcc (https://github.com/Percona-Lab/sysbench-tpcc) prepare on 40-core machine as

            ./tpcc.lua --mysql-host=yang04g --mysql-user=sbtest --mysql-password=sbtest --mysql-db=sbtest --time=1200 --threads=56 --report-interval=1 --tables=10 --scale=100 --use_fk=0 --mysql_table_options='DEFAULT CHARSET=utf8mb4' prepare

            against the similar 40-core machine with the mariadb server. After several minutes of the workload the server crashes.

            Backtrace:

            #0 0x00007f2e109d8aa1 in pthread_kill () from /lib64/libpthread.so.0
            #1 0x000055af0b0902c7 in my_write_core (sig=<optimized out>) at /root/krizhanovsky/server/mysys/stacktrace.c:424
            #2 0x000055af0abd3610 in handle_fatal_signal (sig=6) at /root/krizhanovsky/server/sql/signal_handler.cc:343
            #3 <signal handler called>
            #4 0x00007f2e10634387 in raise () from /lib64/libc.so.6
            #5 0x00007f2e10635a78 in abort () from /lib64/libc.so.6
            #6 0x000055af0a8da889 in ut_dbg_assertion_failed (expr=expr@entry=0x55af0b2a86a7 "cb->m_err == DB_SUCCESS",
                file=file@entry=0x55af0b2a8a10 "/root/krizhanovsky/server/storage/innobase/os/os0file.cc", line=line@entry=3843)
                at /root/krizhanovsky/server/storage/innobase/ut/ut0dbg.cc:60
            #7 0x000055af0a8c3fe0 in io_callback (cb=<optimized out>) at /root/krizhanovsky/server/storage/innobase/os/os0file.cc:3843
            #8 io_callback (cb=<optimized out>) at /root/krizhanovsky/server/storage/innobase/os/os0file.cc:3841
            #9 0x000055af0b035668 in tpool::task_group::execute (this=0x55af0c8f27d0, t=0x55af0c917c78) at /root/krizhanovsky/server/tpool/task_group.cc:55
            #10 0x000055af0b0345af in tpool::thread_pool_generic::worker_main (this=0x55af0c816320, thread_var=0x55af0c823fc0) at /root/krizhanovsky/server/tpool/tpool_generic.cc:550
            #11 0x000055af0b0f7cff in execute_native_thread_routine ()
            #12 0x00007f2e109d3ea5 in start_thread () from /lib64/libpthread.so.0
            #13 0x00007f2e106fc8dd in clone () from /lib64/libc.so.6

            Following patch

            --- a/tpool/aio_liburing.cc
            +++ b/tpool/aio_liburing.cc
            @@ -152,6 +152,9 @@ class aio_uring final : public tpool::aio
                   if (res < 0)
                   {
                     iocb->m_err= -res;
            + my_printf_error(ER_UNKNOWN_ERROR,
            + "io_uring_cqe_get_data() returned %d\n",
            + ME_ERROR_LOG | ME_FATAL, res);
                     iocb->m_ret_len= 0;
                   }
                   else

            produces line

            2021-05-23 11:07:09 0 [ERROR] mariadbd: io_uring_cqe_get_data() returned -11

            in the error log.
            This is almost 100% reproducable crash on MariaDB server 10.6 as of 7e1ec1550ceff29a983bf799622d97b73b79ce43 compiled with -DWITH_URING=yes.

            I run sysbench-tpcc (https://github.com/Percona-Lab/sysbench-tpcc) prepare on 40-core machine as
            {code:sh}
            ./tpcc.lua --mysql-host=yang04g --mysql-user=sbtest --mysql-password=sbtest --mysql-db=sbtest --time=1200 --threads=56 --report-interval=1 --tables=10 --scale=100 --use_fk=0 --mysql_table_options='DEFAULT CHARSET=utf8mb4' prepare
            {code}
            against the similar 40-core machine with the mariadb server. After several minutes of the workload the server crashes.

            Backtrace:
            {noformat:title=10.6 7e1ec1550ceff29a983bf799622d97b73b79ce43}
            #0 0x00007f2e109d8aa1 in pthread_kill () from /lib64/libpthread.so.0
            #1 0x000055af0b0902c7 in my_write_core (sig=<optimized out>) at /root/krizhanovsky/server/mysys/stacktrace.c:424
            #2 0x000055af0abd3610 in handle_fatal_signal (sig=6) at /root/krizhanovsky/server/sql/signal_handler.cc:343
            #3 <signal handler called>
            #4 0x00007f2e10634387 in raise () from /lib64/libc.so.6
            #5 0x00007f2e10635a78 in abort () from /lib64/libc.so.6
            #6 0x000055af0a8da889 in ut_dbg_assertion_failed (expr=expr@entry=0x55af0b2a86a7 "cb->m_err == DB_SUCCESS",
                file=file@entry=0x55af0b2a8a10 "/root/krizhanovsky/server/storage/innobase/os/os0file.cc", line=line@entry=3843)
                at /root/krizhanovsky/server/storage/innobase/ut/ut0dbg.cc:60
            #7 0x000055af0a8c3fe0 in io_callback (cb=<optimized out>) at /root/krizhanovsky/server/storage/innobase/os/os0file.cc:3843
            #8 io_callback (cb=<optimized out>) at /root/krizhanovsky/server/storage/innobase/os/os0file.cc:3841
            #9 0x000055af0b035668 in tpool::task_group::execute (this=0x55af0c8f27d0, t=0x55af0c917c78) at /root/krizhanovsky/server/tpool/task_group.cc:55
            #10 0x000055af0b0345af in tpool::thread_pool_generic::worker_main (this=0x55af0c816320, thread_var=0x55af0c823fc0) at /root/krizhanovsky/server/tpool/tpool_generic.cc:550
            #11 0x000055af0b0f7cff in execute_native_thread_routine ()
            #12 0x00007f2e109d3ea5 in start_thread () from /lib64/libpthread.so.0
            #13 0x00007f2e106fc8dd in clone () from /lib64/libc.so.6
            {noformat}
            Following patch
            {code:diff}
            --- a/tpool/aio_liburing.cc
            +++ b/tpool/aio_liburing.cc
            @@ -152,6 +152,9 @@ class aio_uring final : public tpool::aio
                   if (res < 0)
                   {
                     iocb->m_err= -res;
            + my_printf_error(ER_UNKNOWN_ERROR,
            + "io_uring_cqe_get_data() returned %d\n",
            + ME_ERROR_LOG | ME_FATAL, res);
                     iocb->m_ret_len= 0;
                   }
                   else
            {code}
            produces line
            {noformat}
            2021-05-23 11:07:09 0 [ERROR] mariadbd: io_uring_cqe_get_data() returned -11
            {noformat}
            in the error log.
            marko Marko Mäkelä made changes -
            Priority Major [ 3 ] Blocker [ 1 ]
            Summary Assertion failure on bad io_uring_cqe_get_data() return code Assertion failure on io_uring_cqe_get_data() returning -EAGAIN
            marko Marko Mäkelä made changes -
            Assignee Eugene Kosov [ kevg ] Marko Mäkelä [ marko ]
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2021-06-14 10:32:40.0 2021-06-14 10:32:40.993
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6.2 [ 25800 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 122087 ] MariaDB v4 [ 159328 ]

            People

              marko Marko Mäkelä
              krizhanovsky Alexander Krizhanovsky
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.